Miss an article? Archives
Saturday, March 22, 2008
By John Kohl, SAS Institute (reprinted with permission from Client Side News)
In a previous article, Uwe Muegge speculated about why we don’t hear about more companies using controlled languages. According to Muegge, “anyone new to the field may have a hard time finding reliable, vendor-independent information on what [controlled-language] solutions are available and what the costs and benefits of deploying those solutions are.”
We found that to be true in our investigations at SAS Institute as well, but at least part of the problem is in the interpretation of “controlled language.” In the course of our investigation, we found that the technology has evolved to controlled-authoring, and that it is no longer limited to helping authors conform to a strictly controlled language such as Simplified Technical English. Instead, many companies use controlled-authoring software to:
Our investigations led us to one such controlled authoring product: acrocheck. The acrocheck suite of Content Quality Management tools is based on a Natural Language Processing engine that evolved over the course of 15 years of research and development at the German Research Institute for Artificial Intelligence (DFKI in Saarbruecken). Acrocheck is sold and supported by acrolinx GmbH in Berlin, with offices in the US.
Overview of SAS
SAS is the largest privately owned software company in the world, and it is the global leader in business intelligence and analytical software. It has 10,000 employees worldwide and annual revenues of about $1.9 billion. In our Documentation Division we have 53 technical writers and 12 editors. Of course, we have content creators in other divisions as well, but so far we have implemented acrocheck only in the Documentation Division.
Why acrocheck
Our implementation was motivated partly by the need to standardize and control terminology. In recent years, SAS products have become more integrated. We also began publishing documentation on the web with a consolidated index and full-text search. Terminology issues became more visible to us, and to customers, than ever before.
The intensified pace of globalization also meant that we had to find an efficient way of making our documentation more suitable for translation and easier for nonnative speakers of English to understand. To address this second issue, we have developed a detailed set of “Global English” guidelines. But even the best technical writers find it difficult to apply complex style guidelines or to consistently conform to lists of approved and deprecated terms. Deadlines and time pressures make it impractical for authors and editors to refer to style guides and glossaries frequently.
Since SAS is all about using technology to support business processes and decision-making, it is only natural that we would look for a technological solution to help our authors follow our style and terminology guidelines. We also anticipate that the increased consistency in our documentation will make the use of translation memory more effective, and that consistent terminology and phrasing will make our documentation more usable for all our audiences.
Implementation
We were fortunate to have an active executive-level champion, in addition to great management support throughout the company. To emphasize the goal of helping our authors communicate clearly and consistently, we used Assisted Writing and Editing (AWE) as the name of the project. Although we realize that we are controlling the English language to some degree, we wanted to avoid the negative connotation of the term “controlled authoring.” Besides, we’re not really depriving authors of anything that they can’t easily do without; we’re just helping them make optimal choices.
The rollout, which began in May of 2007, has gone quite smoothly. Overall, the response from our writers and editors has been extremely favorable. We’ve gotten quite a number of positive comments, including the following:
Because acrocheck gives authors immediate feedback on their own writing, they quickly learn to follow guidelines that they never quite grasped before. After an initial productivity hit, this training effect leads to the opposite: a significant productivity increase. Writers fix grammar, spelling, style and terminology issues early in the writing process, so there are fewer corrections to be made late in the documentation cycle, when the pressure to deliver is greatest. Because much of the copy editing work is now done during the writing process, our editors have more time to devote to more substantive issues.
Implementation Details
The whole implementation process, from the initial decision to proceed through our rollout, took a little more than a year-- although I hasten to add that our experience was atypical. Most companies have done it in half that time, or less.
The first issue that contributed to the extended timeline was our extensive collection of deprecated terms, in addition to approved terms, for which we had some background information that we did not want to lose track of. Because that information was scattered around in several places, it took a while to consolidate it into one Excel spreadsheet that we could then use as the basis for an acrocheck term bank. Then we had to specify what the Help topics for those terms should look like, because we wanted to structure them differently than in acrocheck’s default approach.
Second, SAS documentation contains many oddities that we had to “teach” acrocheck to handle. For example, acrocheck initially interpreted the word “%tmfilter” (the name of a software concept called a macro) as two “tokens"--the percent sign and “tmfilter.” That issue became apparent when “tmfilter” was flagged as a spelling error, as if it had no percent sign attached to it. Acrolinx defined dozens of token classes such as “PercentLowercaseWord” for us so that acrocheck would recognize that these “word shapes” were single terms and that they should not be checked for spelling. According to the chief linguist at acrolinx, SAS has more token classes than any other acrolinx customer!
Third, a few months into our implementation, acrolinx rebuilt their batch client, which we planned to use for checking HTML documents. We gladly accepted a two-month delay because the new client included support for checking SGML documents. That was a huge benefit to SAS. We are moving to an XML-based publishing system, but a lot of our content is still authored in SGML.
Fourth, we tested and optimized the acrocheck rules quite thoroughly in order to reduce “false alarms” to a minimum. In hindsight, it wasn’t necessary to be so thorough--most of the standard acrocheck rules are quite accurate “out of the box"--but at the time we were perhaps more concerned about user acceptance than we needed to be.
To facilitate testing, we assembled a large collection of our documentation as our test corpus, which took quite a while. We ended up with 64,000 files and 17,000,000 words in our collection. We used a 1,000,000- word subset of the collection for early testing of new rules that we asked acrolinx to develop. We would run acrocheck in batch mode against the entire collection, review the output from each rule, work with acrolinx to make corrections, and test again.
Most of the refinements to the grammar and style rules reflect the nature of our content. For example, acrocheck flags the following sentence as an error because “the at” seems to be an ungrammatical sequence of words:
The remaining seven characters can include letters, digits, underscores, the dollar sign ($), or the at sign (@).
But you can prevent that “false alarm” from being triggered by modifying the rule so that it ignores any occurrence of “the at” that is immediately followed by ...
Filed under: Content Quality Management : Localization : Structured Content : Translation
Thursday, April 10, 2008
Sworn members of The Typo Eradication Advancement League (TEAL), folks who pledge to dedicate themselves to “a more perfectly spelling union”, have been traveling across the US stamping out as many typos as they can find. While a typo-free society may be a long way off, TEAL members say they believe “that only through working together with vigilance and a love of correctness can we achieve the beauty of a typo-free society.”
TEAL ends their cross country adventure in May. Read the TEAL blog for daily updates, photographs of the typos spotted, and humorous stories about the project and the people they met along the way.
Wednesday, September 26, 2007
Scott Abel, The Content Wrangler, will be presenting his wildly popular presentation, Web 2.0 and Its Impact on Technical Communication at LavaCon, New Orleans, October 27–30, 2007. Before the program, participants are invited to see the devastation inflicted on the Big Easy by hurricane Katrina and to participate in a Community Service program.

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.