Miss an article? Archives

Feature Article

Wednesday, September 24, 2008

Moving Legacy Content To XML: Affordable, Self-Service Analysis/Modeling Tools Needed, Survey Says

By Scott Abel, The Content Wrangler and The Content Wrangler Community

image Moving to structured XML authoring is a challenge for many organizations, especially those with lots of legacy content locked away in unstructured, proprietary Microsoft Word files, a new survey of 500 content professionals uncovered. The survey, conducted by The Content Wrangler, discovered widespread interest in self-service, web-based legacy content analysis tools, software that can analyze content and provide metrics-rich reports designed to help users make the business case for adopting XML authoring and component content management, for example. It also uncovered the importance of security, the attractiveness of variable pricing, and the need for a free demonstration version of legacy content analysis software, according to feedback provided by the respondents.

The survey was sponsored by a software vendor and was designed to determine the level of interest among content professionals (especially technical communicators) in a product designed to analyze large sets of unstructured legacy content and uncover patterns in the content that humans alone cannot easily detect. According to the software vendor, the tool would also provide users with reports loaded with useful metrics designed to help make the business case for new ways of creating, managing and delivering content. Such a tool is needed, content management analysts say, because most organizations don’t know what it costs them to develop their content. Unless a strong business case can be made for change, analysts say, it’s unlikely an organization will receive funding to move to XML.

“Anything that suggests—even roughly—the amount of time, some metrics, or the number/type of resources necessary to get to some level of XML would be helpful,” volunteered one survey respondent. “The hardest thing to sell to management is the time and cost necessary to get to XML.”

“As a manager, I often find I’m making the business case to spend money to make my ‘real’ business case,” offered another respondent.

The survey questions and responses (by percentage and number of respondents) are listed below.

Question One: How interested would you be in a web-based tool that could analyze legacy unstructured documents (Microsoft Word files) and provide you with meaningful metrics designed to help you make the business case for change?

Results

More than 80% of survey respondents (414) said they would be either “very interested” (199) or “interested” (215) in a web-based tool that could analyze legacy unstructured documents (Microsoft Word files), however, a large number indicated that they would also like the tool to analyze unstructured Adobe FrameMaker files. FrameMaker is a popular long document creation and publishing tool that has been in widespread use in the technical communication industry for more than a decade. A small percentage of respondents also mentioned the need for the tool to analyze other unstructured content file types (PDF, EPS, Adobe InDesign, RTF, OpenDocument, HTML, FrontPage, SGML, Adobe RoboHelp). Two respondents said they need a tool that can also analyze wiki content.

“There aren’t strong enough words available to suitably emphasize how interested I’d be in a tool like this that really delivered on its promises,” said one respondent.

image

Question Two: Which of the following features (below) of the proposed new product is most important to you? Which is the least important? Please rank from 1 (least important) to 5 (most important).

Features:

  • Self-service (you upload documents and get report delivered via email)
  • Secure site (using a commonly used web security protocol)
  • Browser agnostic (works on IE, FireFox, Safari)
  • A free demonstration version (so you can try before you buy)
  • Memory (reports are saved for retrieval at later date)

Results

Of the five features survey respondents were asked to rank, a “secure site” tied with “self-service” as the most important features, but not by much. In fact, many respondents said that it was difficult to rank these five proposed features because they are all of nearly equal importance, which the survey results support. Comments provided by the respondents point out the importance of “a free demonstration version” in making the business case to management and establishing confidence that the product actually does what the marketing and sales folks claim it does.

“I feel a trial version is really a standard part of the sales process,” said one respondent. “A demo version is a big deal when showing off a product to management,” another wrote.

Other suggestions for features:

  • A configurable reporting engine (to allow users to create customized reports)
  • A file comparison tool (to compare and report upon the various differences detected in different versions of the same document set)
  • Download capability (to save the results locally, preferably in popular file formats like (Excel, Access)
  • Online file memory (your results are stored on a secure, password-protected website for later retrieval)
  • Support for multi-lingual content

image

Question Three: Would the following attributes make you more or less interested in purchasing such a product?

Features:

  • Variable pricing (fees based on quantity of documents analyzed)
  • Special pricing for independent content consultants
  • Support for analyzing documents in multiple languages
  • Help making the business case for converting legacy content to XML
  • Help creating information models to support structured XML authoring

Results

The results of this question are no surprise to technical communication consultants and structured content professionals. While making the business case is always a challenge, survey respondents also admit needing help creating information models to support structured XML authoring and if a content analysis tool can provide it, they’ll take the help.

Help is needed, some analysts say, because far too many organizations have tackled moving to structured XML authoring without understanding the changes the new paradigm introduces. Ultimately, they say, these projects fail.

“XML is not a good word where I work,” wrote one survey respondent. “The group that uses XML single-source authoring is a mess. As a result, XML will not be a selling point to management.”

“Help is the keyword here,” one respondent shared. “The more help tools can offer in this process (creation of information models) the better.”

Independent consultants, presumably with more experience creating information models shared markedly different comments.

“As a consultant,” one respondent said, “creating good information models for my clients is my job, but I guess I wouldn’t mind having this feature .”

image

Question Four: How likely are you to buy this product?

Results

More than half of the respondents said they would be “very likely” (9.6%) or “somewhat likely” (46.4%) to purchase a self-service, web-based legacy content analysis tool. Of course, “price”—followed closely by the ability to “make the business case to management”—were the key deciding factors, survey respondents said.

“Depends on price. It may do everything I want, but if it costs a million bucks, I’m not going to buy it,” shared one respondent.

“Even if a tool is helpful, it can still be difficult to convince the company to purchase. Sadly, I have to make the business case in order to get the money to make the business case” said one respondent. “It’s difficult to get management to spend money on a tool to analyze (unstructured content) when they don’t understand the need to move to structured authoring in the first place,” wrote another.

Even in organizations that claim they have already made the business case, funding can be a challenge.

“Even though we have already made the business case, we still don’t have the level of funding I think we’ll need,” one respondent said. “Just because a tool exists does not mean the budget does.”

image

Question Five: Are there additional attributes that you would want incorporated into this product?

Results

There were a wide range of suggestions for additional features provided by more than 150 of the 500+ survey respondents. We provide a small sampling for your review here:

  • Easy for non-technical people to run and understand
  • Ability to publish results to the web
  • Integration with authoring tools (Adobe FrameMaker, Adobe Robohelp, various help authoring tools)
  • Free, top-notch technical support, online help, and training
  • Ability to see fuzzy and identical matches in order to edit documents for optimal reusability prior to conversion to XML
  • Integration with computer-assisted translation tools
  • Support for analyzing the text in graphics (for example, text in vector graphics—EPS and SVG)
  • Ability to compare images, photos, illustrations
  • Support for Macintosh computers
  • Additional help—perhaps tutorials—on how to convert XML content to desired DTD or schema
  • Reports that identify possibilities for content reuse
  • Ability to point the software at a document repository and have it scan the files (instead of uploading them to a web-based tool)
  • Access to aggregate report data (from all customers) to establish benchmarks
  • Statistics on quality aspects of the content included in the reports generated
  • Integration with popular content management systems
  • Ability to apply stylesheets to the converted content
  • Ranking of legacy documents based on the amount of effort needed to convert each one (important when analyzing a large number of files)
  • Provide a “map” of the files created
  • Integration with the DITA Open Toolkit

Additional resources

If you are interested in learning more about structured XML authoring and related topics, consider joining the Writing for Reuse group on The Content Wrangler Community.

Drawing winners

The survey offered respondents a chance to win one of ten free tickets [see the winners list] to the Documentation and Training 2008 East conference, held October 29-November 1, 2008 in Burlington, MA.

Related article: Paradigm Shifts are Never Pretty: Advice on Making the Move to XML Authoring

More articles about Content ConversionLegacy Content ConversionStructured ContentXML

Categories

Subscribe: Direct Inbox Delivery

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.

sponsors Image Image Image Image Image Image image Image Image Image Image Image Image