Miss an article? Archives

Feature Article

Tuesday, April 27, 2004

DITA: An XML Architecure For Publishing Technical Information

From the Arbortext Publishing Network

DITA is one of the most important innovations in XML publishing in recent memory. And if you’re using or plan to use XML for publishing technical documentation, you will encounter DITA sooner or later.

Short for “Darwin Information Typing Architecture,” DITA is an IBM invention that the company recently contributed to the community under the auspices of OASIS, the Organization for the Advancement of Structured Information Standards. (More information about the technical committee for DITA that OASIS formed can be found here.

DITA is an architecture based on XML for publishing technical information. In that sense, it’s like DocBook. But there are two aspects of DITA that make it special:

  • Modular – DITA defines a Topic DTD that supports a modular approach to creating information. A topic is an information component, not a complete book. A topic covers one aspect of a specific area of interest. (For example, this article could be divided into three topics: introduction, overview, and origins.) DITA defines a mechanism for combining topics into documents so that the documents contain a hierarchy that is appropriate for the document. For example, the hierarchy of a book usually consists of chapters, sections and sub-sections.
  • Adaptable – The Topic DTD is similar to HTML in that it specifies a set of generic elements each with different formatting such as titles, paragraphs and lists. To adapt the Topic DTD to your specific needs, DITA defines a mechanism called “specialization” that allows you to define new tags that inherit their behavior and properties from tags in Topic.
    Specialization allows downstream applications that are DITA-aware to handle an unknown tag by treating it as the tag from which it inherits its properties. For example, you could create a tag called “Procedure” that inherits from “Ordered List” and a tag called “Step” that inherits from “List Item.” Although you may want to add specific processing for Procedure and Step to your application, a DITA-aware application that knows nothing about these tags would handle them as if they were Ordered List and List Item tags instead. For instance, a DITA-aware publishing application that knows nothing about Step would format it as if it were a List Item.

Knowing the definition of DITA does not give you enough background for understanding its implications. So now let’s start the story.

From Necessity to Invention – The Origins of DITA

Let’s begin by re-visiting some of the key objectives of an XML publishing system:

  • Reuse – To eliminate redundancy, improve accuracy, and reduce the effort to update information, XML helps you reuse and repurpose information so that you can create a single source of truth
  • Sharing – XML lets you construct your information in a way that allows other groups both within and outside your organization to incorporate your information seamlessly into their own processes, adding further value to the information you create
  • Relevance – You can use XML to help you create your information in modules that you automatically assemble according to the needs of each individual so that he gets everything he needs and only what he needs
  • Automation – To achieve these objectives cost-effectively, automation holds the key; XML makes that automation possible by allowing you to enforce the absolutely consistent structure that automated processes require

This “absolutely consistent structure” is defined by the “data model” – the DTD or Schema that prescribes which tags are allowed in your documents and how those tags may be used.

Now imagine you’re part of a huge company with an incredibly diverse product line and your responsibility is to oversee technical publishing for every single product. You’ve chosen to use XML because of its potential to achieve dramatic efficiencies in authoring, translation and publishing while enabling your organization to deliver more accurate, timely and relevant information to your customers.

At the heart of your system is a data model you designed for publishing technical documentation by every group within the company. Your company-wide approach to publishing will not only reduce your costs for obtaining publishing tools and developing publishing applications, it will also allow your company to present a consistent face to your customers while combining content for various products into documentation that’s precisely tailored to each customer’s needs.

As each group starts up, however, they find holes in your data model – they have needs you did not anticipate and they want changes. Each change affects not only the data model itself, but also all the downstream applications that rely on that data model – the most notable of which are assembly and publishing.

As more and more groups come on line and demand changes, and as existing groups find new opportunities that require further changes, you come to realize not only that the changes will never stop, but that you are falling behind – and will never be able to catch up.

Enter DITA

Given these challenges, you decide a new approach is in order. You need to find a way to serve diverse needs, adapt easily to new requirements, and support highly modular information, all while keeping things simple enough so that authors can become productive quickly.

You begin by creating a DTD that contains all of the common formatting constructs that technical publications require. And what better place to start than HTML, which has proven its flexibility across millions of Web pages? Titles, paragraphs, bulleted lists, numbered lists, italicized words – HTML represents all of these and more.

Your new DTD contains no tags specific to your business – just “generic” tags similar to those of HTML. Your new DTD also contains no document hierarchy – it represents only a module of information. (Later, you will come up with a way of combining Topic-based modules into complete documents for delivery to your customers.)

You give your DTD a name – “Topic” – and then you design a mechanism so that you can easily adapt Topic to your specific needs. Inspired by the work on “architectural forms,” you create a method for defining new tags based on inheriting properties from existing tags. This will allow any application that understands your syntax for defining new tags to process those tags.

Because you’re creating technical documentation, you then turn your attention to the specific types of information such documents require. Based on the principles of information architecture, around which exists a substantial body of research, understanding and experience, you define three specializations of Topic that will be the building blocks of your documents:

  • Concept
  • Task
  • Reference

These specializations are more than just your great idea – they represent another important innovation of DITA: creating data models that guide authors to create information that’s both easier to write and easier to understand. You will learn more about this in an upcoming article.

Sign up for the Arbortext Publishing Network newletter today!


More articles about Technical Writing

Categories

Subscribe: Direct Inbox Delivery

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.

sponsors Image Image Image Image Image Image image Image Image Image Image Image Image Image Image Image