Miss an article? Archives

Feature Article

Friday, April 11, 2008

Choosing an XML Schema: DocBook or DITA?

By Richard Hamilton, special to The Content Wrangler (reprinted with permission)

If you follow the latest trends or have been to a conference recently, you may find the idea of choosing an XML schema puzzling.  Isn’t the question really, “How should I customize DITA to do what I want”?  While there are many good reasons to choose DITA, it’s not the only schema in town.

The two most popular schemas at the moment are DocBook and DITA, and I’ll use them as examples.  There are other choices—S1000D and TEI come to mind—but the chances are good that if you’re not in an industry that mandates a particular schema, you’ll end up using DocBook or DITA.

Full disclosure: I’m a long-time DocBook user (this article was authored in DocBook) and a member of the OASIS DocBook Technical Committee. That makes me at least somewhat partisan. I’ll do my best to be even-handed, but you should know where my origins are.

Your decision will depend on the following considerations:

  • Content:  Is your content narrative (books, articles, etc.), modular (topics, reference pages, help pages, etc.), or both?

  • Deliverables:  Do you deliver printed documentation, web pages, help systems, or all of the above?  And, how important is each type relative to the others?

  • Customization:  How specialized is your content?  Do you need to create new markup unique to your application?

  • Scale:  How much content do you need to manage and how many writers do you have working on that content?

Let’s consider each of these in turn.

Content

Common wisdom is that if you develop narrative content, you should use DocBook, and if you develop modular or topic-based content, you should use DITA.  While this is true to an extent, it’s misleading.  You can write books using DITA and modular content using DocBook.

That said, there are important differences, and you will probably find that one or the other will be a more natural fit for your content.  To look for the best fit, you need to consider two kinds of markup:

  • Structural:  Structural markup defines the organization of your content.  It includes markup to identify sections, modules, chapters, or books, as well as markup that builds larger structures from smaller pieces.

    DITA is designed for modular, topic-oriented content. Typically, writers create individual topics, which are then aggregated into deliverables of various kinds using a “ditamap.

    DocBook was originally designed to support documentation structured like a book, with front matter, chapters, and back matter (appendices, glossary, index, etc).  However, it has evolved over the years to support a much wider variety of structures.

  • Inline:  Inline markup lets you tag pieces of content, typically to define their semantics.  For example, in this article, inline markup is used to identify things like links, author information, and book titles.

    DITA and DocBook have markup for most common software and hardware components, as well as standard inlines for links, meta-data, references, and so forth.  In keeping with its philosophy, DITA has fewer inline elements and encourages users to create additional elements as specializations.  DocBook has more choices, and is less likely to need specialization.

The most significant markup differences between DITA and DocBook are structural; therefore, I give them the greatest weight. If you use a topic-based, modular methodology, and you want the schema to help enforce that methodology, you will probably find DITA more to your liking.  If you use a more traditional methodology, or if you don’t want to enforce a particular methodology, you will probably find DocBook more to your liking.

If your team writes both narrative and modular documentation, it’s a tougher call.  Each will handle both types of documentation, but other things being equal, I give the edge to DocBook. I think DocBook does a better job handling modularity than DITA does handling books.

Regarding inlines, I suggest that you look at the choices offered by each schema.  If you need a lot of new inlines, you’ll probably be happier with DITA, but if your content is mainstream software or hardware documentation, you’ll probably find that DocBook already contains what you need.

Deliverables

Both schemas have open source XSL stylesheets that generate a range of deliverables.  As I’m writing this, both the DocBook XSL stylesheets and the DITA Open Toolkit will produce: print (using XSL-FO), HTML, XHTML, HTML Help (HTML that can be compiled into Microsoft HTML help), JavaHelp, and Eclipse help.  In addition, DocBook has stylesheets to generate WordML and plain text, and DITA has stylesheets to generate Microsoft’s Rich Text Format (RTF) and troff.  And, if you can’t make up your mind, there are stylesheets that convert DocBook to DITA and DITA to DocBook.

Since both support essentially the same formats, the important differentiator is how well the stylesheets work for your deliverables.  The DocBook stylesheets are more mature, and are well documented in Bob Stayton’s DocBook XSL: The Complete Guide.  The DITA stylesheets are newer, and have less well developed documentation.  Both are actively maintained, and both have strong communities of interest that are willing to help.

No matter which you choose, you will need to customize the stylesheets.  I’ve never seen an organization that didn’t need something different from the standard look and feel.  If you have XSL-knowledgeable staff or contractors to customize your stylesheets, either will serve you well.  But, if you’re running on a shoestring, or if you have less experienced staff, you will probably prefer DocBook.  It is better documented and the standard stylesheets provide a wider range of parameters that can be adjusted without programming.

Customization

Both schemas can be used “out of the box” to author useful content.  And, both can be customized relatively easily. (Note: in this section customization refers to modifying or extending the markup, not the stylesheets.)

DITA is designed with customization, or in its terminology “specialization,” in mind.  It uses an object-oriented model that allows you to create new elements that are specializations of existing elements.  For example, if you were writing an application for an airline, you might want to markup flight numbers.  Using DITA, you can create a new element called <flightnum> that is a specialization of an existing DITA element (for example, the <prognum> element).  See Eliot Kimber’s DITA Specialization Tutorial for more information.

DocBook starts out with a larger set of elements, so for many applications you can avoid specialization.  It also has a user-defined attribute, the role attribute, which let’s you differentiate variant types of an element.  For example, for flight numbers you could add a role attribute to the productnumber element like this: <productnumber role="flightnum">.  Not as pretty, but it avoids customization.  If you do need to modify the schema, the latest version, DocBook 5.0, provides excellent support for customization, including specialization.  See the DocBook 5.0: The Transition Guide for more information.

The connection between customization and stylesheets is tighter with DITA than it is with DocBook. When you define a new element, you define it as a child of another element.  If you do nothing else, that new element will be processed the same way its parent is processed.  DocBook can do this (see Norm Walsh’s article, DITA for DocBook), but it’s not automatic.

If you plan to extensively customize the schema, you’ll probably be happier with DITA.  If you are looking for a schema that will cover a wider variety of content “out of the box,” you’ll probably be happier with DocBook.

Scale

In general, XML may not be your best choice for small, isolated projects.  The overhead may simply be greater than any benefit. But, if your project is big enough to consider using XML, but still small, DocBook may be more cost-effective.

There are several reasons for this:

  • DocBook is less likely to require customization.

  • The DocBook stylesheets are, at this writing, more comprehensive, better documented, easier to modify, and more likely to be useful with minimal modification.

  • For any given project, DITA will almost always require more files and more links between files than DocBook. While small projects using either schema can be managed without CMS software, as you scale up in size, DITA will need a CMS sooner than DocBook.

For larger projects, it’s unlikely that choosing DocBook or DITA will make a significant difference in the cost of moving to XML.  The cost of analysing your content, designing and implementing a solution, training your team, and maintaining your solution will overwhelm any likely differential between DocBook and DITA.

There is one potential exception to this.  If you need to re-use content extensively, have a large documentation set, have many writers, and use a Content Management System (CMS), DITA may enable you to design in efficiencies that would be harder to achieve with DocBook.  The only way you’ll know is by doing a detailed analysis of your existing content and your proposed solution.  A great deal of ink has been spilled over the notion that DITA will give you these efficiencies.  While I’m willing to be proven wrong, I’m inclined to believe that most, if not all, of the benefit comes from a good methodology, a good CMS, and well-structured content, rather than the schema.

Buzz

There’s one other factor you should at least think about; popularity.  As I write this, more people use DocBook than DITA, and it is clearly more stable and mature.  However, the “buzz” clearly favors DITA.  If you go to a technical communication conference, you will find many sessions on DITA and few (or none) on DocBook.  There is more and more being written about DITA, and some of the best known consultants are devoting a lot of time to it.

I’m not convinced this foreshadows the demise of DocBook, but there is at least the possibility that DocBook has seen the peak in its popularity.  If that’s the case, then over time it will become less well supported.

At the moment, however, there are strong, active communities supporting both DITA and DocBook, and I expect that to be the case for a long time.

Making the Choice

To decide, you need to weigh the relative importance of each factor for your situation.  Usually, the most important factors will be content and deliverables, in that order.  Your writers will spend more time with the schema than anyone else; if you pick one that isn’t a good match for them, they will be less productive.

The importance of other factors will vary.  For example, if marking up your content would require customization of one schema, but not the other, that’s an important consideration. If you’re already committed to a CMS that has better support for one schema than the other, that’s also an important factor.

One thing to guard against is letting a secondary factor dominate your decision for the wrong reasons.  For example, if you happen to have an engineer who is a DocBook expert, that’s a plus for DocBook, but if everything else points to DITA, don’t let that one factor overwhelm your decision.

A skillful vendor presentation can also have a disproportionate effect.  You’re best off avoiding sales presentations until you’ve analyzed your situation and made some basic choices. Then use the vendor’s presentation as a way to find out how well their product matches your requirements, rather then letting the vendor steer you towards a solution that matches their product.

In the end, you need to analyze your current situation and your proposed future situation to make sure you understand the core reasons for making a change.  Remember that the schema is only one of many decisions you will need to make as you move to an XML-based environment.  Look at all of these decisions together, and if you need to, work with an outside consultant to help you better analyze your choices.  Be aware, however, that most consultants have their pet solutions, and many have ties to particular vendors.  Make sure you pick one who can give you an independent evaluation.

About the Author

Richard Hamilton is principal consultant with R.L. Hamilton & Associates, specializing in documentation management and the application of XML technology to documentation.  He has managed documentation teams at AT&T, Novell, and Hewlett-Packard. His teams have developed technical documentation for Unix and Linux systems, web-based applications, and many other software and hardware projects. He also has led development teams, including a team at Hewlett-Packard that developed a DocBook XML-based environment that delivers print, web, and online help content. He has been a member of the DocBook Technical Committee since December, 2001, and is a contributing author to the DocBook 5.0 Transition Guide.  He is currently writing a book about managing documentation; excerpts can be found at: http://rlhamilton.net/blog

More articles about DITADocBookStructured ContentTechnical WritingXML

Image

Categories

Subscribe: Direct Inbox Delivery

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.

sponsors Image Image Image Web Content 2008 Chicago image image Image image Inmedius Horizon Image Image Image Image Image Image image Image Image Image Image image