Oops, We Accidentally Undersold XML
The forgotten business value of XML, and why AI is making interoperability matter again
Looking back, I think we accidentally undersold Extensible Markup Language (XML). We spent years celebrating 🎉 its publishing capabilities while paying far less attention to the larger problem its creators were trying to solve: interoperability. The W3C XML Working Group wasn't simply building a better publishing format. They were building a framework for making information portable, reusable, and understandable across disparate systems.
For most of the last twenty-five years, XML was presented primarily as a publishing solution. We talked about single-source publishing, content reuse, multichannel delivery, conditional processing, and translation savings. Those benefits were real, and they helped many organizations justify investments in structured content, content management systems, and XML-based authoring environments.
Looking back, though, I think we focused on the most visible benefit rather than the most important one. XML certainly made publishing more efficient (which was needed then, and is definitely needed now), but publishing was never its greatest contribution.
XML’s real value was interoperability. It gave us a way to represent our information so that it could move between systems, retain its meaning, and be repurposed in contexts that had nothing to do with the original publication.
The rise of generative AI has brought that original promise back into focus.
Related reading: The Role of XML in Interoperability
What The Automotive Industry Figured Out First
Long before AI entered the conversation, manufacturers faced a problem that should sound familiar to anyone responsible for creating, managing, and delivering enterprise content.
Information existed everywhere, but it didn't travel well. That was true between organizations, where suppliers and manufacturers constantly exchanged documents, and it was often true within organizations as well. Information created by one department frequently had to be manually interpreted, reformatted, or reentered (typed by hand) before another department could put it to work. The result was delay, duplication, cost, and mistakes.
Purchase orders, invoices, shipping notices, inventory reports, and production schedules arrived through the postal service. Someone had to receive those documents, open the envelopes, sort the contents, route them to the right department, read them, and manually enter the information into another system.
Every step required work.
Handoffs from one party to another introduced delays. When information was rekeyed, accuracy depended on another human hand. Studies of manual transcription show error rates ranging from roughly 1% to more than 10%, depending on the workflow and data type, which means every extra handoff created another opportunity for mistakes. Each time someone rekeyed information, another opportunity for error appeared.
Technology gradually improved the speed of delivery, but it didn’t eliminate the underlying problem.
👉🏾 Paper became fax transmissions
👉🏾 Faxes became scanned images
👉🏾 Scanned images became PDFs
Our information moved faster than before, but much of it remained trapped inside formats that humans could interpret and machines could not.
The automotive industry eventually realized that the bottleneck wasn't the movement of their information but the lack of a shared way to represent it. Without a common structure, every transfer required specialized interpretation, creating cost, delay, and opportunities for error.
Before the advent of XML, Electronic Data Interchange (EDI) addressed that challenge by giving manufacturers, suppliers, and trading partners a standardized way to represent business information. Business documents could move directly between systems because everyone agreed on how the information would be structured and what it meant.
The value of EDI wasn’t that information became electronic. The value came from standardization. Once information could move between systems without being repeatedly interpreted by humans, costs fell, errors declined, and entire supply chains became more efficient.
Years later, when I encountered XML, I recognized a similar idea operating at a broader scale.
Related reading: The Evolution of EDI: From Automotive Innovation to Universal Standard
The Moment XML Clicked For Me
Like many tech writers of a certain age, I was introduced to XML through publishing. The pitch was straightforward: create content once and publish it many ways. Compared with the pain of maintaining separate source files for print, online help, web content, and training materials, that sounded revolutionary.
What surprised me was how quickly my attention shifted away from publishing.
The deeper I dug into XML, the more fascinated I became by the underlying model (I am that nerd 🧐).
XML separated information from presentation. It allowed our content to be validated against rules. It enabled us to transform our content into different forms automatically without rewriting it. And, importantly, it provided a mechanism for structure and meaning to travel with our content.
It was 1999 when I first realized that documentation wasn’t merely a collection of documents. I came to understand that technical documentation was assistive information, and that information had tremendous business value independent of the pages on which it appeared.
That realization completely changed the direction of my career.
I became interested in content models, metadata, taxonomy, terminology management, information architecture, and structured authoring because I began to see them as ways of making information more useful. What captivated me wasn’t the ability to generate another PDF, although that was impressive at the time. It was the possibility of creating information that could move automatically between systems, participate in business processes, and retain its meaning wherever it went.
I was completely bedazzled by the idea. 💡
Related reading: What Is The Unified Content Strategy?
The People Who Influenced My Thinking
Several people shaped my understanding during that period.
Ann Rockley (today recognized as “the mother of content strategy”) was among the first to argue convincingly that content should be treated as a business asset rather than a publishing artifact. JoAnn Hackos helped organizations understand how structured content and information architecture could support broader business objectives.
Charles F. Goldfarb’s XML Handbook (1998 — now in its 5th edition) also had a profound influence on me. Goldfarb consistently described markup as a way to represent information rather than merely format documents. His work reinforced the idea that information becomes more valuable when it can survive changes in software, platforms, and delivery channels.
The work that probably influenced me most came from Robert J. Glushko. His research explored the economic value of standardized information and the capabilities organizations gain when information is represented using common structures and shared semantics.
One observation that stayed with me was Glushko’s argument that standard document representations reduce the costs associated with exchanging, integrating, processing, and reusing information.
When organizations stop reinventing information structures for every application and workflow, they can build capabilities that would otherwise be impractical or prohibitively expensive.
That insight helped me understand that XML wasn’t fundamentally about documents. It was about creating information that could participate in larger systems.
Related reading: The Discipline of Organizing (4th Professional Edition) by Robert J. Glushko
Document Engineering: Analyzing And Designing Documents For Business Informatics & Web Services by Robert J. Glushko and Tim McGrath
The W3C Told Us This From The Beginning
One of the most overlooked XML documents is the W3C’s “XML in 10 Points.” Reading it today is a useful reminder of what XML’s architects were trying to accomplish.
The document emphasizes that XML is extensible, platform-independent, international, easy to process, and capable of supporting a wide variety of applications. It highlights the separation of content from presentation and the ability to exchange structured information across different environments.
What stands out now is how little emphasis is placed on publishing.
The W3C wasn’t primarily describing a better way to create web pages or manuals. It was describing a framework for making information portable and interoperable.
Those ten points read almost like a checklist for what organizations now call AI readiness. AI systems need information that is structured, consistent, machine-processable, portable, and capable of being validated. They need information whose meaning remains intact as it moves between repositories, applications, retrieval systems, and user interfaces.
Those requirements may sound modern, but they are remarkably close to the design goals XML was created to support nearly three decades ago.
Interoperability Was The Larger Goal
A simple idea: Information should be able to move from one person, department, organization, or software system to another without losing its meaning along the way.
XML made this possible by separating information from the applications that created it. Instead of exchanging proprietary files that only specific software could understand, organizations could exchange structured XML documents built on shared rules and vocabularies.
In publishing, this meant content could be created once and reused across print, websites, online help systems, training materials, translation workflows, and business applications. The breakthrough wasn't that XML made publishing easier — it did; but the real opportunity was XML allowed information itself to become portable.
We Focused On Publishing Because It Was Easier See And To Explain
To be fair, our profession accomplished a great deal with XML. We reduced duplication, improved content reuse, streamlined localization workflows, and created sophisticated publishing environments capable of delivering information to multiple channels from a common source. Those achievements generated measurable savings — 💵 💶 — and solved real business problems.
Publishing efficiency was relatively easy to explain. Executives understood reduced production costs, faster publishing cycles, and they were always keen to lower translation expenses.
Interoperability was harder to sell because its value extended beyond the documentation department. It required us to think about information as a shared business asset rather than as the raw material for producing manuals and help systems.
As a result, many of us spent years talking about outputs when we might’ve been more successful talking about infrastructure.
Why AI Changes The “Why XML?” Conversation
While some tech writers were questioning XML's future, arguing that lightweight markup languages and docs-as-code approaches (think markdown) would eventually displace it, XML (and by extension, the Darwin Information Typing Architecture or DITA) quietly continued to power some of the world's largest documentation, publishing, financial, manufacturing, and government information systems.
Meanwhile, generative AI has unexpectedly revived the business case for structured information.
🤞🏽 Hopefully, most organizations are quickly discovering that AI systems are heavily dependent on the quality of the information they retrieve. Humans can often compensate for ambiguity, inconsistent terminology, missing context, and poorly organized content. AI systems struggle much more with those conditions.
When information is duplicated, contradictory, poorly structured, or difficult to interpret, AI systems frequently produce answers that reflect those same weaknesses.
This is why discussions about AI increasingly lead to conversations about metadata, taxonomy, controlled vocabularies, semantic structure, content models, and governance. These topics may have once sounded like specialized concerns for information architects and technical communicators. Today they are becoming strategic concerns for organizations trying to improve the quality of AI-generated answers.
The issue isn’t the intelligence of the model. The issue is whether the information being supplied to the model is structured in ways that support reliable retrieval and interpretation.
XML’s Second Act
For the past two decades, XML’s most visible success story involved publishing. Today I think its larger legacy may involve helping organizations make knowledge more interoperable.
Information now moves through a growing ecosystem of content management systems, support platforms, translation environments, knowledge graphs, search engines, retrieval systems, AI assistants, and autonomous agents. Organizations increasingly need information that can move across those environments without losing its meaning or requiring constant human intervention.
That challenge isn’t fundamentally different from the one the automotive industry faced decades ago. The problem wasn’t moving documents from one place to another. Instead, the problem was moving the meaning locked inside them.
EDI addressed that challenge for supply chains. XML addressed it for content. AI is reminding us why it matters.
Looking back, I think many of us accidentally undersold XML. We emphasized publishing because it was the easiest benefit to demonstrate. What we should have spent more time discussing was interoperability.
Information becomes more valuable when it can move between systems. It becomes even more valuable when every system understands what that information means. That was XML’s promise from the beginning. It may also be the reason XML remains relevant in the age of AI. 🤠









