The Content Wrangler

The Content Wrangler

Your PDF Docs Look Beautiful To Your Boss, But Your AI Thinks They're A Crime πŸ«† Scene

Discover why allowing PDF tech docs to power AI-answer engines might not be your best idea β€” πŸ’‘

Scott Abel's avatar
Scott Abel
May 06, 2026
βˆ™ Paid

There’s a quaint kind of optimism that appears when some of us start talking about AI and tech docs. It usually starts like this: β€œWe already have thousands of pages of documentation in PDF, so the model can just read those and answer questions from them. Voila!”

I get the impulse. Really, I do. I ❀️ me a good PDF; but only when I need one.

PDFs are familiar. They look finished, feel official, and are what many tech docs teams ship, archive, email, and point to when someone asks where the user assistance lives.

To a human reader, a well-made PDF can seem perfectly clear.

  • The heading introduces the procedure

  • Steps appear in order

  • Warning box is seriously hard to miss

  • The diagram sits exactly where it belongs, quietly earning its keep

But when a large language model encounters that same file, it often isn’t getting the polished experience you imagine. It’s getting a reconstruction of the content through whatever extraction method sits between the PDF and the model, and that reconstruction can be a hot πŸ”₯ mess.

The Content Wrangler is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

User's avatar

Continue reading this post for free, courtesy of Scott Abel.

Or purchase a paid subscription.
Β© 2026 Scott Abel Β· Privacy βˆ™ Terms βˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture