AIToday

A coalition of tech companies is creating DocLang, a new document format designed to make files cheaper and easier for AI systems to process instead of PDFs and other existing formats.

Hacker News23h ago3 min read
A coalition of tech companies is creating DocLang, a new document format designed to make files cheaper and easier for AI systems to process instead of PDFs and other existing formats.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    What happened: The LF AI & Data Foundation has formed a working group led by IBM, NVIDIA, Red Hat, ABBYY, HumanSignal, and Forgis to develop DocLang, an open standard format that restructures documents for AI consumption. DocLang uses a limited XML vocabulary aligned with how language models tokenize text, and it is lossless—meaning no information is lost in the conversion.

  2. 2

    Why it matters: Existing formats like PDF, Markdown, HTML, and LaTeX were designed for human reading, not machine parsing, which forces AI models to waste tokens deciphering layout instead of extracting meaning. According to ABBYY benchmarks on IBM's 2025 annual report, converting a PDF to DocLang reduced input tokens from 8,421 to 5,310 and cut latency from 4.2s to 2.7s while improving accuracy. At scale, token cost savings range from 4× to more than 30× lower depending on the model and document complexity.

  3. 3

    What to watch: DocLang also preserves document metadata and governance information that typically gets stripped during conversion, addressing a practical pain point for enterprises managing document provenance. The standard is open and free, and the group is actively inviting more technology providers and enterprises to join.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →