
Summaries like this, in your inbox every morning.
Sign up free →What happened: The LF AI & Data Foundation has formed a working group led by IBM, NVIDIA, Red Hat, ABBYY, HumanSignal, and Forgis to develop DocLang, an open standard format that restructures documents for AI consumption. DocLang uses a limited XML vocabulary aligned with how language models tokenize text, and it is lossless—meaning no information is lost in the conversion.
Why it matters: Existing formats like PDF, Markdown, HTML, and LaTeX were designed for human reading, not machine parsing, which forces AI models to waste tokens deciphering layout instead of extracting meaning. According to ABBYY benchmarks on IBM's 2025 annual report, converting a PDF to DocLang reduced input tokens from 8,421 to 5,310 and cut latency from 4.2s to 2.7s while improving accuracy. At scale, token cost savings range from 4× to more than 30× lower depending on the model and document complexity.
What to watch: DocLang also preserves document metadata and governance information that typically gets stripped during conversion, addressing a practical pain point for enterprises managing document provenance. The standard is open and free, and the group is actively inviting more technology providers and enterprises to join.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion





Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack