IBM releases Granite 4.1 LLM family trained on ~15T tokens with five-phase pre-training pipeline and long-context extension to 512K tokens

Hugging Face Blog · April 29, 2026

AI Summary

•Granite 4.1 comprises three dense transformer models (3B, 8B, and 30B parameters) trained from scratch on approximately 15 trillion tokens using a five-stage pre-training strategy that progressively shifts from broad web data to curated, domain-specific content, with context window extended to 512K tokens in the final phase.
•The 8B instruct model matches or surpasses the previous Granite 4.0-H-Small (32B-A9B MoE, a mixture-of-experts architecture) despite using fewer parameters and a simpler dense architecture, refined through supervised fine-tuning on ~4.1M curated samples and reinforcement learning via on-policy GRPO with DAPO loss.
•All Granite 4.1 models are released under the Apache 2.0 license and available via Hugging Face Collection and GitHub Repository.

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.