IBM releases Granite 4.1 LLM family trained on ~15T tokens with five-phase pre-training pipeline and long-context extension to 512K tokens
Hugging Face Blog · April 29, 2026
AI Summary
•Granite 4.1 comprises three dense transformer models (3B, 8B, and 30B parameters) trained from scratch on approximately 15 trillion tokens using a five-stage pre-training strategy that progressively shifts from broad web data to curated, domain-specific content, with context window extended to 512K tokens in the final phase.
•The 8B instruct model matches or surpasses the previous Granite 4.0-H-Small (32B-A9B MoE, a mixture-of-experts architecture) despite using fewer parameters and a simpler dense architecture, refined through supervised fine-tuning on ~4.1M curated samples and reinforcement learning via on-policy GRPO with DAPO loss.
•All Granite 4.1 models are released under the Apache 2.0 license and available via Hugging Face Collection and GitHub Repository.