Researchers propose Group Fine-Tuning (GFT) to improve language model training by addressing fundamental limitations in supervised fine-tuning and reinforcement learning approaches.
arXiv cs.AI · April 17, 2026
AI Summary
•Study reveals supervised fine-tuning (SFT) functions as a special case of policy gradient optimization with sparse rewards and unstable probability weighting, causing training instability
•Group Fine-Tuning framework introduces Group Advantage Learning to create diverse response groups and normalized contrastive supervision, reducing reward sparsity issues
•Dynamic Coefficient Rectification mechanism adaptively controls inverse-probability weights to stabilize the optimization process and prevent gradient explosion
•GFT aims to unify knowledge injection with robust generalization, addressing single-path dependency and entropy collapse problems in current post-training methods