Study reveals how different fine-tuning methods affect LLM interpretability in code compliance tasks, with full fine-tuning producing more focused attribution patterns than parameter-efficient alternatives.
arXiv cs.CL · April 20, 2026
AI Summary
•Researchers used perturbation-based attribution analysis to compare interpretive behaviors across three fine-tuning strategies: full fine-tuning (FFT), low-rank adaptation (LoRA), and quantized LoRA
•Full fine-tuning produced statistically different and more focused attribution patterns compared to parameter-efficient fine-tuning methods
•Larger model scales develop specific interpretive strategies, such as prioritizing numerical constraints and rule identifiers when analyzing code compliance
•Study addresses a gap in existing LLM research by moving beyond treating models as black boxes to understand how training decisions affect model behavior