Researchers develop Twin-Pass CoT-Ensembling to fix unreliable confidence scores in telecom LLMs like Gemma-3
arXiv cs.LG · April 16, 2026
AI Summary
•LLMs used for telecommunications tasks (3GPP analysis, O-RAN troubleshooting) suffer from biased and overconfident self-assessment, making them unsafe for real-world deployment
•Study evaluated Gemma-3 models (4B, 12B, and 27B parameters) on three telecom benchmarks: TeleQnA, ORANBench, and srsRANBench
•Standard single-pass verbalized confidence estimates frequently assign high confidence to incorrect predictions, failing to reflect actual correctness
•Proposed Twin-Pass Chain of Thought (CoT)-Ensembling methodology uses multiple independent passes to improve reliability of confidence estimations in telecom-domain LLMs