Back to articles

New self-supervised method enriches medical imaging reports by adding omitted positive findings, boosting vision-language model performance by up to 7.47%

arXiv cs.LG · April 14, 2026

New self-supervised method enriches medical imaging reports by adding omitted positive findings, boosting vision-language model performance by up to 7.47%

AI Summary

  • SemEnrich addresses the bias in medical datasets where clinicians predominantly report abnormalities while omitting positive/neutral findings
  • The method uses semantic clustering of report sentences to automatically enrich training data with relevant observations from different clusters
  • Testing showed significant improvements: 5.63% gain on COMET score, 7.47% on RadGraph-F1, 7.40% on Sentence BLEU, and 5.30% on CheXbert-F1
  • Ablation studies confirmed that semantic clustering drives improvements, not random data augmentation
  • Researchers also developed a way to incorporate semantic cluster information into reward design for GRPO training

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free