記事一覧に戻る

Study audits four LLMs for reliability in psychiatric hospitalization risk assessment, finding that clinically insignificant variables increase predicted risk scores and output variability across all models.

arXiv cs.LG · 2026年4月27日

AI要約

  • Researchers evaluated Gemini 2.5 Flash, LLaMa 3.3 70b, Claude Sonnet 4.6, and GPT-4o mini using synthetic patient profiles (n = 50) with 15 clinically relevant features and up to 50 clinically insignificant features, tested across four prompt reframings (neutral, logical, human impact, clinical judgment).
  • Including medically insignificant variables resulted in a statistically significant increase in absolute mean predicted hospitalization risk and output variability across all models and prompts, indicating reduced predictive stability as contextual noise increased. Prompt variations independently affected the trajectory of instability in a model-dependent manner.
  • The findings demonstrate that LLM-based psychiatric risk assessments are sensitive to non-clinical information, highlighting the need for systematic evaluations of attributional stability and uncertainty behavior before clinical deployment.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める