← 記事一覧に戻る

大規模言語モデル AI規制・政策

Researchers develop a 524-item benchmark to measure how well large language models monitor their own accuracy across six cognitive domains.

arXiv cs.CL · 2026年4月20日

AI要約

•The Metacognitive Monitoring Battery uses human psychology frameworks to evaluate self-awareness in 20 frontier LLMs through 10,480 total evaluations
•Tests span six domains: learning, metacognitive calibration, social cognition, attention, executive function, and prospective regulation, each based on established experimental paradigms
•After each answer, models are asked to KEEP or WITHDRAW their response and place BETs, with the key metric being the 'withdraw delta' measuring difference in withdrawal rates between incorrect and correct answers
•Five of six task groups were pre-registered on the Open Science Framework before data collection to ensure methodological rigor

元記事を読む

関連記事

AWS、NVIDIA、Microsoft、OpenAIなどが主導するカスタムLLM訓練プラットフォーム市場は2026年から2035年にかけて急速に拡大予定

大規模言語モデル

AWS、NVIDIA、Microsoft、OpenAIなどが主導するカスタムLLM訓練プラットフォーム市場は2026年から2035年にかけて急速に拡大予定

Yahoo Finance AI·2026年4月20日

FBI とFAA がコロラド州のクアーズ・フィールドでドローンの違法使用に対する取り締まりを実施し、6機以上のドローンが飛行制限区域違反で摘発される

AI規制・政策

FBI とFAA がコロラド州のクアーズ・フィールドでドローンの違法使用に対する取り締まりを実施し、6機以上のドローンが飛行制限区域違反で摘発される

DRONELIFE·2026年4月20日

AI規制・政策

German leader Merz advocates for relaxed EU AI regulations to boost industrial competitiveness

Hacker News·2026年4月20日

オープンウェイトモデルの厳選ガイドが、本番環境でのLLMデプロイメント実装を支援

大規模言語モデル

オープンウェイトモデルの厳選ガイドが、本番環境でのLLMデプロイメント実装を支援

Hacker News·2026年4月20日

AIエージェントがコードベースを扱えるかを評価するための「コードベース準備グリッド」がGitHubで公開された

大規模言語モデル

AIエージェントがコードベースを扱えるかを評価するための「コードベース準備グリッド」がGitHubで公開された

Hacker News·2026年4月20日

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める