
Summaries like this, in your inbox every morning.
Sign up free →A fact-checker at WIRED tested AI models (ChatGPT, Claude, Gemini, and Grok) on a professional fact-checking task. None of the models actually performed the fact-checking work; they all provided plans but stopped short of executing them.
Recent benchmarks show stark accuracy gaps: Claude led RealFactBench with 73 percent accuracy, but OpenAI's SimpleQA found that none of the tested models exceeded 50 percent accuracy. A March 2025 Tow Center study found more than 60 percent of AI-powered search engine responses were inaccurate, while a BBC study puts chatbot error rates closer to 45 percent.
WIRED's fact-checking desk encounters AI Overviews in Google search as their main interaction with AI for verification work, and the author assesses these as wrong about a third of the time. A 2025 Association for the Advancement of Artificial Intelligence report found that 60 percent of surveyed researchers doubted the 'factuality' problem would be solved anytime soon.
No discussion yet for this article
Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started Free5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack