Study reveals leading AI search agents rely on memorized knowledge rather than genuine web research, with performance collapsing on questions requiring current information

THE DECODERMay 31, 2026

Summaries like this, in your inbox every morning.

3 Key Points

Researchers from Harbin Institute of Technology and Xiaohongshu tested eleven AI models including GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6, DeepSeek-V4-Pro, and Kimi-K2.6, finding that models solved surprisingly high percentages of BrowseComp benchmark tasks without any internet access—MiniMax M2.5 reached 44.5 percent from memory alone, and Kimi K2.6 hit 62 percent on the Chinese BrowseComp-ZH variant.
When search results contained no supporting documents, every model performed worse than without any tools at all; MiniMax M2.5 dropped from 44.5 to 8.0 percent, and Kimi-K2.6 fell from 25.5 to 2.3 percent, showing that search actively pulls agents away from correct answers when confirming information is unavailable.
On LiveBrowseComp—a new benchmark containing 335 questions dependent on facts from the 90 days before creation—all models in closed-book testing fell below two percent accuracy, and with tools enabled scores landed about 25 to 40 points below the same models' BrowseComp results, with rankings shifting significantly: DeepSeek v3.2 rose from bottom on BrowseComp to top on LiveBrowseComp.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack