AIToday

Study reveals leading AI search agents rely on memorized knowledge rather than genuine web research, with performance collapsing on questions requiring current information

THE DECODER2d ago2 min read
Study reveals leading AI search agents rely on memorized knowledge rather than genuine web research, with performance collapsing on questions requiring current information

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    Researchers from Harbin Institute of Technology and Xiaohongshu tested eleven AI models including GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6, DeepSeek-V4-Pro, and Kimi-K2.6, finding that models solved surprisingly high percentages of BrowseComp benchmark tasks without any internet access—MiniMax M2.5 reached 44.5 percent from memory alone, and Kimi K2.6 hit 62 percent on the Chinese BrowseComp-ZH variant.

  2. 2

    When search results contained no supporting documents, every model performed worse than without any tools at all; MiniMax M2.5 dropped from 44.5 to 8.0 percent, and Kimi-K2.6 fell from 25.5 to 2.3 percent, showing that search actively pulls agents away from correct answers when confirming information is unavailable.

  3. 3

    On LiveBrowseComp—a new benchmark containing 335 questions dependent on facts from the 90 days before creation—all models in closed-book testing fell below two percent accuracy, and with tools enabled scores landed about 25 to 40 points below the same models' BrowseComp results, with rankings shifting significantly: DeepSeek v3.2 rose from bottom on BrowseComp to top on LiveBrowseComp.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →