Researchers use AI coding agent to discover test-time scaling algorithms that outperform human-designed methods

THE DECODERMay 24, 2026

Summaries like this, in your inbox every morning.

3 Key Points

A team from UMD, UVA, WUSTL, UNC, Google, and Meta created AutoTTS, which uses Claude Code to search for better control algorithms in a simulated environment instead of having humans write them by hand. The discovery run cost about $40 and took 160 minutes.
The agent-discovered algorithm tracks how a model's confidence shifts over multiple rounds and adjusts the number of solution paths (width) and their depth dynamically—logic the authors call something humans probably wouldn't have designed by hand. On math benchmarks like AIME and HMMT, it achieves better accuracy per unit of compute than established methods, slashing token usage by about 70 percent compared to standard self-consistency while holding accuracy steady.
The algorithm transfers to different models (tested on DeepSeek-R1-Distill-Llama-8B) and different benchmark types (GPQA-Diamond), suggesting the approach generalizes beyond the initial discovery environment.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack