A developer released state-harness, a tool that detects when AI agents spiral out of control by monitoring token consumption, without requiring extra AI calls to diagnose what went wrong.

Hacker News3h ago3 min read

Summaries like this, in your inbox every morning.

3 Key Points

1
What happened: State-harness is a Python library that monitors multi-turn AI agent loops by tracking token usage against a baseline. When token growth exceeds a threshold (e.g., 2× baseline) for consecutive steps, it trips and classifies the failure pattern—context spiral, retry storm, policy drift, early explosion, or budget exhaustion—with actionable fix suggestions, all inferred from the energy trajectory alone without calling an LLM.
2
Why it matters: Production multi-agent systems fail at rates of 41–87% (Kore.ai 2026), and when they spiral, a budget cap kills the task but reveals nothing about why. State-harness provides failure diagnostics and early termination; in benchmark tests across 3,175 runs, it reduced search nodes by 38.6% and wall time by 30% on SWE-bench by stopping dead-end branches early, achieving these gains with zero false positives across seven models.
3
What to watch: The tool is available now via `pip install state-harness` with pre-built wheels for Linux, macOS, and Windows (x86_64 + ARM64), requiring Python ≥ 3.10 and no Rust toolchain. It integrates with LangGraph, CrewAI, and vanilla Python, and is most useful for search-tree agents (MCTS, beam search) and platform teams; it is not needed for chatbots, RAG, or ReAct loops with fewer than 10 turns.

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack