Company reduced LLM costs by switching from Sonnet 4.0 to Opus 4.6 through a tiered agent architecture that routes 80% of CI failures away from the expensive model.
Hacker News · 2026年4月29日
AI要約
•**What happened**: The company analyzed around 4,000 CI failures; 818 were new problems and 3,187 were known issues surfacing again. They replaced a single Sonnet 4.0 agent with a two-tier system: Haiku agents as triage (handling ~65% of input tokens), escalating only 1 in 5 failures to Opus 4.6 for deeper investigation.
•**How it works**: A cheap Haiku agent with access to semantic search (pgvector) and exact matching determines whether a failure is already tracked. If yes, it stops; if no, it escalates to Opus. Opus then spawns Haiku sub-agents with specific prompts to fetch logs, search code history, or investigate particular aspects—but the sub-agents cannot spawn further sub-agents. The orchestrator (Opus) plans and decides; the cheap agents execute focused tasks and return structured summaries.
•**So what**: A triager match costs around 25× less than a full investigation. Haiku handles ~65% of input tokens but only ~36% of the company's LLM spend. Without the model hierarchy, the daily bill more than doubles. The company now pays less running Opus 4.6 than it did running everything on Sonnet 4.0.
•**Why it became possible now**: Six months ago on Sonnet 4.0, the models struggled with correct ClickHouse queries and Haiku 4.0 was only useful for yes/no classification. Today Opus 4.6 can plan investigations and write precise sub-agent prompts; Haiku 4.5 can handle narrow, directed tasks.