記事一覧に戻る

Company reduced LLM costs by switching from Sonnet 4.0 to Opus 4.6 through a tiered agent architecture that routes 80% of CI failures away from the expensive model.

Hacker News · 2026年4月29日

Company reduced LLM costs by switching from Sonnet 4.0 to Opus 4.6 through a tiered agent architecture that routes 80% of CI failures away from the expensive model.

AI要約

  • **What happened**: The company analyzed around 4,000 CI failures; 818 were new problems and 3,187 were known issues surfacing again. They replaced a single Sonnet 4.0 agent with a two-tier system: Haiku agents as triage (handling ~65% of input tokens), escalating only 1 in 5 failures to Opus 4.6 for deeper investigation.
  • **How it works**: A cheap Haiku agent with access to semantic search (pgvector) and exact matching determines whether a failure is already tracked. If yes, it stops; if no, it escalates to Opus. Opus then spawns Haiku sub-agents with specific prompts to fetch logs, search code history, or investigate particular aspects—but the sub-agents cannot spawn further sub-agents. The orchestrator (Opus) plans and decides; the cheap agents execute focused tasks and return structured summaries.
  • **So what**: A triager match costs around 25× less than a full investigation. Haiku handles ~65% of input tokens but only ~36% of the company's LLM spend. Without the model hierarchy, the daily bill more than doubles. The company now pays less running Opus 4.6 than it did running everything on Sonnet 4.0.
  • **Why it became possible now**: Six months ago on Sonnet 4.0, the models struggled with correct ClickHouse queries and Haiku 4.0 was only useful for yes/no classification. Today Opus 4.6 can plan investigations and write precise sub-agent prompts; Haiku 4.5 can handle narrow, directed tasks.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める