← 記事一覧に戻る

大規模言語モデル

Company reduced LLM costs by switching from Sonnet 4.0 to Opus 4.6 through a tiered agent architecture that routes 80% of CI failures away from the expensive model.

Hacker News · 2026年4月29日

Company reduced LLM costs by switching from Sonnet 4.0 to Opus 4.6 through a tiered agent architecture that routes 80% of CI failures away from the expensive model.

AI要約

•**What happened**: The company analyzed around 4,000 CI failures; 818 were new problems and 3,187 were known issues surfacing again. They replaced a single Sonnet 4.0 agent with a two-tier system: Haiku agents as triage (handling ~65% of input tokens), escalating only 1 in 5 failures to Opus 4.6 for deeper investigation.
•**How it works**: A cheap Haiku agent with access to semantic search (pgvector) and exact matching determines whether a failure is already tracked. If yes, it stops; if no, it escalates to Opus. Opus then spawns Haiku sub-agents with specific prompts to fetch logs, search code history, or investigate particular aspects—but the sub-agents cannot spawn further sub-agents. The orchestrator (Opus) plans and decides; the cheap agents execute focused tasks and return structured summaries.
•**So what**: A triager match costs around 25× less than a full investigation. Haiku handles ~65% of input tokens but only ~36% of the company's LLM spend. Without the model hierarchy, the daily bill more than doubles. The company now pays less running Opus 4.6 than it did running everything on Sonnet 4.0.
•**Why it became possible now**: Six months ago on Sonnet 4.0, the models struggled with correct ClickHouse queries and Haiku 4.0 was only useful for yes/no classification. Today Opus 4.6 can plan investigations and write precise sub-agent prompts; Haiku 4.5 can handle narrow, directed tasks.

元記事を読む

関連記事

General Motors is bringing Google's Gemini AI assistant to around four million vehicles across the US via over-the-air software updates

大規模言語モデル

General Motors is bringing Google's Gemini AI assistant to around four million vehicles across the US via over-the-air software updates

The Verge AI·2026年4月29日

AIエージェント向けのバージョン管理ファイルシステム「CoreGit」が公開

大規模言語モデル

AIエージェント向けのバージョン管理ファイルシステム「CoreGit」が公開

Hacker News·2026年4月29日

大規模言語モデル

Ask HN: How do you differentiate with AI coding interviews?

Hacker News·2026年4月29日

Mistral、Temporal上に構築された耐久的なAIオーケストレーション「Mistral Workflows」を発表

大規模言語モデル

Mistral、Temporal上に構築された耐久的なAIオーケストレーション「Mistral Workflows」を発表

Hacker News·2026年4月29日

大規模言語モデル

OpenAI's system prompt bans goblins, raccoons, and other creatures due to a GPT-5.4 bug that made the model obsessed with goblins.

Hacker News·2026年4月29日

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める