Cerebrium introduces Thalamus, a distributed router that directs AI inference requests across multiple global clusters based on latency, capacity, and cost.

Hacker NewsJun 8, 2026Send on LINE

Summaries like this, in your inbox every morning.

3 Key Points

Cerebrium built Thalamus to route requests across GPU clusters spread across multiple regions, data centers, and providers when a single cluster cannot handle demand. Thalamus makes the routing decision in a budget of a few milliseconds.
Instead of querying each cluster during a request (which would exceed the latency budget), Thalamus uses a push model: each cluster runs a service called cluster-aggregator that continuously reports local state (replicas, capacity, GPU types, health, costs) to a central data store. Every Thalamus instance reads from a local, continuously-synced copy of that snapshot, keeping the request path free of remote calls.
Cerebrium uses Turso, a distributed SQLite-style database with embedded read-only replicas, to sync state snapshots to each Thalamus instance so routing decisions are made from a locally cached view rather than querying remote systems.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No discussion yet for this article

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime