AIToday

Cerebrium introduces Thalamus, a distributed router that directs AI inference requests across multiple global clusters based on latency, capacity, and cost.

Hacker News2d ago2 min read
Cerebrium introduces Thalamus, a distributed router that directs AI inference requests across multiple global clusters based on latency, capacity, and cost.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    Cerebrium built Thalamus to route requests across GPU clusters spread across multiple regions, data centers, and providers when a single cluster cannot handle demand. Thalamus makes the routing decision in a budget of a few milliseconds.

  2. 2

    Instead of querying each cluster during a request (which would exceed the latency budget), Thalamus uses a push model: each cluster runs a service called cluster-aggregator that continuously reports local state (replicas, capacity, GPU types, health, costs) to a central data store. Every Thalamus instance reads from a local, continuously-synced copy of that snapshot, keeping the request path free of remote calls.

  3. 3

    Cerebrium uses Turso, a distributed SQLite-style database with embedded read-only replicas, to sync state snapshots to each Thalamus instance so routing decisions are made from a locally cached view rather than querying remote systems.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →