
Summaries like this, in your inbox every morning.
Sign up free →Cerebrium built Thalamus to route requests across GPU clusters spread across multiple regions, data centers, and providers when a single cluster cannot handle demand. Thalamus makes the routing decision in a budget of a few milliseconds.
Instead of querying each cluster during a request (which would exceed the latency budget), Thalamus uses a push model: each cluster runs a service called cluster-aggregator that continuously reports local state (replicas, capacity, GPU types, health, costs) to a central data store. Every Thalamus instance reads from a local, continuously-synced copy of that snapshot, keeping the request path free of remote calls.
Cerebrium uses Turso, a distributed SQLite-style database with embedded read-only replicas, to sync state snapshots to each Thalamus instance so routing decisions are made from a locally cached view rather than querying remote systems.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion





Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack