
Sail Research, a startup offering cheaper batch inference for AI agents, has closed Series A funding. The company routes requests to open-source models at dramatically lower cost than real-time services—GLM-5.1 on Sail costs 6× less per token than Anthropic's Haiku—by using spare server capacity and queuing requests instead of reserving capacity per user. As AI agents move from chat assistants to background workers processing data overnight, this batch-focused approach may become the dominant inference model.
Summaries like this, in your inbox every morning.
Sign up free →What happened
Sail Research, founded by Neil Movva and Samir Menon, announced a Series A investment alongside Kleiner Perkins, Redpoint, and Sequoia. The company routes asynchronous inference requests across open models like DeepSeek, Qwen, Kimi, and GLM, selecting the cheapest capable model for each task. GLM-5.1 on Sail costs 6x less per token than Anthropic's Haiku.
Why it matters
As AI agents shift from chat assistants into background workers running overnight tasks, most inference workloads will likely flow through batch queues rather than real-time systems. Batch inference costs far less because it uses spot capacity and idle server time instead of reserving capacity per request, making it economically viable for long-running tasks like code review, research, and document processing.
What to watch
Sailboxes—cloud computers that hold state across agent tasks, pause during inference waits, and resume in seconds—let customers pay only for active compute time. Sail has already served trillions of tokens to customers in code review, deep research, and cybersecurity.
No discussion yet for this article
Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack