What hardware does StreamTTS run on?

It runs on an NVIDIA Jetson Orin Nano Super, which has 1024 CUDA cores and 32 tensor cores and is rated at 67 TOPS (trillion operations per second).

How does the service avoid building complex backend infrastructure?

Instead of a request-response API with separate queues, databases, and object storage, StreamTTS uses S2-Lite durable streams—a single abstraction where the worker appends audio chunks to a named stream and clients read from it at any offset, enabling both live delivery and replay from the same code path.

What happens if I close my browser while audio is still generating?

The audio generation continues on the Jetson and is persisted in the durable stream. When you return to the link later, you can replay the entire audio from the beginning or jump to where you left off.

Back to articles

Developer Builds Self-Hosted Text-to-Speech on Jetson Using Durable Streams

Hacker News8h ago4 min read

Key takeaway

A developer demonstrated how to run a text-to-speech AI service on a consumer-grade NVIDIA Jetson by using durable streams—a persistence pattern that lets multiple clients read the same generated audio incrementally, whether they connect during generation, arrive late, or return days later. Rather than building traditional backend infrastructure (queues, databases, object stores), the design treats each inference job as a named, persistent sequence of audio records that the browser client reads from the start and follows to the tail, unifying live and replay in a single code path.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
A developer created StreamTTS, a text-to-speech application running on an NVIDIA Jetson Orin Nano Super (rated at 67 TOPS), powered by the Kokoro-82M neural model. The service uses S2-Lite, an open-source durable streams implementation, to handle inference jobs and deliver audio output as incremental, replayable streams rather than single request-response transactions.
Why it matters
The architecture decouples inference timing from client connections, allowing users to submit text, receive a shareable link immediately, and listen to audio as it is generated—even if they disconnect and return later. This approach avoids building separate queues, databases, object storage, and retry logic by unifying live delivery and replay under a single durable stream abstraction, which may be useful for developers building similar incremental AI workloads on resource-constrained hardware.
What to watch
The service is live at streamtts.dev and self-hosted on the developer's Jetson. The architecture relies on durable streams—an ordered sequence of persisted records that clients can read from the beginning, seek to a known position, or follow live—making it relevant for anyone exploring how to serve local AI inference reliably without cloud dependencies.

FAQ

What hardware does StreamTTS run on?: It runs on an NVIDIA Jetson Orin Nano Super, which has 1024 CUDA cores and 32 tensor cores and is rated at 67 TOPS (trillion operations per second).
How does the service avoid building complex backend infrastructure?: Instead of a request-response API with separate queues, databases, and object storage, StreamTTS uses S2-Lite durable streams—a single abstraction where the worker appends audio chunks to a named stream and clients read from it at any offset, enabling both live delivery and replay from the same code path.
What happens if I close my browser while audio is still generating?: The audio generation continues on the Jetson and is persisted in the durable stream. When you return to the link later, you can replay the entire audio from the beginning or jump to where you left off.

Discussion

No discussion yet for this article

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →