When will Nvidia's Groq 3 LPX become available?

LPX is shipping to early customers in the second half of 2026, on the same timeline as AWS and Cerebras's competing decode solution through Amazon Bedrock.

How much faster is DSpark?

DeepSeek reports per-user generation speed improving 60% to 85% on V4-Flash and 57% to 78% on V4-Pro, and throughput at a fixed service level improved 51%. These figures come from DeepSeek's own benchmarking and have not been independently verified.

Is DSpark available to use?

Yes, DSpark is live in production, open-sourced under the MIT license, and the companion DeepSpec training framework already extends to Qwen and Gemma model families.

Back to articles

DeepSeek's DSpark software undermines Nvidia's new hardware decode bet

Yahoo Finance AI14h ago5 min read

Key takeaway

DeepSeek released DSpark, an open-source software module that significantly speeds up AI text generation by 60–85%, just as Nvidia prepares to ship specialized decode hardware (Groq 3 LPX) designed to handle that same bottleneck. The timing threatens Nvidia's plan to sell LPX as a required add-on to its Vera Rubin GPU systems, because customers may achieve sufficient performance gains using free software running on GPUs alone—or by choosing hyperscaler-native alternatives like the AWS-Cerebras collaboration launching simultaneously.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
DeepSeek released DSpark on June 27, 2026, a speculative decoding module that improves per-user generation speed by 60% to 85% on V4-Flash and 57% to 78% on V4-Pro. The module is open-sourced under the MIT license and already extends to Qwen and Gemma model families. Meanwhile, Nvidia is shipping its Groq 3 LPX specialized decode rack in the second half of 2026, designed to pair with Vera Rubin GPU systems and deliver up to 35 times higher inference throughput per megawatt for trillion-parameter models.
Why it matters
Nvidia's strategy assumes customers will make a separate purchase decision for LPX decode hardware on top of their existing GPU investment. DSpark and DeepSeek's MLA architecture—which reduces memory requirements to roughly 10% of what V3.2 needed for million-token conversations—are free, open-source, and already in production. If software solutions running on standard Rubin GPUs alone deliver sufficient efficiency, the business case for LPX becomes harder to justify. AWS and Cerebras have also announced a competing disaggregated prefill-decode architecture launching in the same timeframe, widening the competitive field.
What to watch
Nvidia's full-year fiscal 2027 Data Center revenue consensus is near $343 billion（約55兆円）. The critical question is whether LPX attach rates meet expectations as customers choose between a second rack purchase, general GPU-based software optimization, or hyperscaler-native decode alternatives. Geopolitical protections secure some hardware supply but not the open-source ideas behind DSpark and similar software approaches.

FAQ

When will Nvidia's Groq 3 LPX become available?: LPX is shipping to early customers in the second half of 2026, on the same timeline as AWS and Cerebras's competing decode solution through Amazon Bedrock.
How much faster is DSpark?: DeepSeek reports per-user generation speed improving 60% to 85% on V4-Flash and 57% to 78% on V4-Pro, and throughput at a fixed service level improved 51%. These figures come from DeepSeek's own benchmarking and have not been independently verified.
Is DSpark available to use?: Yes, DSpark is live in production, open-sourced under the MIT license, and the companion DeepSpec training framework already extends to Qwen and Gemma model families.

Discussion

No discussion yet for this article

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →