Nvidia is acquiring certain assets of Groq and licensing the start-up’s technology in a $20 billion cash deal, according to an exclusive report by CNBC and statements from both companies. The purchase brings Groq’s founder, Jonathan Ross, its president, Sunny Madra, and other staff to Nvidia in what appears to be the largest acquisition in Nvidia’s history that will add a new kind of low-latency technology capability smacking firmly of AI.
Groq will remain an independent company with Simon Edwards as CEO, and the service GroqCloud will continue to be available without interruption.
Nvidia CEO Jensen Huang said in an email quoted by CNBC that the company will work to integrate Groq’s low-latency cores into Nvidia’s AI chip architecture and expand support for real-time inference workloads, according to the report.
What Nvidia Gets With Groq’s Low-Latency Architecture
Groq is most notable for a compiler-first deterministic architecture that effectively minimizes latency when inferring large language models. Where Nvidia’s GPUs rule AI training and high-throughput, low-latency inference, Groq’s architecture comes into its own on real-time workloads — things like voice assistance, live agents, and streaming AI where milliseconds can make a difference. Melding Groq’s low-latency execution with Nvidia’s CUDA, TensorRT, and Triton stack could form a broader platform that spans very large batch jobs all the way to interactive inference.
The strategic rationale is simple: training requires huge compute, but the number of inference queries grows with users. As generative AI goes from demos to production systems, latency budgets and service-level agreements are going to matter as much as raw FLOPs. Groq’s technology is designed for predictability and speed, a good fit for the “AI factory” model that Huang has pushed for real-time digital services.
Deal Structure and Leadership Moves Following the Agreement
Nvidia is both buying assets and licensing Groq’s technology, instead of absorbing the entire company. A number of Groq’s senior leaders will move to Nvidia to enable them to integrate quickly. Groq remains with its brand and operations under new leadership, while GroqCloud still serves customers — important to enterprises already invested in building atop its APIs and tooling.
A deal of that size generally draws regulatory scrutiny. Though an asset purchase can be easier than a full corporate takeover, scrutiny under U.S. and international competition laws could apply, given Nvidia’s outsized position in AI compute. The companies have not disclosed anticipated closing dates.
Nvidia’s Biggest Bet Yet Signals Focus on AI Inference
The hefty price tag eclipses Nvidia’s $6.9 billion purchase of Mellanox, its largest completed deal prior to the proposed transaction with Arm, and demonstrates how core inference has become to the AI business model. Nvidia’s announced deal to buy Arm, for $40 billion, was never completed; this signals Nvidia’s willingness to spend money developing its stack even if it can’t effect a full takeover.
Demand is not the issue. Huang has stated that cloud GPUs are oversubscribed and Blackwell generation demand is “off the charts.” Market analysts like Omdia calculate that Nvidia has more than 80% of the AI training accelerator market, but unlike training, inference is much more broad-based and it depends on cost per query and latency. Adding Groq’s strengths might help Nvidia protect share as workloads diversify.
What It Means for the Intensifying AI Inference Race
Rivals have been homing in on inference as a beachhead. AMD’s MI300-series accelerators jump in performance and efficiency, while Intel’s Gaudi line aims for price-performance. Startups from Cerebras to Tenstorrent pitch specialized hardware for specific workloads. Groq gives Nvidia a differentiated low-latency angle that plays into the rise of agentic AI, real-time translation, and interactive co-pilots that can’t afford jitter or delayed token generation.
It’s software that will be the real test. If Nvidia can integrate Groq’s compiler-first approach into CUDA and Triton, and surface it cleanly through frameworks like TensorRT and NeMo, developers could have an easier way of tuning latency without rewriting models. Look for focus on how GroqCloud fits into Nvidia’s ecosystem, whether that be in terms of unified tooling, model gardens, or managed services.
What to Watch Next as Nvidia Integrates Groq Technology
Some key milestones are closing conditions, direction of assets and IP, and early benchmarks reflecting end-to-end latency improvement numbers on industry-popular models. Enterprise buyers will be listening for pricing signals — whether Nvidia can offer lower total cost per token for real-time workloads — and for guarantees regarding support for their existing GroqCloud investments.
To investors and competitors, the message is clear: Nvidia is widening a lead that was already huge, from training into inference’s toughest corner. If integration comes as advertised, the company would establish not just the bar for peak performance but also the metric that’s most critical to live AI experiences: latency, in milliseconds.