LiveKit, the open-source-born infrastructure provider powering real-time voice and video AI experiences, has raised $100 million at a $1 billion valuation, cementing its position as a core layer in the rapidly emerging voice AI stack. The round was led by Index Ventures, with participation from returning backers including Altimeter Capital Management, Hanabi Capital, and Redpoint Ventures.
Best known as the engine behind OpenAI’s ChatGPT voice mode, LiveKit has become a go-to for companies building low-latency, conversational applications. Its customer roster spans xAI, Salesforce, and Tesla, but also reaches high-stakes environments such as 911 emergency service operators and mental health providers, where reliability and real-time performance are non-negotiable.

Why Real-Time Voice AI Needs New Infrastructure
Human conversation is unforgiving to lag. Researchers have long noted that natural turn-taking typically leaves only a couple hundred milliseconds between speakers, and telecom guidelines such as ITU-T G.114 historically recommend keeping one-way voice latency near or below 150 milliseconds for a high-quality experience. Voice assistants that wait for a full sentence before responding feel robotic; they need streaming audio in, streaming audio out, and an LLM in the loop that can reason while listening. That orchestration is nontrivial at scale.
LiveKit’s architecture leans on WebRTC-style media transport and a selective forwarding unit (SFU) topology to move audio and video with minimal jitter, then coordinates the real-time pipeline across automatic speech recognition, an LLM, and text-to-speech. The company’s technology has been a natural match for OpenAI’s push into real-time voice—OpenAI’s 2024 demos of conversational models showed sub-second, interruptible dialogue that depends on precisely this kind of low-latency, barge-in-friendly infrastructure.
From Open Source Roots To Enterprise Demand
Founded in 2021 by Russ d’Sa and David Zhao, LiveKit started as an open source toolkit to build reliable, real-time audio and video apps—born in the era when much of the world lived on video calls. What began as a developer-centric project quickly found enterprise traction as companies asked for a managed cloud with SLAs, observability, and global scaling. The voice AI boom turned that line of inquiry into a business model.
Today, the company straddles the open source ecosystem and a managed service, giving teams a choice of self-hosting for control and compliance or tapping a fully managed environment for speed. That flexibility matters for regulated sectors. U.S. emergency services handle on the order of 240 million 911 calls annually, according to national associations, and mental health providers face stringent privacy obligations—both require deterministic performance, regional data control, and failover options that typical app frameworks don’t offer out of the box.
Where LiveKit Fits In The Voice AI Stack
Voice AI has three layers: the speech layer (ASR and TTS), the reasoning layer (an LLM), and the real-time routing/coordination layer. LiveKit focuses on the third, ensuring media transport, session control, and event timing are precise enough for natural conversations. In practice, that means interruptibility, fast turn-taking, adaptive bitrate, and resilience to packet loss—plus clean integrations with model providers such as OpenAI for LLM inference.

The company’s customers use that foundation for varied tasks: automotive copilots that understand drivers without cloud round trips, sales and service agents that can hold fluid conversations, and safety-critical triage lines that blend human operators and AI. For enterprises already invested in CRM and analytics platforms, LiveKit’s role is to deliver reliable real-time I/O while the choice of speech and model vendors can evolve underneath.
Competition and the Investor Thesis for Real-Time Voice AI
The space is crowded but fragmented. Communications infrastructure providers such as Twilio, Agora, Vonage, and Daily have developer-friendly media tooling; speech specialists like Deepgram, AssemblyAI, and ElevenLabs focus on accuracy and voice fidelity; and AI platforms including OpenAI are pushing multimodal, real-time models. LiveKit’s bet is that a neutral, programmable real-time layer—tightly tuned for AI interplay—will be indispensable across all of these ecosystems.
Investors are backing that thesis as real-time AI moves from demos to production. Index Ventures’ lead role underscores a broader shift in AI spending from pure model training to inference infrastructure and orchestration. NVIDIA’s push with Riva and other real-time speech offerings further validates the demand for ultra-low-latency pipelines, but enterprises still need a cohesive media and session layer to make it all work together.
What to Watch Next for LiveKit and Real-Time AI
Expect LiveKit to double down on edge acceleration, on-device fallbacks, and richer multimodal features like real-time translation, emotion-aware TTS, and synchronized gestures in video avatars. For buyers, key metrics will be end-to-end latency, quality under network stress, observability, and compliance options such as regional isolation or private cloud deployment.
With fresh capital, a flagship partnership powering ChatGPT’s voice mode, and a growing list of high-stakes users, LiveKit’s unicorn round signals that the battleground for AI is no longer just about bigger models—it’s about making conversations feel human in the milliseconds that matter.
