Lemon Slice has raised $10.5 million in seed funding to further develop real-time, AI-generated video avatars, doubling down on a belief that chatbots are about to get faces and expressions.
The round is primarily being led by Matrix Partners and Y Combinator, though Dropbox co-founder Arash Ferdowsi, Twitch co-founder Emmett Shear and the artist-investors behind The Chainsmokers are participating.

The startup’s pitch is simple: most AI agents today are text boxes; the next growth curve comes from agents that look and act like people or characters in live video. Its new model, Lemon Slice-2, transforms an image into a customizable avatar that can answer questions, teach lessons and guide shoppers or play brand characters during real-time streams.
Inside Lemon Slice-2, the real-time video avatar model
Lemon Slice-2 is a “general purpose video diffusion transformer” with 20 billion parameters, designed to run efficiently enough that it can be used to livestream avatars at about 20 frames per second using just a single GPU, the company said. That trade-off between scale and speed speaks to massive optimization around inference, table stakes if avatars are to respond without awkward pauses.
It’s available via an API for developers to plug into their applications, or a drop-in widget that teams can use. Once onboarded with a single image, users can switch backgrounds, styling and dressing or transition from human-like to non-human characters without the need for further training. Voice synthesis is backed by ElevenLabs, though the company says its avatars are able to access external knowledge bases in a way that ensures whatever’s on screen isn’t just lip-synced but kept contextually relevant with up-to-the-minute insight.
The company’s founders — Lina Colucci, Sidney Primas and Andrew Weitz — started the business in 2024 with a belief that avatars stumble when they step over the uncanny line after just seconds of interaction. Their wager is that a fully jointed, end-to-end trained model will produce more natural micro-expressions, eye gaze and temporal continuity than pipelines devised to stitch together disparate pieces.
Why investors are interested in live AI video avatars
Backers are citing two forces that are coming together: the “bitter lesson” that data and compute win in AI, as well as consumer behavior that already skews video-first. Ilya Sukhar of Matrix has made the point that generalized AI technologies have outpaced bespoke systems in each modality, and he predicts this dynamic will hold true as video avatars scale.
Y Combinator’s Jared Friedman has also advocated for diffusion-style video models to bridge the gap between photorealistic output and interactivity — the central friction that has stifled avatar adoption in support, training and creator tools. If avatars look cool and are totally responsive, or so the argument goes, then they might finally clear a practical version of an “avatar Turing test” for everyday use.

A crowded arena with clear stakes in AI video avatars
Competition is intense. Blossoming video generation platforms, like D-ID and HeyGen, have made talking-head videos the next go-to for marketing and training; avatar-centric players like Genies, Soul Machines, Praktika and AvatarOS are chasing branded characters and coaching assistants. The key difference of Lemon Slice, meanwhile, is live, interactive video (instead of a batch-rendered clip), and a general-purpose model that’s designed to work with both human and non-human faces from the same image.
The timing may be favorable. Sandvine’s Global Internet Phenomena Report has since its inception estimated that video makes up more than 65% of downstream internet traffic, while education research demonstrates a marked preference by users for video explainers over heavy text. Marrying that desire with agentic AI could inject customer support, tutoring and product discovery into more conversational, on-screen flows.
The company has already been working on early use cases, such as language learning, e-commerce guidance, corporate training and interactive education. For brands, it is a simple matter to stand up an informed on-screen guide and, for educators, personalized feedback can be made through a helpful animated tutor. (One company was concerned about having a non-human company mascot, so the platform is also open to the idea of forming a “mismatched human” personal representation for you.)
Guardrails, Identity at Risk and Compliance
As realistic synthesis rates get better, the danger of misuse increases. Lemon Slice claims it has constructed checks to prevent the unauthorized cloning of faces and voices, and that it employs large language models to moderate content in real time. Those controls will be examined as more regulators toughen expectations around biometric consent, deepfake disclosures and provenance — all issues flagged by groups like the Partnership on AI and under debate in numerous jurisdictions.
Outside of policy, the technical bar is climbing. As enterprises can provide low-latency commodity GPUs, they will still require strong lip-sync, eye contact and maintained identity across frames. If avatars are to be used in regulated categories like health coaching or financial services triage, benchmarks related to temporal coherence and alignment with scripted or generated text will count as much as pure photorealism.
What the funding fuels for Lemon Slice’s next phase
The startup, now an eight-person team, intends to use its seed round to make engineering hires and go-to-market hires as well as growth and compute budgets for training and scaling its diffusion models. The API-first posture telegraphs an ecosystem play: get developers to stuff an avatar into their app with a single line of code, then beef up features — language coverage, emotion control, latency reduction — based on real-world feedback.
Should Lemon Slice help keep avatars responsive on modest hardware and even at 20 frames per second, while also making sure expressions can look human and the safety controls are robust, it might stake a claim in the mushrooming market for interactive agents. The next version of artificial intelligence could be a more intelligent AI that not only answers questions, but will also make eye contact and react based on your words.
