Mirelo, a Berlin startup working on just that — an AI “soundtrack” engine for video, something described as the “holy grail” in terms of its potential scale and (if successful) impact — has closed a $41 million round of funding to continue building out technology and expanding mobile device maker partnerships. In addition to Index Ventures and Andreessen Horowitz (which led a $3.7 million seed round), Everkeen, Lakestar, All Iron Capital and Kraft Heinz participated in this Series A. Mirelo’s previous backers include Merci Victoria Grace, Michael Ovitz, Modern Music and IVP. The company was founded by Rus Nerwich (a former DJ) and Maximilian Kniewasser (a former Spotify engineer).
The company is developing models that can analyze video frames and pipe out synchronized sound effects and ambience on the fly — addressing a long-standing gap in many text-to-video and video-editing tools, which are shipped without any native form of audio.
Funding from the round will be used to take Mirelo’s video-to-audio platform to market, following the launch of Mirelo SFX v1.5, one that syncs in-game actions — footsteps, impacts, environmental noises — to tightly timed sound. The greater the scope of audio lift, say Johannes Simon-Gabriel and Florian Wenzel, founders, AI researchers and musicians who have known each other since grade school in Kronach, Germany (where Bach’s father was born), the more complete — rather than like a silent demo — creators can make their AI video feel.
The company is positioning its tech as a plug-in layer for the expanding universe of AI video tools and a workspace for the average creator who can’t afford fully staffed post-production but wants something polished.
Why Sound Is the Missing Piece of AI Video
While image fidelity and motion have made rapid leaps, audio has lagged behind. Even so, most creators still pull videos out of AI engines, then search for stock effects on the side — loop ambience by hand, or give up and go without sound. That slows down workflow and risks breaking the sense of realism; the same clip can sound enthralling or dull, depending on its sonic layer, as film editors and sound designers have long insisted.
Big platforms are finally playing catch-up. Google’s Gemini video generator, for example, has recently started using soundtracks suggested by DeepMind’s Veo 3.1 model that can synthesize audio from a video file; and top audio startups have demonstrated what is possible with voice and music. Granular, event-level sound design — the crunch of gravel as a character pivots, a door reverb that echoes the room — remains a specialized challenge.
How Mirelo’s Model Works for Synchronized AI SFX
Mirelo SFX intakes video, normalizes objects and movements, and predicts acoustics aligned with the timeline. Under the hood, that approach marries visual encoders and cross-modal transformers to translate what’s happening on screen into credible, synced-up sound. The result is a multilayered mix with stems that creators can edit — core effects, ambient beds and transitional textures — meaning it drops into existing timelines with little cleanup.
To mitigate those rights questions, which have long plagued generative audio, Mirelo says it’s training on public and purchased sound libraries, as well as inking revenue-sharing agreements with the owners of some music. Georgia Stevenson, a partner at Index Ventures who led the investment, said that issues of data provenance and artist compensation were table stakes for a backing of the company.
With the models available via API, developers can deploy them on Fal.ai and Replicate to reduce integration friction. For creators, the company is constructing Mirelo Studio, a workspace focused on rapid iteration for sound beds and SFX with shorts, trailers and social clips in mind. It offers a freemium plan, with a suggested tier costing €20 per month (roughly $23.50) for higher-quality exports and priority processing.

Funding, Talent, and Go-To-Market for Mirelo
The new investment builds on an earlier pre-seed round led by Berlin-based Atlantic, and brings Mirelo’s total raised to $44 million. Those angels backing the startup are Mistral CEO Arthur Mensch, the chief science officer at Hugging Face Thomas Wolf, and Fal.ai co-founder Burkay Gur — a moniker which suggests deep connections throughout the open-model and tooling universe.
Mirelo’s goal is to extend research, scale product hiring and develop a go-to-market motion that focuses on API usage for platforms and studios. The leadership would not say what the new valuation was, but said it had grown significantly from the company’s stealth-stage round.
Crucially, the team is betting on workflows that adapt to where creators are already working. Anticipate integrations being more closely tied to the likes of popular AI video suites and NLEs, so SFX stems arrive pre-aligned to edits, reducing hopping between tools.
A Crowded but Not Finished Field for AI Audio SFX
Competition is intensifying. Big media companies like Sony and Tencent have demonstrated video-to-SFX models. Kuaishou’s Kling AI and another Andreessen Horowitz-backed company, ElevenLabs, are furthering the exploration of multimodal sound. Suno, Udio and Meta’s AudioCraft Music, meanwhile, are also music generators, each demonstrating how quickly those audio model architectures can evolve once data, user interfaces and — perhaps most importantly — latency improve.
Mirelo is banking that by contrast to music or voice, sound effects are a less crowded research track offering some time to establish a defendable lead. If the company can keep matching micro-events with convincing audio across genres — from animated shorts to sports highlights — its models may then serve as a default layer for AI video editors, work on social platforms and game engines.
What Comes Next for Mirelo’s AI Video Sound Platform
On the roadmap: more comprehensive SFX libraries, richer ambience modeling and eventually music generation that locks to scene beats and emotional arcs. Translate: “Look for more focus on transparency as content creators and labels advocate that training sources be clearly documented.” What could help Mirelo propagate provenance signals through to outputs are the emerging standards being defined by efforts such as the Content Authenticity Initiative or C2PA.
The bigger picture is simple. With AI video moving out of beta experiments and into publishable content, silence is no longer an option. With new capital and the lead in event-level audio, Mirelo hopes to turn synchronized sound from a novelty into the norm.