Two former Google veterans are rolling out enterprise-grade infrastructure that turns sprawling archives of video and audio into searchable, decision-ready data. Their startup, InfiniMind, is targeting the vast pool of “dark data” that enterprises collect but rarely analyze, promising to convert petabytes of footage into structured insights at scale.
Why Video Dark Data Is a Big Deal for Enterprises
Enterprises now generate more video than they can possibly review. Think broadcast vaults, retail and logistics cameras, training libraries, and hours of product and user research. Seagate and IDC have estimated that 68% of enterprise data goes unanalyzed, and analysts routinely put unstructured data at well over 80% of what organizations store. Video sits at the heart of that problem: it’s information-dense, expensive to process, and historically hard to interrogate beyond basic object detection.
InfiniMind’s founders, CEO Aza Kai and COO Hiraku Yanagita, both worked on data, ads, and video systems at Google Japan. They argue the industry is at an inflection point: multimodal AI is now capable of understanding longer sequences, capturing context across scenes, and linking visuals with speech and sound—exactly what enterprises need to derive real business value from their footage.
Inside the New Video Intelligence Stack at Scale
InfiniMind is building end-to-end infrastructure that ingests raw video and audio, applies long-form multimodal analysis, and produces a queryable knowledge layer. The system fuses computer vision with speech and sound understanding to identify who’s speaking, what’s being shown, how scenes change over time, and which events matter. Instead of tagging single frames, the platform models narratives, causality, and continuity—enabling questions like “Where did a product appear, who mentioned it, and what sentiment followed?”
The company says its flagship product, DeepFrame, can process hundreds of hours of footage in one pass and then surface scenes, speakers, or events through natural language search. A no-code interface aims to broaden access beyond data scientists, while an index built from both vector and symbolic representations supports fast retrieval and auditable results. Cost is a core design constraint: InfiniMind emphasizes adaptive sampling and pipeline optimizations to keep long-form analysis economically viable at enterprise scale.
Products and Early Traction Across Media and Retail
InfiniMind’s first product, TV Pulse, launched in Japan with media and retail customers. It analyzes live and recorded television to quantify product exposure, brand presence, sentiment shifts, and PR impact. The platform has moved beyond pilots, signing paying customers that include wholesalers and media owners who need fast answers to questions that previously took teams days to compile—if they were answerable at all.
DeepFrame is the company’s global push. It’s designed for vast archives and continuous streams: broadcast libraries, safety and security monitoring, operations footage in warehouses, and marketing content repositories. Enterprises can bring their own data, run queries like “find every instance of forklift near a pedestrian with no barrier” or “surface segments where our logo appears with positive sentiment,” and export structured results to BI tools or data warehouses.
How It Differs from Existing Video AI Solutions
The video AI market is crowded but fragmented. Companies such as TwelveLabs offer general-purpose video understanding APIs serving a broad mix of users. InfiniMind is positioning itself squarely around enterprise workflows: long-duration analysis, multimodal fusion across visuals, speech, and sound, and a no-code layer that plugs into existing analytics stacks. The team underscores support for effectively unlimited video length and cost efficiency as key differentiators, areas where many tools force customers to choose between accuracy, scope, and budget.
Beyond retrieval, InfiniMind emphasizes repeatable, explainable insights. By extracting events, entities, and timelines—not just embeddings—the system can support compliance checks, safety audits, and executive reporting with traceable evidence. That type of structure is crucial for regulated industries and for teams that need more than a highlight reel.
Funding and Expansion Plans for U.S. and Japan Growth
InfiniMind has raised $5.8 million in seed funding led by UTEC, with participation from CX2, Headline Asia, Chiba Dojo, and an AI researcher associated with a16z Scout. The startup is relocating its headquarters to the U.S. while retaining an engineering presence in Japan, which it credits as a rigorous proving ground thanks to strong hardware ecosystems and demanding enterprise customers.
The new capital is earmarked for scaling DeepFrame, expanding the company’s AI and infrastructure teams, and accelerating go-to-market in both Japan and the U.S. The founders see industrial and media applications as immediate growth engines, with longer-term ambitions to push the boundaries of general video intelligence—an area they view as central to building systems that better understand real-world context.
The Stakes for Enterprises as Video Becomes Data
For organizations that have treated video as a compliance liability or storage headache, the calculus is changing. With credible ways to map visuals, speech, and sound into timely, searchable data, video becomes a living asset: forecast demand from televised exposure, flag safety risks before incidents escalate, quantify PR impact, and mine archives for reusable content. As data volumes rocket upward and analytics budgets come under scrutiny, the winners will be the tools that turn hours of footage into answers in minutes—and do it at a cost enterprises can justify.