General Intuition has raised $134 million in a seed round to train AI agents on spatial-temporal reasoning from an unusually rich source of data: video game clips. The new research lab is an offshoot of Medal, a platform that lets gamers share their gameplay so they can collaborate and learn from one another; it’s building something called foundation models that determine how objects and actors move through space and time based solely on what a player would have seen on screen.

The round, which is led by Khosla Ventures and General Catalyst with participation from Raine, is unusual for a seed-stage deal and highlights investor belief that spatial reasoning sits within the missing stack of today’s language-focused AI.

Table of Contents

A Data Moat Constructed of Player Perspectives
Teaching Agents To Anticipate Actions And Act In Worlds
Commercial Directions In Games And Robotics
Scale Of Funding And Competitive Landscape
Why Spatial Reasoning Matters For AI Progress Now

General Intuition secures $134M funding for spatial AI agents

The startup’s first markets are advanced game bots and search-and-rescue.

A Data Moat Constructed of Player Perspectives

General Intuition has to begin with that deluge of first-person play from Medal. The company says it has a data set of approximately 2 billion videos yearly from 10 million monthly active users across tens of thousands of titles. In contrast with mixed-quality public video, this corpus is uniformly task-oriented, captured from the player perspective and sometimes synchronized with controller input and in-game HUD signals.

That pairing matters. Agents that learn from the same pixels that humans see, and learn to act on them, generally transfer better across environments than those trained from abstract state logs alone. It further alleviates reliance on expensive manual annotations, which is a long-standing bottleneck in video learning.

The scale and consistency perspective help with handling occlusions, high-speed camera motion, and long-term temporal dependency issues for video-based models. Previous work, such as Meta AI’s Ego4D, had already demonstrated the utility of egocentric video for learning about hands, tools, and goals; Medal’s stream is orders of magnitude larger and covers vastly more diverse ground, with scenarios from physics-enabled puzzles to open-world navigation, racing games, and competitive shooters.

Teaching Agents To Anticipate Actions And Act In Worlds

General Intuition teaches agents to predict the next event and select and execute an action that satisfies multi-step objectives without privileged information. The models consume raw frames and learn control policies to accomplish human-like play, pushing their way through worlds by giving similar joystick and button commands. Preliminary results, the team says, demonstrated zero-shot competence on unseen games and levels, with agents predicting trajectories, occlusions, and object interactions.

Internally, the company creates world models to simulate in and plan with. But unlike peers productizing those simulations, General Intuition says the world model is a means, not an end. Legal risk around game assets does indeed inform that stance to some degree (and a focus on deployable agents rather than content tools).

General Intuition raises $134M for spatial AI agents

Commercial Directions In Games And Robotics

In games, the short-term product is bots and nonplayer characters smarter than those that just play out scripted behavior regardless of what players do. Replacing static NPCs with adaptive agents could in turn help with proper tuning of difficulty, additional long-tail content, and play testing for level designers. Studios have been using rule-based AI that breaks as soon as you get to the edge case for a long time; a policy trained across millions of emergent cases is much less brittle.

Beyond entertainment, the company is focused on robots and drones operating in unfamiliar environments. Search-and-rescue missions often take place without GPS, with no order to the map and certainly not enough light. An agent that has been trained to parse complex and dynamic scenes, and to deduce potential actions from a live video stream, is also able to aid in semi-autonomous navigation, triage, and teleoperation. Using controller-like action spaces also reduces the sim-to-real gap, as many physical systems already support input from gamepads.

This approach resonates with the larger trends in embodied AI. Research teams at top labs have found training and fine-tuning on a variety of physics-rich video enhances performance on downstream manipulation and locomotion tasks, as actions arising from hybrid human-in-the-loop control improve the robustness of safety-critical missions. General Intuition’s gamble is that it can make the same policy function generalize to virtual and real worlds with little task-specific tuning.

Scale Of Funding And Competitive Landscape

The size of the seed puts General Intuition among some of the best-funded early-stage AI companies — a relative oddity to date for this phase, according to PitchBook trends. The data moat also previously attracted attention. The Information has reported that OpenAI kicked the tires on acquiring Medal for about $500 million, a reflection of just how strategically valuable it’s considered for one company to amass large proprietary video corpora.

Competitively, the company is among those working on developing general world models. DeepMind Genie and World Labs Marble hope to create interactive environments and content that developers can use for training or creation. General Intuition is taking a different track by developing agent skills for direct commercialization, avoiding the dispersal of assets that might lead to copyright issues with game publishers.

Why Spatial Reasoning Matters For AI Progress Now

Language models are good at pattern completion in text, but not so much at physical causality, occlusion, and long-horizon planning. Spatial-temporal reasoning helps bridge that divide, as it means decisions are made based on how the world actually changes over frames. The aim, as de Witte and Baier-Lentz stress, is an intuition layer that works in the service of language rather than replaces it—agents who can see, predict, and act with the fluency of expert players.

The next milestones are obvious: generate new training data and worlds to stress-test policies, and exhibit robust autonomy in physical scenarios not previously observed. And if General Intuition can transform billions of casual gaming clips into a form of reliable embodied intelligence, it could change how AI learns to get around both virtual and actual worlds.