Google DeepMind has released a public preview of Gemini 2.5 Computer Use, an agent that interacts with a web browser the way you’d expect a competent assistant to operate it. Based on Gemini 2.5 Pro, it can click, type and scroll; follow an on-screen prompt; and perform multi-step tasks in response to a plain-English command.
The launch moves AI beyond passive chat, to hands-on software control. Developers can now access it through the Gemini API in Google AI and with Vertex AI, with a demonstration accessible from Browserbase as well.
- What the Model of Computer Use Actually Does
- How the Gemini 2.5 Computer Use agent works behind the scenes
- Benchmarks and early results from Google’s evaluations
- Where it fits for developers, teams, and pilot use cases
- Safety, limits, and oversight for responsible deployment
- What to watch next as Gemini Computer Use scales up
What the Model of Computer Use Actually Does
Give it a request along the lines of “Open Wikipedia, search for Atlantis and summarize the history of that cultural myth,” and the agent can retrieve it, take screenshots, parse an interface for analysis and action. In a visible text panel, it explains its logic and actions so that users can follow and interfere if they want.
If the prompt is one that involves sensitive operations (like trying to make a purchase, or editing something), the model can ask you to confirm before it carries out those instructions. In Google’s demos, which were sped up by 3x, the agent updated data in a customer relationship management tool and reorganized content on the Jamboard interface.
How the Gemini 2.5 Computer Use agent works behind the scenes
Gemini 2.5 Computer Use is an iterative loop: look at the page, determine what to do next, do it, and repeat.
It records a sojourn history of recent operations and screen states, thus reducing context-switching efforts as tasks proceed.
This looped control is critical for modern dynamic web apps, where the same action can have multiple outcomes based on previous clicks. As long as you supplement the visual context (screenshots) with interface cues, the model is also able to interact with forms, menus and modal dialogs that trip traditional scripted automations.
Benchmarks and early results from Google’s evaluations
Based on technical notes by Google, the agent outperformed rival tools developed by Anthropic and OpenAI in terms of both accuracy and latency across a broad array of web and mobile control benchmarks. One such framework is Online-Mind2Web, developed to assess the capability of agents to browse and interact within live websites.
While the company didn’t release all its numbers in the announcement, the claim is consistent with its model’s design: tight action loops that track state explicitly and include built-in explainability.
In reality, these features can alleviate common failure scenarios like losing context after navigating between pages.
Where it fits for developers, teams, and pilot use cases
The agent is targeted at browser tasks initially, with some encouraging glimpses of mobile performance. For practitioners, that means pragmatic pilots around:
- Lead enrichment in CRMs
- Routine procurement steps
- Quality assurance in UI flows
- Knowledge discovery from internal dashboards
Computer Use is the next in a wave of other such offerings by lab chiefs. Google has played around with something similar called Project Mariner, an action-taking Chrome extension, and other providers have unveiled browser agents that can browse to sites as requested. What’s new: The difference here is a closer integration with Gemini 2.5 Pro and other enterprise paths through Vertex AI.
Safety, limits, and oversight for responsible deployment
Google is providing developer controls to halt dangerous activity like attempts to bypass CAPTCHAs, exfiltrate sensitive data or access secure systems such as medical devices. Policies can mandate user approval for specific tasks, enforce allowlists and restrict domains.
The system card mentions some known limitations, such as hallucinations and gaps in causal reasoning, and the need to reason about complex logical or counterfactual claims. This echoes what others have been finding in the field; for instance, recent work from Anthropic showed that large models can get ethics questions wrong by misinterpreting context. Those tests simulate “whistleblowing” actions and responses.
This means, for production use, that the agent comes with guardrails—clear scopes of what it can act on, audit logs and human-in-the-loop checkpoints—especially when actions are irreversible (e.g., financial approvals or data deletion).
What to watch next as Gemini Computer Use scales up
If Computer Use pans out in real-world trials, anticipate a wider transition from chatbots to task-capable assistants that manage entire workflows inside the browser. Important indicators to watch are task completion rates, average action latency, and error recovery on highly dynamic sites.
The immediate takeaway is that Gemini 2.5 Computer Use—the big-picture result of which is described here: Human-AI cooperation—was an early but significant step toward reliable, legible agents that don’t just talk but run software. For those open to piloting it with a safety net, the S4 model presents a realistic way to deliver productivity results that are tangible.