FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

Google Gemini Agent Surfs The Web Just Like You

Bill Thompson
Last updated: October 10, 2025 3:07 am
By Bill Thompson
Technology
7 Min Read
SHARE

Google DeepMind’s Computer Use, an evolution of the Gemini 2.5 Pro model which browses the web more like a human user, has been released. Instead of using secret APIs, it looks at pages, clicks buttons, types into fields and scrolls through content — then tells you what it’s doing while it does so.

The goal is a simple one, yet it has lofty aspirations: enable AI to perform real tasks inside real websites with minimal hand-holding and ensure that humans stay informed and in control. Developers can now access it on the Gemini API and Vertex AI, with a public demo available through Browserbase.

Table of Contents
  • How Gemini Computer Use Works Across Real Websites
  • What Gemini Computer Use Can Accomplish Today
  • Performance and benchmarks on real-world browsing tasks
  • Safety railings and well-known limits for AI agents
  • How It Stacks Up Against Other AI Agents and Tools
  • Availability and what to try first with Gemini agents
Google Gemini AI agent browsing web pages in a desktop browser interface

How Gemini Computer Use Works Across Real Websites

Give a natural-language instruction to the model — “Open Wikipedia, find Atlantis and summarize the history of its myth” — and it’s smart enough to fetch the page, take screenshots and analyze an interface. It reads what you see on your screen, and can figure out which elements to engage with, from search boxes to dropdowns and pagination controls.

Behind the scenes, it’s just a loop in which we iterate. The model re-computes the page state after each action (click, type, scroll) to determine what comes next. This short-term memory of past actions is critical for UI work where text changes, modals pop up and things move. The loop is repeated until the target state is satisfied or we require an answer from a human.

The company’s standard for integrations: This is closer to how people flow through sites than traditional integrations. Rather than encoding site-specific scripts, it relies on general visual and structural clues in the appearance of pages. It’s like earlier Google experiments such as Project Mariner and fits in with the larger trend in AI toward embodied agents that can do stuff, as opposed to static chat.

What Gemini Computer Use Can Accomplish Today

In Google’s demos, the agent is shown updating a record on a customer relationship management dashboard and reordering content in Jamboard’s interface. Those aren’t contrived examples; those are real-world cases of dealing with nested menus, confirmations, and change validation.

Some common scenarios are:

  • Collecting data from multiple tabs
  • Filling out multi-step forms
  • Controlling e-commerce carts
  • Scheduling a doctor’s appointment
  • Cleaning up shared documents

If an instruction is sensitive — for example, “purchase this item” — the model can stop and request explicit approval first.

Performance and benchmarks on real-world browsing tasks

The model outperforms competing models from the likes of Anthropic and OpenAI across a combination of web and mobile control benchmarks, including Online-Mind2Web — a framework for evaluating agents on a variety of real-world browsing tasks.

The article touts improvements in task accuracy and latency.

Google Gemini Agent surfs the web, executing tasks across browser tabs

Success rates on benchmarks are important, but so is responsiveness. Google’s own public videos are sped up, so it’s unclear if real time will play out differently for your pages depending on how complex they are, network conditions and the number of steps needed. For enterprise rollouts, teams will want to test target workflows end-to-end and track success rates, retries and median-time-to-completion.

Safety railings and well-known limits for AI agents

Overall, the release articulates control. Developers can limit what the agent can do, stop it from circumventing CAPTCHAs, deny it access to certain pages and add a confirmation step for actions such as making purchases or exporting data. Activity is also read-write logged so it can be audited — a must in regulated industries.

Google’s system card also lists well-known frontier-model limits: hallucinations, gaps in causal understanding, and problems with complex logical deduction and counterfactual reasoning. In practice, this behavior may cause the agent to sometimes mispredict ambiguous interfaces or take suboptimal paths. Human-in-the-loop checkpoints and constraints are still the best practices.

How It Stacks Up Against Other AI Agents and Tools

OpenAI and Anthropic have recently been training agents that can run browsers and manipulate desktops. The common theme is to “generalize” UI control: models that can learn to use new websites without tailored scripts. As for leading the industry in benchmarks, Google’s highly specific assertion implies it has a competitive advantage in perception-action loops and latency (though your mileage will no doubt — as always — vary depending on what you’re trying to do).

One distinction is the focus on explicit, visible action traces — users can see what steps were taken and why as the agent operates. That kind of transparency fosters trust, aids debugging and provides teams with a vehicle for instituting review gates at critical times.

Availability and what to try first with Gemini agents

The model is accessible through the Gemini API in Google AI and via Vertex AI, with a demo available on Browserbase.

Although it’s mainly optimized for the web, Google sees substantial potential in mobile use cases and suggests cross-device control is next.

Early adopters should begin with well-scoped tasks that are high-value, low-risk: internal dashboard updates, report generation and structured data entry. Set up a sandbox, add confirmations for risky operations, and test success rates with a handful of representative sites before rolling it out.

The conclusion is larger than any one demo: AI agents are graduating from answering questions to performing actions in the very interfaces we all use. If Google’s results generalize beyond the lab, the era of browser-native automation may move from frail wielding of scripts to adaptive, auditable AI — one cautious click at a time.

Bill Thompson
ByBill Thompson
Bill Thompson is a veteran technology columnist and digital culture analyst with decades of experience reporting on the intersection of media, society, and the internet. His commentary has been featured across major publications and global broadcasters. Known for exploring the social impact of digital transformation, Bill writes with a focus on ethics, innovation, and the future of information.
Latest News
Super.money taps Juspay for D2C checkout push
Andreessen Horowitz Denies Report of India Office
Reflection AI Raises $2B For America’s Open Frontier Lab
What is the Microsoft 365, Teams, Outlook and Azure outage
Discord Breach Reveals User Data of At Least 70,000
NYC Sues Google, Meta, TikTok for Addictive Apps
Two Ways Project Moohan Edges Out Apple Vision Pro
Tesla FSD Faces US Scrutiny Over Traffic Violations
Elegoo Centauri Carbon 3D Printer Use and Review
TiVo Ends the Standalone DVR Era, Shifts to TV Software
Anker Nebula P1: Projector With Detachable Speakers For Less
Sora Copycats Keep Popping Up in Apple’s App Store
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.