Google’s next step for Gemini on Android is coming into focus, with fresh evidence that the assistant is gearing up to act directly inside other apps. New clues in the latest Google app beta point to a feature called “screen automation,” designed to let Gemini place orders, book rides, and complete other multi-step tasks without you tapping through each screen yourself. It’s the most concrete sign yet that last year’s agentic demos are moving toward real-world use.

What the new code reveals about Gemini screen automation in Android

Strings discovered in the Google app version 17.4.66 for Android outline a short onboarding that frames the capability as getting tasks done with Gemini. The text explicitly references using “screen automation” on select apps to perform actions like online orders and ride-hailing. Internally, the feature appears to carry the codename “bonobo,” a signal it’s being treated as a distinct initiative inside Google’s assistant roadmap.

Table of Contents

What the new code reveals about Gemini screen automation in Android
How it might work under the hood on Android devices
Controls and safeguards for safe Gemini screen automation
Why it matters for Android users and everyday tasks
How it compares to Siri, Shortcuts, Bixby, and Copilot
What to watch next as Google tests Gemini screen automation

There are also hints of supporting UI changes, including potential “Purchases” or “My Orders” views tied to Gemini’s personal content area. If implemented, that would make outcomes traceable—important for trust—by keeping a record of what the assistant initiated on your behalf.

How it might work under the hood on Android devices

Screen automation suggests a blend of on-device perception and secure action execution: Gemini would parse what’s on your display, identify interactive elements, and perform taps, scrolls, or form fills as needed. Android already permits carefully scoped automation via accessibility services, App Actions, and Shortcuts; the difference here is that Gemini could fluidly stitch steps together across different apps and webviews in response to natural language goals.

Think of telling your phone, “Reorder my usual groceries and choose the earliest delivery window,” and Gemini handling the entire flow—even if that means jumping between a retailer’s app, your delivery preferences, and payment confirmation. Google previewed this kind of end-to-end orchestration in its Project Astra demos, where the model understood on-screen content and took context-appropriate actions.

Controls and safeguards for safe Gemini screen automation

The onboarding text emphasizes user oversight. Google cautions that Gemini can make errors and reminds you that you remain responsible for what it does, with options to stop the process, monitor progress, or take manual control at any time. Expect a conservative launch: limited app support, explicit consent prompts, and visible progress indicators so you can intervene quickly.

Privacy language in the beta notes that screenshots from interactions may be reviewed by trained personnel to improve services when activity history is enabled. The guidance advises against entering logins or payment details into chats and avoiding sensitive or urgent scenarios. To earn trust, Google will likely lean on on-device processing for screen understanding where possible, automatic redaction of sensitive fields, and per-app toggles, mirroring patterns it uses in other sensitive features.

A close-up of a smartphone screen displaying the Google Gemini app listing, resized to a 16:9 aspect ratio with a professional flat design background featuring soft patterns.

Why it matters for Android users and everyday tasks

Agentic help promises to flatten chores that typically require repetitive taps and context switching. A few practical examples: rebooking a ride after a cancellation, checking out with a saved cart when a restock alert hits, or applying a promo code during a limited-time sale—all without you navigating multiple screens. The payoff is speed and accessibility; for many users, turning a compound task into a single request is the difference between doing it now and putting it off.

Developers stand to benefit, too. If Google exposes clear contracts for safe surfaces—like designating screens and fields that are automation-friendly—apps can become more “agent-ready” without sacrificing security. That would complement existing Android capabilities such as App Actions, capabilities schemas, and deep links, giving Gemini structured hooks when precision beats free-form screen parsing.

How it compares to Siri, Shortcuts, Bixby, and Copilot

Apple’s Shortcuts and Siri intents let users build automations, but they generally require explicit setup or developer-provided intents. Samsung’s Bixby Routines excels at trigger-based actions across device settings. Microsoft’s Copilot has shown promise with web-based transaction flow. Gemini’s angle is natural-language, cross-app execution that adapts to whatever is on-screen—even when an app lacks first-party integrations—bringing consumer-grade accessibility service techniques into a mainstream, safety-guarded assistant.

What to watch next as Google tests Gemini screen automation

Key signs of progress will include a toggle in Gemini settings for screen automation, a list of supported apps, and a transaction log under “My Stuff” or “Orders.” Early availability will likely arrive via the Google app beta with a narrow capability set—food delivery, ride-hailing, and basic e-commerce—before expanding to more complex workflows.

The bigger question is how Google balances autonomy with accountability. Expect granular controls, per-app permissions, and clear visual affordances when Gemini is “driving.” If the implementation aligns with the company’s Project Astra vision—fast perception, grounded actions, and transparent guardrails—Android could be the first mainstream platform where a general-purpose AI reliably gets things done inside your apps.