FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

Gemini Image Markup Tools Hint at Smarter Visual AI

Gregory Zuckerman
Last updated: October 24, 2025 10:18 pm
By Gregory Zuckerman
Technology
6 Min Read
SHARE

Google is testing a simple idea, but one with the potential for big impact: on-screen markup that allows you to draw over parts of an image on your device screen before asking it a question or giving it a command. The feature, observed in its latest Google app build, will offer a more organized approach to guide Gemini’s focus and thus make visual questions faster, clearer, and better.

How the New Workflow of Markup Could Work

Strings and UI elements in version 16.42.61 of the Google app’s ARM64 build show a new option to draw on an image selected from your gallery or taken by the camera. You can draw a circle or underline an area, then you might say to Gemini that you want it to analyze only that part — “interpret just this label,” “find the damage on this corner,” and “contrast these two logos on the shelf.”

Table of Contents
  • How the New Workflow of Markup Could Work
  • Why Region Prompts Are Important for Multimodal AI
  • Editing Hooks and On-Device Clues in Gemini’s Tools
  • Use Cases That Make Sense in the Real World
  • How It Compares With Rivals in Visual AI Markup
  • Caveats and What to Watch as Google Tests Markup
A collage of Google app icons, including Google Chat, YouTube Kids, Google Translate , Google Wallet, Google One, Google Pay, Google Home , and Google Find My Device , arranged in rows on a soft blue and green gradient background .

While we can’t say for sure, early evidence suggests a color picker and multiple highlight modes, which would suggest that more than one region could be marked in a single shot. That would allow for multistep prompts like “describe the chart in green, then pull a sentence from this box in blue.” There are no explicit instructions to read in large type, but region cues serve as visual signposts that help Gemini guess at where to focus.

Why Region Prompts Are Important for Multimodal AI

It’s very common that multimodal models work better on low-ambiguity cases. Rather than using ambiguous language like “that thing on the left,” the hasty scrawl becomes a point of clear reference. Work on referring expressions and grounding from Google and other labs has also consistently demonstrated that spatial guidance helps object recognition, visual question answering, and caption accuracy.

This leverages Gemini’s long-context capabilities, too. With Gemini 1.5, context windows in images and documents can be enormous, on the order of millions of tokens. Markup is a cost-effective method to reduce the field of view, which may improve response times and decrease the computational burden by limiting the regions requiring attention.

Editing Hooks and On-Device Clues in Gemini’s Tools

The interface suggests more than analysis. Internal references to features known by the monikers “Nano” and “Banana” imply hooks into on-device image editing flows — like quickly cutting out an unwanted piece of a screenshot or tidying up a photo background. That fits Google’s larger split between on-device Gemini Nano for privacy-sensitive, light work and cloud models for heavier lifting.

If the markup tool is smart enough to route tasks efficiently, doing local edits where possible and the cloud when it makes sense, users could have both performance and fidelity. It would be a similar play to how Pixel features such as Magic Eraser and Audio Magic Eraser mix seamless gestures with AI-driven adjustments, except baked into the Gemini prompt flow itself.

** Please provide an image of the Google logo for me to resize and enhance.** To proceed with your r

Use Cases That Make Sense in the Real World

Practical scenarios are everywhere. Students can focus on a particular axis in a chart and request the trend line summary on one dimension only. A sticker in a storefront window can be circled for translation, without bringing along the reflections from behind. A support rep can circle an error message in a screenshot and ask for a resolution, bypassing UI clutter that isn’t relevant.

Those capable of making workers most productive are likely to benefit, as will retail and productivity teams. Product catalogers can select a SKU label and extract structured data in a reliable way, and designers can cut out a logo for brand compliance tests. In medical and insurance processes, visual cues might prevent health-related or other sensitive sections from being blacked out, or mark a location for review to work in conjunction with existing privacy assurances.

How It Compares With Rivals in Visual AI Markup

Competitors have already been heading in this direction. Chat assistants from OpenAI and Microsoft now enable users to tap or draw on images to zero in on a query. Adding first-party markup within Gemini’s core flow would maintain parity and benefit from advantages of Google’s ecosystem: existing native Android markup tools, Google Photos’ image editing stack, and close integration with the Google app.

Caveats and What to Watch as Google Tests Markup

Like all pre-release features, the UI probably will change and looks provisional at best. It’s not entirely clear to me, for example, why there are so many colors — are they for ordering steps, labeling categories, or purely aesthetic? And availability might be staged, possibly first coming to devices with enough on-device AI muscle.

Still, the direction is compelling. “At the end of the day, visual markup is what turns Gemini from a capable generalist into a willing and able assistant that knows exactly what you’re talking about when you point. If Google activates this and tightens up region-exposed prompting, it will get quicker responses with less misunderstanding and a friend, therefore, more likely to make your team’s 100 m run up a mountain without passing out than along for the ride.”

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
DJI Mic Mini Hits Its Lowest Price Yet, for Now
Motorola Razr Ultra price hits its lowest level yet
Proton Rolls Out Encrypted Sheets To Compete With Google And Excel
Micro1 Crosses $100M ARR Competing With Scale AI
Feds Investigating Waymo in the Wake of Austin School Bus Accidents
Apple Loses Key Lawyer and Head of Policy
PlayStation 5 Pro Drops to Lowest Price Ever
Phreeli Launches MVNO That Doesn’t Keep Any Names
New $25 PC Transfer Kit Makes Upgrading Easier
Google adds 3D movies to Samsung Galaxy XR via Google TV
Video Call Glitches Cost Jobs And Parole, Study Finds
OpenAI Rejects Ads As ChatGPT Users Rebel
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.