FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

Gemini Image Markup Tools Hint at Smarter Visual AI

Gregory Zuckerman
Last updated: October 23, 2025 9:57 am
By Gregory Zuckerman
Technology
6 Min Read
SHARE

Google is testing a simple idea, but one with the potential for big impact: on-screen markup that allows you to draw over parts of an image on your device screen before asking it a question or giving it a command. The feature, observed in its latest Google app build, will offer a more organized approach to guide Gemini’s focus and thus make visual questions faster, clearer, and better.

How the New Workflow of Markup Could Work

Strings and UI elements in version 16.42.61 of the Google app’s ARM64 build show a new option to draw on an image selected from your gallery or taken by the camera. You can draw a circle or underline an area, then you might say to Gemini that you want it to analyze only that part — “interpret just this label,” “find the damage on this corner,” and “contrast these two logos on the shelf.”

Table of Contents
  • How the New Workflow of Markup Could Work
  • Why Region Prompts Are Important for Multimodal AI
  • Editing Hooks and On-Device Clues in Gemini’s Tools
  • Use Cases That Make Sense in the Real World
  • How It Compares With Rivals in Visual AI Markup
  • Caveats and What to Watch as Google Tests Markup
Gemini visual AI markup tools interface with smart annotations, tags, and labels

While we can’t say for sure, early evidence suggests a color picker and multiple highlight modes, which would suggest that more than one region could be marked in a single shot. That would allow for multistep prompts like “describe the chart in green, then pull a sentence from this box in blue.” There are no explicit instructions to read in large type, but region cues serve as visual signposts that help Gemini guess at where to focus.

Why Region Prompts Are Important for Multimodal AI

It’s very common that multimodal models work better on low-ambiguity cases. Rather than using ambiguous language like “that thing on the left,” the hasty scrawl becomes a point of clear reference. Work on referring expressions and grounding from Google and other labs has also consistently demonstrated that spatial guidance helps object recognition, visual question answering, and caption accuracy.

This leverages Gemini’s long-context capabilities, too. With Gemini 1.5, context windows in images and documents can be enormous, on the order of millions of tokens. Markup is a cost-effective method to reduce the field of view, which may improve response times and decrease the computational burden by limiting the regions requiring attention.

Editing Hooks and On-Device Clues in Gemini’s Tools

The interface suggests more than analysis. Internal references to features known by the monikers “Nano” and “Banana” imply hooks into on-device image editing flows — like quickly cutting out an unwanted piece of a screenshot or tidying up a photo background. That fits Google’s larger split between on-device Gemini Nano for privacy-sensitive, light work and cloud models for heavier lifting.

If the markup tool is smart enough to route tasks efficiently, doing local edits where possible and the cloud when it makes sense, users could have both performance and fidelity. It would be a similar play to how Pixel features such as Magic Eraser and Audio Magic Eraser mix seamless gestures with AI-driven adjustments, except baked into the Gemini prompt flow itself.

Google Gemini visual AI markup tools highlighting objects with bounding boxes and labels

Use Cases That Make Sense in the Real World

Practical scenarios are everywhere. Students can focus on a particular axis in a chart and request the trend line summary on one dimension only. A sticker in a storefront window can be circled for translation, without bringing along the reflections from behind. A support rep can circle an error message in a screenshot and ask for a resolution, bypassing UI clutter that isn’t relevant.

Those capable of making workers most productive are likely to benefit, as will retail and productivity teams. Product catalogers can select a SKU label and extract structured data in a reliable way, and designers can cut out a logo for brand compliance tests. In medical and insurance processes, visual cues might prevent health-related or other sensitive sections from being blacked out, or mark a location for review to work in conjunction with existing privacy assurances.

How It Compares With Rivals in Visual AI Markup

Competitors have already been heading in this direction. Chat assistants from OpenAI and Microsoft now enable users to tap or draw on images to zero in on a query. Adding first-party markup within Gemini’s core flow would maintain parity and benefit from advantages of Google’s ecosystem: existing native Android markup tools, Google Photos’ image editing stack, and close integration with the Google app.

Caveats and What to Watch as Google Tests Markup

Like all pre-release features, the UI probably will change and looks provisional at best. It’s not entirely clear to me, for example, why there are so many colors — are they for ordering steps, labeling categories, or purely aesthetic? And availability might be staged, possibly first coming to devices with enough on-device AI muscle.

Still, the direction is compelling. “At the end of the day, visual markup is what turns Gemini from a capable generalist into a willing and able assistant that knows exactly what you’re talking about when you point. If Google activates this and tightens up region-exposed prompting, it will get quicker responses with less misunderstanding and a friend, therefore, more likely to make your team’s 100 m run up a mountain without passing out than along for the ride.”

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Shuttle nets $6M to automate ‘vibe’ coding deployment
Amazon Considers Robots, Not Human Labor, for 600,000 Jobs
YouTube Introduces Timer To Reduce Shorts Scrolling
Casio G-Shock Nano Ring Watch Price And Availability
ChatGPT Atlas Might Purchase The Wrong Product
Beyerdynamic MMX 330 Pro headset drops to $329.99 at 34 percent off
Atlas makes ChatGPT the central hub inside your browser
Wi-Fi Mesh Deals To Give Your Connection A Boost
Best Smart Security Deals on Cameras, Locks and Doorbells
Samsung Galaxy S25 Edge Dropped to Lowest Price Ever
AI Leaders Warn of Superintelligence Risks
Meta Cuts 600 Jobs in AI as Costly Expansion Continues
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.