Weaponised image annotation for Gemini
In an about-face move to close a nagging workflow gap that has left users bouncing between the AI and external markup apps, Google is said by sources to be readying native image evaluation capability for Gemini.
The in-product tools would allow folks to draw, highlight, and add text directly onto rendered images, and — perhaps most importantly — use those markups to direct Gemini on targeted edits without having to leave the UI.
- What annotation inside Gemini means for everyday editing
- What the leak reveals about native annotation in Gemini
- How it might work under the hood for region-aware edits
- Closing the loop on real workflows with in-app markups
- Where it fits in Google’s AI roadmap across products
- Timing still unclear as testing rolls out in stages

What annotation inside Gemini means for everyday editing
The image models for Gemini have already grown up fast and deservedly so with some refinement of creations and iteration. But one thing that hasn’t been so easy is simple annotation — circling something, scribbling a note, or adding a quick caption. Today, that often means downloading images, opening a separate image editor — where changes aren’t reflected in your app — and then re-importing, creating friction and potential version confusion when you’re collaborating.
Already competitors were making fuzzy distinctions between prompt-driven editing and hand-guided correction. Adobe’s Generative Fill allows a user to mask a part of an image before invoking the copy-paste command, Canva’s Magic Edit restricts edits to region of interest, and Microsoft Designer facilitates brush-based selections. ChatGPT is off the canvas on selection-driven iterations. Native annotation would assist Gemini to catch up on these practical, everyday tasks that save clicks and keep creative flow.
What the leak reveals about native annotation in Gemini
As per TestingCatalog (on X), Google is experimenting with an annotation layer in Gemini on the web that will allow drawing and text overlays on created images. Previous app teardowns even suggested that mobile will get prompts for markup tools, with Google laying the groundwork on other platforms.
The interesting thing is how markups could be instructions. Think of circling a background object and typing “remove,” or drawing an arrow to a shirt and saying “change to red.” That’s local, prompt-driven editing — turning a drawing and caption into masks and constraints the model can use. It’s a bridge between gooey freeform prompts and the exacting art direction that many creators had been clamoring for.
How it might work under the hood for region-aware edits
Technically, a canvas in the browser can record vector strokes, shapes, and text as layers at image coordinates. Those layers can be turned into masks and fed to the model for region-aware modifications, a common method in diffusion-type systems. Google has certainly had a hand in interactive segmentation and mask-guided image understanding for a while with other research, so it does make sense to plug the annotation signals into Gemini’s image pipeline.

Also look for safety and provenance. Google’s watermarking of SynthID-like AI imagery could stick through annotation workflows, and teams can audit changes with edit histories and metadata. Most of the processing, taking place on the web, would probably remain cloud-based; lighter-weight on-device actions (things like simple highlights or captions) could support server-side edits in mobile for speed.
Closing the loop on real workflows with in-app markups
For marketers, teachers, and social teams, this is a shortcut to done. Rather than having to export an image to a markup app in order to add arrows and callouts, users might be able to put them on in Gemini, and then spit out a version sized just so for Slides, Docs, or Shorts. For designers and product people, a circling loop with prompt helps dictate changes — move button, looser logo, softer texture — without lengthy text prompts or cumbersome file round trips.
The update also looks like it would bolster Gemini as a creative playground for the Chromebook and Android crowds, who are likely to put lightweight tools to use when they’re on the go. Combined with Gemini’s continual iterations and style adjustments, annotation turns the chat window into a practical art-direction surface and not just a prompt box.
Where it fits in Google’s AI roadmap across products
One way or another, Google is closing the gap between chat, creation, and action in Gemini — by pulling in Workspace apps, mobile integrations, and media generation. An annotation layer seems like the natural next step — a bridge between conversational intent and pixel-level control, without losing contact with context. It also works well with Google’s internal image search testbeds — including the recently talked about “Nano Banana Pro” experiments — allowing their power to be useful with everyday tasks.
Timing still unclear as testing rolls out in stages
Because it’s a leak, plans are subject to change. No official timetable or schedule exists, and features like this one often release behind a flag to some subset of accounts before they receive wider visibility. First, keep an eye out for early indicators in the web interface; then Android and iOS, with some piecemeal feature additions — think shape tools, layer ordering, and export presets — if testing goes without a hitch.
If Google includes annotation tied to promptable masks, then Gemini’s image workflow will be significantly improved and have less dependence on third-party image editors. Less context switching, better clarity about intent, faster iteration — one of those quality-of-life changes that end users will feel immediately without ever having to discuss models and masks.
