I put Gemini Nano Banana 2 to a very human test: turning dense information into sketchnotes. The on-device model produced layouts and lettering that often looked classroom-ready, but it also tripped over basics like numbering, ordering, and consistency. Here’s what it nailed, what it botched, and how to coax better results if you’re using it for visual note-taking.

What Gemini Got Surprisingly Right for Sketchnotes

Visual tone and hierarchy were the standout strengths. The model defaulted to pastel “highlighter” colors, hand-drawn containers, and playful but readable lettering—exactly the sketchbook vibe that sketchnoters prize. It grouped related ideas into tidy clusters, used icons to hint at meaning, and left generous white space so the page didn’t feel cramped.

Table of Contents

What Gemini Got Surprisingly Right for Sketchnotes
The Hilarious And Head-Scratching Misses
Why Text In Images Still Trips Up Models
Prompts and tactics that worked for better sketchnotes
Bottom line for sketchnoters using Gemini Nano Banana 2

Gemini Nano Banana 2 sketchnotes highlighting AI features, standout wins and misfires

On a test sketchnote summarizing the U.S. Bill of Rights, it quickly produced 10 labeled sections with clear headings and line-art illustrations. The result looked like something a practiced visual scribe could have drawn during a lecture—only it took seconds, not an hour. That speed-to-structure trade-off is the headline: Nano Banana 2 can draft a first pass faster than most humans can open their notebook.

The Hilarious And Head-Scratching Misses

The model’s weak spot is text precision inside images. Across my runs, it routinely mixed Arabic and Roman numerals, duplicated numbers, or shuffled items out of order. In one Bill of Rights layout, “5” appeared twice in separate bubbles; in another, items 1–4 jumped to the bottom while 7–10 floated to the top. It took six iterations to land a perfect, 10-item grid.

Instruction carryover across prompts also caused odd echoes. After asking for a centered title in one session, later sketchnotes began centering headlines even when I didn’t request it. At one point, the generator dumped what looked like internal instructions rather than an image—a classic sign the context window had tangled. A fresh session immediately cleared that behavior.

Long-form source material introduced a different failure mode. When I asked it to sketchnote a lengthy service-comparison article, the first attempt included numeric labels I had explicitly banned. The second tried to fix that… but only on the top half of the canvas. The third crashed. The fourth succeeded, but only after I made the model summarize the article in text first, then asked for the image.

Why Text In Images Still Trips Up Models

Generating legible, well-placed text is a notorious challenge for image models because they juggle two jobs at once: drawing and typesetting. Diffusion pipelines excel at style and composition, but they’re brittle when you add constraints like “number these 1 through 10 with no repeats and place them in a specific sequence.” Research from design usability groups such as Nielsen Norman Group has long shown that typography and layout rules matter for comprehension; image models don’t reliably enforce those rules yet.

Multimodal benchmarks that touch text—like COCO-Text and TextCaps—highlight how recognition and rendering of characters lag behind general image quality. Add summarization on top (turn a URL into an outline, then draw it), and small hallucinations compound: a misread heading becomes a mislabeled bubble, which becomes an out-of-order diagram. Iterative prompting helps, but it isn’t a silver bullet.

A 16:9 aspect ratio image of two bananas with glowing circuit patterns on their skins, resting on a mossy, circuit board-like surface with small banana plants growing around them, all under a grow light.

Prompts and tactics that worked for better sketchnotes

Be excruciatingly explicit. The winning prompt for my Bill of Rights page spelled out layout and sequencing: put items 1–4 across the top in order, place 5 to the left of the title, 6 to the right, and 7–10 along the bottom in order. I also added “Use only Arabic numerals, do not repeat any number, and double-check order before rendering.” That single prompt removed three recurring errors at once.

Reset when things go weird. If you see internal notes or stylistic echoes from earlier prompts, start a new session to clear context. In my tests, that immediately stopped carryover quirks.

Separate thinking from drawing. For longer sources, ask for a bullet summary first. Once the outline looks right, tell Nano Banana 2 to “turn that outline into a sketchnote” with your constraints (colors, numbering rules, placement). This two-step flow cut failures dramatically in my runs.

Constrain style, not just content. Request “classic highlighter colors,” “hand-drawn shapes,” and “no Roman numerals anywhere.” These cues consistently produced cleaner, more legible cards and avoided font-like artifacts that image models sometimes invent.

Expect a few passes. Across projects, I averaged 3–6 iterations to get a presentation-ready sketchnote. That matches what human-centered AI studies from academic labs and industry research have shown about iterative prompting: quality improves meaningfully after a handful of targeted refinements.

Bottom line for sketchnoters using Gemini Nano Banana 2

Gemini Nano Banana 2 can draft sketchnotes that look delightfully human, and it does so fast. But if your page depends on strict ordering, numbering, or labels, you’ll need to micromanage prompts and iterate. The sweet spot today: use the model to explore layouts, iconography, and color systems in minutes, then lock the final with one last guided pass—or a quick human edit. Win some, lose some, and learn the prompts that stack the wins.