Google has pushed out Veo 3.1, its newest generative video model, inside the Flow video editor with improved audio, more refined controls and better image-to-video results. Expanding on May’s Veo 3 release, the update includes more natural movement and better prompt following, and is also coming to the Gemini app as well as via Vertex AI and Gemini APIs for developers.

“The fact that Flow is making it possible for people to create something as stunning as this with just their phone, we know the bar has moved and creators are in a new place,” said Josh Lovejoy, a staff interaction designer on Google Research’s user experience team who worked on the app.

Table of Contents

What Veo 3.1 Adds: audio, controls, and fidelity
Flow Integration and Availability Across Platforms
Why It Matters for Creators and Brands Using Flow
Competition and context in generative video tools
What to watch next as Veo 3.1 rolls out in Flow

An image collage with various scenes , including balls being poured , a dancer, a figure with a candle, a cowboy on a horse, and a woman in a fancy hall, overlaid with Veo 3 .1.

What Veo 3.1 Adds: audio, controls, and fidelity

The key addition here is audio throughout. Video clips created or trimmed with reference images, first/last-frame guidance or video extension can now be accompanied by AI-generated sound to make it easy for creators to complete share-worthy videos without having to jump out and use a separate audio tool. Consider ambient soundscapes, Foley effects or transitional cues that follow on-screen action.

Veo 3.1 also includes object-level compositing. Users can add an object to a scene and have it mix with the style and lighting of a clip, which suggests layering, hinting at more sophisticated scene understanding and temporal consistency. Google also says object removal will soon come to Flow, which could allow editors to clean up shots or change a scene without needing reshoots.

The features of the existing Veo also get sharper. Fidelity—image-to-video gets better in quality (gains are helpful for translating brand style boards/character sheets into cohesive motion). Reference images can direct a character’s look throughout shots. The first-and-last-frame workflow operates much as a pair of lightweight key frames, and video extension can grow a clip from its final frames for seamless extensions.

Flow Integration and Availability Across Platforms

By baking Veo 3.1 into Flow, Google is defining the editor as something greater than a prompt box; it’s shaping up to be an all-in-one workspace for ideation, drafting and finishing work. The integration cuts down on tool-switching and maintains creative context—prompts, references, frames—within a single timeline.

In addition to Flow, access through the Gemini app and via Vertex and Gemini APIs provides teams with several on-ramps. Flow can be used by creative studios to prototype, which will allow them to execute workflows in Vertex with permissions and managed infrastructure. Veo 3.1 can be integrated by developers into pipelines for marketing automation, social publishing or product visualization.

For Google, multi-surface availability is strategic: consumer usage in Flow informs model polish; and enterprise deployment through Vertex channels the polished model into governed environments with predictable scaling and billing.

A diagram comparing Veo 3.1 and Veo 3 (20 25) across audio, length, and narrative control, illustrating differences in processing at 72 0p and 1080 p resolutions with corresponding audio waveforms and user icons .

Why It Matters for Creators and Brands Using Flow

Sound is important in terms of engagement and pacing. The combination of synchronized sound linked to generated motion means short-form editors, indie marketers and social teams can ship fully captioned videos faster. That logically extends to inserting objects—such as product shots, signage or safety elements—into scenes without needing to reshoot, and new removal for deleting distractions or IP-regulated items will help streamline the cleanup process.

Reference-based generation and first-and-last-frame control are particularly relevant to brand compliance. Teams may lock a character’s look or a product’s silhouette, and then direct motion with frames that correspond to storyboards. That shifts generative video closer to traditional previsualization and post workflows, not one-off propeller experiments.

Competition and context in generative video tools

Veo 3.1 arrives in a market that’s busy already. OpenAI teased Sora with high-fidelity, long-duration clips; Runway’s Gen-3 brought more powerful physics and camera control; Pika and Luma have expanded fast-turnaround creative tools. Google’s differentiator is less a single flashy demo and more the model capability/accessible editing confluence, with an enterprise pathway available through Vertex.

Safety and provenance will be scrutinized as adoption expands. Google has previously pushed for methods such as SynthID watermarking of AI media; the enterprise world will be watching how content labeling, moderation and usage policies translate to Veo 3.1 outputs across Flow and API touchpoints.

What to watch next as Veo 3.1 rolls out in Flow

Larger questions now are around timecode consistency on complex scenes, audio sync for fast edits, and latency for iterative workflows. Pricing and rate limits through the APIs will drive how quickly agencies and platforms integrate Veo 3.1 at scale, while the launch schedules for Flow’s object removal tool will indicate how rapidly Google can turn research prototypes into daily editing tools.

If Google continues to combine creative control (there are tweaks at the object level and keyframe-like guidance) with accessible packaging, Veo 3.1 could push generative video from mere novelty to repeatable production, reducing the gap between prompt-driven drafts and publishable stories.