Last week, I spent a lot of time running prompts through OpenAI’s Sora 2 and Google’s Veo 3. The difference isn’t subtle. On an average day, Sora 2 demonstrates shockingly superior understanding of physics, human nuance, and scene continuation. Veo 3 still makes some cinematic triumphs and offers handy reliability, but the power scales are tipping.
How I tested: methods, prompts, and evaluation criteria
I developed simple English prompts that stick to basics, like a slow-motion espresso pour, a golden retriever racing through a bustling park, or a motorcyclist making his mark across wet sand at dusk—even an iconic yellow taxi bolting through traffic in Kolkata—and, for our sci‑fi fans, a well‑known sci‑fi character walking the streets of Bangkok trying to jostle guardrails.
- How I tested: methods, prompts, and evaluation criteria
- Results, prompt by prompt: head‑to‑head comparisons
- Why Sora 2 pulls ahead in realism, physics, and stability
- Where Veo 3 still wins on cinematography and mood
- Guardrails and reliability: policy, safety, and consistency
- Access and cost: availability, pricing, and workflows
- Bottom line: choosing the right model for your needs

I averaged across multiple takes per prompt with default settings to keep things fair, then rated output on realism, temporal stability, camera control, and instruction-following.
Results, prompt by prompt: head‑to‑head comparisons
Espresso shot: Neither did the new Veo 3 give me a pretty, commercial-grade clip with convincing crema and camera language that felt pro. Sora 2 cranked it up a notch, matching viscosity, splash physics, and cosmetically ruffled liquid swirls with uncanny precision. It felt like a tabletop shoot, not an AI best effort.
Dog in a busy park: Veo 3 betrayed some energy and decent depth of field, though background extras occasionally looked like they were made of people‑parts. Sora 2 was more resistant in my tests: fur dynamics, footfalls, and crowd behavior were integrated with fewer obvious tells like flickering or edge‑bleeding artifacts. The realism veered into “documentary.”
Motorcycle on a beach at sunset: That was the high point of Veo 3. It provided traction against the sand, believable dust plumes, and gorgeous golden‑hour lighting—cinema‑worthy framing included. Sora 2 fell short at points here, occasionally fumbling surface interactions and moments of physical interaction, evoking the weaknesses of earlier‑gen handling when vehicles collide with deformable ground.
Kolkata taxi in daytime: The visual uncompressedness of the city is a death trap for generative models. Veo 3 does a good job picking up ambient atmosphere, but it occasionally fell into that “gliding object” part and shredded smaller text. Sora 2 featured lived‑in street textures, solid vehicle dynamics, and believable passenger movement inside the cabin. It felt grounded.
Copyright probe: Veo 3 drew a near‑cinematic simulacrum of a familiar sci‑fi hero, armor reflections, silhouette, and the works. Sora 2 declined, citing more stringent content requirements. In other words, Veo 3 boasted technical chops but looser guardrails; Sora 2 demonstrated the reverse: strong capability, tighter compliance.
Why Sora 2 pulls ahead in realism, physics, and stability
The most obvious leap is physical plausibility. Sora 2 restrains mass, momentum, and material properties like no other model I’ve tried. That means the “frictionless world” glitches occur less often, along with better object permanence—people and props don’t flicker, collide, or teleport as much. In practice, you don’t have to spend as much time re‑genning to repair shitty motion.
The other is temporal consistency. Both vendors discuss longer‑range coherence, but Sora 2’s frame‑to‑frame logic felt coarser over handheld shots and complex backgrounds. It’s in line with where independent evaluations are going: research benchmarks like VBench and GenEval prize stability, motion realism, and instruction adherence, and the newer frontier models have been steadily raising those scores year after year, according to academic groups at Tsinghua and HKUST.
Where Veo 3 still wins on cinematography and mood
Cinematography. Veo 3 commonly hits tone and camera grammar—rack focuses, tasteful lens flares, graded instead of auto‑balanced colors. Out there on the motorcycle scene, it knows what tires do to bite into sand and how low light carves metal. For creators more interested in mood than microscopic physical fidelity, Veo 3 can be the shortcut to a usable shot.
Veo 3 was also impressive for “cleanly composed” lifestyle visuals—short ads, product shots, and scenic B‑roll. That lines up with what media researchers and production studios have found, among other things, in pilots: models trained and tuned for ad‑favorable composition can excel on controlled vignettes, even if they struggle on tumultuous, crowded scenes.
Guardrails and reliability: policy, safety, and consistency
Sora 2’s rejections on copyrighted characters may also telegraph a tougher policy climate. It’s in line with what large labs have written about on recent system cards and safety briefings: expanded content filters, provenance features, and watermarking experiments. Veo 3 seems to be looser in edge cases, which can help creators prototype but raises clear rights questions for production use.
Access and cost: availability, pricing, and workflows
As of my testing, Sora 2 is still invite‑only. OpenAI has suggested that free access is not going to scale forever and that limiting usage during peaks of demand in some way is the norm. Veo 3 can be accessed via Google’s premium offerings, with daily limits or as part of credit‑based pricing plans—making experimentation inexpensive while still meeting needs for higher‑end workflows. Studios I’ve spoken to treat them as previsualization tools first and not as final‑delivery engines.
Bottom line: choosing the right model for your needs
All in all, Sora 2 seems a generational step for street‑level realism—especially with people, liquid, and busy scenes—while Veo 3 still has the edge on movie polish and certain environment‑physics combos. The gulf is so jarring not so much because Veo 3 is a failure, but because Sora 2 feels too often like a camera and not enough like a reader. And for most content creators, that means fewer retries, more realistic movement, and a better chance of your first rendering being the one.
If you’re creating polished ad visuals or stylized B‑roll, it’s still great. If you want grounded, physics‑faithful clips that hold up to scrutiny, Sora 2 is the one to beat—and now the one that has replaced my default choice.