Google Video AI Roadmap: Veo 3.1, Gemini Omni, and the Veo 4 Question

Google's video AI roadmap is becoming more layered. The public documentation still centers on Veo 3.1, creator products such as Flow keep gaining workflow features, and a newer name, Gemini Omni, has started appearing in pre-I/O reporting. That mix can make the next step feel confusing, especially for people searching for Veo 4.

As of May 17, 2026, the safest way to understand the market is not to treat these names as direct replacements for one another. Veo is the documented video model family. Gemini is the broader AI product ecosystem. Flow is a filmmaking workspace. Gemini Omni, for now, is a reported signal rather than an official launch.

The Confirmed Layer: Veo 3.1

Veo 3.1 is the name with the strongest official footing today. Google's public developer documentation describes Veo 3.1 as the current video-generation model line on Vertex AI, with support for text-to-video, image-to-video, prompt rewriting, and first-and-last-frame generation.

Google's own product updates also point in the same direction. Recent Veo 3.1 announcements highlight better narrative control, reference-image workflows, support for ingredients-to-video creation, and improved editing options in Flow. These are not minor naming updates. They show that Google is still actively developing the Veo stack.

For builders, that matters because Veo 3.1 is the part you can plan around. It has documentation, product surfaces, model behavior to test, and a path into Google-supported infrastructure. Rumored names may influence search demand, but production roadmaps should still be anchored to released capabilities.

The Product Layer: Gemini, Flow, and Google Vids

Google's video story is no longer just about a raw model. The model matters, but the user experience increasingly matters more.

Flow is aimed at AI filmmaking, where a creator may need shot planning, character consistency, image references, audio, and repeatable scene control. Gemini brings video generation closer to a conversational interface. Google Vids brings AI-assisted video creation into workplace communication.

This is why the next Google video release may not look like a simple model-number announcement. It could appear as a new Gemini feature, a Flow editing upgrade, a Vertex AI model update, or a combination of several surfaces at once.

That product layering also explains why "Veo 4" is not the only useful search term. A creator may care less about the model family name and more about whether Google can generate, revise, extend, and edit clips without forcing a full restart each time.

The Unconfirmed Layer: Gemini Omni

Gemini Omni is currently best treated as an unconfirmed product or model signal. Media coverage and interface sightings suggest a Gemini-native video experience focused on creation, remixing, templates, and chat-based editing. That would fit Google's broader direction, but it is not the same as an official Google documentation page.

The interesting part is the emphasis on editing. If Gemini Omni becomes real, its biggest value may not be "another text-to-video model." The more important promise would be iterative control: generate a clip, ask for a specific change, preserve the parts that already work, and avoid restarting from a blank prompt.

That is a different product problem from first-generation video generation. It is closer to a creative workstation, where chat, references, timeline changes, and model output all need to work together.

Where The Veo 4 Question Fits

Veo 4 is still a reasonable thing for people to search. Google released Veo, then Veo 2, then Veo 3, so a fourth version sounds natural. But a naming pattern is not an announcement.

There are several realistic outcomes:

Google could eventually announce a model called Veo 4.
Google could keep improving Veo 3.1 while exposing new features through Gemini and Flow.
Google could use Gemini Omni as the user-facing name for a Veo-powered creation layer.
Google could launch Omni as a separate Gemini-native video system while Veo remains available to developers.

The practical conclusion is simple: do not design a product, article strategy, or paid campaign around Veo 4 as if it were already official. Track it as a search-demand signal, but keep the factual baseline tied to Veo 3.1 and Google's released product surfaces.

What This Means For Creators And Teams

If you create video content today, the most useful question is not "What will the next model be called?" The useful question is "Which workflow will reduce failed generations and make revisions easier?"

That means watching for features such as:

reliable image-to-video control;
stable subjects, products, and logos across revisions;
direct editing of existing clips;
native audio and dialogue control;
first-frame and last-frame guidance;
faster preview models paired with higher-quality final models;
clear commercial-use, watermark, and quota rules.

These capabilities affect daily work more than the headline model name. A model that edits predictably can be more valuable than one that only produces a more impressive first attempt.

A Practical Watchlist For Google I/O 2026

Google I/O 2026 is the next obvious place to watch for clarification. The important details will be specific:

Is Gemini Omni officially announced?
Is it a model, a Gemini feature, a Flow feature, or a brand for several capabilities?
Are there API model IDs for developers?
Does it support video-to-video editing, or only text-to-video and image-to-video?
What happens to Veo branding after the announcement?
How will pricing, credits, quotas, and watermarks work?

Until those answers are public, the cleanest roadmap is this: use Veo 3.1 where official support matters, follow Gemini Omni as the likely next conversation around Google video AI, and treat Veo 4 as an unresolved naming question.

Table of Contents