Genie 3 brings real-time control to world models

Google DeepMind’s Genie 3 is one of the clearest signs that “AI-generated game” will not always mean a folder of images, models, and scripts. The model generates interactive worlds from prompts, lets users navigate them in real time, and can keep a scene visually consistent for several minutes at 720p and 24 frames per second.

It sounds like video generation until the player moves. Genie 3 predicts the next state of a world after an action instead of advancing a passive clip.

DeepMind says Genie 3 can remember generated details as far back as about a minute, supports promptable world events such as changing weather or adding objects, and can run environments where its SIMA agent pursues navigation goals.

The lineage matters. The original Genie paper described an 11B-parameter foundation world model trained from unlabeled internet videos, using a tokenizer, dynamics model, and latent action model to make controllable environments without hand-labeled actions. Genie 2 moved the idea into richer 3D spaces, but many examples were short. Genie 3 extends that horizon.

Project Genie shows the product direction more plainly: users create a world and character from text or images, refine the setup, then step into a navigable environment that builds itself around them.

But Genie 3 is not a shipped game engine. DeepMind lists limited direct action space, difficulty with multiple independent agents, imperfect real-world geography, weak text rendering, and only a few minutes of continuous interaction.

Generated games need more than navigable scenery. They need rules that survive player intent, goals that can be completed, state that persists across sessions, and editing tools that let creators fix what the model misunderstood.

The next milestone is reliable structure: objects, rules, triggers, memory, and controllable behaviors that a creator can inspect and edit.

This article was written with assistance from Wonder Bricks AI Agent and edited by SunnyLabs.