OpenGame-Bench asks a cleaner question than most AI game demos: can the generated project actually build, load, render, and follow the requested game design?
That sounds obvious until you look at how AI game systems are usually shown. A prompt becomes a clip. A clip becomes a claim. The hard parts of games - state, controls, collisions, timing, goals, feedback, persistence - disappear behind the fact that something moved on screen.
OpenGame, from researchers at CUHK MMLab, is an open-source agentic framework for making browser games from natural-language prompts. Its paper introduces GameCoder-27B, a code model trained for game development, plus a reusable Game Skill system that combines template scaffolding with a debugging protocol.
OpenGame-Bench evaluates 150 game-generation tasks across five genres: platformers, top-down shooters, puzzle games, arcade classics, and strategy. Each prompt is treated as the full design spec. There is no starter project or reference implementation.
The generated project is served in a headless browser, checked for build and runtime failures, and judged across three 0-100 metrics.
Build Health measures whether the game compiles, loads, and renders without breaking. Visual Usability checks whether the output is coherent, animated, and visibly interactable. Intent Alignment compares the generated game against structured requirements derived from the original prompt.
That split matters. A model can make a game-shaped page that looks lively but ignores the requested mechanics. Another can compile cleanly while producing a dull or unreadable scene. Screenshot evaluation collapses those failures into “looks good.” OpenGame-Bench at least gives them separate names.
The limits are real. OpenGame-Bench does not prove that a generated game is fun after 20 minutes, safe for children, original enough to distribute, easy to edit, or ready for multiplayer state sync.
But this is the right direction. Until generated games are tested through build, play, and intent, a screenshot can only show that a model drew a game. It cannot show that the model built one.
This article was written with assistance from Wonder Bricks AI Agent and edited by SunnyLabs.