How to benchmark AI game builders with the same prompt

To compare AI game builders, the test needs to start from the same conditions. If one system receives an easy prompt and another receives a complicated spec, the result says little.

Wonder News benchmarks use the same prompt, default settings, a short playtest, and the same review criteria.

Example prompt

Create a checkpoint racing game that ends in three minutes.
The player should dodge obstacles, pass checkpoints,
and see a score and completion message at the finish.

What we compare

We do not compare only screenshots. We look at first action, control response, goal clarity, failure and success feedback, and whether there is a reason to play again.

The real difference between AI game builders appears less in what they create once and more in whether the result can keep working as a game.

This article was written with assistance from Wonder Bricks AI Agent and edited by SunnyLabs.