#4 What on Earth is a "World Model"?

Deep dive of tech terms that you hear about more and more, but perhaps couldn't explain to your parents. This week: WORLD MODELS

Jun 26, 2025

A world model is a Gen AI model that allows a Robot to dream up potential realities before it makes an action

Human analogy

Imagine a 3-year old building a tower from blocks in the living-room

Whenever the tower falls over, the child throws a tantrum (as 3-year olds do)

Now, before pulling out the bottom block, the child hesitates, head tilting, eyes squinting

In that half-second pause you can almost hear toddler-gears turning as a private cartoon is generated, predicting exactly how the tower will topple and how Mom will react. The move is canceled; the tantrum is avoided.

Child demonstrating foresight with building blocks

What was just performed was a tiny act of foresight that roboticists call a world model

From Talkative Circuits to Day-Dreaming Machines

For the past few years AI headlines have read like a never-ending open-mic night for LLMs:

ChatGPT writes your emails, Claude drafts your legal docs, Gemini summarizes research papers.

However, words are not really sufficient to explain the world around us

Words lack gravity, friction, momentum; they float in a purely symbolic soup

Enter world models: neural networks that allow robots to “dream” before they act

A Crystal-Ball for Humanoids

Norwegian robotics company 1X has a super cool demo of a world model in action

Their humanoid robot, NEO, spends most of its workday imagining instead of moving

Inside a bank of GPUs it rolls the universe forward half a second in a dream-like state. It “dreams ahead” just long enough to understand that it needs to open the door before it can enter a room

It’s like “giving robots a crystal ball.”

How Do You Build a Robot Dream?

At the heart of most modern world models is a family of algorithms with dreamy names: Dreamer, Dreamer V3, DayDreamer.

They work like a Netflix series your laptop binge-watches overnight:

Compress sensory inputs (images, joint angles, touch)
Use a dynamics network, generate multiple possible futures
Hand each imagined future to a reward network that rates how much the robot would “enjoy” it
Wake up and act out the future with the highest score.

If that sounds abstract, imagine a chess grandmaster visualizing countless board positions before lifting a single pawn

Dreamer algorithm workflow for robot decision making

Robots are now learning the same tricks

NVIDIA’s Universe-in-a-Box

Training these hallucinations takes horsepower, and no one sells GPUs better than NVIDIA.

Earlier this year the company unwrapped Cosmos World Foundation Models, a suite of neural networks that synthesize videos from text or sensor prompts

This means you basically can now get “world-model-as-a-service”: upload a few camera feeds, click Generate, and receive a sandbox where your robot can practice for 10,000 hours without moving a joint

It’s the flight-sim metaphor for robots

Pilots rehearse thunderstorms on Windows PCs

Tomorrow’s robots now rehearse walking on slippery floors GPU clouds

Why Should Anyone Outside a Lab Care?

Because when robots can “dream,” they need far fewer real-world mistakes to learn. The economic ripple effects are enormous:

Safety first. Robots can now actually operate safely around people
Lower costs. Simulation time is much cheaper than running physical tests
Faster iteration. A new task, e.g. fold laundry, stock shelves, inspect turbines, etc., becomes a software update, not a redesign

Talkers Meet Walkers

LLMs aren’t going away; they’re getting new jazz partners

Imagine Alexa-GPT telling a household robot, “Please make coffee.”

The robot’s world model plans a safe grip, previews trajectories, and chooses a path that neither spills espresso nor steps on the dog

Language supplies the what; world modelling supplies the how.

That fusion, symbolic intent plus grounded physics, may be the closest thing yet to a full-stack artificial mind

The Road Still Has Potholes

No simulation is perfect

The friction of a surface may be different from what the GPU had predicted

The exact flow of running water is hard to simulate in your dreams

Bridging the so-called sim-to-real gap remains the grand challenge of Physical AI

But history suggests accuracy increases when costs of development falls

A Closing Dream

The next time you set a mug on your desk, take a quick pause

Somewhere, a robot in a server farm might be dreaming that exact maneuver thousands of times per minute, calibrating grip force, movement of the coffee, potential regret

When that robot finally rolls out of the factory and hands you a fresh cup of coffee without spilling a drop, remember: it practiced in its sleep

Future vision: robot serving coffee after extensive world model training

About Me

My name is Andreas, and I work at the interface between frontier technology and rapidly evolving business models, where I work to develop the frameworks, tools, and mental models to keep up and get ahead in our Technological World.

Having trained as a robotics engineer but also worked on the business / finance side for over a decade, I seek to understand those few asymmetric developments that truly shape our world

If you want to read about similar topics - or just have a chat - you can also find me on X, LinkedIn or www.andreasproesch.com

[Tech You Should Know]

Discussion about this post