#4 What on Earth is a "World Model"?
Deep dive of tech terms that you hear about more and more, but perhaps couldn't explain to your parents. This week: WORLD MODELS
A world model is a Gen AI model that allows a Robot to dream up potential realities before it makes an action
Human analogy
Imagine a 3-year old building a tower from blocks in the living-room
Whenever the tower falls over, the child throws a tantrum (as 3-year olds do)
Now, before pulling out the bottom block, the child hesitates, head tilting, eyes squinting
In that half-second pause you can almost hear toddler-gears turning as a private cartoon is generated, predicting exactly how the tower will topple and how Mom will react. The move is canceled; the tantrum is avoided.
What was just performed was a tiny act of foresight that roboticists call a world model
From Talkative Circuits to Day-Dreaming Machines
For the past few years AI headlines have read like a never-ending open-mic night for LLMs:
ChatGPT writes your emails, Claude drafts your legal docs, Gemini summarizes research papers.
However, words are not really sufficient to explain the world around us
Words lack gravity, friction, momentum; they float in a purely symbolic soup
Enter world models: neural networks that allow robots to “dream” before they act
A Crystal-Ball for Humanoids
Norwegian robotics company 1X has a super cool demo of a world model in action
Their humanoid robot, NEO, spends most of its workday imagining instead of moving
Inside a bank of GPUs it rolls the universe forward half a second in a dream-like state. It “dreams ahead” just long enough to understand that it needs to open the door before it can enter a room
It’s like “giving robots a crystal ball.”
How Do You Build a Robot Dream?
At the heart of most modern world models is a family of algorithms with dreamy names: Dreamer, Dreamer V3, DayDreamer.
They work like a Netflix series your laptop binge-watches overnight:
Compress sensory inputs (images, joint angles, touch)
Use a dynamics network, generate multiple possible futures
Hand each imagined future to a reward network that rates how much the robot would “enjoy” it
Wake up and act out the future with the highest score.
If that sounds abstract, imagine a chess grandmaster visualizing countless board positions before lifting a single pawn
Robots are now learning the same tricks
NVIDIA’s Universe-in-a-Box
Training these hallucinations takes horsepower, and no one sells GPUs better than NVIDIA.
Earlier this year the company unwrapped Cosmos World Foundation Models, a suite of neural networks that synthesize videos from text or sensor prompts
This means you basically can now get “world-model-as-a-service”: upload a few camera feeds, click Generate, and receive a sandbox where your robot can practice for 10,000 hours without moving a joint
It’s the flight-sim metaphor for robots
Pilots rehearse thunderstorms on Windows PCs
Tomorrow’s robots now rehearse walking on slippery floors GPU clouds
Why Should Anyone Outside a Lab Care?
Because when robots can “dream,” they need far fewer real-world mistakes to learn. The economic ripple effects are enormous:
Safety first. Robots can now actually operate safely around people
Lower costs. Simulation time is much cheaper than running physical tests
Faster iteration. A new task, e.g. fold laundry, stock shelves, inspect turbines, etc., becomes a software update, not a redesign
Talkers Meet Walkers
LLMs aren’t going away; they’re getting new jazz partners
Imagine Alexa-GPT telling a household robot, “Please make coffee.”
The robot’s world model plans a safe grip, previews trajectories, and chooses a path that neither spills espresso nor steps on the dog
Language supplies the what; world modelling supplies the how.
That fusion, symbolic intent plus grounded physics, may be the closest thing yet to a full-stack artificial mind
The Road Still Has Potholes
No simulation is perfect
The friction of a surface may be different from what the GPU had predicted
The exact flow of running water is hard to simulate in your dreams
Bridging the so-called sim-to-real gap remains the grand challenge of Physical AI
But history suggests accuracy increases when costs of development falls
A Closing Dream
The next time you set a mug on your desk, take a quick pause
Somewhere, a robot in a server farm might be dreaming that exact maneuver thousands of times per minute, calibrating grip force, movement of the coffee, potential regret
When that robot finally rolls out of the factory and hands you a fresh cup of coffee without spilling a drop, remember: it practiced in its sleep
About Me
My name is Andreas, and I work at the interface between frontier technology and rapidly evolving business models, where I work to develop the frameworks, tools, and mental models to keep up and get ahead in our Technological World.
Having trained as a robotics engineer but also worked on the business / finance side for over a decade, I seek to understand those few asymmetric developments that truly shape our world
If you want to read about similar topics - or just have a chat - you can also find me on X, LinkedIn or www.andreasproesch.com