#5 Tokens & Parameters, the Lego bricks and the muscles of an LLM
Deep dive of tech terms that you hear about more and more, but perhaps couldn't explain to your parents. This week: TOKENS & PARAMETERS
To listen to this essay as a 2-person podcast (11 minutes): PODCAST
If you want to understand how AI really works, not the sci-fi version, but the economics and mechanics of it, you need to understand two key terms:
Tokens
Parameters
Why do people talk about Tokens and Parameters?
OpenAI charges you by the token
In the press releases of all the major LLM providers, they seem to mention parameter counts like they were the protein contents in a power bar
But unless you're deep in the weeds of foundational model architecture, it can all feel like magic numbers
It’s not magic, it’s Math
And understanding it help you understand how models think, how vendors charge, and how to optimize your AI usage
Here’s the simplest way to think about it:
Tokens are the Lego bricks of language
Parameters are the muscles of intelligence
Let’s break it down
Tokens: The Lego Bricks of Language
In AI-speak, a token is a slice of text - but it’s not always a word
Sometimes it’s a full word: cat
Sometimes it’s one of several parts of a word: elec, tri, city
Sometimes it’s just punctuation: . or !
When you write a prompt like:
“Draft a funny email subject line for a productivity app.”
That sentence gets broken down into ~10–12 tokens depending on the tokenizer used
Why?
LLMs don’t read like humans, they read like machines
They need to deconstruct language into bite-sized chunks that can be mapped mathematically
This process is called tokenization, and it’s the first step in any model’s workflow.
How does tokenization actually work?
Most LLMs today use subword tokenization. You can think of it like Lego bricks of language. The most common method is Byte Pair Encoding (BPE)
BPE starts with characters and keeps merging frequently used character pairs until it builds a smart vocabulary
It’s like giving the model a toolbox:
Small bricks for rare or complex words.
Big bricks for common phrases. Efficient and flexible
But Tokens cost money
Every interaction with an LLM is priced based on token count
Both for what you send in, and what it sends back.
That’s why prompt engineering matters.
Writing with tokens is like sending a message via carrier pigeon that charges by the gram.
More tokens = higher cost
Just for context, 1,000 tokens ≈ 750 words ≈ 4 pages of text.
Newer models can handle a lot of tokens in a single context window
Claude 4.0 = 200,000 input tokens, 128,000 output tokens
GPT-4o = 128,000 input & output tokens
In orders of magnitude, that’s the length of a typical novel in a single context window
Parameters: The Muscles of an LLM
If tokens are the Lego bricks, parameters are the weights and wiring of an LLM
A parameter is a number, or more precisely a learned coefficient inside a neural network
To simplify, parameters are what turn static inputs into generative outputs
Think of the model as a giant brain made of billions of adjustable dials
During training, these dials get tuned to recognize patterns:
Grammar
Logic
Humor
Code structure
Sarcasm (sort of)
So when you hear “GPT-3 had 175 billion parameters,” that’s 175 billion knobs it has tuned to understand and generate language
And when DeepSeek R1 says “I’m a 671B MoE (Mixture-of-Experts) model but I only activate 37B parameters per token”
That’s like saying “I’m a stadium full of experts, but I only send the best few into the game each time.”
Smart and efficient
Why does this for YOU?
You probably don’t need to debug tensor flows on a daily basis, but you probably do need to make strategic decision during your work day
Even if token costs continues to drop a dramatic speeds, Jevon’s Paradox would argue that the cheaper something becomes the more it also gets used (if it also delivers increasing amounts of value)
Some examples of things to consider today (may vary in the future as LLMs develop):
Prompt length = cost control: A team writing 500 prompts/day can slash costs 40% by optimizing tokens.
Model choice = speed vs intelligence: Don’t throw the latest (trillion parameter?) model at every customer service request. Sometimes the small models are plenty enough
Multilingual? It’s trickier than it looks: Some languages (like Arabic or Chinese) may require 2–5× more tokens to say the same thing. That affects cost, latency, and accuracy.
If you’re deploying AI across a company, you’re not just buying compute, you’re managing a token economy (hey, that’s a great word for either a podcast, or a rock band)
Trade-offs & Hidden Complexities
A few things people often miss:
Token inflation in translation: Translating an English doc into German or Japanese? Expect multiples in terms of token usage
Parameter scaling ≠ linear improvement: Jumping from 7B to 70B parameters won’t give you 10x performance. Diminishing returns are real, and it’s often more about the model architecture than the number of parameters for a particular use case
Interoperability headaches: Every model has its own tokenizer. Switching between models (say from OpenAI to Mistral) may break your token expectations
Final Thoughts: Understand what it means for YOU
You don’t need to be a machine learning engineer to master this.
But if you understand Tokens and Parameters it will help you understand how Generative AI is evolving as a market place. Why some LLM providers do what they do, and why companies buy what they buy
Questions like:
Where are we spending tokens for different AI applications?
Are we overpaying for performance we don’t need?
Can we compress prompts without losing meaning?
Is a particular model over-parameterized for a given task?
Because in the age of AI, language is data, and data is money
Tokens and parameters are the Lego bricks and muscles of this new digital infrastructure.
Learn them. Understand them. Optimize them
Your bottom line will thank you
About Me
My name is Andreas, and I work at the interface between frontier technology and rapidly evolving business models, where I work to develop the frameworks, tools, and mental models to keep up and get ahead in our Technological World.
Having trained as a robotics engineer but also worked on the business / finance side for over a decade, I seek to understand those few asymmetric developments that truly shape our world
If you want to read about similar topics - or just have a chat - you can also find me on LinkTree, X, LinkedIn or www.andreasproesch.com
![[Tech You Should Know]](https://substackcdn.com/image/fetch/$s_!ojWk!,w_80,h_80,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c0a09b-5df9-4610-bb7b-e72d07a36d55_1024x1024.png)




