#5 Tokens & Parameters, the Lego bricks and the muscles of an LLM

Deep dive of tech terms that you hear about more and more, but perhaps couldn't explain to your parents. This week: TOKENS & PARAMETERS

Jul 02, 2025

To listen to this essay as a 2-person podcast (11 minutes): PODCAST

If you want to understand how AI really works, not the sci-fi version, but the economics and mechanics of it, you need to understand two key terms:

Tokens
Parameters

Why do people talk about Tokens and Parameters?

OpenAI charges you by the token

In the press releases of all the major LLM providers, they seem to mention parameter counts like they were the protein contents in a power bar

But unless you're deep in the weeds of foundational model architecture, it can all feel like magic numbers

It’s not magic, it’s Math

And understanding it help you understand how models think, how vendors charge, and how to optimize your AI usage

Here’s the simplest way to think about it:

Tokens are the Lego bricks of language
Parameters are the muscles of intelligence

Let’s break it down

Tokens: The Lego Bricks of Language

In AI-speak, a token is a slice of text - but it’s not always a word

Sometimes it’s a full word: cat
Sometimes it’s one of several parts of a word: elec, tri, city
Sometimes it’s just punctuation: . or !

When you write a prompt like:

“Draft a funny email subject line for a productivity app.”

That sentence gets broken down into ~10–12 tokens depending on the tokenizer used

Why?

LLMs don’t read like humans, they read like machines

They need to deconstruct language into bite-sized chunks that can be mapped mathematically

This process is called tokenization, and it’s the first step in any model’s workflow.

How does tokenization actually work?

Most LLMs today use subword tokenization. You can think of it like Lego bricks of language. The most common method is Byte Pair Encoding (BPE)

BPE starts with characters and keeps merging frequently used character pairs until it builds a smart vocabulary

It’s like giving the model a toolbox:

Small bricks for rare or complex words.
Big bricks for common phrases. Efficient and flexible

But Tokens cost money

Every interaction with an LLM is priced based on token count

Both for what you send in, and what it sends back.

That’s why prompt engineering matters.

Writing with tokens is like sending a message via carrier pigeon that charges by the gram.

More tokens = higher cost

Just for context, 1,000 tokens ≈ 750 words ≈ 4 pages of text.

Newer models can handle a lot of tokens in a single context window

Claude 4.0 = 200,000 input tokens, 128,000 output tokens

GPT-4o = 128,000 input & output tokens

In orders of magnitude, that’s the length of a typical novel in a single context window

Parameters: The Muscles of an LLM

If tokens are the Lego bricks, parameters are the weights and wiring of an LLM

A parameter is a number, or more precisely a learned coefficient inside a neural network

To simplify, parameters are what turn static inputs into generative outputs

Think of the model as a giant brain made of billions of adjustable dials

During training, these dials get tuned to recognize patterns:

Grammar
Logic
Humor
Code structure
Sarcasm (sort of)

So when you hear “GPT-3 had 175 billion parameters,” that’s 175 billion knobs it has tuned to understand and generate language

And when DeepSeek R1 says “I’m a 671B MoE (Mixture-of-Experts) model but I only activate 37B parameters per token”

That’s like saying “I’m a stadium full of experts, but I only send the best few into the game each time.”

Smart and efficient

Why does this for YOU?

You probably don’t need to debug tensor flows on a daily basis, but you probably do need to make strategic decision during your work day

Even if token costs continues to drop a dramatic speeds, Jevon’s Paradox would argue that the cheaper something becomes the more it also gets used (if it also delivers increasing amounts of value)

Some examples of things to consider today (may vary in the future as LLMs develop):

Prompt length = cost control: A team writing 500 prompts/day can slash costs 40% by optimizing tokens.
Model choice = speed vs intelligence: Don’t throw the latest (trillion parameter?) model at every customer service request. Sometimes the small models are plenty enough
Multilingual? It’s trickier than it looks: Some languages (like Arabic or Chinese) may require 2–5× more tokens to say the same thing. That affects cost, latency, and accuracy.

If you’re deploying AI across a company, you’re not just buying compute, you’re managing a token economy (hey, that’s a great word for either a podcast, or a rock band)

Trade-offs & Hidden Complexities

A few things people often miss:

Token inflation in translation: Translating an English doc into German or Japanese? Expect multiples in terms of token usage
Parameter scaling ≠ linear improvement: Jumping from 7B to 70B parameters won’t give you 10x performance. Diminishing returns are real, and it’s often more about the model architecture than the number of parameters for a particular use case
Interoperability headaches: Every model has its own tokenizer. Switching between models (say from OpenAI to Mistral) may break your token expectations

Final Thoughts: Understand what it means for YOU

You don’t need to be a machine learning engineer to master this.

But if you understand Tokens and Parameters it will help you understand how Generative AI is evolving as a market place. Why some LLM providers do what they do, and why companies buy what they buy

Questions like:

Where are we spending tokens for different AI applications?
Are we overpaying for performance we don’t need?
Can we compress prompts without losing meaning?
Is a particular model over-parameterized for a given task?

Because in the age of AI, language is data, and data is money

Tokens and parameters are the Lego bricks and muscles of this new digital infrastructure.

Learn them. Understand them. Optimize them

Your bottom line will thank you

About Me

My name is Andreas, and I work at the interface between frontier technology and rapidly evolving business models, where I work to develop the frameworks, tools, and mental models to keep up and get ahead in our Technological World.

Having trained as a robotics engineer but also worked on the business / finance side for over a decade, I seek to understand those few asymmetric developments that truly shape our world

If you want to read about similar topics - or just have a chat - you can also find me on LinkTree, X, LinkedIn or www.andreasproesch.com

[Tech You Should Know]

Discussion about this post