#12 Why Smart Money is Chasing HBM Right Now

Deep dive of tech terms that you hear about more and more, but perhaps couldn't explain to your parents. This week: HIGH BANDWIDTH MEMORY (HBM)

Nov 10, 2025

“Nvidia’s trillion-dollar empire depends on a few millimeters of stacked silicon”

Why you need to understand what “HBM” means

Every investor in AI should understand one acronym right now:

HBM: High-Bandwidth Memory

It’s the component that actually decides how many GPUs Nvidia can ship, how fast models can train, and which countries actually can control the global AI infrastructure

(Hope that intro is interesting enough to keep you reading on)

HBM is a high-performance memory component that goes into GPUs, like Blackwell from NVIDIA

But it’s not NVIDIA who builds HBM. It’s three suppliers:

They are the world’s three suppliers of HBM that goes into every new generation GPU
And one company, TSMC, assembles nearly every GPU that HBM goes into.

Control that ecosystem, and you control the speed of AI progress itself

The Memory Wall

GPUs are like Formula 1 engines, but if you don’t have good fuel lines of data feeding that engine, it’s like fuelling that high performance engine with gasoline car through a paper straw

GPUs can perform billions of calculations per second, but the data for those operations needs to arrive fast enough if you want to do really heavy and fast computations

But it’s technically not just raw data, it is storing that data in short term memory for the LLMs to quickly access it

If you remember from previous articles, LLMs are based on Attention and Context, which basically means that they generate the next word based on all the words it generated before that word. This required short-term memory

And the more an LLM generates (e.g. the longer the conversation you have with ChatGPT, the more data you put into the context window, or the more complicated a task you give an AI agent), the more short-term memory you need

But accessing that memory depends your “memory bandwidth”

Celestial AI Wants To Break The Memory Wall, Fuse HBM With DDR5

The size of that bandwidth very quickly becomes the physical limit for every large model, training run, inference cluster, and data center build out

HBM’s task is to move memory closer and wider to the actual computing action

Instead of scattering lots of memory chips across a circuit board (think of what a classic circuit board looks like), HBM stacks them vertically like a skyscraper, linked by microscopic elevators and parked right beside the GPU

How HBM overcomes the memory wall for AI and processors | Lam Research posted on the topic | LinkedIn

Thousands of parallel data lanes can then deliver terabytes per second of throughput, in order to keep the GPU engines roaring

Packaging: The New Battlefield

Each GPU die has limited “beachfront property” where you can connect stuff to it. Some industry analysts will use the term “the shoreline problem”

There is basically only so much physical space on the edge of the GPU where you can connect other things to it

Enter TSMC’s CoWoS-L packaging (“Chip-on-Wafer-on-Substrate”), which literally extends the physical silicon edge/interface so Nvidia can anchor up to eight HBM stacks around a single GPU

That packaging capacity, not transistor density (ref. Moore’s Law, which is quickly becoming less and less relevant in terms of AI performance), now caps output of modern GPUs

So, when physical supply of CoWoS to TSMC in e.g. Taiwan get bottlenecked, GPU shipments get constrained

The Real Players Behind Nvidia

Here’s a quick overview of the ecosystem underpinning a major value pool of AI’s supply chain today :

SK hynix: Supplies most of the HBM3 and HBM4 memory used in Nvidia’s H100, H200, and next-gen Blackwell (B100/B200) chips
Nvidia has effectively pre-purchased years of Hynix capacity, which means they have locked up some of the best-yielding memory chips on Earth
Samsung: the “backup supplier”, racing to qualify its HBM4 as a second source by late 2025.
Micron: Developing U.S.-based HBM4 lines so Nvidia and the U.S. government have a domestic option
TSMC, meanwhile, assembles everything using CoWoS-L. Without TSMC’s packaging, even a warehouse full of HBM chips can’t become a single GPU

Nvidia may design the world’s best compute engines, but its ability to ship them depends entirely on this four-company ecosystem

So, the real engine of AI doesn’t live inside the GPU, but in the supply chain around it

Why Bandwidth Equals Money

In a data center, bandwidth determines your actual profits

A GPU waiting on memory is like an idle factory, with a massive cost of capital burning through your balance sheet really, really fast

Remember, GPUs require huge amounts of CAPEX upfront, so you need to make sure you’re feeding those GPUs with data continuously to make any return on that investment

Doubling memory bandwidth can lift GPU utilization from 60 % to 90 %, which is the equivalent of adding half again as many GPUs without buying any new ones (nice!)

That’s why Nvidia’s latest modules deliver up to 13 TB/s of memory throughput.
Faster memory translates directly into higher margins and lower energy per inference.

The KV-Cache Analogy

Let’s cover one more technical term, through I’ll probably write a separate post on it later:

KV-Cache

LLMs rely on remembering context (that’s the whole idea behind Transformers, really)

Every time an LLM generates a new word as part of an answer, it must flip through a growing library of everything it has already generated (aka. the “key-value cache” / KV Cache) to recall the context of what it is try to answer

Keeping that library in HBM let’s the LLM keep all that context readily available on your work desk, and accessible in the blink of an eye

But HBM is expensive

You can also store some of the context in in cheaper memory like LPDDR or flash but that would be like keeping all your notes in the basement: much slower but more affordable (unless your work desk is infinitely huge)

Modern AI systems move data dynamically between different types of memory storage, shuffling hot tokens into HBM and cold ones out, a kind of real-time cash-flow management for memory

This is why we’ve seen massive improvements in some LLM’s performance, even when you shrink the model size or constrain compute

In other words, if you manage the memory of an LLM efficiently, you can make much smaller models and get much more out of your existing compute capacity (super nice!)

To close off for today

The AI boom’s true bottleneck isn’t just algorithms or GPUs

It’s increasingly becoming “bandwidth per watt”

Understanding HBM design challenges - Rambus

Basically, if you only have a certain amount of power (watts) available for your data center, you want to make sure that the bandwidth of your GPUs is maximised

So one could argue that “packaging of memory” is a new version of Moore’s Law

And the companies mastering HBM (SK hynix, Samsung, Micron, and TSMC) quietly define the limits of the world’s AI capacity (and the growth of companies like NVIDIA and AMD?)

If compute is the engine, HBM is the bloodstream

And whoever controls that bloodstream controls the future of AI

[Tech You Should Know]

Discussion about this post