A processor can only compute as fast as it can be fed. Memory bandwidth and unified memory often decide real performance more than the core count on the box.
Buy a chip and the spec sheet leads with cores and clock speed. Those numbers describe how fast the processor can compute once it has data to work on. They say nothing about how fast it can get that data. For a large share of real workloads, the second question is the one that decides performance, and the answer comes down to memory bandwidth and how the memory is organized.
The processor starves#
A processor core does its work on data held in registers and small, fast caches right next to it. When the data it needs is not in cache, it has to fetch it from main memory, which is much slower to reach. While it waits, the core can stall, sitting idle because there is nothing to compute.
This is the memory wall: processors have gotten fast enough that feeding them is often the bottleneck. Adding more cores makes it worse, not better, in one specific way. More cores means more demand for data at once, all competing for the same path to memory. If that path is narrow, the extra cores spend more time waiting.
Two numbers describe that path:
- Bandwidth: how much data per second can move between memory and the processor. This is the width of the pipe.
- Latency: how long it takes to get a specific piece of data after asking for it. This is the delay before the pipe delivers.
Bandwidth tends to be the limiter for workloads that stream through large amounts of data: high-resolution media, large simulations, and especially modern AI models, which move enormous quantities of numbers.







Discussion