Problem: The Path Between a CPU Chip and Off-chip Memory is Slow ${ }^{2}$


This path is relatively slow, forcing the CPU to wait for up to 200 clock cycles just to do a store to, or a load from, memory.

Depending on your CPU's ability to process instructions out-of-order, it might go idle during this time.

This is a huge performance hit!


Cache and Memory are Named by "Distance Level" from the ALU



Demonstrating the Cache-Miss Problem - Down Columns









False Sharing - the Effect of Spreading Your Data to Multiple Cache Lines ${ }^{40}$







