





|                             | L1             | L2                | L3                                                       | Memory      | Disk          |
|-----------------------------|----------------|-------------------|----------------------------------------------------------|-------------|---------------|
| Type of Storage             | On-chip        | On-chip           | On-chip                                                  | Off-chip    | Disk          |
| Typical Size                | 100 KB         | 8 MB              | 32 MB                                                    | 32 GB       | Many<br>GBs   |
| Typical Access<br>Time (ns) | .25            | .50               | 10.8                                                     | 50          | 5,000,000     |
| Scaled Access<br>Time       | 1 second       | 2 seconds         | 43 seconds                                               | 3.3 minutes | 231 days      |
| Managed by                  | Hardware       | Hardware          | Hardware                                                 | OS          | OS            |
| uantitative Appro           | oach, Morgan-K | aufmann, 2007. (4 | Computer Architecto<br>the Edition)<br>tructions and one |             | ill often see |









Demonstrating the Cache-Miss Problem – Across Rows #define NUM 10000 float Array[NUM][NUM]; double MyTimer(); main( int argc, char \*argv[])
{ float sum = 0.: double start = MyTimer(); for( int i = 0; i < NUM; i++ ) for( int j = 0; j < NUM; j++ ) { sum += Array[ i ][ j ]; // access across a row } double finish = MyTimer( ); double row\_secs = finish - start; T Oregon State University Computer Graphics

9

11





























21 **Cache Architectures** N-way Set Associative – a cache line from a particular block of memory can appear in a limited number of places in cache. Each "limited place" is called a set of cache lines. A set contains N cache lines The memory block can appear in any cache line in its set. 0 Most Caches today are N-way Set Associative 2 N is typically 4 for L1 and 8 or 16 for L2  $\,$ 3 5 6 7 . 64 bytes Cache line blocks in memory (the numbers) T This would be called "2-way and what cache line set Set Set Set Set 0 1 2 3 Sets of Cache Lines they map to (the colors) University water Graph

21





How do you figure out where in cache a specific

memory address will live?

































































51 False Sharing – Fix #2 Stack #include <stdlib.h> struct s Makes this a private variable that lives in each thread's individual stack 1 float value: Stack } Array[4]; omp\_set\_num\_threads( 4); Common Program const int SomeBigNumber = 10000000; Executable #pragma omp parallel for for( int i = 0; i < 4/++ ) { foat trpp = vrray[ i ].value; for( int ] = 0; j < SomeBigNumber; j++ )</pre> Common Globals tmp = tmp + (float)rand( ); Array[ i ].value = tmp; Common Неар T This works because a localized temporary variable is created in each core's stack area, so little or no cache University mputer Graph line conflict exists

51





50





