## Hardware implications on cache design

- Caches are basically *the* thing that make real workloads fast
- The size of a cache is inversely proportional to its speed
  - Smaller caches are faster
- And every bit counts
- This is why caches use as few **bits** as possible to do their work
  - This makes caches tricky to walk through as a human

# A simple cache

| addro | <u>ess string:</u> |
|-------|--------------------|
| 4     | 00000100           |
| 8     | 00001000           |
| 12    | 00001100           |
| 4     | 00000100           |
| 8     | 00001000           |
| 20    | 00010100           |
| 4     | 00000100           |
| 8     | 00001000           |
| 20    | 00010100           |
| 24    | 00011000           |
| 12    | 00001100           |
| 8     | 00001000           |
| 4     | 00000100           |
|       |                    |



4 entries, each block holds one word, any block can hold any word.

- A cache that can put a line of data anywhere is called
- The most popular replacement strategy is *LRU* (

).



- A cache that can put a line of data anywhere is called Fully Associative
- The most popular replacement strategy is LRU ( Least Recently Used ).

# A simpler cache

| 4  | <u>ess string:</u><br>00000100 |       |
|----|--------------------------------|-------|
| 8  | 00001000                       |       |
| 12 | 00001100                       |       |
| 4  | 00000100                       |       |
| 8  | 00001000                       | 00000 |
| 20 | 00010100                       | 00000 |
| 4  | 00000100                       |       |
| 8  | 00001000                       |       |
| 20 | 00010100                       |       |
| 24 | 00011000                       |       |
| 12 | 00001100                       |       |
| 8  | 00001000                       |       |
| 4  | 00000100                       |       |



4 entries, each block holds one word, each word in memory maps to exactly one cache location.

- A cache that can put a line of data in exactly one place is called
- Advantages/disadvantages vs. fully-associative?



- A cache that can put a line of data in exactly one place is called direct mapped
- Advantages/disadvantages vs. fully-associative?

#### A set-associative cache

| <u>addr</u> | ress string: |          |                                                        |      |     |      |  |
|-------------|--------------|----------|--------------------------------------------------------|------|-----|------|--|
| 4           | 00000100     |          |                                                        |      |     |      |  |
| 8           | 00001000     |          |                                                        |      |     |      |  |
| 12          | 00001100     | 00000100 |                                                        |      |     |      |  |
| 4           | 00000100     | 00000100 | tag                                                    | data | tag | data |  |
| 8           | 00001000     | ΙΓ       |                                                        |      |     |      |  |
| 20          | 00010100     | L [      |                                                        |      |     |      |  |
| 4           | 00000100     |          |                                                        |      |     |      |  |
| 8           | 00001000     |          | 4 entries, each block holds one word, each word        |      |     |      |  |
| 20          | 00010100     |          |                                                        |      |     |      |  |
| 24          | 00011000     |          | in memory maps to one of a set of <i>n</i> cache lines |      |     |      |  |
| 12          | 00001100     |          |                                                        |      |     |      |  |
| 8           | 00001000     |          |                                                        |      |     |      |  |
| 4           | 00000100     |          |                                                        |      |     |      |  |

- A cache that can put a line of data in exactly *n* places is called *n*-way\_\_\_\_\_
- The cache lines/blocks that share the same index are a cache



- A cache that can put a line of data in exactly *n* places is calle<u>d *n*-way set-associative</u>.
- The cache lines/blocks that share the same index are a cache **set**.

## **Longer Cache Blocks**

| addro | ess string: |
|-------|-------------|
| 4     | 00000100    |
| 8     | 00001000    |
| 12    | 00001100    |
| 4     | 00000100    |
| 8     | 00001000    |
| 20    | 00010100    |
| 4     | 00000100    |
| 8     | 00001000    |
| 20    | 00010100    |
| 24    | 00011000    |
| 12    | 00001100    |
| 8     | 00001000    |
| 4     | 00000100    |
|       |             |



4 entries, each block holds two words, each word in memory maps to exactly one cache location (this cache is twice the total size of the prior caches).

- Large cache blocks take advantage of *spatial locality*.
- Too large of a block size can waste cache space.
- Longer cache blocks require less tag space

## **Longer Cache Blocks**

| addr | ess string: |
|------|-------------|
| 4    | 00000100    |
| 8    | 00001000    |
| 12   | 00001100    |
| 4    | 00000100    |
| 8    | 00001000    |
| 20   | 00010100    |
| 4    | 00000100    |
| 8    | 00001000    |
| 20   | 00010100    |
| 24   | 00011000    |
| 12   | 00001100    |
| 8    | 00001000    |
| 4    | 00000100    |



4 entries, each block holds two words, each word in memory maps to exactly one cache location (this cache is twice the total size of the prior caches).

- Large cache blocks take advantage of *spatial locality*.
- Too large of a block size can waste cache space.
- Longer cache blocks require less tag space