Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

lab on cache direct mapping, set associative , Exercises of Computer Science

it explains the cache principle with simulation too;

Typology: Exercises

2016/2017

Uploaded on 12/24/2017

rajesh-tiwary
rajesh-tiwary 🇮🇳

1 document

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
1
Review!
How is this cache different if…!
-the block is 4 words?!
-the index field is 12 bits?!
0
1
2
3
...
...
1022
1023
Index Tag Data Valid
Address (32 bits)
=
Hit
10 20
Tag
2 bits
Mux
Data
8 8 8 8
8
2
2-way set associative implementation!
0
...
2k
Index Tag Data Valid
Address (m bits)
=
Hit
k (m-k-n)
Tag
2-to-1 mux
Data
2n
Tag Valid Data
2n
2n
=
Index Block
offset
Compare a 2-way cache set
associative cache with a
fully-associative cache?!
Only 2 comparators!
!needed!
Cache tags are a little!
!shorter too!
… deciding replacement?!
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download lab on cache direct mapping, set associative and more Exercises Computer Science in PDF only on Docsity!

Review

How is this cache different if…

- the block is 4 words?

- the index field is 12 bits?

Index Valid Tag Data Address (32 bits) = Hit

Tag 2 bits Mux Data

2-way set associative implementation

2 k Index Valid Tag Data Address (m bits) = Hit (m-k-n) k Tag 2-to-1 mux Data 2 n Valid Tag Data 2 n 2 n

Index (^) offsetBlock

Compare a 2-way cache set

associative cache with a

fully-associative cache?

Only 2 comparators

needed

Cache tags are a little

shorter too

… deciding replacement?

Set associative caches are a general idea

By now you have noticed the 1-way set associative

cache is the same as a direct-mapped cache

Similarly, if a cache has 2 k^ blocks, a 2 k -way set

associative cache would be the same as a fully-

associative cache

Set 0 1 2 3 Set 0 1 Set 1-way 8 sets, 1 block each 2-way 4 sets, 2 blocks each 4-way 2 sets, 4 blocks each 0 Set 8-way 1 set, 8 blocks direct mapped fully associative 4

Summary

Larger block sizes can take advantage of spatial

locality by loading data from not just one address,

but also nearby addresses, into the cache

Associative caches assign each memory address to a

particular set within the cache, but not to any

specific block within that set

 Set sizes range from 1 (direct-mapped) to 2 k^ (fully

associative)

 Larger sets and higher associativity lead to fewer cache

conflicts and lower miss rates, but they also increase the

hardware cost

 In practice, 2-way through 16-way set-associative caches

strike a good balance between lower miss rates and higher

costs

Next, we’ll talk more about measuring cache

performance, and also discuss the issue of writing

data to a cache

Inconsistent memory

But now the cache and memory contain different,

inconsistent data!

First Rule of Data Management: No inconsistent data

Second Rule: Don’t Even Think About Violating 1st^ Rule

How can we ensure that subsequent loads will return

the right value?

This is also problematic if other devices are sharing the

main memory, as in I/O or a multiprocessor system

Index V Tag Data Address ... 110 ...

Data 42803

Write-through caches

A write-through cache solves the inconsistency

problem by forcing all writes to update both the

cache and the main memory.

This is simple to implement and keeps the cache and

memory consistent

Why might it be not so good?

Index V Tag Data Address ... 110 ...

Data 21763

Mem[ 214 ] = 21763

Write-through caches

A write-through cache solves the inconsistency

problem by forcing all writes to update both the

cache and the main memory.

This is simple to implement and keeps the cache and

memory consistent.

The bad thing is that forcing every write to go to main

memory, we use up bandwidth between the cache

and the memory.

Index V Tag Data Address ... 110 ...

Data 21763

Mem[ 214 ] = 21763 10

Write buffers

Write-through caches can result in slow writes, so processors

typically include a write buffer, which queues pending writes to

main memory and permits the CPU to continue …

Buffers are commonly used when two devices run at different

speeds

 If a producer generates data too quickly for a consumer to handle,

the extra data is stored in a buffer and the producer can continue

on with other tasks, without waiting for the consumer

 Conversely, if the producer slows down, the consumer can

continue running at full speed as long as there is excess data in

the buffer

For us, the producer is the CPU and the consumer is the main

memory

Producer Buffer Consumer

Finishing the write back

We don’t need to store the new value back to main

memory unless the cache block gets replaced

For example, on a read from Mem[ 142 ], which maps to

the same cache block, the modified cache contents

will first be written to main memory

Only then can the cache block be replaced with data

from address 142

Index Tag^ Data ... 110 ...

Address Data 21763

Dirty 0

V

Dirty 1 Index Tag^ Data ... 110 ...

Address Data 21763

V

Write-back cache discussion

The advantage of write-back caches is that not

all write operations need to access main

memory, as with write-through caches

 If a single address is frequently written to, then it

doesn’t pay to keep writing that data through to

main memory

 If several bytes within the same cache block are

modified, they will only force one memory write

operation at write-back time

Write-back cache discussion

Each block in a write-back cache needs a dirty bit to

indicate whether or not it must be saved to main

memory before being replaced—otherwise we might

perform unnecessary writebacks

Notice the penalty for the main memory access will not

be applied until the execution of some subsequent

instruction following the write

 In our example, the write to Mem[ 214 ] affected only the

cache

 But the load from Mem[ 142 ] resulted in two memory

accesses: one to save data to address 214 , and one to load

data from address 142

  • The write can be “buffered” as was shown in write-through 16

Write misses

A second scenario is if we try to write to an address

that is not already contained in the cache; this is

called a write miss.

Let’s say we want to store 21763 into Mem[1101 0110]

but we find that address is not currently in the cache.

When we update Mem[1101 0110], should we also load

it into the cache?

Index V Tag Data Address ... 110 ...

Data 6378

Which is it?

Given the following trace of accesses, can you

determine whether the cache is write-allocate or

write-no-allocate?

 Assume A and B are distinct, and can be in the cache

simultaneously.

Load A

Store B

Store A

Load A

Load B

Load B

Load A

Miss

Miss

Miss

Hit

Hit

Hit

Hit

Which is it?

Given the following trace of accesses, can you

determine whether the cache is write-allocate or

write-no-allocate?

 Assume A and B are distinct, and can be in the cache

simultaneously.

Load A

Store B

Store A

Load A

Load B

Load B

Load A

Miss

Miss

Miss

Hit

Hit

Hit

Hit

On a write-allocate

cache this would

be a hit

Answer: Write-no-allocate

First Observations

Split Instruction/Data caches:

 Pro: No structural hazard between IF & MEM stages

  • A single-ported unified cache stalls fetch during load or store

 Con: Static partitioning of cache between instructions &

data

  • Bad if working sets unequal: e.g., code/ DATA or CODE /data

Cache Hierarchies:

 Trade-off between access time & hit rate

  • L1 cache can focus on fast access time (okay hit rate)
  • L2 cache can focus on good hit rate (okay access time)

 Such hierarchical design is another “big idea”

CPU L1 cache Main

Memory

L2 cache

Opteron Vital Statistics

L1 Caches: Instruction & Data

 64 kB

 64 byte blocks

 2-way set associative

 2 cycle access time

L2 Cache:

 1 MB

 64 byte blocks

 4-way set associative

 16 cycle access time (total, not just miss penalty)

Memory

 200+ cycle access time

CPU L1 cache Main

Memory

L2 cache