












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This course focuses on quantitative principle of computer design, instruction set architectures, datapath and control, memory hierarchy design, main memory, cache, hard drives, multiprocessor architectures, storage and I/O systems, computer clusters. This lecture includes: Synchronization, Parallel, Architecture, Shared, Memory, Performance, Multiprocessor, Symmetric, Distributed
Typology: Slides
1 / 20
This page cannot be seen from the preview
Don't miss anything!
Recap: Performance of Multiprocessors with
Synchronization in Parallel Architecture
So far we have discussed the sharing of caches for multi-processing in the:
symmetric shared-memory architecture
Distributed shared memory architecture
We have studied cache coherence problem in symmetric and distributed shared- memory multiprocessors; and have noticed that this problem is indeed performance- critical
Snooping protocols employ write invalidate and write broadcast techniques
Here, the block of memory is in one of the three states, and each cached-block tracks these three states; and
the controller responds to the read/write request for a block of memory or cached block, both from the processor and from the bus
Recap: Implementation Complications of snoopy protocols
The three states of the basic FSM are: Shared, Exclusive or Invalid
However, the complications such as: write races, interventions and invalidation have been observed in the implementation of snoopy protocols; and
to overcome these complications number of variations in the FSM controller have been suggested
These variations are: MESI Protocol, Barkley Protocol and Illinois Protocol
The larger multiprocessor systems employ distributed shared-memory , i.e., a separate memory per processor is provided
Here, the Cache Coherency is achieved using non-cached pages or directory containing information for every block in memory
The directory-based protocol tracks state of every block in every cache and finds the …..
Recap: Directory Based Protocol
…… caches having copies of block being dirty or clean
The directory-based protocol tracks state of every block in every cache and finds the caches having copies of block being dirty or clean
Similar to the Snoopy Protocol, the directory-based protocol are implemented by FSM having three states: Shared, Uncached and Exclusive
Recap: Directory Based Protocols
These protocols involve three processors or nodes, namely: local, home and remote nodes
Recap: Directory-based Protocol
The transactions are caused by the messages such as: read misses, write misses, invalidates or data fetch requests
These messages are sent to the directory to cause actions such as: update directory state and to satisfy requests
The controller tracks all copies of memory block; and indicates an action that updates the sharing set
Here, if the required data is not in the cache and is available in memory associated with the respective processor, then the state machine is said to be in Uncached state; and transition to other states is caused by messages such as: read miss, write miss, invalidates and data fetch request
A1 and A2 map to the same cache block
step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A P1: Read A1P2: Read A
P2: Write 40 to A
P2: Write 20 to A
Processor 1 Processor 2 Interconnect Directory Memory
step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A1 (^) Excl. A1 10 W rMsDaRp P1P1 A1A1 0 A1 Ex {P1} P1: Read A1P2: Read A
P2: Write 40 to A2 P2: Write 20 to A
Processor 1 Processor 2 (^) Interconnect Directory Memory
At Step 2 – P1 reads A1; CPU read HITs occurs, hence the FSM Stays in exclusive state
step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A1 (^) Excl. A1 10 W rMsDaRp P1P1 A1A1 0 A1 Ex {P1} P1: Read A1P2: Read A1 Excl. A1 10
P2: Write 40 to A
P2: Write 20 to A
Processor 1 Processor 2 (^) Interconnect Directory Mem
P2: Write 20 to A
A1 and A2 map to the same cache block
P1 P2 Bus Directory Memory step State Addr Value State Addr Value Action Proc. Addr Value Addr State {Procs} Value P1: Write 10 to A1 W rMs P1 A1 A1 Ex {P1} Excl. A1 10 DaRp P1 A1 0 P1: Read A1 Excl. A1 10 P2: Read A1 Shar. A1 RdMs P2 A Shar. A1 10 Ftch P1 A1 10 10 Shar. A1 10 DaRp P2 A1 10 A1 Shar.{P1,P2} 10 10 10 P2: Write 40 to A2 10
Processor 1 (^) Processor 2 Interconnect Memory
Directory
A
Write back
At Step 4: P2 write 20 to A
i) As A1 and A2 maps to the same cache block; P1 find a remote write, so the state of the controller changes from shared to Invalid ii) P2 find a CPU write, so places write miss on the bus and changes the state from shared to exclusive and writes value 20 to A iii) The director addresses to A1 with sharer-set containing {P2}