Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Synchronization in Parallel Architecture-Advance Computer Architecture-Lecture Slides, Slides of Advanced Computer Architecture

This course focuses on quantitative principle of computer design, instruction set architectures, datapath and control, memory hierarchy design, main memory, cache, hard drives, multiprocessor architectures, storage and I/O systems, computer clusters. This lecture includes: Synchronization, Parallel, Architecture, Shared, Memory, Performance, Multiprocessor, Symmetric, Distributed

Typology: Slides

2011/2012

Uploaded on 08/06/2012

amrusha
amrusha 🇮🇳

4.4

(32)

149 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Today’s Topics
Recap:
Performance of Multiprocessors with
Symmetric Shared-Memory
Distributed Shared Memory
Synchronization in Parallel Architecture
Conclusion
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Synchronization in Parallel Architecture-Advance Computer Architecture-Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!

Today’s Topics

Recap: Performance of Multiprocessors with

  • Symmetric Shared-Memory
  • Distributed Shared Memory

Synchronization in Parallel Architecture

Conclusion

Recap: Cache Coherence Problem

So far we have discussed the sharing of caches for multi-processing in the:

 symmetric shared-memory architecture

 Distributed shared memory architecture

We have studied cache coherence problem in symmetric and distributed shared- memory multiprocessors; and have noticed that this problem is indeed performance- critical

Recap: Snooping Protocols

Snooping protocols employ write invalidate and write broadcast techniques

Here, the block of memory is in one of the three states, and each cached-block tracks these three states; and

the controller responds to the read/write request for a block of memory or cached block, both from the processor and from the bus

Recap: Implementation Complications of snoopy protocols

The three states of the basic FSM are: Shared, Exclusive or Invalid

However, the complications such as: write races, interventions and invalidation have been observed in the implementation of snoopy protocols; and

to overcome these complications number of variations in the FSM controller have been suggested

These variations are: MESI Protocol, Barkley Protocol and Illinois Protocol

Recap: Directory based Protocols

The larger multiprocessor systems employ distributed shared-memory , i.e., a separate memory per processor is provided

Here, the Cache Coherency is achieved using non-cached pages or directory containing information for every block in memory

The directory-based protocol tracks state of every block in every cache and finds the …..

Recap: Directory Based Protocol

…… caches having copies of block being dirty or clean

The directory-based protocol tracks state of every block in every cache and finds the caches having copies of block being dirty or clean

Similar to the Snoopy Protocol, the directory-based protocol are implemented by FSM having three states: Shared, Uncached and Exclusive

Recap: Directory Based Protocols

These protocols involve three processors or nodes, namely: local, home and remote nodes

  • Local node originates the request
  • Home node stores the memory location of an address
  • Remote node holds a copy of a cache block, whether exclusive or shared

Recap: Directory-based Protocol

The transactions are caused by the messages such as: read misses, write misses, invalidates or data fetch requests

These messages are sent to the directory to cause actions such as: update directory state and to satisfy requests

The controller tracks all copies of memory block; and indicates an action that updates the sharing set

Example: Working of Finite State Machine Controller

Here, if the required data is not in the cache and is available in memory associated with the respective processor, then the state machine is said to be in Uncached state; and transition to other states is caused by messages such as: read miss, write miss, invalidates and data fetch request

Example: Dealing with read/write misses

A1 and A2 map to the same cache block

step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A P1: Read A1P2: Read A

P2: Write 40 to A

P2: Write 20 to A

Processor 1 Processor 2 Interconnect Directory Memory

Example: Working of Finite State Machine Controller

  1. the state transition from Uncached to exclusive takes place these operations are shown here in red color

step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A1 (^) Excl. A1 10 W rMsDaRp P1P1 A1A1 0 A1 Ex {P1} P1: Read A1P2: Read A

P2: Write 40 to A2 P2: Write 20 to A

Processor 1 Processor 2 (^) Interconnect Directory Memory

Example: Working of Finite State Machine Controller

At Step 2 P1 reads A1; CPU read HITs occurs, hence the FSM Stays in exclusive state

step P1State^ Addr Value P2State^ Addr Value BusAction^ Proc. Addr Value DirectoryAddr State^ {Procs} MemoryValue P1: Write 10 to A1 (^) Excl. A1 10 W rMsDaRp P1P1 A1A1 0 A1 Ex {P1} P1: Read A1P2: Read A1 Excl. A1 10

P2: Write 40 to A

P2: Write 20 to A

Processor 1 Processor 2 (^) Interconnect Directory Mem

Example: Working of FSM Controller

P2: Write 20 to A

A1 and A2 map to the same cache block

P1 P2 Bus Directory Memory step State Addr Value State Addr Value Action Proc. Addr Value Addr State {Procs} Value P1: Write 10 to A1 W rMs P1 A1 A1 Ex {P1} Excl. A1 10 DaRp P1 A1 0 P1: Read A1 Excl. A1 10 P2: Read A1 Shar. A1 RdMs P2 A Shar. A1 10 Ftch P1 A1 10 10 Shar. A1 10 DaRp P2 A1 10 A1 Shar.{P1,P2} 10 10 10 P2: Write 40 to A2 10

Processor 1 (^) Processor 2 Interconnect Memory

Directory

A

Write back

Example: Working of Finite State Machine Controller

At Step 4: P2 write 20 to A

i) As A1 and A2 maps to the same cache block; P1 find a remote write, so the state of the controller changes from shared to Invalid ii) P2 find a CPU write, so places write miss on the bus and changes the state from shared to exclusive and writes value 20 to A iii) The director addresses to A1 with sharer-set containing {P2}