Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Computation -to -communication ratio, Lecture notes of Advanced Computer Architecture

Memorial University of Newfoundland Advanced Computer Architecture

Computer Architecture Parallel programs

Typology: Lecture notes

2016/2017

Uploaded on 07/14/2017

wasan-fraihat 🇨🇦

1 document

1 / 58

This page cannot be seen from the preview

Don't miss anything!

CMPE655

CMPE655 -

-Shaaban

Shaaban

#1 lec # 3 Fall2013 9-10-2013

Parallel Computation/Program Issues

•

•Dependency Analysis:

Dependency Analysis:

–

–Types of dependency

Types of dependency

–

–Dependency Graphs

Dependency Graphs

–

–Bernstein

Bernstein’

’s Conditions of Parallelism

s Conditions of Parallelism

•

•Asymptotic Notations for Algorithm Complexity Analysis

Asymptotic Notations for Algorithm Complexity Analysis

•Parallel Random-Access Machine (PRAM)

–

–Example: sum algorithm on P processor PRAM

Example: sum algorithm on P processor PRAM

•

•Network Model of Message

Network Model of Message-

-Passing

Passing Multicomputers

Multicomputers

–Example: Asynchronous Matrix Vector Product on a Ring

•

•Levels of Parallelism in Program Execution

Levels of Parallelism in Program Execution

•

•Hardware Vs. Software Parallelism

Hardware Vs. Software Parallelism

•

•Parallel Task Grain Size

Parallel Task Grain Size

•

•Software Parallelism Types: Data Vs. Functional Parallelism

Software Parallelism Types: Data Vs. Functional Parallelism

•

•Example Motivating Problem With high levels of concurrency

Example Motivating Problem With high levels of concurrency

•

•Limited Parallel Program Concurrency: Amdahl

Limited Parallel Program Concurrency: Amdahl’

’s Law

s Law

•

•Parallel Performance Metrics: Degree of Parallelism (DOP)

Parallel Performance Metrics: Degree of Parallelism (DOP)

–

–Concurrency Profile

Concurrency Profile

•

•Steps in Creating a Parallel Program:

Steps in Creating a Parallel Program:

–

–1

1-

-Decomposition, 2

Decomposition, 2-

-Assignment, 3

Assignment, 3-

-Orchestration, 4

Orchestration, 4-

-(Mapping + Scheduling)

(Mapping + Scheduling)

–

–Program Partitioning Example (handout)

Program Partitioning Example (handout)

–

–Static Multiprocessor Scheduling Example (handout)

Static Multiprocessor Scheduling Example (handout)

PCA Chapter 2.1, 2.2

+ Average Parallelism

Partial preview of the text

Download Computation -to -communication ratio and more Lecture notes Advanced Computer Architecture in PDF only on Docsity!

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

Parallel Computation/Program IssuesParallel Computation/Program Issues•^ •

Dependency Analysis:Dependency Analysis:^ – –

Types of dependencyTypes of dependency

-^ –

Dependency GraphsDependency Graphs

-^ –

Bernstein’Bernstein

’s Conditions of Parallelisms Conditions of Parallelism

-^ •

Asymptotic Notations for Algorithm Complexity AnalysisAsymptotic Notations for Algorithm Complexity Analysis

-^

Parallel Random-Access Machine (PRAM)^ – –

Example: sum algorithm on P processor PRAMExample: sum algorithm on P processor PRAM

-^ •

Network Model of Message-Network Model of Message

-PassingPassing Multicomputers

Multicomputers

-^

Example: Asynchronous Matrix Vector Product on a Ring

•^

Levels of Parallelism in Program ExecutionLevels of Parallelism in Program Execution

-^ •

Hardware Vs. Software ParallelismHardware Vs. Software Parallelism

•^

Parallel Task Grain SizeParallel Task Grain Size

-^ •

Software Parallelism Types: Data Vs. Functional ParallelismSoftware Parallelism Types: Data Vs. Functional Parallelism

-^ •

Example Motivating Problem With high levels of concurrencyExample Motivating Problem With high levels of concurrency

•^

Limited Parallel Program Concurrency: AmdahlLimited Parallel Program Concurrency: Amdahl’

’s Laws Law

-^ •

Parallel Performance Metrics: Degree of Parallelism (DOP)Parallel Performance Metrics: Degree of Parallelism (DOP)–^ –

Concurrency ProfileConcurrency Profile

-^ •

Steps in Creating a Parallel Program:Steps in Creating a Parallel Program:^ – –

(^1) 1-- Decomposition,

2

Decomposition,

2-

Assignment, 3Assignment, 3-
Orchestration, 4Orchestration, 4-
(Mapping + Scheduling)(Mapping + Scheduling) -^ –

Program Partitioning Example (handout)Program Partitioning Example (handout)

-^ –

Static Multiprocessor Scheduling Example (handout)Static Multiprocessor Scheduling Example (handout)

PCA Chapter 2.1, 2.

Average Parallelism

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

Parallel Programs: Definitions

-^

A parallel program is comprised of a number of tasks running as threads (orprocesses) on a number of processing elements that cooperate/communicate aspart of a single parallel computation.

-^

Task

:

-^

Arbitrary piece of undecomposed work in parallel computation

-^

Executed sequentially on a single processor; concurrency in parallelcomputation is only across tasks.

-^

Parallel or Independent Tasks:^ –

Tasks that with no dependencies among them and thus can run in parallel ondifferent processing elements.

-^

Parallel Task Grain Size:

The amount of computations in a task.

-^

Process (thread)

:

-^

Abstract program entity that performs the computations assigned to a task

-^

Processes communicate and synchronize to perform their tasks

-^

Processor or (Processing Element)

:

-^

Physical computing engine on which a process executes sequentially

-^

Processes virtualize machine to programmer

-^

First write program in terms of processes, then map to processors

-^

Communication to Computation Ratio (C-to-C Ratio):

Represents the amount of

resulting communication between tasks of a parallel program

In general, for a parallel computation, a lower C-to-C ratio isdesirable and usually indicates better parallel performance

Other ParallelizationOverheads

Communication

Computation Parallel Execution Time

The processor with max. execution timedetermines parallel execution time

i.e At Thread Level Parallelism (TLP)

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

-^

Dependency analysis is concerned with detecting the presence andtype of dependency between tasks that prevent tasks from beingindependent and from running in parallel on different processorsand can be applied to tasks of any grain size.^ –

Represented graphically as task dependency graphs.

-^

Dependencies between tasks can be 1- algorithm/program related or2- hardware resource/architecture related.

•^

Algorithm/program Task Dependencies:^ –

Data Dependence:

-^

True Data or Flow Dependence

-^

Name Dependence:

-^

Anti-dependence

-^

Output (or write) dependence

–^

Control Dependence

•^

Hardware/Architecture Resource Dependence

Dependency Analysis & Conditions of Parallelism

Conditions of Parallelism

Down to task = instruction

A task only executes on one processor to which it has been mapped or allocated

Task Grain Size:Amount of computation ina task

Algorithm Related

Parallel Program and Programming Model RelatedAlgorithm Related

Parallel architecture related

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

Conditions of Parallelism:Conditions of Parallelism:Data & Name DependenceData & Name Dependence

Assume task S2 follows task S1 in sequential program order

1

True Data or Flow Dependence: Task S2 is data dependent ontask S1 if an execution path exists from S1 to S2 and if at leastone output variable of S1 feeds in as an input operand used by S

Represented by

S

in task dependency graphs

2

Anti-dependence: Task S2 is antidependent on S1, if S2 follows S1in program order and if the output of S2 overlaps the input of S

Represented by

S

S2 in dependency graphs

3

Output dependence: Two tasks S1, S2 are output dependent ifthey produce the same output variables (or output overlaps)

Represented by

S

S2 in task dependency graphs

S1.. .. S2 ProgramOrder NameDependencies

As part of the algorithm/computation

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

Name Dependence Classification:

Classification: Anti-Dependence

Task S2 is anti-dependent on task S

-^

Assume task S2 follows task S1 in sequential program order

-^

Task S1 reads one or more values from one or more names (registers ormemory locations)

-^

Task S2 writes one or more values to the same names (same registers ormemory locations read by S1)^ –

Then task S2 is said to be anti-dependent on task S

-^

Changing the relative execution order of tasks S1, S2 in the parallel programviolates this name dependence and may result in incorrect execution.

Task Dependency Graph Representation

S1 S

Anti-dependence

S1.. .. S2 ProgramOrder

Name: Register or Memory Location

S

(Read)

SharedNames

S

(Write)

S

⎯→

S

e.g. shared memory locationsin shared address space (SAS)

Does anti-dependence matter for message passing?

Program Related

CMPE655 -CMPE

Shaaban

Shaaban

#^

lec # 3

Fall

9-10-

Name Dependence Classification:

Classification:

Output (or Write) Dependence

Task S2 is output-dependent on task S

-^

Assume task S2 follows task S1 in sequential program order

-^

Both tasks S1, S2 write to the same a name or names (same registers ormemory locations)^ –

Then task S2 is said to be output-dependent on task S

-^

Changing the relative execution order of tasks S1, S2 in the parallel programviolates this name dependence and may result in incorrect execution.

Task Dependency Graph Representation

I J

Output dependence

Name: Register or Memory Location

S1.. .. S2 ProgramOrder

S

(Write)

SharedNames

S

(Write)

S

⎯→

S

e.g. shared memory locationsin shared address space (SAS)

Does output dependence matter for message passing?

Program Related

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

ADD.D

F2, F1, F

ADD.D

F4, F2, F

ADD.D

F2, F2, F

ADD.D

F4, F2, F

1 2 3 4

ADD.D F2, F1, F

1 ADD.D F4, F2, F

4

Dependency Graph ExampleDependency Graph Example

MIPS Code

Task Dependency graph

Here assume each instruction is treated as a taskADD.D F4, F2, F

2

ADD.D F2, F2, F

3

True Date Dependence:(1, 2)

(1, 3)

(2, 3)

(3, 4)

i.e.

1

⎯→

2

1

⎯→

3

2

⎯→

3

⎯→

4

Output Dependence:(1, 3)

(2, 4)

i.e.

1

⎯→

3

2

⎯→

4

Anti-dependence:(2, 3)

(3, 4)

i.e.

2

⎯→

3

⎯→

4

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

L.D

F0, 0 (R1)

ADD.D

F4, F0, F

S.D

F4, 0(R1)

L.D

F0, -8(R1)

ADD.D

F4, F0, F

S.D

F4, -8(R1)

1 2 3 4 5 6

L.D F0, 0 (R1)

1

ADD.D F4, F0, F

2

S.D F4, 0(R1)

3

ADD.D F4, F0, F

5

L.D F0, -8 (R1)

4

S.D F4, -8 (R1)

6

Can instruction 4 (second L.D) be movedjust after instruction 1 (first L.D)?If not what dependencies are violated?

Can instruction 3 (first S.D) be movedjust after instruction 4 (second L.D)?How about moving 3 after 5 (the second ADD.D)?If not what dependencies are violated?

Dependency Graph ExampleDependency Graph Example

(From 551)

MIPS Code

Task Dependency graph

Here assume each instruction is treated as a task

True Date Dependence:(1, 2)

(2, 3)

(4, 5)

(5, 6)

i.e.

1

⎯→

2

1 ⎯→

3

4 ⎯→

5

5 ⎯→

6

Output Dependence:(1, 4) (2, 5)i.e.

1

⎯→

4

2 ⎯→

5

Anti-dependence:(2, 4)

(3, 5) i.e.

2

⎯→

4

3 ⎯→

5

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

Conditions of ParallelismConditions of Parallelism

-^

Control Dependence:^ –

Order of execution cannot be determined before runtimedue to conditional statements.

-^

Resource Dependence:^ –

Concerned with conflicts in using shared resources amongparallel tasks, including:

-^

Functional units (integer, floating point), memory areas,communication links etc.

-^

Bernstein’s Conditions of Parallelism:Two processes P

1

, P

2

with input sets I

, I 1

2

and output sets

O

, O 1

2

can execute in parallel (denoted by P

1

|| P

) if: 2

I^1

∩

O

2

=

∅

I^2

∩

O

1

=

∅

O

1

∩

O

2

=

∅

i.e no output dependence

Order of Pi.e no flow (data) dependenceor anti-dependence(which is which?)

, P 1

? 2

i.e. Resultsproduced

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

Bernstein’Bernstein

’s Conditions: An Example

s Conditions: An Example

-^

For the following instructions P

, P 1

, P 2

, P 3

, P 4

: 5

-^

Each instruction requires one step to execute

-^

Two adders are available

P^1

: C = D x E P^2

: M = G + C P^3

: A = B + C P^4

: C = L + M P^5

: F = G

÷

E

Using Bernstein’s Conditions after checking statement pairs:

P^1

|| P

P^2

|| P

P^2

|| P

P^3

|| P

, 5

P^4

|| P

5

X^

P^1 D E +^3

P 4

+^2

P 3

+^1

P C 2

B

G L^

÷

P 5 G E

F

A

C

X^

P^1 D E +

P 1 2 +^3

P 4 ÷

P 5

G B

C P+ 2 F

3 A

L E^

G

C

M

Parallel execution in three stepsassuming two adders are availableper step

Sequentialexecution

Time

X

P^1

÷

P^5

+^2

+^3

+^1 P^2

P^4

P^3

Dependence graph:Data dependence (solid lines)Resource dependence (dashed lines)

P1Co-Begin

P1, P3, P Co-EndP

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

Asymptotic Notations for Algorithm AnalysisAsymptotic Notations for Algorithm Analysis^ ♦

Asymptotic Lower bound

:^

Big Omega Notation

Used in the analysis of the lower limit of algorithm performance

f(n) =

Ω

(g(n))

if there exist positive constants c, n

such that 0

| f(n) |

≥

c | g(n) |

for all

n > n

0

⇒

i.e.

g(n) is a lower bound on f(n)

♦

Asymptotic Tight bound:

Big Theta Notation

Used in finding a tight limit on algorithm performance

f(n) =

Θ

(g(n))

if there exist constant positive integers c

, c 1

, and n 2

such that 0

c^1

| g(n) |

≤

| f(n) |

≤

c

| g(n) | 2

for all

n > n

0

⇒

i.e.

g(n) is both an upper and lower bound on f(n)

AKA Tight bound

Ω

Θ

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

Graphs of

O,

f(n) =O(g(n))Upper Bound

cg(n) f(n)

n

0

f(n) =

(g(n))

Lower Bound

cg(n)

n

0

f(n)

f(n) =

(g(n))

Tight bound

c

g(n) 2

n

0

c

g(n) 1

f(n)

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

) 9-10-

log(

n n

2 n

)

log(

n

n 2

Rate of Growth of Common Computing Time^ Rate of Growth of Common Computing Time

FunctionsFunctions

O(1) < O(log n) < O(n) < O(n log n) < O (n

2 ) < O(n

3 ) < O(

n )

CMPE655 -CMPE

Shaaban

Shaaban

lec # 3

Fall

9-10-

Theoretical Models of Parallel Computers:Theoretical Models of Parallel Computers:^ •

Parallel Random-Access Machine (PRAM):^ –

p^

processor, global shared memory model.

-^

Models idealized parallel shared-memory computers with zerosynchronization, communication or memory access overhead.

-^

Utilized in parallel algorithm development and scalability andcomplexity analysis.

-^

PRAM variants: More realistic models than pure PRAM^ –

EREW-PRAM: Simultaneous memory reads or writes to/fromthe same memory location are not allowed.

-^

CREW-PRAM: Simultaneous memory writes to the samelocation is not allowed. (Better to model SAS MIMD?)

-^

ERCW-PRAM: Simultaneous reads from the same memorylocation are not allowed.

-^

CRCW-PRAM: Concurrent reads or writes to/from the samememory location are allowed.

Why? Sometimes used to model SIMD since no memory is shared

PRAM: An Idealized Shared-Memory Parallel Computer Model

Computation -to -communication ratio, Lecture notes of Advanced Computer Architecture

Related documents

Partial preview of the text

Download Computation -to -communication ratio and more Lecture notes Advanced Computer Architecture in PDF only on Docsity!

CMPE655 -CMPE

Shaaban

PCA Chapter 2.1, 2.

CMPE655 -CMPE

Shaaban

CMPE655 -CMPE

Shaaban

Dependency analysis is concerned with detecting the presence andtype of dependency between tasks that prevent tasks from beingindependent and from running in parallel on different processorsand can be applied to tasks of any grain size.^ –

Represented graphically as task dependency graphs.

Dependencies between tasks can be 1- algorithm/program related or2- hardware resource/architecture related.

•^

–^

•^

CMPE655 -CMPE

Shaaban

Assume task S2 follows task S1 in sequential program order

True Data or Flow Dependence: Task S2 is data dependent ontask S1 if an execution path exists from S1 to S2 and if at leastone output variable of S1 feeds in as an input operand used by S

Represented by

S

S

Anti-dependence: Task S2 is antidependent on S1, if S2 follows S1in program order and if the output of S2 overlaps the input of S

Represented by

S

Output dependence: Two tasks S1, S2 are output dependent ifthey produce the same output variables (or output overlaps)

Represented by

S

CMPE655 -CMPE

Shaaban

SharedNames

CMPE655 -CMPE

Shaaban

SharedNames

CMPE655 -CMPE

Shaaban

CMPE655 -CMPE

Shaaban

CMPE655 -CMPE

Shaaban

Order of execution cannot be determined before runtimedue to conditional statements.

Concerned with conflicts in using shared resources amongparallel tasks, including:

CMPE655 -CMPE

Shaaban

P^1

P^5

P^4

P^3

CMPE655 -CMPE

Shaaban

:^

CMPE655 -CMPE

Shaaban

Graphs of

O,

f(n) =O(g(n))Upper Bound

cg(n) f(n)

n

f(n) =

(g(n))

Lower Bound

cg(n)

n

f(n)

f(n) =

(g(n))

Tight bound

c

g(n) 2

n

c

g(n) 1

f(n)

CMPE655 -CMPE

Shaaban

CMPE655 -CMPE

Shaaban