Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Vector Architecture - Principles of Computer Architecture - Lecture Slides, Slides of Advanced Computer Architecture

In this short course we study the basic concept of the principle of computer architecture. In these lecture slides the key points are:Vector Architecture, Vector Processor, Design of Vector Architecture, Vector Processing Development, Solomon Project, Massively Parallel Computing, Super Computers, Control Data Corporation, Vector Registers

Typology: Slides

2012/2013
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 04/23/2013

sarasvatir
sarasvatir 🇮🇳

4.5

(28)

86 documents

1 / 51

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Vector Architecture
VECTOR PROCESSOR
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
Discount

On special offer

Partial preview of the text

Download Vector Architecture - Principles of Computer Architecture - Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!

Vector Architecture

VECTOR PROCESSOR

Contents

 History of Vector Architecture

 Definition of Vector Processor

 Why Vector Processing?

 Basic Vector Architecture

 Design of Vector Architecture

 Implementation

 Examples

 Advantages and Dis-Advantages

 Conclusion

History (cont’d)

  • when used on data-intensive applications, such as Computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world.
  • The ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.

Super Computers

  • First successful implementation of vector processing - Control Data Corporation(CDC) STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC).
  • The basic ASC (i.e., "one pipe") ALU used a pipeline architecture that supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain.
  • The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600.

Cray-1 (cont’d)

  • These vector-specific registers provided for faster computations than requiring memory access would allow.
  • The Cray-1 also used a process called " vector chaining "
  • The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.

Seymour Cray

The Father of Vector Processing and Supercomputing

Later Vector Processing

 A number of companies attempted to follow up on the success of the Cray- machine, but none could really compete with Cray.  Cray continued its dominance of the Vector Processing field with its Cray-2, Cray X-MP, and Cray Y-MP computers.

Vector Processor

  • A vector processor , or array processor , is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors.
  • Vector processors are special purpose computers that match a range of scientific computing tasks. These tasks usually consist of large active data sets, often poor locality, and long run times.
  • Vector processors provide vector instructions.

Why is This More Efficient?

  • Because there is only a need for one instruction , there is no need for vector processor to fetch and decode for many instructions. Thus, memory bandwidth and the control unit overhead are reduced considerably.
  • Reduces branches and branch problems in pipelines
  • The computation of each result in the vector is independent of the computation of other results in the same vector
  • The Vector machine is faster at performing mathematical operations on larger vectors than is the MIPS machine.

Typical Vector Operations

  • Add two vectors to produce a third
  • Subtract two vectors to produce a third
  • Multiply two vectors to produce a third
  • Divide two vectors to produce a third
  • Load a vector from memory
  • Store a vector to memory

There are Two Specific Kinds of

Machines

  • Memory to Memory : operands are fetched from memory and passed on directly to the functional unit. The results are then written back out to memory to complete the process.
  • Register to Register: operands are loaded into a set of vector registers, the operands are fetched from the vector registers and the results are returned to a vector register.

Representation of Code in Two

Formats

for (i=0; i<N; i++)

{

C[i] = A[i] + B[i]; D[i] = A[i] - B[i];

}

Example Source Code

ADDV C, A, B SUBV D, A, B

Vector Memory-Memory Code

LV V1, A LV V2, B ADDV V3, V1, V SV V3, C SUBV V4, V1, V SV V4, D

Vector Register Code

Components of Vector Architecture

Vector Register : fixed length bank holding a single vector

  • typically 8-32 vector registers, each holding 64-128 64-bit elements.

Vector Functional Units : (FUs)-fully pipelined, start new operation every clock

  • typically 4 to 8 FUs: FP add, FP mul, FP reciprocal (1/X), integer add, logical shift.  Vector Load-Store Units: ( LSUs)-fully pipelined unit to load or store a vector  Scalar Registers: single element for FP scalar or address  Cross-bar : to connect FUs, LSUs, registers

Properties of Vector Processor

  • Single vector instruction implies lots of work ( loop) => fewer instruction fetches
  • The computation of each result in the vector is independent of the computation of other results in the same vector
  • Hardware need only check for data hazards between two vector instructions once per vector operand, not once for every element within the vectors.
  • Vector instructions that access memory have a known access pattern.
  • Because an entire loop is replaced by a vector instruction whose behavior is predetermined, control hazards that would normally arise from the loop branch are nonexistent.
  • Reduces branches and branch problems in pipelines