Download Vector Architecture - Principles of Computer Architecture - Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!
Vector Architecture
VECTOR PROCESSOR
Contents
History of Vector Architecture
Definition of Vector Processor
Why Vector Processing?
Basic Vector Architecture
Design of Vector Architecture
Implementation
Examples
Advantages and Dis-Advantages
Conclusion
History (cont’d)
- when used on data-intensive applications, such as Computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world.
- The ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.
Super Computers
- First successful implementation of vector processing - Control Data Corporation(CDC) STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC).
- The basic ASC (i.e., "one pipe") ALU used a pipeline architecture that supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain.
- The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600.
Cray-1 (cont’d)
- These vector-specific registers provided for faster computations than requiring memory access would allow.
- The Cray-1 also used a process called " vector chaining "
- The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.
Seymour Cray
The Father of Vector Processing and Supercomputing
Later Vector Processing
A number of companies attempted to follow up on the success of the Cray- machine, but none could really compete with Cray. Cray continued its dominance of the Vector Processing field with its Cray-2, Cray X-MP, and Cray Y-MP computers.
Vector Processor
- A vector processor , or array processor , is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors.
- Vector processors are special purpose computers that match a range of scientific computing tasks. These tasks usually consist of large active data sets, often poor locality, and long run times.
- Vector processors provide vector instructions.
Why is This More Efficient?
- Because there is only a need for one instruction , there is no need for vector processor to fetch and decode for many instructions. Thus, memory bandwidth and the control unit overhead are reduced considerably.
- Reduces branches and branch problems in pipelines
- The computation of each result in the vector is independent of the computation of other results in the same vector
- The Vector machine is faster at performing mathematical operations on larger vectors than is the MIPS machine.
Typical Vector Operations
- Add two vectors to produce a third
- Subtract two vectors to produce a third
- Multiply two vectors to produce a third
- Divide two vectors to produce a third
- Load a vector from memory
- Store a vector to memory
There are Two Specific Kinds of
Machines
- Memory to Memory : operands are fetched from memory and passed on directly to the functional unit. The results are then written back out to memory to complete the process.
- Register to Register: operands are loaded into a set of vector registers, the operands are fetched from the vector registers and the results are returned to a vector register.
Representation of Code in Two
Formats
for (i=0; i<N; i++)
{
C[i] = A[i] + B[i]; D[i] = A[i] - B[i];
}
Example Source Code
ADDV C, A, B SUBV D, A, B
Vector Memory-Memory Code
LV V1, A LV V2, B ADDV V3, V1, V SV V3, C SUBV V4, V1, V SV V4, D
Vector Register Code
Components of Vector Architecture
Vector Register : fixed length bank holding a single vector
- typically 8-32 vector registers, each holding 64-128 64-bit elements.
Vector Functional Units : (FUs)-fully pipelined, start new operation every clock
- typically 4 to 8 FUs: FP add, FP mul, FP reciprocal (1/X), integer add, logical shift. Vector Load-Store Units: ( LSUs)-fully pipelined unit to load or store a vector Scalar Registers: single element for FP scalar or address Cross-bar : to connect FUs, LSUs, registers
Properties of Vector Processor
- Single vector instruction implies lots of work ( loop) => fewer instruction fetches
- The computation of each result in the vector is independent of the computation of other results in the same vector
- Hardware need only check for data hazards between two vector instructions once per vector operand, not once for every element within the vectors.
- Vector instructions that access memory have a known access pattern.
- Because an entire loop is replaced by a vector instruction whose behavior is predetermined, control hazards that would normally arise from the loop branch are nonexistent.
- Reduces branches and branch problems in pipelines