Download CS203 Advanced Computer Architecture and more Lecture notes Computer Science in PDF only on Docsity!
Performance (1):
What does “perfect” mean?
Hung-Wei Tseng
Recap: Processors and memory modules are everywhere!
Processors Memory
Recap: Demo
if(option)
std::sort(data, data + arraySize);
for (unsigned c = 0 ; c < arraySize*1000; ++c) {
int t = std::rand();
(data[c%arraySize] >= t)
sum ++;
O ( nlog 2 n ) O ( n )
if option is set to 1: — but faster!!!
otherwise, O(n):
O ( nlog 2 n ) O ( n )
Recap: Demo (2) for(i = 0 ; i < ARRAY_SIZE; i++) { for(j = 0 ; j < ARRAY_SIZE; j++) { c[i][j] = a[i][j]+b[i][j]; } } for(j = 0 ; j < ARRAY_SIZE; j++) { for(i = 0 ; i < ARRAY_SIZE; i++) { c[i][j] = a[i][j]+b[i][j]; } } O ( n 2 ) O ( n 2 Complexity )
A^ B
A Lot Better! Performance? Worse
- (^) Definition of “Performance”
- (^) The classical CPU performance equation
- (^) Other important metrics Outline
What does it really mean by
performance?
https://www.pollev.com/hungweitseng close in
- (^) Comparing the experiments we have done with Gemini and ChatGPT, how many of the following metrics does Gemini outperforms ChatGPT? က Response time က Throughput က End-to-end latency (i.e., total execution time) က Quality of results A. 0 B. 1 C. 2 D. 3 E. 4 Gemini v.s. ChatGPT
- (^) Comparing the experiments we have done with Gemini and ChatGPT, how many of the following metrics does Gemini outperforms ChatGPT? က Response time က Throughput က End-to-end latency (i.e., total execution time) က Quality of results A. 0 B. 1 C. 2 D. 3 E. 4 Gemini v.s. ChatGPT
- (^) Comparing the experiments we have done with Gemini and ChatGPT, how many of the following metrics does Gemini outperforms ChatGPT? က Response time က Throughput က End-to-end latency (i.e., total execution time) က Quality of results A. 0 B. 1 C. 2 D. 3 E. 4 Gemini v.s. ChatGPT ?
- (^) End-to-end latency — how much time the program/operation takes from the beginning to the end
- (^) Response time — how much time the user starts to feel the program is running/finishing
- (^) Throughput/bandwidth — the average amount of work/data can the program/system deliver within the execution time
- (^) Energy consumption — the aggregated power during the execution time
- (^) Cost of operation — the amount of money necessary for finishing an operation
- (^) Quality of results — the human perception of the execution result
- (^) Power consumption — the heat generation produced by the circuit Important performance metrics
- (^) Latency is the most fundamental performance metric Takeaways: What does “perfect” mean?
Let’s start with “end-to-end latency” as the default metric — how long it takes to execute a program?
- (^) Consider the following c code snippet and x86 instructions implement the code snippet If (1) count is set to 1,000,000,000, (2) a memory instruction takes 5 cycles, (3) a branch/ jump instruction takes 2 cycles, (4) other instructions takes 1 cycle on average, and (5) the processor runs at 4 GHz, how much time is it take to finish executing the code snippet? A. 0.5 sec B. 1 sec C. 2.5 sec D. 3.75 sec E. 4 sec Performance equation C x for(i = 0 ; i < count; i++) { s += a[i]; } .L3: movslq (%rdi), %rdx addq $4, %rdi addq %rdx, %rax cmpq %rcx, %rdi jne .L
Execution time of a program in the von Neumann model
Processor
Memory
Storage
f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f
Instructions 1f
08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000
Data
int main(){ printf(“Hello, world!\n”); } f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f
Instructions 1f
08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000
Data
Instruction Fetch Arithmetic Logical Units (ALU) Complex Arithmetic Operations (Mul/div) Branch/ Jump Memory Operations Instruction Decode Program Counter Registers 4883ec sub $0x8,%rsp 0x8 0x 0x 0x How long do we need for each instruction on average? How many instruction “instances” for the program?