Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS203 Advanced Computer Architecture, Lecture notes of Computer Science

CS203 Advanced Computer Architecture

Typology: Lecture notes

2023/2024

Uploaded on 05/23/2025

fancycode
fancycode 🇺🇸

7 documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Performance (2):
What can I change?
Hung-Wei Tseng
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download CS203 Advanced Computer Architecture and more Lecture notes Computer Science in PDF only on Docsity!

Performance (2):

What can I change?

Hung-Wei Tseng

Recap: von Neumann architecture

Processor

Memory

Storage

f30f1efa

4883ec

488d3d

0f0000e

dcffffff

31c

c408c30f

Instructions 1f

6c6c6f2c

20776f

6c

Data int main(){ printf(“Hello, world!\n”); }

f30f1efa

4883ec

488d3d

0f0000e

dcffffff

31c

c408c30f

Instructions 1f

6c6c6f2c

20776f

6c

Data Instruction Fetch Arithmetic Logical Units (ALU) Complex Arithmetic Operations (Mul/div) Branch/ Jump Memory Operations Instruction Decode Program Counter Registers

4883ec

sub $0x8,%rsp

0x8 0x

0x

0x10640x

By loading different programs into memory, your computer can perform different functions

Classic CPU Performance Equation (ET of a program) How many instruction “instances” for the program? How long do we need for each instruction on average? Instructions Program Cycles Instruction × Seconds Cycle × ×

Execution Time =

C Code x86 instructions int init_data(int64_t *data, int data_size) { register unsigned int i = 0 ; for(i = 0 ; i < data_size; i++) { s+=data[i]; } return s; } int main(int argc, char **argv) { int *data = malloc(8000000000); init_data(data, 1000000000) return 0; } init_data: .LFB16: endbr testl %esi, %esi jle .L leal -1(%rsi), %ecx xorq %rax, %rax .L3: movslq (%rdi), %rdx addq $4, %rdi addq %rdx, %rax cmpq %rcx, %rdi jne .L .L2: xorlq %rax, %rax ret 1000000000x If data memory access instructions takes 5 cycles, branch 2 cycles, others take only 1 cycle, CPU freq. = 4 GHz ET = ( 5 × 10 9 ) × 2 × 1 4 × 10 9 sec = 2.5 sec CPI

average

= 20 % × 5 + 20 % × 2 + 60 % × 1 ) = 2 memory inst.branch inst.

  • (^) If we have more iterations? Larger datasets? — potentially changes the IC
  • (^) What if the hardware trade (cheat) performance with accuracy?
  • (^) Cannot compare different ISA/compiler
    • (^) What if the compiler can generate code with fewer instructions?
    • (^) What if new architecture has more IC but also lower CPI?
  • (^) If floating point operations are not critical in the target application? Recap: Is TFLOPS (Tera FLoating-point Operations Per Second) a good metric?

TFLOPS =

of floating point instructions × 10

− 12 Exection Time

IC × % of floating point instructions × 10 − 12 IC × CPI × CT IC is gone!

% of floating point instructions × 10 − 12 CPI × CT

  • (^) What can affect each factor in the classic CPU performance

equation?

  • (^) Programming languages
  • (^) Programmers
  • (^) Compilers
  • (^) Complexity

Outline

What Affects Each Factor in

Performance Equation

https://www.pollev.com/hungweitseng close in

  • (^) Which of the following programming language needs to

highest instruction count to print “Hello, world!” on screen?

A. C B. C++ C. Java D. Perl E. Python

Programming languages

  • (^) Modern processors provides performance counters
    • (^) instruction counts
    • (^) cache accesses/misses
    • (^) branch instructions/mis-predictions
  • (^) How to get their values?
    • (^) You may use “perf stat” in linux
    • (^) You may use Instruments —> Time Profiler on a Mac
    • (^) Intel’s vtune — only works on Windows w/ intel processors
    • (^) You can also create your own functions to obtain counter values Use “performance counters” to figure out!

Instruction count LOC Ranking C 600k 6 1 C++ 3M 6 2 Java ~145M 8 5 Perl ~12M 4 3 Python ~33M 1 4 GO (Interpreter) ~1200M 1 6 GO (Compiled) ~1.7M 1 Rust ~1.4M 1

  • (^) How many instructions are there in “Hello, world!”

Programming languages

  • (^) Which of the following programming language needs to

highest instruction count to print “Hello, world!” on screen?

A. C B. C++ C. Java D. Perl E. Python

Programming languages

x86 ISA: the abstracted machine

24 CPU

Registers

RAX RBX RCX RDX RSP RBP RSI RDI R R R R R R R R RIP FLAGS CS SS DS ES FS GS Memory 64-bit 64-bit 264 Bytes ALU ADD SUB IMUL AND OR XOR JMP JE CALL RET 0x 0x 0x 0x 0x 0x 0x 0x 0xFFFFFFFFFFFFFFC 0xFFFFFFFFFFFFFFC 0xFFFFFFFFFFFFFFD 0xFFFFFFFFFFFFFFD 0xFFFFFFFFFFFFFFE 0xFFFFFFFFFFFFFFE 0xFFFFFFFFFFFFFFF 0xFFFFFFFFFFFFFFF MOV

Start with this simple program in C

int A[] = {1,2,3,4,5,6,7,8,9,10,1,2,3, ,5,6,7,8,9,10}; int main() { int i=0, sum=0; for(i = 0; i < 20; i++) { sum += A[i]; } return 0; } memory access logical operations arithmetic operations control flow operations Contents of section .text: 0000 f30f1efa 554889e5 c745f800 000000c 0010 45fc0000 0000c745 f8000000 00eb1e8b 0020 45f84898 488d1485 00000000 488d 0030 0000008b 04020145 fc8345f8 01837df 0040 137edcb8 00000000 5dc main: .LFB0: endbr pushq %rbp movq %rsp, %rbp movl $0, -8(%rbp) movl $0, -4(%rbp) movl $0, -8(%rbp) jmp .L .L3: movl -8(%rbp), %eax cltq leaq 0(,%rax,4), %rdx leaq A(%rip), %rax movl (%rdx,%rax), %eax addl %eax, -4(%rbp) addl $1, -8(%rbp) .L2: cmpl $19, -8(%rbp) jle .L movl $0, %eax popq %rbp ret Contents of section .data: 0000 01000000 02000000 03000000 04000000 0010 05000000 06000000 07000000 08000000 0020 09000000 0a000000 01000000 02000000 0030 03000000 04000000 05000000 06000000 0040 07000000 08000000 09000000 0a

Data
Compiler
Compiler
Instructions

Recap: How my “Java code” becomes a “program” 27

Compiler

(e.g., javac)

Java Bytecode (.class)

00c2e

00c2f

00c2f

00c

Data

cafebabe

001d0a

06000f

0800120a

Instructions 07001507 Source Code

Java Virtual

Machine (e.g., java)

Other (.class) 00c2e 00000008 00c2f 00000008 00c2f 00000008 00c 00000008 Data cafebabe 00000033 001d0a 06000f 00100011 0800120a 00130014 07001507 Instructions One Time Cost! Everytime when we run it!

What’s in your java classes?

public static int fibonacci(int n) {
if(n == 0)
return 0;
else if(n == 1)
return 1;
else
return fibonacci(n - 1) + fibonacci(n - 2);
0: iload_
1: ifne 6
4: iconst_
5: ireturn
6: iload_
7: iconst_
8: if_icmpne 13
11: iconst_
12: ireturn
13: iload_
14: iconst_
15: isub
16: invokestatic
19: iload_
20: iconst_
21: isub
22: invokestatic
25: iadd
26: ireturn

labels Most instructions doesn’t have an argument!