Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS203 Advanced Computer Architecture, Lecture notes of Computer Science

CS203 Advanced Computer Architecture

Typology: Lecture notes

2023/2024

Uploaded on 05/23/2025

fancycode
fancycode 🇺🇸

7 documents

1 / 56

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Performance (3):
Do the right thing
Hung-Wei Tseng
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38

Partial preview of the text

Download CS203 Advanced Computer Architecture and more Lecture notes Computer Science in PDF only on Docsity!

Performance (3):

Do the right thing

Hung-Wei Tseng

Recap: von Neumann architecture Processor Memory Storage f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f

Instructions 1f

08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000

Data

int main(){ printf(“Hello, world!\n”); } f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f

Instructions 1f

08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000

Data

Instruction Fetch Arithmetic Logical Units (ALU) Complex Arithmetic Operations (Mul/div) Branch/ Jump Memory Operations Instruction Decode Program Counter Registers 4883ec sub $0x8,%rsp 0x8 0x 0x 0x10640x By loading different programs into memory, your computer can perform different functions

  • (^) If we turn on “-O3” flag when using gcc to compile both code snippets A and B , how many of the following can we expect?

က Compiler optimizations can reduce IC for both

က Compiler optimizations can make the CPI lower for both

က Compiler optimizations can make the ET lower for both

က Compiler optimizations can transform code B into code A

A. 0

B. 1

C. 2

D. 3

E. 4

How compilers affect performance Compiler can apply loop unrolling, constant propagation naively to reduce IC for(i = 0 ; i < ARRAY_SIZE; i++) { for(j = 0 ; j < ARRAY_SIZE; j++) { c[i][j] = a[i][j]+b[i][j]; } } for(j = 0 ; j < ARRAY_SIZE; j++) { for(i = 0 ; i < ARRAY_SIZE; i++) { c[i][j] = a[i][j]+b[i][j]; } } A^ B Reduced IC does not necessarily mean lower CPI — compiler may pick one longer instruction to replace a few shorter ones Compiler cannot guarantee the combined effects lead to better performance! “Most compilers” will not significantly change programmer’s code since compiler cannot guarantee if doing that would affect the correctness

  • (^) What does better mean?
  • (^) Amdahl’s Law and its implications Outline

Quantitive Analysis of “Better”

  • (^) The relative performance between two machines, X and Y. Y is n

times faster than X

  • (^) The speedup of Y over X Speedup n = Execution Time X Execution Time Y Speedup = Execution Time X Execution Time Y
  • (^) Consider the same program on the following two machines, X and Y. By

how much Y is faster than X?

A. 0. B. 0. C. 0. D. 1. E. No changes Speedup of Y over X ET Y = ( 5 × 10 9 ) × ( 20 % × 7 + 20 % × 2 + 60 % × 1 ) × 1 6 × 10 9 secs = 2 secs Speedup = Execution Time X Execution TimeY =

2 = 1. Clock Rate Dynamic Instruction Count Percentage of Type-A Insts. CPI of Type-A Insts. Percentage of Type-B Insts. CPI of Type-B Insts. Percentage of Type-C Insts. CPI of Type-C Insts. Machine X 4 GHz 5000000000 20% 5 20% 2 60% 1 Machine Y 6 GHz 5000000000 20% 7 20% 2 60% 1 ET X = ( 5 × 10 9 ) × ( 20 % × 5 + 20 % × 2 + 60 % × 1 ) × 1 4 × 10 9 sec = 2.5 sec

Amdahl’s Law — and It’s

Implication in the Multicore Era

Mark D. Hill, University of Wisconsin-Madison

Michael R. Marty, Google

In IEEE Computer, vol. 41, no. 7

Amdahl’s Law Speedup enhanced ( f, s) =

( 1 − f ) + f s f — The fraction of time in the original program s — The speedup we can achieve on f Speedup enhanced = Execution Time baseline Execution Time enhanced

https://www.pollev.com/hungweitseng close in

  • (^) Final Fantasy XV spends lots of time loading a map — within which period that 95% of the time on the accessing the H.D.D., the rest in the operating system, file system and the I/O protocol. If we replace the H.D.D. with a flash drive, which provides 100x faster access time. By how much can we speed up the map loading process? A. ~7x B. ~10x C. ~17x D. ~29x E. ~100x Practicing Amdahl’s Law Hard Disk Drive Latency (us) 0 2000 4000 6000 8000 File System Operating System HDD
  • (^) We can apply Amdahl’s law for multiple optimizations
  • (^) These optimizations must be dis-joint!
    • (^) If optimization #1 and optimization #2 are dis-joint:
    • (^) If optimization #1 and optimization #2 are not dis-joint: Amdahl’s Law on Multiple Optimizations Speedup enhanced ( f Opt 1 , f Opt 2 , s Opt 1 , s Opt 2 ) =

( 1 − f Opt 1 − f Opt 2

f_Opt 1 s_Opt 1

f_Opt 2 s_Opt 2 Speedup enhanced ( f OnlyOpt 1 , f OnlyOpt 2 , f BothOpt 1 Opt 2 , s OnlyOpt 1 , s OnlyOpt 2 , s BothOpt 1 Opt 2 )

fOpt1 fOpt2 1-fOpt1-fOpt

fOnlyOpt1 fOnlyOpt2 fBothOpt1Opt2 1-fOnlyOpt1-fOnlyOpt2-fBothOpt1Opt

= 1 ( 1 − fOnlyOpt 1 − fOnlyOpt 2 − fBothOpt 1 Opt 2 ) + + f_BothOpt 1 Opt 2 s_BothOpt 1 Opt 2

f_OnlyOpt 1 s_OnlyOpt 1

f_OnlyOpt 2 s_OnlyOpt 2

https://www.pollev.com/hungweitseng close in

  • (^) With the latest flash memory technologies, the system spends 16% of time on accessing the flash, and the software overhead is now 84%. If your company ask you and your team to invent a new memory technology that replaces flash to achieve 2x speedup on loading maps, how much faster the new technology needs to be? A. ~5x B. ~10x C. ~20x D. ~100x E. None of the above Speedup further! Flash SSD Latency (us) 0 12.5 25 37.5 50 File System Operating System Hardware