
















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
CS203 Advanced Computer Architecture
Typology: Lecture notes
1 / 56
This page cannot be seen from the preview
Don't miss anything!
Recap: von Neumann architecture Processor Memory Storage f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f
08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000
int main(){ printf(“Hello, world!\n”); } f30f1efa 4883ec 488d3d 0f0000e dcffffff 31c c408c30f
08400000 00000100 02004865 6c6c6f2c 20776f 6c 00000000 00000000
Instruction Fetch Arithmetic Logical Units (ALU) Complex Arithmetic Operations (Mul/div) Branch/ Jump Memory Operations Instruction Decode Program Counter Registers 4883ec sub $0x8,%rsp 0x8 0x 0x 0x10640x By loading different programs into memory, your computer can perform different functions
How compilers affect performance Compiler can apply loop unrolling, constant propagation naively to reduce IC for(i = 0 ; i < ARRAY_SIZE; i++) { for(j = 0 ; j < ARRAY_SIZE; j++) { c[i][j] = a[i][j]+b[i][j]; } } for(j = 0 ; j < ARRAY_SIZE; j++) { for(i = 0 ; i < ARRAY_SIZE; i++) { c[i][j] = a[i][j]+b[i][j]; } } A^ B Reduced IC does not necessarily mean lower CPI — compiler may pick one longer instruction to replace a few shorter ones Compiler cannot guarantee the combined effects lead to better performance! “Most compilers” will not significantly change programmer’s code since compiler cannot guarantee if doing that would affect the correctness
A. 0. B. 0. C. 0. D. 1. E. No changes Speedup of Y over X ET Y = ( 5 × 10 9 ) × ( 20 % × 7 + 20 % × 2 + 60 % × 1 ) × 1 6 × 10 9 secs = 2 secs Speedup = Execution Time X Execution TimeY =
2 = 1. Clock Rate Dynamic Instruction Count Percentage of Type-A Insts. CPI of Type-A Insts. Percentage of Type-B Insts. CPI of Type-B Insts. Percentage of Type-C Insts. CPI of Type-C Insts. Machine X 4 GHz 5000000000 20% 5 20% 2 60% 1 Machine Y 6 GHz 5000000000 20% 7 20% 2 60% 1 ET X = ( 5 × 10 9 ) × ( 20 % × 5 + 20 % × 2 + 60 % × 1 ) × 1 4 × 10 9 sec = 2.5 sec
Amdahl’s Law Speedup enhanced ( f, s) =
( 1 − f ) + f s f — The fraction of time in the original program s — The speedup we can achieve on f Speedup enhanced = Execution Time baseline Execution Time enhanced
https://www.pollev.com/hungweitseng close in
( 1 − f Opt 1 − f Opt 2
f_Opt 1 s_Opt 1
f_Opt 2 s_Opt 2 Speedup enhanced ( f OnlyOpt 1 , f OnlyOpt 2 , f BothOpt 1 Opt 2 , s OnlyOpt 1 , s OnlyOpt 2 , s BothOpt 1 Opt 2 )
= 1 ( 1 − fOnlyOpt 1 − fOnlyOpt 2 − fBothOpt 1 Opt 2 ) + + f_BothOpt 1 Opt 2 s_BothOpt 1 Opt 2
f_OnlyOpt 1 s_OnlyOpt 1
f_OnlyOpt 2 s_OnlyOpt 2
https://www.pollev.com/hungweitseng close in