




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Hennesey and Patterson 4th edition solutions
Typology: Study Guides, Projects, Research
1 / 211
This page cannot be seen from the preview
Don't miss anything!
1.1.14 Program that translates symbolic instructions to binary instructions: (21) assembler
1.1.15 High-level language for business data processing: (25) Cobol
1.1.16 Binary language that the processor can understand: (19) machine language
1.1.17 Commands that the processors understand: (17) instruction
1.1.18 High-level language for scientific computation: (26) Fortran
1.1.19 Symbolic representation of machine instructions: (18) assembly language
1.1.20 Interface between user’s program and hardware providing a variety of services and supervision functions: (14) operating system
1.1.21 Software/programs developed by the users: (24) application software
1.1.22 Binary digit (value 0 or 1): (16) bit
1.1.23 Software layer between the application software and the hardware that includes the operating system and the compilers: (23) system software
1.1.24 High-level language used to write application and system software: (20) C
1.1.25 Portable language composed of words and algebraic expressions that must be translated into assembly language before run in a computer: (22) high-level language
1.1.26 10 12 or 2^40 bytes: (6) terabyte
Solution 1. 1.2.1 8 bits × 3 colors = 24 bits/pixel = 3 bytes/pixel.
a. Confi guration 1: 640 × 480 pixels = 179,200 pixels => 179,200 × 3 = 537,600 bytes/frame Confi guration 2: 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932, bytes/frame b. Confi guration 1: 1024 × 768 pixels = 786,432 pixels => 786,432 × 3 = 2,359, bytes/frame Confi guration 2: 2560 × 1600 pixels = 4,096,000 pixels => 4,096,000 × 3 = 12,288, bytes/frame
AQ 1
1.3.2 No. cycles = time × clock rate time = (No. Instr × CPI)/clock rate, then No. instructions = No. cycles/CPI
a. cycles(P1) = 10 × 3 × 109 = 30 × 109 s cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 109 No. instructions(P2) = 25 × 10 9 /1 = 25 × 109 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 109 b. cycles(P1) = 10 × 2 × 109 = 20 × 109 s cycles(P2) = 10 × 3 × 109 = 30 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 20 × 10 9 /1.2 = 16.66 × 109 No. instructions(P2) = 30 × 10 9 /0.8 = 37.5 × 109 No. instructions(P3) = 40 × 10 9 /2 = 20 × 109
1.3.3 timenew = time (^) old × 0.7 = 7 s
a. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 20 × 109 × 1.8 / 7 = 5.14 GHz f(P2) = 25 × 10 9 × 1.2 / 7 = 4.28 GHz f(P1) = 18.18 × 109 × 2.6 / 7 = 6.75 GHz b. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.44, CPI(P2) = 0.96, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 16.66 × 109 × 1.44/7 = 3.42 GHz f(P2) = 37.5 × 10 9 × 0.96/7 = 5.14 GHz f(P1) = 20 × 10 9 × 2.4/7 = 6.85 GHz
1.3.4 IPC = 1/CPI = No. instr/(time × clock rate)
a. IPC(P1) = 0. IPC(P2) = 1. IPC(P3) = 2. b. IPC(P1) = 2 IPC(P2) = 1. IPC(P3) = 0.
1.3.
a. Time (^) new/Timeold = 7/10 = 0.7. So fnew = fold /0.7 = 2.5 GHz/0.7 = 3.57 GHz. b. Timenew/Timeold = 5/8 = 0.625. So fnew = fold/0.625 = 4.8 GHz.
AQ 3
1.3.
a. Timenew /Timeold = 9/10 = 0.9. Then Instructionsnew = Instructions (^) old × 0.9 = 30 × 109 × 0.9 = 27 × 10 9. b. Time (^) new /Timeold = 7/8 = 0.875. Then Instructionsnew = Instructions (^) old × 0.875 = 26.25 × 109.
Solution 1.
1.4.
Class A: 10^5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 105 instr.
Time = No. instr × CPI/clock rate
a. Total time P1 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 105 × 3)/(2.5 × 109 ) = 10.4 × 10 −^4 s Total time P2 = (10^5 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 10 9 ) = 6.66 × 10 −^4 s b. Total time P1^ =^ (10^5 ×^2 +^2 ×^105 ×^ 1.5^ +^5 ×^10 5 ×^2 +^2 ×^10 5 )/(2.5^ ×^109 )^ =^ 6.8^ ×^10 −^4 s Total time P2 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 + 2 × 10 5 )/(3 × 10 9 ) = 4 × 10 −^4 s
1.4.2 CPI = time × clock rate/No. instr
a. CPI (P1) = 10.4 × 10 −^4 × 2.5 × 109 /10^6 = 2. CPI (P2) = 6.66 × 10 −^4 × 3 × 109 /10^6 = 2. b. CPI (P1)^ =^ 6.8^ ×^10 −^4 ×^ 2.5^ ×^10 9 /10^6 =^ 1. CPI (P2) = 4 × 10 −^4 × 3 × 10 9 /10^6 = 1.
1.4.
a. clock cycles (P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105 clock cycles (P2) = 105 × 2 + 2 × 10 5 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105 b. clock cycles (P1) = 17 × 105 clock cycles (P2) = 12 × 105
1.4.
a. (650 × 1 + 100 × 5 + 600 × 5 + 50 × 2) × 0.5 × 10–9^ = 2,125 ns b. (750 × 1 + 250 × 5 + 500 × 5 + 500 × 2) × 0.5 × 10–9^ = 2,750 ns
1.4.5 CPI = time × clock rate/No. instr
a. CPI = 2,125 × 10–9^ × 2 × 10^9 /1,400 = 3. b. CPI = 2,750 × 10–9^ × 2 × 10^9 /2,000 = 2.
Solution 1.
1.6.1 CPI = Texec × f/No. Instr
Compiler A CPI Compiler B CPI a. 1.8 1. b. 1.1 1.
1.6.2 f (^) A/fB = (No. Instr(A) ´ CPI(A))/(No. Instr(B) ´ CPI(B))
a. fA/fB = 1 b. fA/fB = 0.
1.6.
Speedup vs. Compiler A Speedup vs. Compiler B a. T (^) new /TA = 0.36 Tnew/T (^) B = 0. b. T (^) new /TA = 0.6 Tnew/T (^) B = 0.
1.6.
P1 Peak P2 Peak a. 4 × 10^9 Inst/s^ 2 × 10^9 Inst/s b. 4 × 10^9 Inst/s^ 3 × 10^9 Inst/s
1.6.5 Speedup, P1 versus P2:
a. T 1 /T 2 = 1. b. T 1 /T 2 = 1.
1.6.
a. 4.37 GHz b. 6 GHz
Solution 1.
1.7.
Geometric mean clock rate ratio = (1.28 × 1.56 × 2.64 × 3.03 × 10.00 × 1.80 × 0.74) 1/7^ = 2.
Geometric mean power ratio = (1.24 × 1.20 × 2.06 × 2.88 × 2.59 × 1.37 × 0.92)1/7^ = 1.
1.7. Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4 Willamette) Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pentium Pro)
1.7. Clock rate: 2.667 × 109 /12.5 × 106 = 213. Power: 95 W/3.3 W = 28.
1.7.4 C = P/V 2 × clock rate 80286: C = 0.0105 × 10 −^6 80386: C = 0.01025 × 10 −^6 80486: C = 0.00784 × 10 −^6 Pentium: C = 0.00612 × 10 −^6 Pentium Pro: C = 0.0133 × 10 −^6 Pentium 4 Willamette: C = 0.0122 × 10 −^6 Pentium 4 Prescott: C = 0.00183 × 10 −^6 Core 2: C = 0.0294 × 10 −^6
1.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)
1.7. Pentium to Pentium Pro: 3.3/5 = 0. Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0. Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0. Pentium 4 Prescott to Core 2: 1.1/1.25 = 0. Geometric mean = 0.
Solution 1. 1.8.1 Power = V 2 × clock rate × C. Power 2 = 0.9 Power 1
a. C 2 /C 1 = 0.9 × 1.75^2 × 1.5 × 10^9 /(1.2^2 × 2 × 10^9 ) = 1. b. C 2 /C 1 = 0.9 × 1.1^2 × 3 × 10^9 /(0.8 2 × 4 × 10^9 ) = 1.
1.8.2 Power 2 /Power 1 = V 22 × clock rate 2 /(V 12 × clock rate 1 )
a. Power 2 /Power 1 = 0.62 => Reduction of 38% b. Power 2 /Power 1 = 0.7 => Reduction of 30%
1.9.
a. Power (^) st /Power (^) dyn = 10/50 = 0. b. Power (^) st /Power (^) dyn = 60/90 = 0.
1.9.4 Powerst/Powerdyn = 0.6 => Powerst = 0.6 × Powerdyn
a. Power^ st = 0.6 × 35 W = 21 W b. Power^ st = 0.6 × 30 W = 18 W
1.9.
1.2 V 1.0 V 0.8 V a. P^ st = 12.5 W Pdyn = 62.5 W
Pst = 10 W Pdyn = 50 W
Pst = 5.8 W Pdyn = 29.2 W b. Pst = 24.8 W Pdyn = 37.2 W
Pst = 20 W Pdyn = 30 W
Pst = 12 W Pdyn = 18 W
1.9.
a. 29. b. 23.
Solution 1.
a. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096
b. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096
Then defect per area = (2/die area)(y−1/2^ − 1) 1.12.1 CPI = clock rate × CPU time/instr count
- Chapter 1 Solutions S
1.12.
(12.86 × 14.98)1/2^ = 13.
1.12.4 CPU time = No. instr × CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is, 10%. 1.12.5 CPU time(before) = No. instr × CPI/clock rate CPU time(after) = 1.1 × No. instr × 1.05 × CPI/clock rate CPU times(after)/CPU time(before) = 1.1 × 1.05 = 1.155 Thus, CPU time is increased by 15.5%. 1.12.6 SPECratio = reference time/CPU time SPECratio(after)/SPECratio(before) = CPU time(before)/CPU time(after) = 1/1.1555 = 0.86. Thus, the SPECratio is decreased by 14%.
Solution 1. 1.13.1 CPI = (CPU time × clock rate)/No. instr
a. CPI = 700 × 4 × 10^9 /(0.85 × 2,389 × 10^9 ) = 1. b. CPI = 620 × 4 × 10^9 /(0.85 × 1,658 × 10^9 ) = 1.
1.13.2 Clock rate ratio = 4 GHz/3 GHz = 1.
a. CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1. b. CPI @ 4 GHz = 1.75, CPI @ 3 GHz = 1.26, ratio = 1.
They are different because although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage.
1.13.
a. 700/750 = 0.933. CPU time reduction: 6.7% b. 620/700 = 0.886. CPU time reduction: 11.4%
1.13.4 No. instr = CPU time × clock rate/CPI
a. No. instr = 960 × 0.9 × 4 × 10^9 /1.61 = 2,146 × 10^9 b. No. instr = 690 × 0.9 × 4 × 10^9 /1.79 = 1,387 × 10^9
1.13.5 Clock rate = no. instr × CPI/CPU time. Clock ratenew = no. instr × CPI/0.9 × CPU time = 1/0.9 clock rateold = 3.33 GHz
1.14.
a. T(P1) = (5 × 10^5 × 0.75 + 4 × 10^5 × 1 + 10 × 10^5 × 1.5)/(4 × 10^9 ) = 5.86 × 10–4^ s CPI(P1) = 5.86 × 10–4^ × 4 × 10^9 /10 6 = 2. MIPS(P1) = 4 × 10^9 /(2.27 ×10^6 ) = 1.76 × 10^3 T(P2) = (2 × 10^6 × 1.25 + 2 × 10^6 × 0.8 + 1 × 10^6 × 1.25)/(3 × 10^9 ) = 1.78 × 10–3^ s CPI(P2) = 1.78 × 10–3^ × 3 × 10^9 /(5 × 10^6 ) = 1.068 s MIPS(P2) = 3 × 10^9 /(1.068 × 10^6 ) = 2.78 × 10^3 b. T(P1) = (1.5 × 10^6 × 1.5 + 1.5 × 10^6 × 1 + 2 × 10^6 × 2)/(4 × 10^9 ) = 1.93 × 10–3^ s CPI(P1) = 1.93 × 10–3^ × 4 × 10^9 /(5 × 10^6 ) = 1. MIPS(P1) = 4 × 10^9 /(1.54 × 10^6 ) = 2.59 × 10^3 T(P2) = (0.8 × 10^6 × 1.25 + 0.6 × 10^6 × 1 + 0.6 × 10^6 × 2.5)/(3 × 10^9 ) = 1.03 × 10–3^ s CPI(P2) = 1.03 × 10–3^ × 3 × 10^9 /(2 ×10^6 ) = 1. MIPS(P1) = 3 × 10^9 /(1.54 × 10^6 ) = 1.94 × 10^3
1.14.
a. T(P1) = 5.86 × 10–4^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 1.7 × 10^3 T(P2) = 1.78 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 5.6 × 10^2 perf(P1) > perf(P2), MIPS(P1) > MIPS(P2), MFLOPS(P1) < MFLOPS(P2) b. T(P1) = 1.93 × 10–3^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 5.1 × 10^2 T(P2) = 1.03 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 9.7 × 10^2 perf(P1) < perf(P2), MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2)
Solution 1. 1.15.
a. T (^) fp = 70 × 0.8 = 56 s. Tnew= 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% b. Tfp = 40 × 0.8 = 32 s. Tnew= 32 + 90 + 60 + 20 = 202 s. Reduction: 3.8%
1.15.
a. T (^) new = 250 × 0.8 = 200 s, Tfp + Tl/s + Tbranch = 165 s, T (^) int = 35 s. Reduction time INT: 58.8% b. T (^) new = 210 × 0.8 = 168 s, Tfp + Tl/s + Tbranch = 120 s, T (^) int = 48 s. Reduction time INT: 46.6%
1.15.
a. Tnew = 250 × 0.8 = 200 s, Tfp + Tint + Tl/s = 210 s. NO b. Tnew = 210 × 0.8 = 168 s, Tfp + Tint + Tl/s = 190 s. NO
1.15.
Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr.
T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109
a. 2 processors: clock cycles = 4,096 × 10^6 ; Tcpu = 2.048 s b. 16 processors: clock cycles = 512 × 10^6 ; Tcpu = 0.256 s
To half the number of clock cycles by improving the CPI of FP instructions:
CPIimproved fp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/
CPIimproved fp = (clock cycles/2 − (CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.))/No. FP instr.
a. 2 processors: CPIimproved fp = (2,048 – 3,816)/280 < 0 ==> not possible b. 16 processors: CPIimproved fp = (256 – 462)/50 < 0 ==> not possible
1.15.5 Using the clock cycle data from 1.15.4:
To half the number of clock cycles improving the CPI of L/S instructions:
CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/
CPIimproved l/s = (clock cycles/2 − (CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) branch × No. branch instr.))/No. L/S instr.
a. 2 processors: CPIimproved l/s = (2,048 – 1,536)/640 = 0. b. 16 processors: CPIimproved l/s = (256 – 198)/80 = 0.
1.15.
Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.
T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109
1.16.5 Geometric mean of computing time ratios = 0.62. Multiplying this by the computing time for a 64-processor system gives a computing time for a 128-processor system of 11.474 ms.
Geometric mean of routing time ratios = 1.19. Multiplying this by the routing time for a 64-processor system gives a routing time for a 128-processor system of 30.9 ms.
1.16.6 Computing time = 201/0.62 = 324 ms. Routing time = 0, since no communica- tion is required.
Author Query
AQ 1: Page S2: As meant t/o? AQ 2: Page S3: As meant t/o? AQ 3: Page S4: Close up t/o? AQ 4: Page S12: Inserted heading OK? AQ 5: Page S18: Blank cells as meant?