Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Hennesey and Patterson 4th edition solutions, Study Guides, Projects, Research of Advanced Computer Architecture

Hennesey and Patterson 4th edition solutions

Typology: Study Guides, Projects, Research

2017/2018

Uploaded on 04/25/2018

niket-agrawal
niket-agrawal 🇮🇳

4.9

(9)

1 document

1 / 211

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 Solutions
Solution 1.1
1.1.1 Computer used to run large problems and usually accessed via a network:
(3) servers
1.1.2 1015 or 250 bytes: (7) petabyte
1.1.3 A class of computers composed of hundred to thousand processors and tera-
bytes of memory and having the highest performance and cost: (5) supercomputers
1.1.4 Today’s science fi ction application that probably will be available in near
future: (1) virtual worlds
1.1.5 A kind of memory called random access memory: (12) RAM
1.1.6 Part of a computer called central processor unit: (13) CPU
1.1.7 Thousands of processors forming a large cluster: (8) data centers
1.1.8 Microprocessors containing several processors in the same chip: (10) multi-
core processors
1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-
work: (4) low-end servers
1.1.10 A computer used to running one predetermined application or collection
of software: (9) embedded computers
1.1.11 Special language used to describe hardware components: (11) VHDL
1.1.12 Personal computer delivering good performance to single users at low cost:
(2) desktop computers
1.1.13 Program that translates statements in high-level language to assembly
language: (15) compiler
Sol01-9780123747501.indd S1Sol01-9780123747501.indd S1 9/5/11 11:24 AM9/5/11 11:24 AM
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Hennesey and Patterson 4th edition solutions and more Study Guides, Projects, Research Advanced Computer Architecture in PDF only on Docsity!

1 Solutions

Solution 1.

1.1.1 Computer used to run large problems and usually accessed via a network:

(3) servers

1.1.2 10 15 or 2^50 bytes: (7) petabyte

1.1.3 A class of computers composed of hundred to thousand processors and tera-

bytes of memory and having the highest performance and cost: (5) supercomputers

1.1.4 Today’s science fiction application that probably will be available in near

future: (1) virtual worlds

1.1.5 A kind of memory called random access memory: (12) RAM

1.1.6 Part of a computer called central processor unit: (13) CPU

1.1.7 Thousands of processors forming a large cluster: (8) data centers

1.1.8 Microprocessors containing several processors in the same chip: (10) multi-

core processors

1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-

work: (4) low-end servers

1.1.10 A computer used to running one predetermined application or collection

of software: (9) embedded computers

1.1.11 Special language used to describe hardware components: (11) VHDL

1.1.12 Personal computer delivering good performance to single users at low cost:

(2) desktop computers

1.1.13 Program that translates statements in high-level language to assembly

language: (15) compiler

S2 Chapter 1 Solutions

1.1.14 Program that translates symbolic instructions to binary instructions: (21) assembler

1.1.15 High-level language for business data processing: (25) Cobol

1.1.16 Binary language that the processor can understand: (19) machine language

1.1.17 Commands that the processors understand: (17) instruction

1.1.18 High-level language for scientific computation: (26) Fortran

1.1.19 Symbolic representation of machine instructions: (18) assembly language

1.1.20 Interface between user’s program and hardware providing a variety of services and supervision functions: (14) operating system

1.1.21 Software/programs developed by the users: (24) application software

1.1.22 Binary digit (value 0 or 1): (16) bit

1.1.23 Software layer between the application software and the hardware that includes the operating system and the compilers: (23) system software

1.1.24 High-level language used to write application and system software: (20) C

1.1.25 Portable language composed of words and algebraic expressions that must be translated into assembly language before run in a computer: (22) high-level language

1.1.26 10 12 or 2^40 bytes: (6) terabyte

Solution 1. 1.2.1 8 bits × 3 colors = 24 bits/pixel = 3 bytes/pixel.

a. Confi guration 1: 640 × 480 pixels = 179,200 pixels => 179,200 × 3 = 537,600 bytes/frame Confi guration 2: 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932, bytes/frame b. Confi guration 1: 1024 × 768 pixels = 786,432 pixels => 786,432 × 3 = 2,359, bytes/frame Confi guration 2: 2560 × 1600 pixels = 4,096,000 pixels => 4,096,000 × 3 = 12,288, bytes/frame

AQ 1

S4 Chapter 1 Solutions

1.3.2 No. cycles = time × clock rate time = (No. Instr × CPI)/clock rate, then No. instructions = No. cycles/CPI

a. cycles(P1) = 10 × 3 × 109 = 30 × 109 s cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 109 No. instructions(P2) = 25 × 10 9 /1 = 25 × 109 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 109 b. cycles(P1) = 10 × 2 × 109 = 20 × 109 s cycles(P2) = 10 × 3 × 109 = 30 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 20 × 10 9 /1.2 = 16.66 × 109 No. instructions(P2) = 30 × 10 9 /0.8 = 37.5 × 109 No. instructions(P3) = 40 × 10 9 /2 = 20 × 109

1.3.3 timenew = time (^) old × 0.7 = 7 s

a. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 20 × 109 × 1.8 / 7 = 5.14 GHz f(P2) = 25 × 10 9 × 1.2 / 7 = 4.28 GHz f(P1) = 18.18 × 109 × 2.6 / 7 = 6.75 GHz b. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.44, CPI(P2) = 0.96, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 16.66 × 109 × 1.44/7 = 3.42 GHz f(P2) = 37.5 × 10 9 × 0.96/7 = 5.14 GHz f(P1) = 20 × 10 9 × 2.4/7 = 6.85 GHz

1.3.4 IPC = 1/CPI = No. instr/(time × clock rate)

a. IPC(P1) = 0. IPC(P2) = 1. IPC(P3) = 2. b. IPC(P1) = 2 IPC(P2) = 1. IPC(P3) = 0.

1.3.

a. Time (^) new/Timeold = 7/10 = 0.7. So fnew = fold /0.7 = 2.5 GHz/0.7 = 3.57 GHz. b. Timenew/Timeold = 5/8 = 0.625. So fnew = fold/0.625 = 4.8 GHz.

AQ 3

Chapter 1 Solutions S

1.3.

a. Timenew /Timeold = 9/10 = 0.9. Then Instructionsnew = Instructions (^) old × 0.9 = 30 × 109 × 0.9 = 27 × 10 9. b. Time (^) new /Timeold = 7/8 = 0.875. Then Instructionsnew = Instructions (^) old × 0.875 = 26.25 × 109.

Solution 1.

1.4.

Class A: 10^5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 105 instr.

Time = No. instr × CPI/clock rate

a. Total time P1 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 105 × 3)/(2.5 × 109 ) = 10.4 × 10 −^4 s Total time P2 = (10^5 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 10 9 ) = 6.66 × 10 −^4 s b. Total time P1^ =^ (10^5 ×^2 +^2 ×^105 ×^ 1.5^ +^5 ×^10 5 ×^2 +^2 ×^10 5 )/(2.5^ ×^109 )^ =^ 6.8^ ×^10 −^4 s Total time P2 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 + 2 × 10 5 )/(3 × 10 9 ) = 4 × 10 −^4 s

1.4.2 CPI = time × clock rate/No. instr

a. CPI (P1) = 10.4 × 10 −^4 × 2.5 × 109 /10^6 = 2. CPI (P2) = 6.66 × 10 −^4 × 3 × 109 /10^6 = 2. b. CPI (P1)^ =^ 6.8^ ×^10 −^4 ×^ 2.5^ ×^10 9 /10^6 =^ 1. CPI (P2) = 4 × 10 −^4 × 3 × 10 9 /10^6 = 1.

1.4.

a. clock cycles (P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105 clock cycles (P2) = 105 × 2 + 2 × 10 5 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105 b. clock cycles (P1) = 17 × 105 clock cycles (P2) = 12 × 105

1.4.

a. (650 × 1 + 100 × 5 + 600 × 5 + 50 × 2) × 0.5 × 10–9^ = 2,125 ns b. (750 × 1 + 250 × 5 + 500 × 5 + 500 × 2) × 0.5 × 10–9^ = 2,750 ns

1.4.5 CPI = time × clock rate/No. instr

a. CPI = 2,125 × 10–9^ × 2 × 10^9 /1,400 = 3. b. CPI = 2,750 × 10–9^ × 2 × 10^9 /2,000 = 2.

Chapter 1 Solutions S

Solution 1.

1.6.1 CPI = Texec × f/No. Instr

Compiler A CPI Compiler B CPI a. 1.8 1. b. 1.1 1.

1.6.2 f (^) A/fB = (No. Instr(A) ´ CPI(A))/(No. Instr(B) ´ CPI(B))

a. fA/fB = 1 b. fA/fB = 0.

1.6.

Speedup vs. Compiler A Speedup vs. Compiler B a. T (^) new /TA = 0.36 Tnew/T (^) B = 0. b. T (^) new /TA = 0.6 Tnew/T (^) B = 0.

1.6.

P1 Peak P2 Peak a. 4 × 10^9 Inst/s^ 2 × 10^9 Inst/s b. 4 × 10^9 Inst/s^ 3 × 10^9 Inst/s

1.6.5 Speedup, P1 versus P2:

a. T 1 /T 2 = 1. b. T 1 /T 2 = 1.

1.6.

a. 4.37 GHz b. 6 GHz

Solution 1.

1.7.

Geometric mean clock rate ratio = (1.28 × 1.56 × 2.64 × 3.03 × 10.00 × 1.80 × 0.74) 1/7^ = 2.

Geometric mean power ratio = (1.24 × 1.20 × 2.06 × 2.88 × 2.59 × 1.37 × 0.92)1/7^ = 1.

S8 Chapter 1 Solutions

1.7. Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4 Willamette) Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pentium Pro)

1.7. Clock rate: 2.667 × 109 /12.5 × 106 = 213. Power: 95 W/3.3 W = 28.

1.7.4 C = P/V 2 × clock rate 80286: C = 0.0105 × 10 −^6 80386: C = 0.01025 × 10 −^6 80486: C = 0.00784 × 10 −^6 Pentium: C = 0.00612 × 10 −^6 Pentium Pro: C = 0.0133 × 10 −^6 Pentium 4 Willamette: C = 0.0122 × 10 −^6 Pentium 4 Prescott: C = 0.00183 × 10 −^6 Core 2: C = 0.0294 × 10 −^6

1.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)

1.7. Pentium to Pentium Pro: 3.3/5 = 0. Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0. Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0. Pentium 4 Prescott to Core 2: 1.1/1.25 = 0. Geometric mean = 0.

Solution 1. 1.8.1 Power = V 2 × clock rate × C. Power 2 = 0.9 Power 1

a. C 2 /C 1 = 0.9 × 1.75^2 × 1.5 × 10^9 /(1.2^2 × 2 × 10^9 ) = 1. b. C 2 /C 1 = 0.9 × 1.1^2 × 3 × 10^9 /(0.8 2 × 4 × 10^9 ) = 1.

1.8.2 Power 2 /Power 1 = V 22 × clock rate 2 /(V 12 × clock rate 1 )

a. Power 2 /Power 1 = 0.62 => Reduction of 38% b. Power 2 /Power 1 = 0.7 => Reduction of 30%

S10 Chapter 1 Solutions

1.9.

a. Power (^) st /Power (^) dyn = 10/50 = 0. b. Power (^) st /Power (^) dyn = 60/90 = 0.

1.9.4 Powerst/Powerdyn = 0.6 => Powerst = 0.6 × Powerdyn

a. Power^ st = 0.6 × 35 W = 21 W b. Power^ st = 0.6 × 30 W = 18 W

1.9.

1.2 V 1.0 V 0.8 V a. P^ st = 12.5 W Pdyn = 62.5 W

Pst = 10 W Pdyn = 50 W

Pst = 5.8 W Pdyn = 29.2 W b. Pst = 24.8 W Pdyn = 37.2 W

Pst = 20 W Pdyn = 30 W

Pst = 12 W Pdyn = 18 W

1.9.

a. 29. b. 23.

Solution 1.

a. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096

b. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096

a. Processors Execution Time (μs)

1.11.2 Cost per die = cost per wafer/(dies per wafer × yield)

Then defect per area = (2/die area)(y−1/2^ − 1) 1.12.1 CPI = clock rate × CPU time/instr count

 - Chapter 1 Solutions S 
  • 1.10.
    • 1 4.
    • 2 2.
    • 4 1.
    • 8 1.
    • 1 4. b. Processors Execution Time (μs)
    • 2 2.
    • 4 1.
    • 8 0.
  • 1.10.
    • 1 5. a. Processors Execution Time (μs)
    • 2 3.
    • 4 1.
    • 8 1.
    • 1 5. b. Processors Execution Time (μs)
    • 2 3.
    • 4 1.
    • 8 1.
  • 1.10.
    • 1 4. a. Cores Execution Time (s) @ 3 GHz
    • 2 2.
    • 4 1.
    • 8 1.
    • 1 3. b. Cores Execution Time (s) @ 3 GHz
    • 2 2.
    • 4 1.
    • 8 0. - Chapter 1 Solutions S
  • Yield = 1/(1 + (defect per area × die area)/2)
    • a. Yield = 0.
    • b. Yield = 0.
    • a. Cost per die = 0.
    • b. Cost per die = 0.
  • 1.11.
    • a. Dies per wafer = 1.1 × 84 =
      • Defects per area = 1.15 × 0.02 = 0.023 defects/cm
      • Die area = wafer area/Dies per wafer = 176.7/92 = 1.92 cm
      • Yield = 0.
    • b. Dies per wafer = 1.1 × 100 =
      • Defects per area = 1.15 × 0.031 = 0.036 defects/cm
      • Die area = wafer area/Dies per wafer = 314.2/110 = 2.86 cm
      • Yield = 0.
  • 1.11.4 Yield = 1/(1 + (defect per area × die area)/2)
  • T1: defects per area = 0.00085 defects/mm 2 = 0.085 defects/cm Replacing values for T1 and T2 we get:
  • T2: defects per area = 0.00060 defects/mm 2 = 0.060 defects/cm
  • T3: defects per area = 0.00043 defects/mm 2 = 0.043 defects/cm
  • T4: defects per area = 0.00026 defects/mm 2 = 0.026 defects/cm
  • Solution 1. 1.11.5 no solution provided
    • a. CPI(bzip2) = 3 × 10^9 × 750/(2,389 × 10^9 ) = 0. clock rate = 1/cycle time = 3 GHz
    • b. CPI(go) = 3 × 10^9 × 700/(1,658 × 10^9 ) = 1.
    • a. SPECratio(bzip2) = 9,650/750 = 12. 1.12.2 SPECratio = ref. time/execution time
    • b. SPECratio(go) = 10,490/700 = 14.

S14 Chapter 1 Solutions

1.12.

(12.86 × 14.98)1/2^ = 13.

1.12.4 CPU time = No. instr × CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is, 10%. 1.12.5 CPU time(before) = No. instr × CPI/clock rate CPU time(after) = 1.1 × No. instr × 1.05 × CPI/clock rate CPU times(after)/CPU time(before) = 1.1 × 1.05 = 1.155 Thus, CPU time is increased by 15.5%. 1.12.6 SPECratio = reference time/CPU time SPECratio(after)/SPECratio(before) = CPU time(before)/CPU time(after) = 1/1.1555 = 0.86. Thus, the SPECratio is decreased by 14%.

Solution 1. 1.13.1 CPI = (CPU time × clock rate)/No. instr

a. CPI = 700 × 4 × 10^9 /(0.85 × 2,389 × 10^9 ) = 1. b. CPI = 620 × 4 × 10^9 /(0.85 × 1,658 × 10^9 ) = 1.

1.13.2 Clock rate ratio = 4 GHz/3 GHz = 1.

a. CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1. b. CPI @ 4 GHz = 1.75, CPI @ 3 GHz = 1.26, ratio = 1.

They are different because although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage.

1.13.

a. 700/750 = 0.933. CPU time reduction: 6.7% b. 620/700 = 0.886. CPU time reduction: 11.4%

1.13.4 No. instr = CPU time × clock rate/CPI

a. No. instr = 960 × 0.9 × 4 × 10^9 /1.61 = 2,146 × 10^9 b. No. instr = 690 × 0.9 × 4 × 10^9 /1.79 = 1,387 × 10^9

1.13.5 Clock rate = no. instr × CPI/CPU time. Clock ratenew = no. instr × CPI/0.9 × CPU time = 1/0.9 clock rateold = 3.33 GHz

S16 Chapter 1 Solutions

1.14.

a. T(P1) = (5 × 10^5 × 0.75 + 4 × 10^5 × 1 + 10 × 10^5 × 1.5)/(4 × 10^9 ) = 5.86 × 10–4^ s CPI(P1) = 5.86 × 10–4^ × 4 × 10^9 /10 6 = 2. MIPS(P1) = 4 × 10^9 /(2.27 ×10^6 ) = 1.76 × 10^3 T(P2) = (2 × 10^6 × 1.25 + 2 × 10^6 × 0.8 + 1 × 10^6 × 1.25)/(3 × 10^9 ) = 1.78 × 10–3^ s CPI(P2) = 1.78 × 10–3^ × 3 × 10^9 /(5 × 10^6 ) = 1.068 s MIPS(P2) = 3 × 10^9 /(1.068 × 10^6 ) = 2.78 × 10^3 b. T(P1) = (1.5 × 10^6 × 1.5 + 1.5 × 10^6 × 1 + 2 × 10^6 × 2)/(4 × 10^9 ) = 1.93 × 10–3^ s CPI(P1) = 1.93 × 10–3^ × 4 × 10^9 /(5 × 10^6 ) = 1. MIPS(P1) = 4 × 10^9 /(1.54 × 10^6 ) = 2.59 × 10^3 T(P2) = (0.8 × 10^6 × 1.25 + 0.6 × 10^6 × 1 + 0.6 × 10^6 × 2.5)/(3 × 10^9 ) = 1.03 × 10–3^ s CPI(P2) = 1.03 × 10–3^ × 3 × 10^9 /(2 ×10^6 ) = 1. MIPS(P1) = 3 × 10^9 /(1.54 × 10^6 ) = 1.94 × 10^3

1.14.

a. T(P1) = 5.86 × 10–4^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 1.7 × 10^3 T(P2) = 1.78 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 5.6 × 10^2 perf(P1) > perf(P2), MIPS(P1) > MIPS(P2), MFLOPS(P1) < MFLOPS(P2) b. T(P1) = 1.93 × 10–3^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 5.1 × 10^2 T(P2) = 1.03 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 9.7 × 10^2 perf(P1) < perf(P2), MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2)

Solution 1. 1.15.

a. T (^) fp = 70 × 0.8 = 56 s. Tnew= 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% b. Tfp = 40 × 0.8 = 32 s. Tnew= 32 + 90 + 60 + 20 = 202 s. Reduction: 3.8%

1.15.

a. T (^) new = 250 × 0.8 = 200 s, Tfp + Tl/s + Tbranch = 165 s, T (^) int = 35 s. Reduction time INT: 58.8% b. T (^) new = 210 × 0.8 = 168 s, Tfp + Tl/s + Tbranch = 120 s, T (^) int = 48 s. Reduction time INT: 46.6%

Chapter 1 Solutions S

1.15.

a. Tnew = 250 × 0.8 = 200 s, Tfp + Tint + Tl/s = 210 s. NO b. Tnew = 210 × 0.8 = 168 s, Tfp + Tint + Tl/s = 190 s. NO

1.15.

Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr.

T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109

a. 2 processors: clock cycles = 4,096 × 10^6 ; Tcpu = 2.048 s b. 16 processors: clock cycles = 512 × 10^6 ; Tcpu = 0.256 s

To half the number of clock cycles by improving the CPI of FP instructions:

CPIimproved fp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/

CPIimproved fp = (clock cycles/2 − (CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.))/No. FP instr.

a. 2 processors: CPIimproved fp = (2,048 – 3,816)/280 < 0 ==> not possible b. 16 processors: CPIimproved fp = (256 – 462)/50 < 0 ==> not possible

1.15.5 Using the clock cycle data from 1.15.4:

To half the number of clock cycles improving the CPI of L/S instructions:

CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/

CPIimproved l/s = (clock cycles/2 − (CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) branch × No. branch instr.))/No. L/S instr.

a. 2 processors: CPIimproved l/s = (2,048 – 1,536)/640 = 0. b. 16 processors: CPIimproved l/s = (256 – 198)/80 = 0.

1.15.

Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.

T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109

Chapter 1 Solutions S

1.16.5 Geometric mean of computing time ratios = 0.62. Multiplying this by the computing time for a 64-processor system gives a computing time for a 128-processor system of 11.474 ms.

Geometric mean of routing time ratios = 1.19. Multiplying this by the routing time for a 64-processor system gives a routing time for a 128-processor system of 30.9 ms.

1.16.6 Computing time = 201/0.62 = 324 ms. Routing time = 0, since no communica- tion is required.

Author Query

AQ 1: Page S2: As meant t/o? AQ 2: Page S3: As meant t/o? AQ 3: Page S4: Close up t/o? AQ 4: Page S12: Inserted heading OK? AQ 5: Page S18: Blank cells as meant?