Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Hennesey and Patterson 4th edition solutions, Study Guides, Projects, Research of Advanced Computer Architecture

Indian Institute of Technology Ropar Advanced Computer Architecture

Hennesey and Patterson 4th edition solutions

Typology: Study Guides, Projects, Research

2017/2018

Uploaded on 04/25/2018

niket-agrawal 🇮🇳

4.9

(9)

1 document

1 / 211

This page cannot be seen from the preview

Don't miss anything!

1 Solutions

Solution 1.1

1.1.1 Computer used to run large problems and usually accessed via a network:

(3) servers

1.1.2 1015 or 250 bytes: (7) petabyte

1.1.3 A class of computers composed of hundred to thousand processors and tera-

bytes of memory and having the highest performance and cost: (5) supercomputers

1.1.4 Today’s science fi ction application that probably will be available in near

future: (1) virtual worlds

1.1.5 A kind of memory called random access memory: (12) RAM

1.1.6 Part of a computer called central processor unit: (13) CPU

1.1.7 Thousands of processors forming a large cluster: (8) data centers

1.1.8 Microprocessors containing several processors in the same chip: (10) multi-

core processors

1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-

work: (4) low-end servers

1.1.10 A computer used to running one predetermined application or collection

of software: (9) embedded computers

1.1.11 Special language used to describe hardware components: (11) VHDL

1.1.12 Personal computer delivering good performance to single users at low cost:

(2) desktop computers

1.1.13 Program that translates statements in high-level language to assembly

language: (15) compiler

Sol01-9780123747501.indd S1Sol01-9780123747501.indd S1 9/5/11 11:24 AM9/5/11 11:24 AM

Partial preview of the text

Download Hennesey and Patterson 4th edition solutions and more Study Guides, Projects, Research Advanced Computer Architecture in PDF only on Docsity!

1 Solutions

Solution 1.

1.1.1 Computer used to run large problems and usually accessed via a network:

(3) servers

1.1.2 10 15 or 2^50 bytes: (7) petabyte

1.1.3 A class of computers composed of hundred to thousand processors and tera-

bytes of memory and having the highest performance and cost: (5) supercomputers

1.1.4 Today’s science fiction application that probably will be available in near

future: (1) virtual worlds

1.1.5 A kind of memory called random access memory: (12) RAM

1.1.6 Part of a computer called central processor unit: (13) CPU

1.1.7 Thousands of processors forming a large cluster: (8) data centers

1.1.8 Microprocessors containing several processors in the same chip: (10) multi-

core processors

1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-

work: (4) low-end servers

1.1.10 A computer used to running one predetermined application or collection

of software: (9) embedded computers

1.1.11 Special language used to describe hardware components: (11) VHDL

1.1.12 Personal computer delivering good performance to single users at low cost:

(2) desktop computers

1.1.13 Program that translates statements in high-level language to assembly

language: (15) compiler

S2 Chapter 1 Solutions

1.1.14 Program that translates symbolic instructions to binary instructions: (21) assembler

1.1.15 High-level language for business data processing: (25) Cobol

1.1.16 Binary language that the processor can understand: (19) machine language

1.1.17 Commands that the processors understand: (17) instruction

1.1.18 High-level language for scientific computation: (26) Fortran

1.1.19 Symbolic representation of machine instructions: (18) assembly language

1.1.20 Interface between user’s program and hardware providing a variety of services and supervision functions: (14) operating system

1.1.21 Software/programs developed by the users: (24) application software

1.1.22 Binary digit (value 0 or 1): (16) bit

1.1.23 Software layer between the application software and the hardware that includes the operating system and the compilers: (23) system software

1.1.24 High-level language used to write application and system software: (20) C

1.1.25 Portable language composed of words and algebraic expressions that must be translated into assembly language before run in a computer: (22) high-level language

1.1.26 10 12 or 2^40 bytes: (6) terabyte

Solution 1. 1.2.1 8 bits × 3 colors = 24 bits/pixel = 3 bytes/pixel.

a. Confi guration 1: 640 × 480 pixels = 179,200 pixels => 179,200 × 3 = 537,600 bytes/frame Confi guration 2: 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932, bytes/frame b. Confi guration 1: 1024 × 768 pixels = 786,432 pixels => 786,432 × 3 = 2,359, bytes/frame Confi guration 2: 2560 × 1600 pixels = 4,096,000 pixels => 4,096,000 × 3 = 12,288, bytes/frame

AQ 1

S4 Chapter 1 Solutions

1.3.2 No. cycles = time × clock rate time = (No. Instr × CPI)/clock rate, then No. instructions = No. cycles/CPI

a. cycles(P1) = 10 × 3 × 109 = 30 × 109 s cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 109 No. instructions(P2) = 25 × 10 9 /1 = 25 × 109 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 109 b. cycles(P1) = 10 × 2 × 109 = 20 × 109 s cycles(P2) = 10 × 3 × 109 = 30 × 109 s cycles(P3) = 10 × 4 × 109 = 40 × 109 s No. instructions(P1) = 20 × 10 9 /1.2 = 16.66 × 109 No. instructions(P2) = 30 × 10 9 /0.8 = 37.5 × 109 No. instructions(P3) = 40 × 10 9 /2 = 20 × 109

1.3.3 timenew = time (^) old × 0.7 = 7 s

a. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 20 × 109 × 1.8 / 7 = 5.14 GHz f(P2) = 25 × 10 9 × 1.2 / 7 = 4.28 GHz f(P1) = 18.18 × 109 × 2.6 / 7 = 6.75 GHz b. CPI (^) new = CPIold × 1.2, then CPI(P1) = 1.44, CPI(P2) = 0.96, CPI(P3) = 2. f = No. Instr × CPI/time, then f(P1) = 16.66 × 109 × 1.44/7 = 3.42 GHz f(P2) = 37.5 × 10 9 × 0.96/7 = 5.14 GHz f(P1) = 20 × 10 9 × 2.4/7 = 6.85 GHz

1.3.4 IPC = 1/CPI = No. instr/(time × clock rate)

a. IPC(P1) = 0. IPC(P2) = 1. IPC(P3) = 2. b. IPC(P1) = 2 IPC(P2) = 1. IPC(P3) = 0.

1.3.

a. Time (^) new/Timeold = 7/10 = 0.7. So fnew = fold /0.7 = 2.5 GHz/0.7 = 3.57 GHz. b. Timenew/Timeold = 5/8 = 0.625. So fnew = fold/0.625 = 4.8 GHz.

AQ 3

Chapter 1 Solutions S

1.3.

a. Timenew /Timeold = 9/10 = 0.9. Then Instructionsnew = Instructions (^) old × 0.9 = 30 × 109 × 0.9 = 27 × 10 9. b. Time (^) new /Timeold = 7/8 = 0.875. Then Instructionsnew = Instructions (^) old × 0.875 = 26.25 × 109.

Solution 1.

1.4.

Class A: 10^5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 105 instr.

Time = No. instr × CPI/clock rate

a. Total time P1 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 105 × 3)/(2.5 × 109 ) = 10.4 × 10 −^4 s Total time P2 = (10^5 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 10 9 ) = 6.66 × 10 −^4 s b. Total time P1^ =^ (10^5 ×^2 +^2 ×^105 ×^ 1.5^ +^5 ×^10 5 ×^2 +^2 ×^10 5 )/(2.5^ ×^109 )^ =^ 6.8^ ×^10 −^4 s Total time P2 = (10^5 + 2 × 10 5 × 2 + 5 × 10 5 + 2 × 10 5 )/(3 × 10 9 ) = 4 × 10 −^4 s

1.4.2 CPI = time × clock rate/No. instr

a. CPI (P1) = 10.4 × 10 −^4 × 2.5 × 109 /10^6 = 2. CPI (P2) = 6.66 × 10 −^4 × 3 × 109 /10^6 = 2. b. CPI (P1)^ =^ 6.8^ ×^10 −^4 ×^ 2.5^ ×^10 9 /10^6 =^ 1. CPI (P2) = 4 × 10 −^4 × 3 × 10 9 /10^6 = 1.

1.4.

a. clock cycles (P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105 clock cycles (P2) = 105 × 2 + 2 × 10 5 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105 b. clock cycles (P1) = 17 × 105 clock cycles (P2) = 12 × 105

1.4.

a. (650 × 1 + 100 × 5 + 600 × 5 + 50 × 2) × 0.5 × 10–9^ = 2,125 ns b. (750 × 1 + 250 × 5 + 500 × 5 + 500 × 2) × 0.5 × 10–9^ = 2,750 ns

1.4.5 CPI = time × clock rate/No. instr

a. CPI = 2,125 × 10–9^ × 2 × 10^9 /1,400 = 3. b. CPI = 2,750 × 10–9^ × 2 × 10^9 /2,000 = 2.

Chapter 1 Solutions S

Solution 1.

1.6.1 CPI = Texec × f/No. Instr

Compiler A CPI Compiler B CPI a. 1.8 1. b. 1.1 1.

1.6.2 f (^) A/fB = (No. Instr(A) ´ CPI(A))/(No. Instr(B) ´ CPI(B))

a. fA/fB = 1 b. fA/fB = 0.

1.6.

Speedup vs. Compiler A Speedup vs. Compiler B a. T (^) new /TA = 0.36 Tnew/T (^) B = 0. b. T (^) new /TA = 0.6 Tnew/T (^) B = 0.

1.6.

P1 Peak P2 Peak a. 4 × 10^9 Inst/s^ 2 × 10^9 Inst/s b. 4 × 10^9 Inst/s^ 3 × 10^9 Inst/s

1.6.5 Speedup, P1 versus P2:

a. T 1 /T 2 = 1. b. T 1 /T 2 = 1.

1.6.

a. 4.37 GHz b. 6 GHz

Solution 1.

1.7.

Geometric mean clock rate ratio = (1.28 × 1.56 × 2.64 × 3.03 × 10.00 × 1.80 × 0.74) 1/7^ = 2.

Geometric mean power ratio = (1.24 × 1.20 × 2.06 × 2.88 × 2.59 × 1.37 × 0.92)1/7^ = 1.

S8 Chapter 1 Solutions

1.7. Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4 Willamette) Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pentium Pro)

1.7. Clock rate: 2.667 × 109 /12.5 × 106 = 213. Power: 95 W/3.3 W = 28.

1.7.4 C = P/V 2 × clock rate 80286: C = 0.0105 × 10 −^6 80386: C = 0.01025 × 10 −^6 80486: C = 0.00784 × 10 −^6 Pentium: C = 0.00612 × 10 −^6 Pentium Pro: C = 0.0133 × 10 −^6 Pentium 4 Willamette: C = 0.0122 × 10 −^6 Pentium 4 Prescott: C = 0.00183 × 10 −^6 Core 2: C = 0.0294 × 10 −^6

1.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)

1.7. Pentium to Pentium Pro: 3.3/5 = 0. Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0. Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0. Pentium 4 Prescott to Core 2: 1.1/1.25 = 0. Geometric mean = 0.

Solution 1. 1.8.1 Power = V 2 × clock rate × C. Power 2 = 0.9 Power 1

a. C 2 /C 1 = 0.9 × 1.75^2 × 1.5 × 10^9 /(1.2^2 × 2 × 10^9 ) = 1. b. C 2 /C 1 = 0.9 × 1.1^2 × 3 × 10^9 /(0.8 2 × 4 × 10^9 ) = 1.

1.8.2 Power 2 /Power 1 = V 22 × clock rate 2 /(V 12 × clock rate 1 )

a. Power 2 /Power 1 = 0.62 => Reduction of 38% b. Power 2 /Power 1 = 0.7 => Reduction of 30%

S10 Chapter 1 Solutions

1.9.

a. Power (^) st /Power (^) dyn = 10/50 = 0. b. Power (^) st /Power (^) dyn = 60/90 = 0.

1.9.4 Powerst/Powerdyn = 0.6 => Powerst = 0.6 × Powerdyn

a. Power^ st = 0.6 × 35 W = 21 W b. Power^ st = 0.6 × 30 W = 18 W

1.9.

1.2 V 1.0 V 0.8 V a. P^ st = 12.5 W Pdyn = 62.5 W

Pst = 10 W Pdyn = 50 W

Pst = 5.8 W Pdyn = 29.2 W b. Pst = 24.8 W Pdyn = 37.2 W

Pst = 20 W Pdyn = 30 W

Pst = 12 W Pdyn = 18 W

1.9.

a. 29. b. 23.

Solution 1.

a. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096

b. Processors Instructions per Processor Total Instructions 1 4096 4096 2 2048 4096 4 1024 4096 8 512 4096

a. Processors Execution Time (μs)

1.11.2 Cost per die = cost per wafer/(dies per wafer × yield)

Then defect per area = (2/die area)(y−1/2^ − 1) 1.12.1 CPI = clock rate × CPU time/instr count

 - Chapter 1 Solutions S

1.10.
- 1 4.
- 2 2.
- 4 1.
- 8 1.
- 1 4. b. Processors Execution Time (μs)
- 2 2.
- 4 1.
- 8 0.
1.10.
- 1 5. a. Processors Execution Time (μs)
- 2 3.
- 4 1.
- 8 1.
- 1 5. b. Processors Execution Time (μs)
- 2 3.
- 4 1.
- 8 1.
1.10.
- 1 4. a. Cores Execution Time (s) @ 3 GHz
- 2 2.
- 4 1.
- 8 1.
- 1 3. b. Cores Execution Time (s) @ 3 GHz
- 2 2.
- 4 1.
- 8 0. - Chapter 1 Solutions S
Yield = 1/(1 + (defect per area × die area)/2)
- a. Yield = 0.
- b. Yield = 0.
- a. Cost per die = 0.
- b. Cost per die = 0.
1.11.
- a. Dies per wafer = 1.1 × 84 =
  - Defects per area = 1.15 × 0.02 = 0.023 defects/cm
  - Die area = wafer area/Dies per wafer = 176.7/92 = 1.92 cm
  - Yield = 0.
- b. Dies per wafer = 1.1 × 100 =
  - Defects per area = 1.15 × 0.031 = 0.036 defects/cm
  - Die area = wafer area/Dies per wafer = 314.2/110 = 2.86 cm
  - Yield = 0.
1.11.4 Yield = 1/(1 + (defect per area × die area)/2)
T1: defects per area = 0.00085 defects/mm 2 = 0.085 defects/cm Replacing values for T1 and T2 we get:
T2: defects per area = 0.00060 defects/mm 2 = 0.060 defects/cm
T3: defects per area = 0.00043 defects/mm 2 = 0.043 defects/cm
T4: defects per area = 0.00026 defects/mm 2 = 0.026 defects/cm
Solution 1. 1.11.5 no solution provided
- a. CPI(bzip2) = 3 × 10^9 × 750/(2,389 × 10^9 ) = 0. clock rate = 1/cycle time = 3 GHz
- b. CPI(go) = 3 × 10^9 × 700/(1,658 × 10^9 ) = 1.
- a. SPECratio(bzip2) = 9,650/750 = 12. 1.12.2 SPECratio = ref. time/execution time
- b. SPECratio(go) = 10,490/700 = 14.

S14 Chapter 1 Solutions

1.12.

(12.86 × 14.98)1/2^ = 13.

1.12.4 CPU time = No. instr × CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is, 10%. 1.12.5 CPU time(before) = No. instr × CPI/clock rate CPU time(after) = 1.1 × No. instr × 1.05 × CPI/clock rate CPU times(after)/CPU time(before) = 1.1 × 1.05 = 1.155 Thus, CPU time is increased by 15.5%. 1.12.6 SPECratio = reference time/CPU time SPECratio(after)/SPECratio(before) = CPU time(before)/CPU time(after) = 1/1.1555 = 0.86. Thus, the SPECratio is decreased by 14%.

Solution 1. 1.13.1 CPI = (CPU time × clock rate)/No. instr

a. CPI = 700 × 4 × 10^9 /(0.85 × 2,389 × 10^9 ) = 1. b. CPI = 620 × 4 × 10^9 /(0.85 × 1,658 × 10^9 ) = 1.

1.13.2 Clock rate ratio = 4 GHz/3 GHz = 1.

a. CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1. b. CPI @ 4 GHz = 1.75, CPI @ 3 GHz = 1.26, ratio = 1.

They are different because although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage.

1.13.

a. 700/750 = 0.933. CPU time reduction: 6.7% b. 620/700 = 0.886. CPU time reduction: 11.4%

1.13.4 No. instr = CPU time × clock rate/CPI

a. No. instr = 960 × 0.9 × 4 × 10^9 /1.61 = 2,146 × 10^9 b. No. instr = 690 × 0.9 × 4 × 10^9 /1.79 = 1,387 × 10^9

1.13.5 Clock rate = no. instr × CPI/CPU time. Clock ratenew = no. instr × CPI/0.9 × CPU time = 1/0.9 clock rateold = 3.33 GHz

S16 Chapter 1 Solutions

1.14.

a. T(P1) = (5 × 10^5 × 0.75 + 4 × 10^5 × 1 + 10 × 10^5 × 1.5)/(4 × 10^9 ) = 5.86 × 10–4^ s CPI(P1) = 5.86 × 10–4^ × 4 × 10^9 /10 6 = 2. MIPS(P1) = 4 × 10^9 /(2.27 ×10^6 ) = 1.76 × 10^3 T(P2) = (2 × 10^6 × 1.25 + 2 × 10^6 × 0.8 + 1 × 10^6 × 1.25)/(3 × 10^9 ) = 1.78 × 10–3^ s CPI(P2) = 1.78 × 10–3^ × 3 × 10^9 /(5 × 10^6 ) = 1.068 s MIPS(P2) = 3 × 10^9 /(1.068 × 10^6 ) = 2.78 × 10^3 b. T(P1) = (1.5 × 10^6 × 1.5 + 1.5 × 10^6 × 1 + 2 × 10^6 × 2)/(4 × 10^9 ) = 1.93 × 10–3^ s CPI(P1) = 1.93 × 10–3^ × 4 × 10^9 /(5 × 10^6 ) = 1. MIPS(P1) = 4 × 10^9 /(1.54 × 10^6 ) = 2.59 × 10^3 T(P2) = (0.8 × 10^6 × 1.25 + 0.6 × 10^6 × 1 + 0.6 × 10^6 × 2.5)/(3 × 10^9 ) = 1.03 × 10–3^ s CPI(P2) = 1.03 × 10–3^ × 3 × 10^9 /(2 ×10^6 ) = 1. MIPS(P1) = 3 × 10^9 /(1.54 × 10^6 ) = 1.94 × 10^3

1.14.

a. T(P1) = 5.86 × 10–4^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 1.7 × 10^3 T(P2) = 1.78 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 5.6 × 10^2 perf(P1) > perf(P2), MIPS(P1) > MIPS(P2), MFLOPS(P1) < MFLOPS(P2) b. T(P1) = 1.93 × 10–3^ s (see problem 1.14.5) performance(P1) = 1/T(P1) = 5.1 × 10^2 T(P2) = 1.03 × 10–3^ s (see problem 1.14.5) performance(P2) = 1/T(P2) = 9.7 × 10^2 perf(P1) < perf(P2), MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2)

Solution 1. 1.15.

a. T (^) fp = 70 × 0.8 = 56 s. Tnew= 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% b. Tfp = 40 × 0.8 = 32 s. Tnew= 32 + 90 + 60 + 20 = 202 s. Reduction: 3.8%

1.15.

a. T (^) new = 250 × 0.8 = 200 s, Tfp + Tl/s + Tbranch = 165 s, T (^) int = 35 s. Reduction time INT: 58.8% b. T (^) new = 210 × 0.8 = 168 s, Tfp + Tl/s + Tbranch = 120 s, T (^) int = 48 s. Reduction time INT: 46.6%

Chapter 1 Solutions S

1.15.

a. Tnew = 250 × 0.8 = 200 s, Tfp + Tint + Tl/s = 210 s. NO b. Tnew = 210 × 0.8 = 168 s, Tfp + Tint + Tl/s = 190 s. NO

1.15.

Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr.

T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109

a. 2 processors: clock cycles = 4,096 × 10^6 ; Tcpu = 2.048 s b. 16 processors: clock cycles = 512 × 10^6 ; Tcpu = 0.256 s

To half the number of clock cycles by improving the CPI of FP instructions:

CPIimproved fp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/

CPIimproved fp = (clock cycles/2 − (CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.))/No. FP instr.

a. 2 processors: CPIimproved fp = (2,048 – 3,816)/280 < 0 ==> not possible b. 16 processors: CPIimproved fp = (256 – 462)/50 < 0 ==> not possible

1.15.5 Using the clock cycle data from 1.15.4:

To half the number of clock cycles improving the CPI of L/S instructions:

CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S instr. + CPI (^) branch × No. branch instr. = clock cycles/

CPIimproved l/s = (clock cycles/2 − (CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) branch × No. branch instr.))/No. L/S instr.

a. 2 processors: CPIimproved l/s = (2,048 – 1,536)/640 = 0. b. 16 processors: CPIimproved l/s = (256 – 198)/80 = 0.

1.15.

Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPI (^) l/s × No. L/S instr. + CPI (^) branch × No. branch instr.

T (^) cpu = clock cycles/clock rate = clock cycles/2 × 109

Chapter 1 Solutions S

1.16.5 Geometric mean of computing time ratios = 0.62. Multiplying this by the computing time for a 64-processor system gives a computing time for a 128-processor system of 11.474 ms.

Geometric mean of routing time ratios = 1.19. Multiplying this by the routing time for a 64-processor system gives a routing time for a 128-processor system of 30.9 ms.

1.16.6 Computing time = 201/0.62 = 324 ms. Routing time = 0, since no communica- tion is required.

Author Query

AQ 1: Page S2: As meant t/o? AQ 2: Page S3: As meant t/o? AQ 3: Page S4: Close up t/o? AQ 4: Page S12: Inserted heading OK? AQ 5: Page S18: Blank cells as meant?

Hennesey and Patterson 4th edition solutions, Study Guides, Projects, Research of Advanced Computer Architecture

Related documents

Partial preview of the text

Download Hennesey and Patterson 4th edition solutions and more Study Guides, Projects, Research Advanced Computer Architecture in PDF only on Docsity!

1 Solutions

Solution 1.

1.1.1 Computer used to run large problems and usually accessed via a network:

(3) servers

1.1.2 10 15 or 2^50 bytes: (7) petabyte

1.1.3 A class of computers composed of hundred to thousand processors and tera-

bytes of memory and having the highest performance and cost: (5) supercomputers

1.1.4 Today’s science fiction application that probably will be available in near

future: (1) virtual worlds

1.1.5 A kind of memory called random access memory: (12) RAM

1.1.6 Part of a computer called central processor unit: (13) CPU

1.1.7 Thousands of processors forming a large cluster: (8) data centers

1.1.8 Microprocessors containing several processors in the same chip: (10) multi-

core processors

1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-

work: (4) low-end servers

1.1.10 A computer used to running one predetermined application or collection

of software: (9) embedded computers

1.1.11 Special language used to describe hardware components: (11) VHDL

1.1.12 Personal computer delivering good performance to single users at low cost:

(2) desktop computers

1.1.13 Program that translates statements in high-level language to assembly

language: (15) compiler

S2 Chapter 1 Solutions

S4 Chapter 1 Solutions

Chapter 1 Solutions S

Chapter 1 Solutions S

S8 Chapter 1 Solutions

S10 Chapter 1 Solutions

a. Processors Execution Time (μs)

1.11.2 Cost per die = cost per wafer/(dies per wafer × yield)

S14 Chapter 1 Solutions

S16 Chapter 1 Solutions

Chapter 1 Solutions S

Chapter 1 Solutions S