Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

MIPS Rating - Computer Architecture and Engineering - Solved Exams, Exams of Computer Architecture and Organization

Kannur University Computer Architecture and Organization

Main points of this past exam are: Mips Rating, Yield Runtime, Floating Point, Lower-Performance Version, Original Processor, Mips Rating, Original Cost, New Processor, Parallel Prefix, Possible Speedup

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

University of California, Berkeley

College of Engineering

Computer Science Division  EECS

Spring 2001

John Kubiatowicz

Midterm I

SOLUTIONS

March 1, 2001

CS152 Computer Architecture and Engineering

Your Name:

SID Number:

Discussion Section:

Problem Possible Score

1 20

2 20

3 30

4 30

Total

Partial preview of the text

Download MIPS Rating - Computer Architecture and Engineering - Solved Exams and more Exams Computer Architecture and Organization in PDF only on Docsity!

University of California, Berkeley College of Engineering Computer Science Division  EECS

Spring 2001 John Kubiatowicz

Midterm I

SOLUTIONS

March 1, 2001 CS152 Computer Architecture and Engineering

Your Name:

SID Number:

Discussion Section:

Problem Possible Score

1 20

2 20

3 30

4 30

Total

[ This page left for π ]

Problem 1c: What is the CPI and MIPS rating of the new processor?

CPI = (0.35 × 1) + (0.25 × 2) + (0.25 × 3) + (0.15 × 13) = 3.

MIPS = 300MhZ ÷ 3.55 = 84.

Problem 1d: What is the original cost per (working) processor?

36 2 12

(^22)

die wafer 0. 27 2

2 2 2

dieYield

die wafer dieYield

waferCost dieCost

Problem 1e: What is the new cost per (working) processor?

56 2 10

(^22)

die wafer 0. 36 2

2 2 2

dieYield

die wafer dieYield

waferCost dieCost

Problem 1f: Assume that we are considering the other direction of improving the original processor by increasing the speed of floating point. What is the best possible speedup that we could get, and what would the CPI and MIPS rating be of the new processor?

The easiest thing to do is use Amdahl’s law: ( 1 )

f n

f f

speedup

2 as n →∞.

(i.e. speeding up floating-point really well). In this case, f is the fraction of time normally

devoted to floating point (in time!). So, f=CPI float/CPI=(0.15 × 5) ÷ 2.35 = 0.

Max speedup = (1-0.319) -1^ = 1.

CPI computed with “zerocycle” floating-point instructions: 2.35-(0.15 × 5) = 1.

MIPS = 300/1.6 = 187.

Problem 2: Parallel Prefix

Assume the following characteristics for NAND gates: Input load: 120fF, Internal delay: TPlh=0.3ns, TPhl=0.6ns, Load-Dependent delay: TPlhf=.0020ns, TPhlf=.0021ns

Problem 2a: Suppose that we construct an XOR, as follows:

Compute the standard parameters for the linear delay models for this complex gate, assuming the parameters given above for the NAND gate. Assume that a wire doubles the input capacitance of the gate that it is attached to:

A Input Capacitance: 240fF Load-dependent Delays: B Input Capacitance : 240fF TPAYlhf: 0.0020 ns/fF TPAYhlf: 0.0021 ns/fF TPBYlhf: 0.0020 ns/fF TPBYhlf: 0.0021 ns/fF

Maximum Internal delays for A⇒Y: TPAYlh:

Critical path goes through 3 gates. Low-to-high on output implies high-to-low on inputs to last gate, which implies low-to-high on input A. Note that the two internal nodes are driven, so we multiply capacitance by 2:

TPAYlh = 0.3ns+(2)(240fF)(0.0020ns/fF) + 0.6ns + (2)(120fF)(0.0021ns/fF) + 0.3ns = 2.664ns

TPAYhl:

High-to-low on output implies low-to-high on inputs to last gate, which implies high-to-low on input A.

TPAYhl = 0.6ns + (2)(240fF)(0.0021ns/fF) + 0.3ns + (2)(120fF)(0.0020ns/fF) + 0.6ns = 2.

A

B

Y

Problem 2c: Now, put these 2-input blocks together to produce a 4-input block that takes I 0 , I 1 , I 2 , and I 3 , and C (^) down and produces: O 0 = I 0 ⊕ C (^) down O 1 = I 1 ⊕ I 0 ⊕ C (^) down O 2 = I 2 ⊕ I 1 ⊕ I 0 ⊕ C (^) down O 3 = I 3 ⊕ I 2 ⊕ I 1 ⊕ I 0 ⊕ C (^) down C (^) up = I 3 ⊕ I 2 ⊕ I 1 ⊕ I 0 Your goal is to minimize the output delay of each block.

Using only blocks from part 2b:

Compute the input capacitance for each input:

I 0 : 480, I 1 : 240, I 2 : 480, I 3 : 240, C (^) down: 480

Identify the critical path of your circuit and compute the unloaded delay for this path.

Critical path from I 0 to O 3. Arrange so that two internal nodes go from high-to-low:

TPI 0 O 3 hl = 3 TPhlxor+2 [TPhlfxor (2) (2) (240)] = 12.996 ns TPI 0 O 3 lh = 2 TPhlxor+2 [TPhlfxor (2) (2) (240)] +TPlh (^) xor= 12.672 ns

O (^1)

I 3 I (^2) I 1 I 0

O (^0)

Cup

O 3 O^2

Cdown

Problem 2d: Finally, show how the 4 input prefix circuit can be used as a building block to produce a 16- element prefix circuit that minimizes gate reuse and which has minimal delay. What is the critical path and how many XOR gates are in it?

Hint: this is very similar to a carry-lookahead adder.

The critical path is from I 0 up through the central logic and back through the C (^) down of the last stage to O 14 or O15.

Adding this up, we get: 2 + 3 + 2 = 7 XOR gates

Problem 2e :

How many XOR gates are in the critical path of a 64-bit parallel-prefix circuit?

This adds one more level of blocks. Tracing the first input to last output, we note that we have 2 for each level up, 3 for the top level, and 2 for each level down: 2 + 2 + 3 + 2 + 2 = 11 xor gates.

I 3 I 2 I 1 I 0

o 3 o 2 o 1 o 0 c (^) up c (^) dn I 3 I 2 I 1 I 0

o 3 o 2 o 1 o 0 c (^) up c (^) dn

I 3 I 2 I 1 I 0

o 3 o 2 o 1 o 0 c (^) up c (^) dn

Cup

I 15 I 14 I 13 I 12 I 11 I 10 I 9 I 8 I 7 I 6 I 5 I 4 I 3 I 2 I 1 I 0

o 15 o 14 o 13 o (^12) o 11 o 10 o (^9) o 8 o 7 o 6 o (^5) o 4 o 3 o 2 o 1 o (^0)

Cdown

Recall how divide (in base 10) works The following shows a division of 1 by 23:

Suppose we had a procedure that produced each of the digits (zeros) in the dividend, one at a time. Consider the remainders as integers from the current decimal point. So, for instance, we have the remainders 1, 10, 100, 80, 110, 180, etc. At each stage, we multiply by ten, add the incoming digit (zero in the example), then

This could be combined with the current remainder but multiplying the remainder by 10, adding the new digit (which is zero in this case), then seeing how much the result divides the answer.

Here is complete pseudo code for computing one of the streams ( Note: we have fixed a couple of the typos) :

Stream (digitnum,incoming,oddnum,sign,xsquared,termID,maxtermID) { ARemainder = A_REMARRAY [termID]; ARemainder = ARemainder × 10 + incoming;

; This is a quotient/remainder operation (ADigit, ARemainder) = ARemainder / xsquared; A_REMARRAY[termID] = ARemainder;

BRemainder = B_REMARRAY [termID]; BRemainder = BRemainder × 10 + Adigit; (BDigit, BRemainder) = BRemainder / oddnum; B_REMARRAY[termID] = BRemainder;

AddInDigit (BDigit, digitnum, sign);

If ((termID = maxtermID ) && (ADigit != 0)) { A_REMARRAY[termID+1] = 0; B_REMARRAY[termID+1] = 0; /* This was missing originally */ maxtermID++; }

If (termID < maxtermID) { MaxtermID = Stream (digitnum, ADigit,(oddnum+2),-sign, xsquared, (termID+1), maxtermID); } return maxtermID; /* This was missing originally */ }

Remainders

Problem 3a: Write MIPS assembly for this pseudo code. Make sure to adhere to MIPS conventions. Assume that A_REMARRAY[] and B_REMARRAY[] are word arrays that are addressed via constants (assume that you can use the la pseudo instruction to load their addresses into registers. Also, assume that there are 7 argument registers ($a0 - $a6) for the sake of this problem. Note that AddInDigit is a procedure call.

Stream: subiu $sp, $sp, 36 ; 7 args, 1 ret addr, 1 temp (ADigit) sw $ra, 36($sp) ; Save return address sw $a0, 32($sp) ; Save $a <... etc ...> ; Save $a1 - $a sw $a6, 8($sp) ; Save $a sll $t0, $a5, 2 ; Convert termID to word index la $t1, A_REMARRAY addu $t1, $t1, $t0 ; address of ARemainder lw $t2, 0($t1) ; Get ARemainder mul $t2, $t2, 10 ; x 10 (pseudo instruction) addu $t2, $t2, $a divu $t2, $a mfhi $t2 ; New remainder sw $t2, 0($t1) ; Save it into array mflo $t sw $t3, 4($sp) ; Save ADigit for later la $t1, B_REMARRAY ; addu $t1, $t1, $t0 ; address of BRemainder lw $t2, 0($t1) ; Get BRemainder mul $t2, $t2, 10 ; x10 (pseudo-instruction) addu $t2, $t2, $t3 ; Add in ADigit divu $t2, $a mfhi $t2 ; New BRemainder sw $t2, 0($t1) ; Save back into array move $a2, $a3 ; sign (third arg) move $a1, $a0 ; digitnum (second arg) mflo $a0 ; Get BDigit jal AddInDigit lw $a0, 32($sp) ; Restore digitnum (arg 1) lw $a1, 4($sp) ; Restore ADigit to $a lw $a2, 24($sp) ; restore oddnum lw $a3, 20($sp) ; restore sign lw $a4, 16($sp) ; restore xsquared lw $a5, 12($sp) ; restore termID lw $v0, 8($sp) ; restore maxTermID (will return) bne $a5, $v0, finalcheck ; termId != maxTermID beq $t3, $r0, finalcheck ; ADigit == 0 sll $t1, $a5, 2 la $t1, A_REMARRAY addu $t1, $t1, $t0 ; address of A_REMARRAY[termID] sw $r0, 4($t1) ; store zero at A_REMARRAY[termID+1] la $t1, B_REMARRAY addu $t1, $t1, $t0 ; address of B_REMARRAY[termID] sw $r0, 4($t1) ; store zero at B_REMARRAY[termID+1] addiu $v0, $v0, 1 ; maxterm++

finalcheck: blt $a5, $v0, return ; Check termID < maxtermID (pseudo-op) addiu $a2, $a2, 2 ; oddnum+ subu $a3, $r0, $a3 ; sign = -sign addiu $a5, $a5, 1 ; termID+ jal stream

return: lw $ra, 36($sp) addiu $sp, $sp, 36 ; restore stack jr $ra ; return

Problem 3c: Explain the initialization of the A_REMVALUE[] and B_REMVALUE[] arrays if we were

going to compute (^)

4 arctan. What is the purpose of the termID and maxtermID

parameters?

We are just going to fold the 4 into our calculations. If we let the 4 be part of the A (^0) computation, then every other term will be multiplied by 4 automatically (since A 1 depends on

A 0 , etc). Thus, we simply have an outer loop that produces the digits of 5

one at a time and feed

them to “stream”. So, we will use A_REMVALUE[] and B_REMVALUE[] for all terms beyond the first one. Since each new remainder gets zeroed as it is needed, we merely have to set the first element of each array to zero. Thus, let A_REMVALUE[0] = 0 and B_REMVALUE[0]=0.

The variable termID tracks which term of the series we are currently working on. Since the first

term ( the x

term) is a little special (It is not derived from other terms by dividing by x 2 , we will

let termID=0 be the (^) 3 3

term, termID=1 be the (^) 5 5

etc. The maxtermID is the maximum

term that we have produced nonzero values for up to now. Note that in the stages of the design, almost all terms are zero, hence we start termID=maxtermID=

Problem 3d: Explain the initialization of the FINALVALUE array:

Each digit of the FINALVALUE array must be initialized to zero before it is used. Since we are walking though the “answer” one digit at a time, we can choose to initialize this digit before we use it. (I.e. when we are working on the 10 th^ s place, we don’t care what is in the 100 th^ s or 1000 th^ s place, since we know to ignore it.

Problem 3e:

Write pseudo-code to compute (^)

4 arctan using^ stream (). Assume that the initialization in

(3c) and (3d) are accomplished..

FINALVALUE[0]=0 ; Set ones place to zero FINALVALUE[1]=8 ; This is 4/ A_REMVALUE[0]=B_REMVALUE[0] = 0 ; Start with 1 term

; Handle first digit (10 ths place) maxtermID = stream(1,8,3,-1,25,0,0) for (digitnum=2; true; digitnum=digitnum+2) { FINALVALUE[digitnum] = 0; maxtermID=stream(digitnum,0,3,-1,25,0,maxtermID); }

[ This page intentionally left blank]

op | rs | rt | rd | shamt | funct = MEM[PC] op | rs | rt | Imm16 = MEM[PC]

INST Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] - R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LW R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 SW MEM[R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(Imm16) || 00 else PC ← PC + 4

For your reference, here is the microcode for two of the 6 MIPS instructions:

Label ALU SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence Fetch Add PC 4 ReadPC IR ALU Seq Dispatch Add PC ExtShft Dispatch

RType Func rs rt Seq rd-ALU Fetch BEQ Sub rs rt ALUoutCond Fetch

In this problem, we are going to add four new instructions to this data path:

jal ⇒ PC ← zero_ext(Instr[25:0]) || 00 R[31] ← PC + 4 add $rd, $rs, $rt ⇒ if (R[rs]+ R[rt] doesn’t overflow) then R[rd] ← R[rs] + R[rt] PC←PC+ Else EPC←PC Cause← 12 PC←0x mfc0 $rd, $rt if ($rt == 13) then R[rd] ←Cause Else if ($rt == 14) then R[rd] ←EPC PC←PC+

compmul $rd, $rs, $rt ⇒ R[rd]=(R[rs]×R[rt]) – (R[rs+1]×R[rt+1]) R[rd+1]= (R[rs]×R[rt])+(R[rs+1]×R[rt+1]) PC←PC+ This math was a typo. The real way to compute complex multiply is: compmul $rd, $rs, $rt ⇒ R[rd]=(R[rs]×R[rt]) – (R[rs+1]×R[rt+1]) R[rd+1]= (R[rs]×R[rt+1])+(R[rs+1]×R[rt]) PC←PC+ We will give the solution with the original spec (for fairness)

The jal instruction is familiar to you from the normal MIPS instruction set.
The add instruction is a normal add except that it causes an overflow exception if there is overflow. You need to implement the EPC (error PC) and Cause registers. Just assume that EPC gets the PC of the bad instruction and Cause gets the number 12.
The mfc0 instruction is used to get the EPC and Cause values into normal registers
The compmul instruction does a complex multiply. It is assumed that the registers rd, rs, and rt are even registers and that the two source complex values are in R[rs], R[rs+1] (real, imaginary) and R[rt], R[rt+1] (real, imaginary), and that the results are put into R[rd] and R[rd+1] (real,imaginary).

Problem 4a: (2 pts) How wide are microinstructions in the original datapath (answer in bits and show some work!)?

2 + 1 + 3 + 2 + 2 + 1 + 2 + 2 = 15 bits wide

The trickiest part of this computation is the PC Write field. We have to remember to represent the “do nothing” option, which means that there are actually three different values for the PC Write field.

Problem 4b: (4 points) Draw a block diagram of a microcontroller that will support the new instructions (it will be slightly different than that required for the original instructions). Include sequencing hardware, the dispatch ROM, the microcode ROM, and decode blocks to turn the fields of the microcode into control signals. Make sure to show all of the control signals coming from somewhere. ( hint: The PCWr, PCWrCond, and PCSrc signals must come out of a block connected to thePCWrite field of the microinstruction).

2 points were given for drawing a decent microcontroller for the old datapath. 1 point was given if the branching (exception) mechanism was implemented with a mux. Another point was given for showing some new control signals (EPCWrite is the most notable).

3) Expand PCSrc mux to take in 0x80000080.

mfc0: 4 points

1) 13 and 14 only differ by 1 bit, so just use a mux with the LSB of $rt as the selector to choose between Cause and EPC. Any other values of $rt are dontcares.

2) Expand MemtoReg mux to take in the CauseOrEPC.

Alternatively, some students expanded SRC1 to be able to have the value of CauseOrEPC, but this has the disadvantage that you need to create a way for SRC to be forced to zero, and mfc0 would then require 4 instead of 3 microinstructions.

compmul: 4 points Correction: The math in the original test was wrong. The spec given on the exam was: compmul $rd, $rs, $rt => R[rd] (R[rs]R[rt]) – (R[rs+1]R[rt+1]) R[rd+1] (R[rs]R[rt]) + (R[rs+1]R[rt+1]) PC PC + 4

But anyways, this error makes the problem a bit simpler, because with the buggy problem we need to calculate only two products instead of four, so this solution will go with the original instructions.

1) Add 32-bit multiplication capability. Either add the multiply operation to the ALU or put down a multiplier that takes in the same inputs as the ALU.

2) Add registers to store products.

You need at least two. Well, actually if a multiply-accumulate unit is used instead of a multiplier, you could go with just one, but that would make things complicated.

3) Expand ALUSelA and ALUSelB muxes to take in these products.

4) Add capability to read rs+1 and rt+1.

Some students did this with 5-bit adders and muxes. That’s fine, but you don’t need that much hardware because the registers are guaranteed to be even.

5) Add capability to read rd+1.

MIPS Rating - Computer Architecture and Engineering - Solved Exams, Exams of Computer Architecture and Organization

Related documents

Partial preview of the text

Download MIPS Rating - Computer Architecture and Engineering - Solved Exams and more Exams Computer Architecture and Organization in PDF only on Docsity!

Midterm I

SOLUTIONS

[ This page left for π ]

CPI = (0.35 × 1) + (0.25 × 2) + (0.25 × 3) + (0.15 × 13) = 3.

MIPS = 300MhZ ÷ 3.55 = 84.

2 as n →∞.

devoted to floating point (in time!). So, f=CPI float/CPI=(0.15 × 5) ÷ 2.35 = 0.

CPI computed with “zerocycle” floating-point instructions: 2.35-(0.15 × 5) = 1.

MIPS = 300/1.6 = 187.

Problem 2: Parallel Prefix

A

B

Y

I 15 I 14 I 13 I 12 I 11 I 10 I 9 I 8 I 7 I 6 I 5 I 4 I 3 I 2 I 1 I 0

In this problem, we are going to add four new instructions to this data path: