











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Main points of this past exam are: Mips Rating, Yield Runtime, Floating Point, Lower-Performance Version, Original Processor, Mips Rating, Original Cost, New Processor, Parallel Prefix, Possible Speedup
Typology: Exams
1 / 19
This page cannot be seen from the preview
Don't miss anything!
University of California, Berkeley College of Engineering Computer Science Division EECS
Spring 2001 John Kubiatowicz
March 1, 2001 CS152 Computer Architecture and Engineering
Your Name:
SID Number:
Discussion Section:
Problem Possible Score
1 20
2 20
3 30
4 30
Total
Problem 1c: What is the CPI and MIPS rating of the new processor?
Problem 1d: What is the original cost per (working) processor?
Problem 1e: What is the new cost per (working) processor?
Problem 1f: Assume that we are considering the other direction of improving the original processor by increasing the speed of floating point. What is the best possible speedup that we could get, and what would the CPI and MIPS rating be of the new processor?
Assume the following characteristics for NAND gates: Input load: 120fF, Internal delay: TPlh=0.3ns, TPhl=0.6ns, Load-Dependent delay: TPlhf=.0020ns, TPhlf=.0021ns
Problem 2a: Suppose that we construct an XOR, as follows:
Compute the standard parameters for the linear delay models for this complex gate, assuming the parameters given above for the NAND gate. Assume that a wire doubles the input capacitance of the gate that it is attached to:
A Input Capacitance: Load-dependent Delays: B Input Capacitance : TPAYlhf: TPAYhlf: TPBYlhf: TPBYhlf:
Maximum Internal delays for A⇒Y: TPAYlh:
TPAYhl:
Problem 2c: Now, put these 2-input blocks together to produce a 4-input block that takes I 0 , I 1 , I 2 , and I 3 , and C (^) down and produces: O 0 = I 0 ⊕ C (^) down O 1 = I 1 ⊕ I 0 ⊕ C (^) down O 2 = I 2 ⊕ I 1 ⊕ I 0 ⊕ C (^) down O 3 = I 3 ⊕ I 2 ⊕ I 1 ⊕ I 0 ⊕ C (^) down C (^) up = I 3 ⊕ I 2 ⊕ I 1 ⊕ I 0 Your goal is to minimize the output delay of each block.
Compute the input capacitance for each input:
Identify the critical path of your circuit and compute the unloaded delay for this path.
Problem 2d: Finally, show how the 4 input prefix circuit can be used as a building block to produce a 16- element prefix circuit that minimizes gate reuse and which has minimal delay. What is the critical path and how many XOR gates are in it?
Hint: this is very similar to a carry-lookahead adder.
Problem 2e :
How many XOR gates are in the critical path of a 64-bit parallel-prefix circuit?
Recall how divide (in base 10) works The following shows a division of 1 by 23:
Suppose we had a procedure that produced each of the digits (zeros) in the dividend, one at a time. Consider the remainders as integers from the current decimal point. So, for instance, we have the remainders 1, 10, 100, 80, 110, 180, etc. At each stage, we multiply by ten, add the incoming digit (zero in the example), then
This could be combined with the current remainder but multiplying the remainder by 10, adding the new digit (which is zero in this case), then seeing how much the result divides the answer.
Here is complete pseudo code for computing one of the streams:
Stream (digitnum,incoming,oddnum,sign,xsquared,termID,maxtermID) { ARemainder = A_REMARRAY [termID]; ARemainder = ARemainder × 10 + incoming;
; This is a quotient/remainder operation (ADigit, ARemainder) = ARemainder / xsquared; A_REMARRAY[termID] = ARemainder;
BRemainder = B_REMARRAY [termID]; BRemainder = BRemainder × 10 + Adigit; (BDigit, BRemainder) = BRemainder / oddnum; B_REMARRAY[termID] = BRemainder;
AddInDigit (BDigit, digitnum, sign);
If ((termID = maxtermID ) && (ADigit != 0)) { A_REMARRAY[termID+1] = 0; maxtermID++; }
If (termID < maxtermID) { Stream (digitnum, ADigit,(oddnum+2),-sign, xsquared, (termID+1), maxtermID); } }
Remainders
Problem 3a: Write MIPS assembly for this pseudo code. Make sure to adhere to MIPS conventions. Assume that A_REMARRAY[] and B_REMARRAY[] are word arrays that are addressed via constants (assume that you can use the la pseudo instruction to load their addresses into registers. Also, assume that there are 7 argument registers ($a0 - $a6) for the sake of this problem. Note that AddInDigit is a procedure call.
Problem 3c: Explain the initialization of the A_REMVALUE[] and B_REMVALUE[] arrays if we were
going to compute
4 arctan. What is the purpose of the termID and maxtermID
parameters?
Problem 3d: Explain the initialization of the FINALVALUE array:
Problem 3e:
Write pseudo-code to compute (^)
4 arctan using stream (). Assume that the initialization in
(3c) and (3d) are accomplished..
[ This page intentionally left blank]
In class, we made our multicycle machine support the following six MIPS instructions:
op | rs | rt | rd | shamt | funct = MEM[PC] op | rs | rt | Imm16 = MEM[PC]
INST Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] - R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LW R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 SW MEM[R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(Imm16) || 00 else PC ← PC + 4 For your reference, here is the microcode for two of the 6 MIPS instructions:
Label ALU SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence Fetch Add PC 4 ReadPC IR ALU Seq Dispatch Add PC ExtShft Dispatch
RType Func rs rt Seq rd-ALU Fetch BEQ Sub rs rt ALUoutCond Fetch
jal
add $rd, $rs, $rt ⇒ if (R[rs]+ R[rt] doesn’t overflow) then R[rd] ← R[rs] + R[rt] PC←PC+ Else EPC←PC Cause← 12 PC←0x mfc0 $rd, $rt if ($rt == 13) then R[rd] ←Cause Else if ($rt == 14) then R[rd] ←EPC PC←PC+
compmul $rd, $rs, $rt ⇒ R[rd]=(R[rs]×R[rt]) – (R[rs+1]×R[rt+1]) R[rd+1]= (R[rs]×R[rt])+(R[rs+1]×R[rt+1]) PC←PC+ This math was a typo. The real way to compute complex multiply is: compmul $rd, $rs, $rt ⇒ R[rd]=(R[rs]×R[rt]) – (R[rs+1]×R[rt+1]) R[rd+1]= (R[rs]×R[rt+1])+(R[rs+1]×R[rt]) PC←PC+
Problem 4a: How wide are microinstructions in the original datapath (answer in bits and show some work!)?
Problem 4b: Draw a block diagram of a microcontroller that will support the new instructions (it will be slightly different than that required for the original instructions). Include sequencing hardware, the dispatch ROM, the microcode ROM, and decode blocks to turn the fields of the microcode into control signals. Make sure to show all of the control signals coming from somewhere. ( hint: The PCWr, PCWrCond, and PCSrc signals must come out of a block connected to thePCWrite field of the microinstruction).
Problem 4d: Describe changes to the microinstruction assembly language for these new instructions. How wide are your microinstructions now?
Problem 4e: Write complete microcode for the new instructions. Include the Fetch and Dispatch microinstructions. If any of the microcode for the original instructions must change, explain how ( Hint: since the original instructions did not use R[rd] as a register input, you must make sure that your changes do not mess up the original instructions).