Computer Arithmetic in Scientific Computing - Assignment | CS 340 | Assignments Computer Science

CS 340 – Introduction to Scientific Computing

Homework Project # 4: Computer Arithmetic

Purpose of the Project:

This project was designed to make sure each student is comfortable with IEEE floating point numbers and

computer arithmetic, as well as potential errors associated with floating point arithmetic in limited preci-

sion environments.

You are required to write your solutions to this homework in L

X. You are to submit the electronic

portions of your homework (the .tex file, the .pdf file, and the graphics files) via WebCAT.

1. Suppose you are working with a machine that uses 6 bits to represent a number: 1 bit to represent

the sign, s, 2 bits to represent the exponent (characteristic), c, and 3 bits to represent the fraction

(mantissa), f.

(a) [4 points] Determine the value for “shift” here. Explain your reasoning.

(b) [6 points] Use a graphics package to draw a number line, and plot and label all of the positive

machine numbers representable with this machine on it. (Hint: you may want to color code

them for your own use, with one color for each cvalue.) Give the illustration here.

knew you needed to represent much smaller (in magnitude) numbers than what you’ve listed

above, give TWO ways in which you would change your number representation (in bits and/or

formula) to achieve this. Explain your reasoning.

2. Recall our Mini-Machine from class, that used 6 bits to represent a number: 1 bit to represent

the sign, s, 3 bits to represent the exponent (characteristic), c, and 2 bits to represent the fraction

(mantissa), f. In this case our floating point number was of the form

(−1)s2c−3(1 + f)

Suppose that was the “short form” representation for machine numbers on the Mini-Machine, and

there’s also a “long form” available that utilizes twice as many bits to represent a number. With

the long form, the bits are distributed in the following fashion: 1 bit to represent the sign, 5 bits

to represent the exponent, and 6 bits to represent the fraction. The long form of the floating point

number is then given by

(−1)s2c−15(1 + f)

(a) [5 points] Explain why all of the short form numbers can be represented in the long form.

(b) [5 points] Find machine epsilon if the long format is our most precise representation of num-

bers?

form floating point numbers? Explain your reasoning. (You really don’t want to enumerate

these. There are at least two ways to come up without enumerating them.)

(d) The following machine number is given in the “long format”.

1 10111 011101

i. [3 points] Determine it’s base 10 (decimal) equivalent. Show your calculations.

ii. [4 points] Find the next largest and smallest machine numbers, and represent these in their

“long format” as well as with their decimal equivalent. Label your answers accordingly.

Computer Arithmetic in Scientific Computing - Assignment | CS 340, Assignments of Computer Science