Programming Assignment P3: CS270 Computer Organization

Programming Assignment P3
Floating Point in C

Programming due Tuesday, Sep. 13 at 10:00pm, late deadline at 11:59pm.

About The Assignment

This assignment is designed to teach you how to do several floating point point operations in C without using either the float or double types. You will learn how to use the C language operators for binary and (&), binary or (|), and binary not (~). You will also use the C language bit shift operators (<< and >>). You will also learn simple pointer operations using C's address-of operator (&) and dereference operator (*).

When you have completed this assignment you will understand how floating point values are stored in the computer, and how to perform several operations in the case where the underlying hardware/software does not provide floating point support. For example, the LC3 computer you will use later in this course has no floating point support.

First read the introductory sections below and then study the documentation for flt32.h in the Files tab to understand the details of the assignment.

Basic Bit Manipulation

The binary and operation (&) will be used to extract the value of a bit and to set a bit to 0. This relies on the fact that any bit anded with 1 results in the original bit. Also, any bit anded with 0 results in 0. The binary or operation (|) is used to set a bit to 1. This relies on the the fact that any bit ored with 1 results in a 1.

You will create masks. A mask is a bit pattern that contains 0's and 1's in appropriate places so that when the binary and/or operation is performed, the result has extracted/modified the bits of interest. For example, suppose we want to extract bits 2, 3, and 4 from the 32-bit binary number 1001 0001 1010 1100 0011 1001 1110 1101 and clear all other bits. Remember that bit 0 is the rightmost bit. We can achieve this by anding this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 & (AND)
    mask: 0000 0000 0000 0000 0000 0000 0001 1100
          ---------------------------------------
  result: 0000 0000 0000 0000 0000 0000 0001 0100 = 0x14

Notice I designed this mask by placing ones in the bits I am interesting in extracting while placing zeroes elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0x1C;
  int result = value & mask;
  printf("0x%X\n", result); // This will print 0x14

Now, suppose we want to clear bits 5, 6, and 7 from the same number while leaving the rest of the bits untouched. We can achieve this by anding this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 & (AND)
    mask: 1111 1111 1111 1111 1111 1111 0001 1111
          ---------------------------------------
  result: 1001 0001 1010 1100 0011 1001 0001 0101 = 0x91AC2915

Notice I designed the mask by placing zeroes in the bits I am interesting in clearing while placing ones elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0xFFFFFF1F;
  int result = value & mask;
  printf("0x%X\n", result); // This will print 0x91AC2915

Now, suppose we want to set bits 28, 29, and 30 in the same number while leaving the rest of the bits untouched. We can achieve this by oring this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 | (OR)
    mask: 0111 0000 0000 0000 0000 0000 0000 0000
          ---------------------------------------
  result: 1111 0001 1010 1100 0011 1001 1111 0101 = 0xF1AC29F5

Notice I designed the mask by placing ones in the bits I am interesting in setting while placing zeroes elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0x70000000;
  int result = value | mask;
  printf("0x%X\n", result); // This will print 0xF1AC29F5

Creating masks is an art that is mastered through practice. Sometimes, masks can be hard-coded (as above). Other times, masks depend on other variables. For example, you may not know in advance which bits you need to modify/extract. In these cases, you will need to create masks by using other operations such as negation, subtraction, and shifts (this is where creativity and hard work comes in).

In this program, you will need to create masks to extract the sign/exponent/mantissa fields out of a IEEE-754-formatted number and use shift operations to convert them to values you can use. When you have computed the answer you will use shift and bitwise operations to reassemble the parts into the correct format.

Getting Started

Perform the following steps

Create a directory for this assignment.
Copy four files into this directory. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
- flt32.c (complete this file)
- flt32.h (do not modify)
- testFlt32.c (do not modify)
- Makefile (do not modify)
Open a terminal and make sure you are in the directory you created in step 1. The cd command can be used for this.

In the terminal type the following command to build the executable.

    make

You should see the following output:

    gcc -std=c11 -g -Wall -c flt32.c
    gcc -std=c11 -g -Wall -c testFlt32.c
    gcc -std=c11 -g -o testFlt32 flt32.o testFlt32.o

In the terminal type ./testFlt32 and read how to run the the program. There are 8 functions which you must implement. The testFlt32 program will call one of your functions depending on the arguments that you pass. For example, if you type ./testFlt32 sign -1.25, your flt32_get_sign(...) function will be called with -1.25 (represented as a 32-bit integer on which you can perform bitwise operations). This allows you to test one function at a time.
In the terminal type ./testFlt32 bin -3.625 and you should see the output:
```
    dec: -1066926080  hex: 0xC0680000
    bin: 1100-0000-0110-1000-0000-0000-0000-0000    
```
What you are seeing it the internal bit pattern of the floating point value -3.625 expressed as an integer, as hex, and as binary.

You now have a functioning program. All the commands (abs, all, add, etc.) do something. However, only bin will produce correct results at this point.

Completing the Code

Before attempting to write any of the functions in flt32.c, study the documentation found in the Files tab. You will be especially interested in the documentation for flt32.h. Plan what you need to do before writing code.

The best way to complete the code is to follow a write/compile/test sequence. Do not attempt to write everything at once. Rather choose one function and do the following steps.

Write some/all of one function in flt32.c using your favorite editor.
Save your changes and recompile using make. You will find it convenient to work with both a terminal and editor window at the same time.
Repeat steps 1 and 2 until there are no errors or warnings.
Test the function you have been working on. Do not attempt to move on until you complete and thoroughly test a function.
Repeat steps 1 thru 5 for the remaining functions.

You should work on your functions in the following suggested order. A sample solution prepared by the author contained the shown approximate line counts (including empty lines):

flt32_get_sign() - 1 line of code
flt32_get_exp() - 1 line of code
flt32_get_val() - 1 line of code
flt32_get_all() - 3 lines of code
flt32_abs() - 1 line of code
flt32_negate() - 3 lines of code
flt32_add() - 60 lines of code
flt32_sub() - 1 line of code

Your code may be a little longer, but in every case, these methods are quite simple. If you find any of your solutions is much longer that stated, you will want to think about how you are approaching the problem.

Floating Point Addition

The single function flt32_add() is the only complex function in this assignment. Many of the things you need to do can be done by calling the support methods you have already written and thoroughly tested.

The general algorithm for floating point addition is as follows:

Extract the sign, exponent, and value from the 32-bit operands.
Adjust values so both operands have identical exponents.
Convert the signed magnitude operands to two's complement.
Do an integer addition of the values.
Convert the two's complement sum back to signed magnitude.
Normalize the result by adjusting the exponent and shifting the sum value.
Reassemble the sign, exponent and value into a 32-bit value.

This is just the general outline. You may need to handle special cases. The following references will be useful:

Explanation of functions in P3 (video): click here
An example of floating point addition (PDF): click here
Another example of floating point addition (video): part 1 | part 2 | part 3

Important Notes

We will not consider negative zero (-0.0) to be a valid value. This would correspond to bit 31 equal to 1 and the rest of the bits equal to 0. Therefore, we will not pass this value to your functions. We also expect your flt32_abs, flt32_negate, flt32_add, and flt32_sub functions not to return a negative zero.

The only IEEE-754 special case we expect you to handle is 0.0 (when the bit pattern is all zeroes). For example, your code should be able to add/subtract any number and 0.0. It should also return 0.0 correctly when the result of the addition/subtraction is 0.0. You should also negate 0.0 correctly (the negation of 0.0 is not -0.0!).

Since flt32_add consists in multiple steps, you may want to make sure that the bit pattern you get after each step matches what you expect. Hence, for debugging purposes, we have provided you with the printBinary function. You can call it by providing a single argument. It will print the argument as a binary pattern. Make sure you get rid of all printBinary and printf calls (and any other function calls that print something to the screen) prior to turning in your solution. Otherwise, your program will not match the expected output in the auto-grader.

Grading Criteria

100 points for perfect submission.
0 points for no submission, will not compile, submitted class file, etc.
Each test can make multiple calls to the function being tested, with different values.
Preliminary Tests
- testCompile: checks that program compiles. (5 points)
- test1: calls testFlt32 with exp to check flt32_get_exp. (5 points)
- test2: calls testFlt32 with sign to check flt32_get_sign (positive number). (5 points)
- test3: calls testFlt32 with sign to check flt32_get_sign (negative number). (5 points)
- test4: calls testFlt32 with val to check flt32_get_val. (5 points)
- test5: calls testFlt32 with all to check flt32_get_all. (5 points)
Final Tests
- test6: calls testFlt32 with abs to check flt32_abs (positive number). (5 points)
- test7: calls testFlt32 with abs to check flt32_abs (negative number). (5 points)
- test8: calls testFlt32 with neg to check flt32_negate (positive number). (5 points)
- test9: calls testFlt32 with neg to check flt32_negate (negative number). (5 points)
- test10: calls testFlt32 with add to check flt32_add, using operands with identical exponents. (5 points)
- test11: calls testFlt32 with sub to check flt32_sub, using operands with identical exponents. (5 points)
- test12: calls testFlt32 with add to check flt32_add, using operands with different exponents. (10 points)
- test13: calls testFlt32 with sub to check flt32_sub, using operands with different exponents. (10 points)
- test14: calls testFlt32 with add to check flt32_add, with arbitrary values. (5 points)
- test15: calls testFlt32 with add to check flt32_add, with arbitrary values. (5 points)
- test16: calls testFlt32 with sub to check flt32_sub, with arbitrary values. (5 points)
- test17: calls testFlt32 with sub to check flt32_sub, with arbitrary values. (5 points)
- Final tests will include the preliminary tests.

Submit the single file flt32.c to the Checkin tab on the course website, as you were shown in the recitation.

Programming Assignment P3 Floating Point in C