My Project
cs270 Programming Assignment P3 - Floating Point in C
Programming Assignment P3: CS270 Computer Organization

Programming Assignment P3
Floating Point in C


Programming due Tuesday, Sep. 13 at 10:00pm, late deadline at 11:59pm.


About The Assignment

This assignment is designed to teach you how to do several floating point point operations in C without using either the float or double types. You will learn how to use the C language operators for binary and (&), binary or (|), and binary not (~). You will also use the C language bit shift operators (<< and >>). You will also learn simple pointer operations using C's address-of operator (&) and dereference operator (*).

When you have completed this assignment you will understand how floating point values are stored in the computer, and how to perform several operations in the case where the underlying hardware/software does not provide floating point support. For example, the LC3 computer you will use later in this course has no floating point support.

First read the introductory sections below and then study the documentation for flt32.h in the Files tab to understand the details of the assignment.


Basic Bit Manipulation

The binary and operation (&) will be used to extract the value of a bit and to set a bit to 0. This relies on the fact that any bit anded with 1 results in the original bit. Also, any bit anded with 0 results in 0. The binary or operation (|) is used to set a bit to 1. This relies on the the fact that any bit ored with 1 results in a 1.

You will create masks. A mask is a bit pattern that contains 0's and 1's in appropriate places so that when the binary and/or operation is performed, the result has extracted/modified the bits of interest. For example, suppose we want to extract bits 2, 3, and 4 from the 32-bit binary number 1001 0001 1010 1100 0011 1001 1110 1101 and clear all other bits. Remember that bit 0 is the rightmost bit. We can achieve this by anding this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 & (AND)
    mask: 0000 0000 0000 0000 0000 0000 0001 1100
          ---------------------------------------
  result: 0000 0000 0000 0000 0000 0000 0001 0100 = 0x14

Notice I designed this mask by placing ones in the bits I am interesting in extracting while placing zeroes elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0x1C;
  int result = value & mask;
  printf("0x%X\n", result); // This will print 0x14

Now, suppose we want to clear bits 5, 6, and 7 from the same number while leaving the rest of the bits untouched. We can achieve this by anding this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 & (AND)
    mask: 1111 1111 1111 1111 1111 1111 0001 1111
          ---------------------------------------
  result: 1001 0001 1010 1100 0011 1001 0001 0101 = 0x91AC2915

Notice I designed the mask by placing zeroes in the bits I am interesting in clearing while placing ones elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0xFFFFFF1F;
  int result = value & mask;
  printf("0x%X\n", result); // This will print 0x91AC2915

Now, suppose we want to set bits 28, 29, and 30 in the same number while leaving the rest of the bits untouched. We can achieve this by oring this number with the mask below:

   value: 1001 0001 1010 1100 0011 1001 1111 0101 | (OR)
    mask: 0111 0000 0000 0000 0000 0000 0000 0000
          ---------------------------------------
  result: 1111 0001 1010 1100 0011 1001 1111 0101 = 0xF1AC29F5

Notice I designed the mask by placing ones in the bits I am interesting in setting while placing zeroes elsewhere. In C, this could be written as

  int value = 0x91AC29F5;
  int mask = 0x70000000;
  int result = value | mask;
  printf("0x%X\n", result); // This will print 0xF1AC29F5

Creating masks is an art that is mastered through practice. Sometimes, masks can be hard-coded (as above). Other times, masks depend on other variables. For example, you may not know in advance which bits you need to modify/extract. In these cases, you will need to create masks by using other operations such as negation, subtraction, and shifts (this is where creativity and hard work comes in).

In this program, you will need to create masks to extract the sign/exponent/mantissa fields out of a IEEE-754-formatted number and use shift operations to convert them to values you can use. When you have computed the answer you will use shift and bitwise operations to reassemble the parts into the correct format.


Getting Started

Perform the following steps
  1. Create a directory for this assignment.
  2. Copy four files into this directory. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
  3. Open a terminal and make sure you are in the directory you created in step 1. The cd command can be used for this.
  4. In the terminal type the following command to build the executable.
        make
    You should see the following output:
        gcc -std=c11 -g -Wall -c flt32.c
        gcc -std=c11 -g -Wall -c testFlt32.c
        gcc -std=c11 -g -o testFlt32 flt32.o testFlt32.o
  5. In the terminal type ./testFlt32 and read how to run the the program. There are 8 functions which you must implement. The testFlt32 program will call one of your functions depending on the arguments that you pass. For example, if you type ./testFlt32 sign -1.25, your flt32_get_sign(...) function will be called with -1.25 (represented as a 32-bit integer on which you can perform bitwise operations). This allows you to test one function at a time.
  6. In the terminal type ./testFlt32 bin -3.625 and you should see the output:
        dec: -1066926080  hex: 0xC0680000
        bin: 1100-0000-0110-1000-0000-0000-0000-0000    
    What you are seeing it the internal bit pattern of the floating point value -3.625 expressed as an integer, as hex, and as binary.

You now have a functioning program. All the commands (abs, all, add, etc.) do something. However, only bin will produce correct results at this point.


Completing the Code

Before attempting to write any of the functions in flt32.c, study the documentation found in the Files tab. You will be especially interested in the documentation for flt32.h. Plan what you need to do before writing code.

The best way to complete the code is to follow a write/compile/test sequence. Do not attempt to write everything at once. Rather choose one function and do the following steps.

  1. Write some/all of one function in flt32.c using your favorite editor.
  2. Save your changes and recompile using make. You will find it convenient to work with both a terminal and editor window at the same time.
  3. Repeat steps 1 and 2 until there are no errors or warnings.
  4. Test the function you have been working on. Do not attempt to move on until you complete and thoroughly test a function.
  5. Repeat steps 1 thru 5 for the remaining functions.

You should work on your functions in the following suggested order. A sample solution prepared by the author contained the shown approximate line counts (including empty lines):

Your code may be a little longer, but in every case, these methods are quite simple. If you find any of your solutions is much longer that stated, you will want to think about how you are approaching the problem.

Floating Point Addition

The single function flt32_add() is the only complex function in this assignment. Many of the things you need to do can be done by calling the support methods you have already written and thoroughly tested.

The general algorithm for floating point addition is as follows:

  1. Extract the sign, exponent, and value from the 32-bit operands.
  2. Adjust values so both operands have identical exponents.
  3. Convert the signed magnitude operands to two's complement.
  4. Do an integer addition of the values.
  5. Convert the two's complement sum back to signed magnitude.
  6. Normalize the result by adjusting the exponent and shifting the sum value.
  7. Reassemble the sign, exponent and value into a 32-bit value.
This is just the general outline. You may need to handle special cases. The following references will be useful:

Important Notes

We will not consider negative zero (-0.0) to be a valid value. This would correspond to bit 31 equal to 1 and the rest of the bits equal to 0. Therefore, we will not pass this value to your functions. We also expect your flt32_abs, flt32_negate, flt32_add, and flt32_sub functions not to return a negative zero.

The only IEEE-754 special case we expect you to handle is 0.0 (when the bit pattern is all zeroes). For example, your code should be able to add/subtract any number and 0.0. It should also return 0.0 correctly when the result of the addition/subtraction is 0.0. You should also negate 0.0 correctly (the negation of 0.0 is not -0.0!).

Since flt32_add consists in multiple steps, you may want to make sure that the bit pattern you get after each step matches what you expect. Hence, for debugging purposes, we have provided you with the printBinary function. You can call it by providing a single argument. It will print the argument as a binary pattern. Make sure you get rid of all printBinary and printf calls (and any other function calls that print something to the screen) prior to turning in your solution. Otherwise, your program will not match the expected output in the auto-grader.


Grading Criteria

Submit the single file flt32.c to the Checkin tab on the course website, as you were shown in the recitation.
© 2016 CS270 Colorado State University. All Rights Reserved.