CS 270
CS270 Programming Assignment - Symbol Table

Essentials

There are two submissions for this assignment: P4A and P4B.

The most challenging part of these assignments is becoming comfortable with the way the data structures are organized. The following resources from previous semesters will help you understand the big picture. Some of the details may be different.

See the class Progress page for due dates of these submissions.


P4A

P4A uses some basic commands that allow you to add and list entries in the symbol table. The following commands are tested in this part of the assignment. These commands require the implementation the following functions:

P4B

P4B uses all of the remaining commands, which will require implementing the rest of the functions in symbol.c. See below for the list of commands.

Commands

  init capacity    - create symbol table with given capacity, MUST be first action taken
  add name address - prints 1 on success, 0 on failure
  count            - prints count of names/addresses uses function pointers
  debug level      - turn debug on/off (0 is off)
  exit/quit        - terminates program
  get name         - prints NULL or name/address
  label address    - prints NULL or name associated with address
  list             - prints all names/addresses uses function pointers
  order option     - calls symbol_order and prints value(s), option is HASH|NAME|ADDR
  search name      - prints NULL or name/address and hash/index
  size             - call symbol_size()
  reset            - call symbol_reset()

Goals of Assignment

In this assignment you will build a symbol table. This assignment serves several purposes:

Overview

A symbol table is a data structure that relates an identifier to information about that identifier. For this assignment, the identifiers are the labels on lines of LC3 assembly language statements. The information associated with the label is its address. For more information on symbol tables see this wikipedia entry.

In this assignment, you will use a hash table with chaining. You will learn more about hash tables in your Data Structures course. The data structures you use are already defined for you in symbol.h/symbol.c.

The hash function you will use computes an integer value from a string. More than one string may hash to the same value. However, a good hash function will minimize this. You can think of the hash value as an "alias" for the string. Thus, when searching for a string, you first search for the hash value (using integer comparison, which is fast) and only when you find an equal hash value, use strcasecmp() (much slower than integer comparison) to verify you have found the string of interest.

The symbol table contains a hash_table which is an array. The index into this array is calculated as index = hash-value % size-of-hash-table. A given index in the hash table contains a pointer to the head of a singly linked list of nodes containing symbols whose hash values share the same index.

There is a second table in the symbol table structure. It is a dynamically allocated array of char* (C's version of strings), that allow a fast lookup of a symbols name given its address.

Here is a graphical representation of the data structure. Make sure you understand how the declarations relate to the drawing.


Getting Started

  1. Create a directory for this assignment and cd to it
  2. Copy the following files to the directory you created. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
  3. Build the executable by typing make
  4. You now have a functional program, but it will not produce the correct results.
  5. Type ./testSymbol to start the program. Then type help to print the usage message. This program is like a shell in that you type commands, and the program performs the selected action (sometimes silently). Study the commands to understand how to test individual functions. NOTE The init must be the first command used if you want to test your code. Type exit to end the program.
  6. You are now ready to begin implementation. You have a number of functions to implement. In the reference implementation, approximately 110 line of code were added (including blank lines). Implement functions one at a time and test as you go.

Implementing symbol_init()

In this function, you must allocate space for your symbol table and initialize it appropriately. Compare malloc()/calloc(). You will also allocate and initialize addr_table[].

Here is a graphical representation of the data structure. Make sure you understand how the declarations relate to the drawing.


Implementing symbol_search()

This function will do a lot of the work needed for the functions symbol_add() and symbol_find_by_name(). The basic algorithm to search for a symbol by name is:
  1. compute the hash value for the name using symbol_hash()
  2. compute the index based on (hash-value % table-capacity)
  3. search the linked list to find the symbol. To speed thing up search first for the hash value. Only when the hash values match do you need to do a case independent string comparison.
  4. return the information if the name is found, or NULL otherwise
You can test implementation by using the command search in the test driver.

Implementing symbol_add()

This function adds a symbol to the symbol table if it does not already exist. If it does not exist, you must allocate memory for the appropriate structure, initialize the fields of the structure appropriately and store in at the correct index in the symbol table. The entry at a given index of the symbol table is a pointer to the head of a linked list of nodes. The symbol just created should become the new head of the list. You may test your implementation using the add command with the test driver.

A hash table is designed to spread out the entries, often sacrificing space for speed. To thoroughly test your function, you will need to force multiple names to occupy the same index in the hash table. Think about the pigeon hole principle from CS161, study the code and you should understand what to modify to test your code. Your code will be tested with a different values.

Also remember to store a char * that point to the symbols name in the appropriate location in the addr_table


Implementing the remaining functions

Implement the remaining functions one by one and test them as you go.
  1. The function symbol_iterate() is only a few (10) lines of code. It consists of nested loops with the outer loop varying over the symbol table and the inner loop varying over the linked list at the current index. Using a function pointer is really straight forward. The actual line of code you will use is:
    
        (*fnc)(symbol, data); 
    This is almost identical to calling a normal function named fnc() except that you would write:
    
        fnc(symbol, data);
    You can test your function using the commands count/list in the test driver
  2. The function symbol_find_by_name() is only a few lines of code. Most of the work is done by symbol_search(). Test by using the command get in the test driver.
  3. The function symbol_reset() consist of two nested loops as you wrote for symbol_iterate(). For each node you will need to free any allocated memory. Test it with the command reset in the test driver.
  4. The function symbol_term() may use symbol_reset to do most of the work. Be sure and deallocate the main symbol table as well. symbol_term will only be called once during testing of your code. This will happen when the test driver terminates.
  5. The function symbol_order() creates a dynamically allocated array that you will need to fill with pointers to all symbols in the Symbol Table. You are not responsible for freeing this array. For an extra challenge try passing a function to symbol_iterate() to fill the array. Once the array is filled with symbols it must be sorted based on the order parameter. order can have the values HASH, NAME or ADDR. For the sorting part of symbol_order(), you may want to use the C library function qsort(). Here is an example of using qsort.
  6. Implement the remaining functions found in symbol.c


Testing

Thoroughly test your code using the test driver. You may find it convenient to build a file containing a series of commands to execute, then use this file to test your program without having to type in commands over and over. Here is a basic example of a testFile. You will need to extend this file to fully test the functionality of your code.
  init 3
  add i 111
  add x 21
  add i 111
  add z 37
  add y 323
  list
To run the test driver with this file, assuming the file is called testFile and is in the same directory as the testSymbol program,use the command:
   ./testSymbol [-debug] < testFile
Once you have your code working use valgrind to ensure that there are no memory leaks or invalid memory accesses. valgrind will be used by the Checkin program to test your code. In general all malloc/calloc calls in your code will need to have a corresponding call to free for your code to not leak memory, the only exception is the array of pointers allocated in symbol_order(). You also need to ensure that you don't derefference a pointer after you have freed its associated memory. You can run valgrind with this command:
    valgrind [-debug] ./testSymbol < testFile 
   

Checking in Your Code

You will submit the single file symbol.c using the checkin program. To do this using the terminal use the command:
P4A

  ~cs270/bin/checkin P4A symbol.c 
    
P4B
  ~cs270/bin/checkin P4B symbol.c