Essentials
There are two submissions for this assignment:
P4A and
P4B.
The most challenging part of these assignments is becoming comfortable with the way the data structures are organized.
The following resources from previous semesters will help you understand the big picture.
Some of the details may be different.
- A video about symbol tables and hash tables.
- A video about data structure organization.
- A visualization of the data structure organization.
It also shows you the declarations of the C structures used in this assignment.
It is very important to understand how the declarations relate to the drawing.
You will find it useful to have this page open while you code.
See the class Progress page for due dates of these submissions.
P4A
P4A uses some basic commands that allow you to add and list entries in the symbol table.
The following commands are tested in this part of the assignment.
- init capacity
- add name address
- list
- exit/quit
These commands require the implementation the following functions:
- symbol_init()
- symbol_add()
- symbol_iterate()
- symbol_search()
P4B
P4B uses all of the remaining commands, which will require implementing the rest of the functions in
symbol.c. See below for the list of commands.
Commands
init capacity - create symbol table with given capacity, MUST be first action taken
add name address - prints 1 on success, 0 on failure
count - prints count of names/addresses uses function pointers
debug level - turn debug on/off (0 is off)
exit/quit - terminates program
get name - prints NULL or name/address
label address - prints NULL or name associated with address
list - prints all names/addresses uses function pointers
order option - calls symbol_order and prints value(s), option is HASH|NAME|ADDR
search name - prints NULL or name/address and hash/index
size - call symbol_size()
reset - call symbol_reset()
Goals of Assignment
In this assignment you will build a symbol table. This assignment serves
several purposes:
- Learn about C structures (struct) and how to embed one
structure within another.
- Learn how to use C pointers and the address-of operator
&
- Learn how to allocate and free memory using malloc()/calloc()/free()
- Learn a little bit about hash tables
- Learn about function pointers
- Learn to use a simple Debugging framework
- Build a module that will be reused in future assignment(s).
Overview
A symbol table is a data structure that relates an identifier to information
about that identifier. For this assignment, the identifiers are the labels on
lines of LC3 assembly language statements. The information associated with the
label is its address. For more information on symbol tables see this
wikipedia entry.
In this assignment, you will use a hash table with chaining. You will learn
more about hash tables in your Data Structures course. The data structures you
use are already defined for you in symbol.h/symbol.c.
The hash function you will use computes an integer value from a string. More
than one string may hash to the same value. However, a good hash
function will minimize this. You can think of the hash value as an "alias" for
the string. Thus, when searching for a string, you first search for the hash
value (using integer comparison, which is fast) and only when you find an equal
hash value, use strcasecmp() (much slower than integer comparison)
to verify you have found the string of interest.
The symbol table contains a hash_table which is an array. The
index into this array is calculated as index = hash-value % size-of-hash-table.
A given index in the hash table contains a pointer to the head of a singly
linked list of nodes containing symbols whose hash values share the same index.
There is a second table in the symbol table structure. It is a
dynamically allocated array of char* (C's version of strings),
that allow a fast lookup of a symbols name given its address.
Here is a graphical representation of the data
structure. Make sure you understand how the declarations relate to the drawing.
Getting Started
- Create a directory for this assignment and cd to it
- Copy the following files to the directory you created. It is easiest to
right click on the link, and do a Save Target As.. for each
of the files.
- Build the executable by typing make
- You now have a functional program, but it will not produce the
correct results.
- Type ./testSymbol to start the program. Then type help
to print the usage message. This program is like a shell in that you type
commands, and the program performs the selected action (sometimes silently).
Study the commands to understand how to test individual functions.
NOTE The init must be the first command used if you
want to test your code. Type exit to end the program.
- You are now ready to begin implementation. You have a number of functions to
implement. In the reference implementation, approximately 110 line of code
were added (including blank lines). Implement functions one at a time and
test as you go.
Implementing symbol_init()
In this function, you must allocate space for your symbol table and initialize
it appropriately. Compare
malloc()/calloc(). You will also
allocate and initialize
addr_table[].
Here is a graphical representation of the data
structure. Make sure you understand how the declarations relate to the drawing.
Implementing symbol_search()
This function will do a lot of the work needed for the functions
symbol_add() and
symbol_find_by_name(). The basic
algorithm to search for a symbol by name is:
- compute the hash value for the name using symbol_hash()
- compute the index based on (hash-value % table-capacity)
- search the linked list to find the symbol. To speed thing up search first
for the hash value. Only when the hash values match do you need to do a
case independent string comparison.
- return the information if the name is found, or NULL otherwise
You can test implementation by using the command
search in the
test driver.
Implementing symbol_add()
This function adds a symbol to the symbol table if it does not already exist.
If it does not exist, you must allocate memory for the appropriate structure,
initialize the fields of the structure appropriately and store in at the
correct index in the symbol table. The entry at a given index of the symbol
table is a pointer to the head of a linked list of nodes. The symbol just
created should become the new
head of the list. You may test your
implementation using the
add command with the test driver.
A hash table is designed to spread out the entries, often sacrificing space for
speed. To thoroughly test your function, you will need to force multiple names
to occupy the same index in the hash table. Think about the pigeon hole
principle from CS161, study the code and you should understand what to
modify to test your code. Your code will be tested with a different values.
Also remember to store a char * that point to the symbols name in the appropriate location in the addr_table
Implementing the remaining functions
Implement the remaining functions one by one and test them as you go.
- The function symbol_iterate() is only a few (10) lines of code.
It consists of nested loops with the outer loop varying over the symbol table
and the inner loop varying over the linked list at the current index. Using
a function pointer is really straight forward. The actual line of code you will
use is:
(*fnc)(symbol, data);
This is almost identical to calling a normal function named fnc()
except that you would write:
fnc(symbol, data);
You can test your function using the commands count/list in the test
driver
- The function symbol_find_by_name() is only a few lines of code.
Most of the work is done by symbol_search(). Test by using the
command get in the test driver.
- The function symbol_reset() consist of two nested loops as you
wrote for symbol_iterate(). For each node you will need to free
any allocated memory. Test it with the command reset in the test
driver.
- The function symbol_term() may use symbol_reset to
do most of the work. Be sure and deallocate the main symbol table as well. symbol_term will only be called once during testing of your code. This will happen when the test driver terminates.
- The function symbol_order() creates a dynamically allocated array that you will need to fill with pointers to all symbols in the Symbol Table. You are not responsible for freeing this array. For an extra challenge try passing a function to symbol_iterate() to fill the array. Once the array is filled with symbols it must be sorted based on the order parameter. order can have the values HASH, NAME or ADDR. For the sorting part of symbol_order(), you may want to use the C library function qsort(). Here is an example of using qsort.
- Implement the remaining functions found in symbol.c
Testing
Thoroughly test your code using the test driver. You may find it convenient to
build a file containing a series of commands to execute, then use this file to
test your program without having to type in commands over and over. Here is a basic example of a testFile. You will need to extend this file to fully test the functionality of your code.
init 3
add i 111
add x 21
add i 111
add z 37
add y 323
list
To run the test driver with this file, assuming the file is called testFile and is in the same directory as the
testSymbol program,use the command:
./testSymbol [-debug] < testFile
Once you have your code working use
valgrind to ensure that there are no memory leaks or invalid memory accesses.
valgrind will be used by the
Checkin program to test your code. In general all
malloc/calloc calls in your code will need to have a corresponding call to
free for your code to not leak memory, the only exception is the array of pointers allocated in
symbol_order(). You also need to ensure that you don't derefference a pointer after you have freed its associated memory. You can run
valgrind with this command:
valgrind [-debug] ./testSymbol < testFile
Checking in Your Code
You will submit the single file
symbol.c using the
checkin program. To do this using the terminal use the command:
P4A
~cs270/bin/checkin P4A symbol.c
P4B
~cs270/bin/checkin P4B symbol.c