Essentials
There are two submissions for this assignment:
P4A and
P4B.
The most challenging part of these assignments is becoming comfortable with the way the data structures are organized.
The following resources from previous semesters will help you understand the big picture.
Some of the details may be different.
- A video about symbol tables and hash tables.
- A video about data structure organization.
- A visualization of the data structure organization.
It also shows you the declarations of the C structures used in this assignment.
It is very important to understand how the declarations relate to the drawing.
You will find it useful to have this page open while you code.
See the class Progress page for due dates of these submissions.
P4A
P4A uses some basic commands that allow you to add and list entries in the symbol table.
The following commands are tested in this part of the assignment.
- init capacity
- add name address
- list
- exit/quit
These commands require the implementation of any functions used to perform these operations, such as:
- symbol_init()
- symbol_add()
- symbol_iterate()
You might find it useful to also implement
symbol_search() for use in the
symbol_add() function.
P4B
P4B uses all of the remaining commands.
You can run the program and type
help to view a list of all commands which will be tested.
This will require you to implement any functions in
symbol.c that were not implemented in P4A.
Goals of Assignment
In this assignment you will build a symbol table. This assignment serves
several purposes:
- Learn about C structures (struct) and how to embed one
structure within another.
- Learn how to use C pointers and the address-of operator
&
- Learn how to allocate and free memory using malloc()/calloc()/free()
- Learn a little bit about hash tables
- Learn about function pointers
- Learn to use a simple Debugging framework
- Build a module that will be reused in future assignment(s).
Overview
A symbol table is a data structure that relates an identifier to information
about that identifier. For this assignment, the identifiers are the labels on
lines of LC3 assembly language statements. The information associated with the
label is its address. For more information on symbol tables see this
wikipedia entry.
In this assignment, you will use a hash table with chaining. You will learn
more about hash tables in your Data Structures course. The data structures you
use are already defined for you in symbol.h/symbol.c.
The hash function you will use computes an integer value from a string. More
than one string may hash to the same value. However, a good hash
function will minimize this. You can think of the hash value as an "alias" for
the string. Thus, when searching for a string, you first search for the hash
value (using integer comparison, which is fast) and only when you find an equal
hash value, use strcasecmp() (much slower than integer comparison)
to verify you have found the string of interest.
The symbol table contains a hash_table which is an array. The
index into this array is calculated as index = hash-value % size-of-hash-table.
A given index in the hash table contains a pointer to the head of a singly
linked list of nodes containing symbols whose hash values share the same index.
There is a second table in the symbol table structure. It is a
dynamically allocated array of char* (C's version of strings),
that allow a fast lookup of a symbols name given its address.
Here is a graphical representation of the data
structure. Make sure you understand how the declarations relate to the drawing.
Getting Started
- Create a directory for this assignment and cd to it
- Copy the following files to the directory you created. It is easiest to
right click on the link, and do a Save Target As.. for each
of the files.
- Build the executable by typing make
- You now have a functional program, but it will not produce the
correct results.
- Type ./testSymbol to start the program. Then type help
to print the usage message. This program is like a shell in that you type
commands, and the program performs the selected action (sometimes silently).
Study the commands to understand how to test individual functions.
NOTE The init must be the first command used if you
want to test your code. Type exit to end the program.
- You are now ready to begin implementation. You have a number of functions to
implement. In the reference implementation, approximately 110 line of code
were added (including blank lines). Implement functions one at a time and
test as you go.
Implementing symbol_init()
In this function, you must allocate space for your symbol table and initialize
it appropriately. Compare
malloc()/calloc(). You will also
allocate and initialize
addr_table[].
Here is a graphical representation of the data
structure. Make sure you understand how the declarations relate to the drawing.
Implementing symbol_search()
This function will do a lot of the work needed for the functions
symbol_add() and
symbol_find_by_name(). The basic
algorithm to search for a symbol by name is:
- compute the hash value for the name using symbol_hash()
- compute the index based on (hash-value % table-capacity)
- search the linked list to find the symbol. To speed thing up search first
for the hash value. Only when the hash values match do you need to do a
case independent string comparison.
- return the information if the name is found, or NULL otherwise
You can test implementation by using the command
search in the
test driver.
Implementing symbol_add()
This function adds a symbol to the symbol table if it does not already exist.
If it does not exist, you must allocate memory for the appropriate structure,
initialize the fields of the structure appropriately and store in at the
correct index in the symbol table. The entry at a given index of the symbol
table is a pointer to the head of a linked list of nodes. The symbol just
created should become the new
head of the list. You may test your
implementation using the
add command with the test driver.
A hash table is designed to spread out the entries, often sacrificing space for
speed. To thoroughly test your function, you will need to force multiple names
to occupy the same index in the hash table. Think about the pigeon hole
principle from CS161, study the code and you should understand what to
modify to test your code. Your code will be tested with a different values.
Implementing the remaining functions
Implement the remaining functions one by one and test them as you go.
- The function symbol_iterate() is only a few (10) lines of code.
It consists of nested loops with the outer loop varying over the symbol table
and the inner loop varying over the linked list at the current index. Using
a function pointer is really straight forward. The actual line of code you will
use is:
(*fnc)(symbol, data);
This is almost identical to calling a normal function named fnc()
except that you would write:
fnc(symbol, data);
You can test your function using the commands count/list in the test
driver
- The function symbol_find_by_name() is only a few lines of code.
Most of the work is done by symbol_search(). Test by using the
command get in the test driver.
- The function symbol_reset() consist of two nested loops as you
wrote for symbol_iterate(). For each node you will need to free
any allocated memory. Test it with the command reset in the test
driver.
- The function symbol_term() may use symbol_reset to
do most of the work. Be sure and deallocate the main symbol table as well. This
will be tested when the test driver terminates.
- Implement the remaining functions defined in the TODO list.
Thoroughly test your code using the test driver. You may find it convenient to
build a file containing a series of commands to execute, then use this file to
test your program without having to type in commands over and over.
Execute
./testSymbol [-debug] < testFile
Checking in Your Code
You will submit the single file
symbol.c using the
checkin program. Use the key
SYMBOL. At the terminal
type:
~cs270/bin/checkin SYMBOL symbol.c
(if that doesn't work, try typing: ~cs270/bin/checkin P4A symbol.c)