My Project
|
struct
) and how to embed one
structure within another.&
malloc()/calloc()/free()
If we find errors in the assignment, we will post them here. Make sure you check this often.
symbol_init()
, we said "the addr_table
member of the sym_table_t
structure allocated previously must be set to point
to the start of the hash table." It should be "start of the address table" instead of
"start of the hash table."
In this assignment, you will use a hash table with chaining. You will learn
more about hash tables in your Data Structures course. The data structures you
use are already defined for you in symbol.h/symbol.c
.
The hash function you will use computes an integer value from a string. More
than one string may hash to the same value. However, a good hash
function will minimize this. You can think of the hash value as an "alias" for
the string. Thus, when searching for a string, you first search for the hash
value (using integer comparison, which is fast) and only when you find an equal
hash value, use strcasecmp()
(much slower than integer comparison)
to verify you have found the string of interest.
The symbol table contains a hash_table
which is an array. The
index into this array is calculated as index = hash-value % capacity-of-hash-table
.
A given index in the hash table contains a pointer to the head of a singly
linked list of nodes containing symbols whose hash values share the same index.
There is a second table in the symbol table structure called the address
table. It is a dynamically allocated array of char*
(C's
version of strings) that allows for a fast lookup of a symbol's name given its
address.
cd
to itSave Target As..
for each
of the files.
make
./testSymbol
. You will see a usage message which tells
you how to use the program. You may also use help
within the
program to reprint the message. The program takes only one argument: the
size of the hash table. All functions are invoked through commands. For
example (user input is shown in blue):
./testSymbol 3
add label1 12
OK
add label2 35
OK
quit
symbol_init()
(5 lines)symbol_add_unique()
(15 lines)symbol_find_by_addr()
(1 line)symbol_iterate()
(7 lines)The approximate line counts shown above are based on our solution and include empty lines and comments.
Please refer to the Files tab for details about each function (you are
mostly interested in the documentation for symbol.h
).
The most challenging part of this assignment is becoming comfortable with the way the data structures are organized. The following resources will help you understand the big picture:
symbol_search()
(13 lines)symbol_add()
(8 lines)symbol_find_by_name()
(9 lines)symbol_reset()
(20 lines)symbol_term()
(4 lines)The approximate line counts shown above are based on our solution and include empty lines and comments.
We encourage you to use Fritz Sieker's debugging framework. He has provided a
function called debug()
. You don't have to include any extra header
files. You can use this function like a printf. The difference is that the
debug()
function will print stuff only if you run your program with
the -debug
option as the first argument. This way, if you don't
want to see the debugging output, you don't have to remove the calls to
debug()
. You can simply remove the -debug
option from
the command line. We have provided you with an example of how to call the
debug()
function in the symbol_init
function. To see
the output, run the program as follows:
./testSymbol -debug 20
DEBUG symbol.c[59] symbol_init() symbol_init was called with table_size = 20
You don't have to remove the debug()
calls before submitting
your program because we will run it without the -debug
option.
However, if you choose not to use
this framework, make sure to remove all statements that print something before
submitting. In other words, if I were to run your program without the
-debug
option, I should not see any extra output. Otherwise, the
autograder won't be able to match your output to the expected output.
Valgrind is a popular program that allows you to track memory errors (such as segmentation faults) and memory leaks. To run your program through Valgrind and check for memory leaks, use the following command:
valgrind --leak-check=yes ./testSymbol size
Here is an example. I ran my program and got the following:
./testSymbol 20
add askbjs 22
Segmentation fault (core dumped)
A segmentation fault usually indicates that you tried to access memory you weren't supposed to access. I ran my program using Valgrind and got the following (only part of the output is shown):
valgrind --leak-check=yes ./testSymbol 20
==28069== Memcheck, a memory error detector
==28069== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28069== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28069== Command: ./testSymbol 20
==28069==
add askbjs 22
==28069== Invalid read of size 8
==28069== at 0x400C93: symbol_search (symbol.c:106)
==28069== by 0x400D0D: symbol_add (symbol.c:118)
==28069== by 0x401165: main (testSymbol.c:154)
==28069== Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069== at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069== by 0x400A81: symbol_init (symbol.c:60)
==28069== by 0x4010C7: main (testSymbol.c:138)
==28069==
==28069== Invalid read of size 8
==28069== at 0x400B31: symbol_add_unique (symbol.c:75)
==28069== by 0x400D28: symbol_add (symbol.c:119)
==28069== by 0x401165: main (testSymbol.c:154)
==28069== Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069== at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069== by 0x400A81: symbol_init (symbol.c:60)
==28069== by 0x4010C7: main (testSymbol.c:138)
==28069==
==28069== Invalid write of size 8
==28069== at 0x400B54: symbol_add_unique (symbol.c:78)
==28069== by 0x400D28: symbol_add (symbol.c:119)
==28069== by 0x401165: main (testSymbol.c:154)
==28069== Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069== at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069== by 0x400A81: symbol_init (symbol.c:60)
==28069== by 0x4010C7: main (testSymbol.c:138)
==28069==
OK
(You can quit from Valgrind prematurely using Ctrl-C)
It gave me a bunch of invalid reads and writes. I started by looking into the
first invalid read. It shows me the stack trace at the point where the invalid
read occurred: main
called symbol_add
and
symbol_add
called symbol_search
. So, the invalid read
occurred on line 106 of my symbol.c file. It also shows me the cause for the
invalid read. It says: "Address 0x51fa0b8 is 16 bytes after a block of size 8
alloc'd." This is basically saying that I previously allocated a block of 8
bytes and I'm trying to access something out of bounds. It even tells me where
in my code I allocated that block: line 60 of symbol.c. I went to that line and,
sure enough, I realized that I hadn't allocated enough space for my hash table.
I corrected the error and re-ran with Valgrind:
valgrind --leak-check=yes ./testSymbol 20
==28136== Memcheck, a memory error detector
==28136== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28136== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28136== Command: ./testSymbol 20
==28136==
add askbjs 22
OK
quit
==28136==
==28136== HEAP SUMMARY:
==28136== in use at exit: 0 bytes in 0 blocks
==28136== total heap usage: 6 allocs, 6 frees, 525,535 bytes allocated
==28136==
==28136== All heap blocks were freed -- no leaks are possible
==28136==
==28136== For counts of detected and suppressed errors, rerun with: -v
==28136== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I no longer get memory errors. Also, note that it says "All heap blocks were
freed -- no leaks are possible." This means that my symbol_term
function freed all dynamically allocated memory correctly. It's important to
note that you have to try your program with different test cases. One single run
of Valgrind does not guarantee that your program is correct. Also, note that
the program must terminate before Valgrind shows you information about memory
leaks.
Valgrind will come in handy when you get a segmentation fault or when you
want to test your symbol_reset
and symbol_term
functions.
Here's what a run with memory leaks would look like:
valgrind --leak-check=yes ./testSymbol 20
==28299== Memcheck, a memory error detector
==28299== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28299== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28299== Command: ./testSymbol 20
==28299==
add test 1
OK
add test2 2
OK
quit
==28299==
==28299== HEAP SUMMARY:
==28299== in use at exit: 11 bytes in 2 blocks
==28299== total heap usage: 8 allocs, 6 frees, 525,571 bytes allocated
==28299==
==28299== 11 bytes in 2 blocks are definitely lost in loss record 1 of 1
==28299== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==28299== by 0x4EC0079: strdup (in /usr/lib64/libc-2.23.so)
==28299== by 0x400AFF: symbol_add_unique (symbol.c:72)
==28299== by 0x400D2B: symbol_add (symbol.c:119)
==28299== by 0x401158: main (testSymbol.c:154)
==28299==
==28299== LEAK SUMMARY:
==28299== definitely lost: 11 bytes in 2 blocks
==28299== indirectly lost: 0 bytes in 0 blocks
==28299== possibly lost: 0 bytes in 0 blocks
==28299== still reachable: 0 bytes in 0 blocks
==28299== suppressed: 0 bytes in 0 blocks
==28299==
==28299== For counts of detected and suppressed errors, rerun with: -v
==28299== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
This is telling me that I didn't free something that I allocated in line 72 of my symbol.c.
Thoroughly test your code using the test driver and Valgrind. You may find it convenient to build a file containing a series of commands to execute, then use this file to test your program without having to type in commands over and over. Execute
./testSymbol size < testFile
The <
operator in this context means that the contents of
testFile
are going to be redirected to the stdin
stream
of the program. You are effectively automating user input.
Preliminary testing (15 points):
symbol_add_unique()
and symbol_find_by_addr()
with 1 symbol (4 points)
symbol_add_unique()
and symbol_find_by_addr()
with multiple symbols (no duplicate names or addresses) and no hash table collisions (4 points)
symbol_add_unique()
and symbol_find_by_addr()
with multiple symbols (no duplicate names but with duplicate addresses) and no hash table collisions (4 points)
symbol_add_unique()
,
symbol_find_by_addr()
, and symbol_iterate()
.
Due to the nature of auto-grading, there is a potential for double
penalization. In other words, if you got symbol_add_unique()
wrong, then you probably will output the wrong thing for the other functions.
Therefore, test your program well.In part B, we will re-test part A. Hence, the total number of points for P4 is 150: 50 for part A which was due on 2/14, 50 for the re-test of part A, and 50 for the part B functions.
Preliminary testing (27 points):
symbol_add_unique()
and symbol_search()
with 1 symbol (4 points)
symbol_add_unique()
and symbol_search()
with multiple symbols (no duplicate names or addresses) and no hash table collisions (4 points)
symbol_add_unique()
and symbol_search()
with multiple symbols (no duplicate names but with duplicate addresses) and no hash table collisions (4 points)
Final testing includes all preliminary testing.
symbol.c
in the Checkin tab of the
course web page.