Hash Lab

In this lab, we will look at a simple hashing container. The files are in ~cs253/Lab/Hash. Copy them somewhere local. Compile like this: g++ -std=c++17 -Wall main.cc, which must not produce any warnings or errors.

You will create a tar file called results.tar, and turn it in for credit.

A Hash Template

In main.cc, we have a program that uses a templated container called hset (for hash set), which is implemented in the other files. An hset is much like a set—it takes a single (for now) template argument, that is, the type to be contained. An hset is not the standard C++ container unordered_set, though they are both hash tables.

Compile main.cc and run it. Observe that the order of the output is not the same as the order of the input, because the hash table stores the data in hashed order.

Understanding the code

Look at hset.h, hset_iter.h, and hasher.h, and understand the following:

class hset: a hash table implemented as a vector of lists.
class hset_iter: an iterator class used by hset, containing a reference to the vector, and an integer.
class Hasher: a class that only defines functors.

Exercises

An hset is a vector of lists, and the vector is stuck at size 5. Change that size to be an optional second template parameter (not a constructor argument), which defaults to 5. We don’t have dynamic rehashing, but at least the user can set a reasonable size for the vector.
Label your change with // Exercise 1.
Add .empty() to hset. Make it more efficient than just .size()==0, without adding a data member to keep track of the size. Keep in mind that .empty() doesn’t need to calculate the size—it just has to know if the container is empty.
Label your changes with // Exercise 2.
Add an hset<double> to main(), and insert the values (1.1, 1.2, 1.3, …, 1.8, 1.9). Observe that it works poorly, because it calls Hasher::operator(int), and the doubles just get truncated. Improve Hasher to hash doubles better. It must not put all of (1.1 … 1.9) in the same bucket.
Label your changes with // Exercise 3.
The design of hset_iter is horrible. All that it has is an index into the entire hash table, so, whenever hset_iter::operator*() is called, it has to walk the vector, looking for which list contains this item. The vector can be very large for a big hash table.
Change hset_iter so that, instead of having only a single index into the entire hash table, it has two indices: one that indexes into the vector, and another that indexes into the list. We can index into a vector quickly, using []. Indexing into a list is slow, but we can assume that each individual list is small. Both indices start at 0. operator++ increments the list index, and if that goes past the end of the list, change the vector index to next list and reset the list index to 0. What should you do if the next list is empty?
I suggest that you name the two indices vector_index and list_index. I can hear you thinking: “Those names are much too long, and, besides, I already I already have the variable num. I’ll just re-use num and add another index called something short, like n.” Really? Consider how much time you’ll spend trying to remember the difference between num and n, which is which? However, you saved nearly ten seconds on typing. Good work!
Make sure that your new iterator works for an hset that doesn’t have anything in the first bucket. Don’t just assume it—try it.
We will assume that this changes many places in hset_iter.h, so don’t bother labeling those. However, label your changes in hset.h with // Exercise 4.
Create the tar file results.tar. Type carefully, or copy & paste!

tar -cvf results.tar hasher.h hset.h hset_iter.h main.cc

How to submit your work:

In Canvas, check in the file results.tar to the assignment “Lab14”. It’s due 11:59ᴘᴍ MT Saturday, with a five-day late period.

How to receive negative points:

Turn in someone else’s work.

CS253: Software Development with C++