Computer Science C++ Programming

For this assignment, write a class named HashTable that implements a hash table using linear probing to resolve collisions. Submit your solution in it’s own header file named HashTable.h.Your solution should include the following methods in it’s public interface:insert() – accepts a string as it’s only argument, which is added to the hash table, setting the element’s mark where it is stored to 2.remove() – accepts a string as it’s only argument, and removes the string from the hash table (if found) by setting it’s mark to 1.isFull() – returns true if the hash table is full, false otherwise.isEmpty() – returns true if the hash table is empty, false otherwise.find() – accepts a string as it’s only argument. Returns true if it is found in the hash table, false otherwise.clear() – empties the hash table by setting each elements mark to 0.print() – displays the contents of the hash table to the screen. That is, only the elements that are being used (marked with 2)constructor – accepts an integer as it’s only argument, creating a hash table that can store that many values (a dynamically allocated array of Elements).destructor – frees all memory used.Other than the print() method, none of these methods should interact with the user in any way.Here’s a list of private attributes:Element – a nested struct containing the following fields:key – Stores the string passed to the object through the insert method.mark – an variable set to 2 if the element is being used, 1 if it was used but then was deleted, and 0 if it was never used.table – an Element pointer that holds the memory address of a dynamically allocated array of Elements.size – stores the size of the Element array.hash – The hash function. It accepts a String as it’s only argument and returns an index into the hash table. Design your hash function any way you wish. I gave a couple suggestions in class. My solution will use the sum of ascii values approach.
hash_table_algorithms.docx

Unformatted Attachment Preview

Don't use plagiarized sources. Get Your Custom Essay on
Computer Science C++ Programming
Just from $13/Page
Order Essay

Hash Table Algorithms
Introduction
A hash table is a data structure that implements an associative array data type.
An associative array is also known as a map, symbol table, or dictionary. These
things allow for data to be mapped to keys.
For example, you could have the following values: Mary, had, a, little,
lamb. Then, you could associate integers 0, 1, 2, 3, 4 with each one of them to
create a key, value pair.
That would give us:
0, Mary
1, had
2, a
3, little
4, lamb
The hash table is the most frequent implementation of an associate array. It
uses a special function called a hash function that maps a “key” value onto some
other data.
Each key gets dumped into something called a bucket. Buckets are stored in an
array. In a simple implementation, each element is a bucket. So, the
combination of an array and the hash function is key to the implementation.
The cool thing about this data structure is that if the hash function is well written,
it allows for you to usually access data in constant time, O(1).
Another cool thing is you can choose anything to access pieces of data. Instead
of using integers, as in the above example, you could use strings. For example:
alpha, Mary
beta, had
charlie, a
delta, little
echo, lamb
Hash Functions
The hash function is a special function that takes some value, called a key, as an
argument and then converts that into an integer. That integer is then used to
identify one of the buckets in which to store data. We can call this integer the
hash index.
So, assume a scenario where you want to store some data in a hash
function. To keep it simple, for this example we’ll assume we want to just store
the key itself. Assume the key we want to store is the string “Mary”. What we
have to do is use our hash function to select a bucket in which to store Mary.
So, our hash function has to do something like this:
“Mary” —>
[HASH FUNCTION]
—> 4
So that Mary is the input to the hash function, and then we get an output of
4. This means that Mary would be stored in the third element (bucket) of the
array we are using to implement our hash table.
[ ? | ? | ? | ? | “Mary” ]
0 1 2 3
4
So, we use the hash function to map a key to a bucket. The hash function is
used when storing values, searching for values, or removing values from the
hash table.
The process of converting a hash key to a hash index is called hashing.
Writing Hash Functions
So, how could we write the hash function? If the key is a string, as in the above
example, there are many ways you could do it, but here’s one. What we could do
is sum up all the ascii codes of the string itself and then use modulus division to
help us identify a bucket.
So,
M = 77
a = 97
r = 114
y = 121
———409
Then, what we’ll do is we’ll use that and modulus using the number of buckets to
map Mary to an array subscript:
409 % 5 = 4
So, then Mary will go into the 5th element (bucket).
This is just one example, and there are many ways of doing it. This
implementation is pretty slow, O(n), since we have to add up each character’s
ascii value. The number of additions is going to vary based on the number of
characters in the string. Not a very good hash function, as it turns out, but it
works. Ideally, it should perform in O(1). More on this later.
Collisions
What happens if after storing Mary we want to store another key in the table that
“hashes” to the same bucket? For example, what if we also wanted to store
Mart? Do the math, and you’ll see we end up with the same hash index as we
had for Mary. Whenever this happens, it’s called a collision.
Now, our array of buckets has 5 elements, but we’re only trying to add 2. Does
that mean we just replace Mary with Mart? Not necessarily. We can go through
a process known as collision resolution.
To resolve collisions, there are two general approaches: open addressing and
chaining.
Open addressing
In this approach, all the data we want to store in the hash table is stored in the
buckets themselves, as you probably expect. The big idea with this strategy is to
scan the array of buckets in some sequence until an “unused” bucket is found
where we can store the data.
This act of scanning is also known as probing of which there are many
types. Let’s examine: linear probing, quadratic probing, random probing, and
rehashing.
Linear probing
In this approach, we’ll just look at the following bucket to see if it’s unused. So, in
our previous example, we’d just move onto the next bucket, which would be at
element 0 (remember circular arrays from queues?). So, we’ll stick Mart there:
[ Mart | ? | ? | ? | Mary ]
0
1 2 3
4
Now, if we need to add a third value, say Maro, we’d get another collision at hash
index 4. So, linear probing kicks in and we check the following bucket at index
0. Since that bucket is used, we then check the following bucket at index 1. That
one is free, so we stick Maro there:
[ Mart | Maro | ? | ? | Mary ]
0
1
2 3
4
Problem solved!
Linear probing does have a few problems though. First, consider the
performance implications. When we added Mary, that was a really fast operation
– all we had to do was assign Mary to bucket 4. A simple assignment is O(1),
constant time. But what about when we stored Maro? We had to go element by
element, searching for an empty space. Sound familiar? In that case, our
algorithm for resolving collisions degenerates into a linear search, O(n)!
The reason this happens is because we end up with something called a
cluster. A cluster is what you get when multiple values are stored adjacent to
each other in a hash table. As you can see, clusters result in severe
performance degradation as we ended up having to perform a linear search to
find an open bucket.
The more clusters, the bigger the cluster, the more performance
suffers. Combating clustering lead to the other probing methods.
Quadratic Probing
Quadratic probing is a simple attempt at minimizing clusters. The big idea here
is instead of just moving the next bucket, let’s move forward in different
increments. How far forward? Use some sort of a quadratic equation to
determine that.
Whereas with linear probing, if h is the original hash index, we have a sequence
like:
h + 0, h + 1, h + 2, h + 3, … , h + i
which is just adding 1 over and over,
we could have a sequence like this:
h + 0, h + 1, h + 4, h + 9, h+ 16, … , h + i2
In the second sequence, instead of just adding i, we’re adding i2.
Random Probing
With random probing, instead of using a fixed value such as i or i2, a pseudorandom number generator is employed. By seeding the random number
generator with a predetermined seed value, you can produce a repeatable
sequence of numbers to break up the clusters.
Double Hashing
In this method, a second hash function is used to hash the original hash
key. The results of both hashings are combined to determine the next bucket to
examine. For example:
h(i, k) = h1(i) + i * h2(k) % array_size
Relatively Prime Requirement
All of these probing methods require the value added to the original index and
the size of the array holding the buckets be relatively prime. Two numbers are
said to be relatively prime if two numbers have no common factors other than
1. In other words, both the numbers can only be divided evenly by 1. Why?
Consider linear probing. Let’s say instead of adding 1 each time, we add 2 and
the array was exactly 6 elements long.
Instead of the results of our rehashing giving us these values:
1 2 3 4 5 0
We would get:
2 4 0
See the problem? We have gaps in our table.
What if we add 5 each time? Well, we’re back to the array size and the amount
added each time to being relatively prime:
5 4 3 2 1 0
Searching the Hash Table
Ok, so everything we have discussed so far works great for inserting values to
the hash table and resolving collisions. How would it work with searching for
values in the hash table? We can re-use everything we’ve learned so far about
hashing and collision resolution to help us search the table.
In this case, say we use linear probing. If you want to see if Maro is in the table,
then pass Maro to your hash function. The hash function will then tell you which
bucket(4) to look in first. If the value in that bucket is a match, return true, you
found it. Otherwise, use your collision resolution technique to check the next
bucket(0). Match, return true. Otherwise, check the next
bucket(1). Match? Return true. Move to the next bucket(2). Reach an unused
bucket? Return false.
[ Mart | Maro | ? | ? | Mary ]
0
1
2 3
4
Simple enough, but there’s a problem. Suppose Mart was “deleted” before you
searched?
[ ? | Maro | ? | ? | Mary ]
0
1
2 3
4
That’s going to leave us a gap, isn’t it? We check bucket 4, no Maro. Move to
the next bucket, bucket 0. It’s empty now, so that means we didn’t find what
we’re looking for so Maro’s not in the table right? Obviously wrong.
A solution to this problem is to mark each bucket. If you think about it, there are
going to be 3 possibilities for each bucket. Either it is currently being used, it has
never been used, or it was once used but isn’t being used any more. Let’s refer
to each of these states as used, empty, or deleted.
Now, we can modify our search so that instead of stopping at a “deleted” bucket,
we’ll keep looking. So now, our search goes something like this: Check bucket
4. Used, but no Maro. Move to bucket 0. Deleted, so nothing here. Move on to
bucket 1. Used and Maro is here. Found it!
Problem solved.
So, consider what an insert and remove algorithm must do.
For an insertion algorithm:
•
•
•
It has to find the bucket to store the data in.
Resolve any collisions using a probing technique.
If a deleted or unused bucket is found, store the data and mark it used.
For a remove algorithm:
•
•
•
It has to find a bucket containing the value to remove.
Solve any collisions using a probing technique.
If a bucket is found containing the value to remove, mark the bucket as
deleted.
Open Addressing Algorithms
Let’s define a bucket using a struct:
struct Bucket
{
string key;
int mark;
// any other data you’d want to store
};
We could let:
0 = unused
1 = used
2 = deleted
You could use enumerated data types to represent these things, named
constants, or just plain old integer literals.
We could then define an array of buckets:
Bucket table[SIZE];
Be sure to initialize each element’s used field to unused (0).
Assume a hash function named hash that accepts a key as it’s only argument
and returns a valid hash index. The definition of the hash function is left as an
exercise.
So, then…
Insert
Where table is the array of buckets, key is the key to store in the table, and size
is the number of buckets, we have:
Insert(table, key, size)
IF NOT full:
hi <- hash(key) WHILE table[hi].mark == 1: hi <- (hi + 1) MOD size table[hi].key <- key table[hi].mark <- 1 // 1 = used // linear probing // that bucket has the key now // that bucket is used now When is the table full? Should be obvious, if all buckets are marked used, the table is full. Search Same as above, where table is the array of buckets, key is the value to search for, and size is the number of buckets in the table. Then, Search(table, key, size) IF NOT empty: hi <- hash(key) // start looking here found <- false // haven't found it yet i <- 0 // counter to let us know when we've checked all buckets WHILE hi != 0 AND NOT found AND i < size: // keep looking IF table[hi].mark = 1 AND table[hi].key = key: // found it found <- true hi <- (hi + 1) MOD size // linear probing i <- i + 1 // we've checked a bucket, increment i RETURN found // what's the result of the search? When is a hash table empty? Again, should be obvious. When all buckets are marked as either deleted or unused. Remove Remove is similar to the search algorithm. It's just if we find the key we want to remove, we just mark it's bucket as deleted (2). Again, as above, where table is the array of buckets, key is the key we're hoping to remove, and size is the number of buckets in the array, then... Remove(table, key, size) IF NOT empty: hi <- hash(key) // start looking here i <- 0 // the counter again WHILE i < size AND table[hi] != 0: IF table[hi].key = key: table[hi].mark <- 2 // deleted BREAK i <- i + 1 // increment i hi <- (hi + 1) MOD size // linear probing Now, if you have a decent understanding of the algorithms and the data structure, then it should be easy to see how you would modify the above algorithms to use something other than linear probing to resolve collisions. Chaining Chaining, also known as Separate Chaining, takes a different approach. In this approach, each bucket is completely independent of the others and uses some sort of list to maintain multiple entries that share the same key or hash index. In short, the big idea is this : an array of linked list objects. With this approach, each element of the array (bucket) is either the head pointer for a linked list, or a linked list object. You still use the hash function as before, but instead of just storing the data in the bucket, you store it in the linked list attached to that bucket. Let's say you re-use the MyList class you wrote earlier. Then, you could create an array of MyList objects: MyList buckets[SIZE]; Then, whenever it's time to add something to the hash table, say the key "Mary", call your hash function as before: hi = hash("Mary"); Then use the hash index returned to identify which MyList object to invoke append on: buckets[hi].append("Mary"); That's it. No probing at all, just append the item to the list associated with the bucket. Want to search for a value in the table? Use the hash function again and then call your search method: hi = hash("Mary"); found = buckets[hi].search("Mary"); Performance Considerations How well this method of collision resolution performs is going to depend on the hash function and the linked list implementation it's based on. An ideal hash function would distribute keys to buckets uniformly (rarely is there such a thing as an idea hash function). It would also execute in constant time, O(1). Appending the first value to a linked list is going to happen in constant time (it's empty). So an ideal hash function adding a key to an empty bucket would execute overall in constant time, O(1). Very fast. Now, let's say the hash function isn't very good, and can't achieve a uniform distribution of keys. Let's also say there are more keys to store than buckets. Well, then, you're going to end up with multiple keys stored in a single linked list. In that case, if the append algorithm is O(n), then inserting values into the hash table is going to degenerate into a O(n) operation. How to Increase Performance There's this idea in when talking about hash tables known as "load factor". The load factor is a ratio of the number of keys you are storing into the table to the number of buckets in the table. A load factor of 1 is ideal. This means that you have exactly one key per bucket. This means that searching, inserting, and removing keys will be very fast, O(1). So, anything you can do to get as close as possible to a load of factor of 1 is what you want to do. Here are a couple of ideas of what you could do to get closer to that ideal load factor. If you have 10 buckets and expect to store 20 keys, make your array twice as big. If your hash function is consistently storing keys in the same three buckets, you need a different hash function. Here's another consideration. What do you expect to be the most common interaction with your hash table? Do you need to add keys to the table as quickly as possible? Instead of an array of linked lists, how about an array of stacks? If you speed up the structures attached to each bucket for a particular operation, you're going to speed up the overall performance of the data structure. Dynamic Resizing So, last topic. This technique can be useful for both types of collision resolution, linear probing and chaining. With this technique a certain threshold is identified, say the hash table is 75% full, or you've reached some other limit as to the number of keys that are currently in the table, or maybe your load factor gets to 3. When that threshold is hit, then we can dynamically allocate a new array of buckets that's maybe twice the size as the existing bucket. Then, we scan through the old bucket array, and then add them to the new bucket array, rehashing as we go. Then delete the old bucket array and keep using the new one. This has the benefit of spreading the keys around. It should break up clusters that form when using linear probing and shorten the lengths of the linked lists when using chaining. That should move your load factor back to 1. ... Purchase answer to see full attachment

GradeAcers
Calculate your paper price
Pages (550 words)
Approximate price: -

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.

Order your essay today and save 15% with the discount code DISCOUNT15