For this assignment, write a class named HashTable that implements a hash table using linear probing to resolve collisions. Submit your solution in it’s own header file named HashTable.h.Your solution should include the following methods in it’s public interface:insert() – accepts a string as it’s only argument, which is added to the hash table, setting the element’s mark where it is stored to 2.remove() – accepts a string as it’s only argument, and removes the string from the hash table (if found) by setting it’s mark to 1.isFull() – returns true if the hash table is full, false otherwise.isEmpty() – returns true if the hash table is empty, false otherwise.find() – accepts a string as it’s only argument. Returns true if it is found in the hash table, false otherwise.clear() – empties the hash table by setting each elements mark to 0.print() – displays the contents of the hash table to the screen. That is, only the elements that are being used (marked with 2)constructor – accepts an integer as it’s only argument, creating a hash table that can store that many values (a dynamically allocated array of Elements).destructor – frees all memory used.Other than the print() method, none of these methods should interact with the user in any way.Here’s a list of private attributes:Element – a nested struct containing the following fields:key – Stores the string passed to the object through the insert method.mark – an variable set to 2 if the element is being used, 1 if it was used but then was deleted, and 0 if it was never used.table – an Element pointer that holds the memory address of a dynamically allocated array of Elements.size – stores the size of the Element array.hash – The hash function. It accepts a String as it’s only argument and returns an index into the hash table. Design your hash function any way you wish. I gave a couple suggestions in class. My solution will use the sum of ascii values approach.

hash_table_algorithms.docx

Unformatted Attachment Preview

Don't use plagiarized sources. Get Your Custom Essay on

Computer Science C++ Programming

Just from $13/Page

Hash Table Algorithms

Introduction

A hash table is a data structure that implements an associative array data type.

An associative array is also known as a map, symbol table, or dictionary. These

things allow for data to be mapped to keys.

For example, you could have the following values: Mary, had, a, little,

lamb. Then, you could associate integers 0, 1, 2, 3, 4 with each one of them to

create a key, value pair.

That would give us:

0, Mary

1, had

2, a

3, little

4, lamb

The hash table is the most frequent implementation of an associate array. It

uses a special function called a hash function that maps a “key” value onto some

other data.

Each key gets dumped into something called a bucket. Buckets are stored in an

array. In a simple implementation, each element is a bucket. So, the

combination of an array and the hash function is key to the implementation.

The cool thing about this data structure is that if the hash function is well written,

it allows for you to usually access data in constant time, O(1).

Another cool thing is you can choose anything to access pieces of data. Instead

of using integers, as in the above example, you could use strings. For example:

alpha, Mary

beta, had

charlie, a

delta, little

echo, lamb

Hash Functions

The hash function is a special function that takes some value, called a key, as an

argument and then converts that into an integer. That integer is then used to

identify one of the buckets in which to store data. We can call this integer the

hash index.

So, assume a scenario where you want to store some data in a hash

function. To keep it simple, for this example we’ll assume we want to just store

the key itself. Assume the key we want to store is the string “Mary”. What we

have to do is use our hash function to select a bucket in which to store Mary.

So, our hash function has to do something like this:

“Mary” —>

[HASH FUNCTION]

—> 4

So that Mary is the input to the hash function, and then we get an output of

4. This means that Mary would be stored in the third element (bucket) of the

array we are using to implement our hash table.

[ ? | ? | ? | ? | “Mary” ]

0 1 2 3

4

So, we use the hash function to map a key to a bucket. The hash function is

used when storing values, searching for values, or removing values from the

hash table.

The process of converting a hash key to a hash index is called hashing.

Writing Hash Functions

So, how could we write the hash function? If the key is a string, as in the above

example, there are many ways you could do it, but here’s one. What we could do

is sum up all the ascii codes of the string itself and then use modulus division to

help us identify a bucket.

So,

M = 77

a = 97

r = 114

y = 121

———409

Then, what we’ll do is we’ll use that and modulus using the number of buckets to

map Mary to an array subscript:

409 % 5 = 4

So, then Mary will go into the 5th element (bucket).

This is just one example, and there are many ways of doing it. This

implementation is pretty slow, O(n), since we have to add up each character’s

ascii value. The number of additions is going to vary based on the number of

characters in the string. Not a very good hash function, as it turns out, but it

works. Ideally, it should perform in O(1). More on this later.

Collisions

What happens if after storing Mary we want to store another key in the table that

“hashes” to the same bucket? For example, what if we also wanted to store

Mart? Do the math, and you’ll see we end up with the same hash index as we

had for Mary. Whenever this happens, it’s called a collision.

Now, our array of buckets has 5 elements, but we’re only trying to add 2. Does

that mean we just replace Mary with Mart? Not necessarily. We can go through

a process known as collision resolution.

To resolve collisions, there are two general approaches: open addressing and

chaining.

Open addressing

In this approach, all the data we want to store in the hash table is stored in the

buckets themselves, as you probably expect. The big idea with this strategy is to

scan the array of buckets in some sequence until an “unused” bucket is found

where we can store the data.

This act of scanning is also known as probing of which there are many

types. Let’s examine: linear probing, quadratic probing, random probing, and

rehashing.

Linear probing

In this approach, we’ll just look at the following bucket to see if it’s unused. So, in

our previous example, we’d just move onto the next bucket, which would be at

element 0 (remember circular arrays from queues?). So, we’ll stick Mart there:

[ Mart | ? | ? | ? | Mary ]

0

1 2 3

4

Now, if we need to add a third value, say Maro, we’d get another collision at hash

index 4. So, linear probing kicks in and we check the following bucket at index

0. Since that bucket is used, we then check the following bucket at index 1. That

one is free, so we stick Maro there:

[ Mart | Maro | ? | ? | Mary ]

0

1

2 3

4

Problem solved!

Linear probing does have a few problems though. First, consider the

performance implications. When we added Mary, that was a really fast operation

– all we had to do was assign Mary to bucket 4. A simple assignment is O(1),

constant time. But what about when we stored Maro? We had to go element by

element, searching for an empty space. Sound familiar? In that case, our

algorithm for resolving collisions degenerates into a linear search, O(n)!

The reason this happens is because we end up with something called a

cluster. A cluster is what you get when multiple values are stored adjacent to

each other in a hash table. As you can see, clusters result in severe

performance degradation as we ended up having to perform a linear search to

find an open bucket.

The more clusters, the bigger the cluster, the more performance

suffers. Combating clustering lead to the other probing methods.

Quadratic Probing

Quadratic probing is a simple attempt at minimizing clusters. The big idea here

is instead of just moving the next bucket, let’s move forward in different

increments. How far forward? Use some sort of a quadratic equation to

determine that.

Whereas with linear probing, if h is the original hash index, we have a sequence

like:

h + 0, h + 1, h + 2, h + 3, … , h + i

which is just adding 1 over and over,

we could have a sequence like this:

h + 0, h + 1, h + 4, h + 9, h+ 16, … , h + i2

In the second sequence, instead of just adding i, we’re adding i2.

Random Probing

With random probing, instead of using a fixed value such as i or i2, a pseudorandom number generator is employed. By seeding the random number

generator with a predetermined seed value, you can produce a repeatable

sequence of numbers to break up the clusters.

Double Hashing

In this method, a second hash function is used to hash the original hash

key. The results of both hashings are combined to determine the next bucket to

examine. For example:

h(i, k) = h1(i) + i * h2(k) % array_size

Relatively Prime Requirement

All of these probing methods require the value added to the original index and

the size of the array holding the buckets be relatively prime. Two numbers are

said to be relatively prime if two numbers have no common factors other than

1. In other words, both the numbers can only be divided evenly by 1. Why?

Consider linear probing. Let’s say instead of adding 1 each time, we add 2 and

the array was exactly 6 elements long.

Instead of the results of our rehashing giving us these values:

1 2 3 4 5 0

We would get:

2 4 0

See the problem? We have gaps in our table.

What if we add 5 each time? Well, we’re back to the array size and the amount

added each time to being relatively prime:

5 4 3 2 1 0

Searching the Hash Table

Ok, so everything we have discussed so far works great for inserting values to

the hash table and resolving collisions. How would it work with searching for

values in the hash table? We can re-use everything we’ve learned so far about

hashing and collision resolution to help us search the table.

In this case, say we use linear probing. If you want to see if Maro is in the table,

then pass Maro to your hash function. The hash function will then tell you which

bucket(4) to look in first. If the value in that bucket is a match, return true, you

found it. Otherwise, use your collision resolution technique to check the next

bucket(0). Match, return true. Otherwise, check the next

bucket(1). Match? Return true. Move to the next bucket(2). Reach an unused

bucket? Return false.

[ Mart | Maro | ? | ? | Mary ]

0

1

2 3

4

Simple enough, but there’s a problem. Suppose Mart was “deleted” before you

searched?

[ ? | Maro | ? | ? | Mary ]

0

1

2 3

4

That’s going to leave us a gap, isn’t it? We check bucket 4, no Maro. Move to

the next bucket, bucket 0. It’s empty now, so that means we didn’t find what

we’re looking for so Maro’s not in the table right? Obviously wrong.

A solution to this problem is to mark each bucket. If you think about it, there are

going to be 3 possibilities for each bucket. Either it is currently being used, it has

never been used, or it was once used but isn’t being used any more. Let’s refer

to each of these states as used, empty, or deleted.

Now, we can modify our search so that instead of stopping at a “deleted” bucket,

we’ll keep looking. So now, our search goes something like this: Check bucket

4. Used, but no Maro. Move to bucket 0. Deleted, so nothing here. Move on to

bucket 1. Used and Maro is here. Found it!

Problem solved.

So, consider what an insert and remove algorithm must do.

For an insertion algorithm:

It has to find the bucket to store the data in.

Resolve any collisions using a probing technique.

If a deleted or unused bucket is found, store the data and mark it used.

For a remove algorithm:

It has to find a bucket containing the value to remove.

Solve any collisions using a probing technique.

If a bucket is found containing the value to remove, mark the bucket as

deleted.

Open Addressing Algorithms

Let’s define a bucket using a struct:

struct Bucket

{

string key;

int mark;

// any other data you’d want to store

};

We could let:

0 = unused

1 = used

2 = deleted

You could use enumerated data types to represent these things, named

constants, or just plain old integer literals.

We could then define an array of buckets:

Bucket table[SIZE];

Be sure to initialize each element’s used field to unused (0).

Assume a hash function named hash that accepts a key as it’s only argument

and returns a valid hash index. The definition of the hash function is left as an

exercise.

So, then…

Insert

Where table is the array of buckets, key is the key to store in the table, and size

is the number of buckets, we have:

Insert(table, key, size)

IF NOT full:

hi <- hash(key)
WHILE table[hi].mark == 1:
hi <- (hi + 1) MOD size
table[hi].key <- key
table[hi].mark <- 1
// 1 = used
// linear probing
// that bucket has the key now
// that bucket is used now
When is the table full? Should be obvious, if all buckets are marked used, the
table is full.
Search
Same as above, where table is the array of buckets, key is the value to search
for, and size is the number of buckets in the table. Then,
Search(table, key, size)
IF NOT empty:
hi <- hash(key) // start looking here
found <- false // haven't found it yet
i <- 0 // counter to let us know when we've checked all buckets
WHILE hi != 0 AND NOT found AND i < size: // keep looking
IF table[hi].mark = 1 AND table[hi].key = key: // found it
found <- true
hi <- (hi + 1) MOD size // linear probing
i <- i + 1 // we've checked a bucket, increment i
RETURN found // what's the result of the search?
When is a hash table empty? Again, should be obvious. When all buckets are
marked as either deleted or unused.
Remove
Remove is similar to the search algorithm. It's just if we find the key we want to
remove, we just mark it's bucket as deleted (2).
Again, as above, where table is the array of buckets, key is the key we're hoping
to remove, and size is the number of buckets in the array, then...
Remove(table, key, size)
IF NOT empty:
hi <- hash(key) // start looking here
i <- 0 // the counter again
WHILE i < size AND table[hi] != 0:
IF table[hi].key = key:
table[hi].mark <- 2 // deleted
BREAK
i <- i + 1 // increment i
hi <- (hi + 1) MOD size // linear probing
Now, if you have a decent understanding of the algorithms and the data
structure, then it should be easy to see how you would modify the above
algorithms to use something other than linear probing to resolve collisions.
Chaining
Chaining, also known as Separate Chaining, takes a different approach. In this
approach, each bucket is completely independent of the others and uses some
sort of list to maintain multiple entries that share the same key or hash index.
In short, the big idea is this : an array of linked list objects.
With this approach, each element of the array (bucket) is either the head pointer
for a linked list, or a linked list object. You still use the hash function as before,
but instead of just storing the data in the bucket, you store it in the linked list
attached to that bucket.
Let's say you re-use the MyList class you wrote earlier. Then, you could create
an array of MyList objects:
MyList buckets[SIZE];
Then, whenever it's time to add something to the hash table, say the key "Mary",
call your hash function as before:
hi = hash("Mary");
Then use the hash index returned to identify which MyList object to invoke
append on:
buckets[hi].append("Mary");
That's it. No probing at all, just append the item to the list associated with the
bucket.
Want to search for a value in the table? Use the hash function again and then
call your search method:
hi = hash("Mary");
found = buckets[hi].search("Mary");
Performance Considerations
How well this method of collision resolution performs is going to depend on the
hash function and the linked list implementation it's based on. An ideal hash
function would distribute keys to buckets uniformly (rarely is there such a thing as
an idea hash function). It would also execute in constant time, O(1).
Appending the first value to a linked list is going to happen in constant time (it's
empty). So an ideal hash function adding a key to an empty bucket would
execute overall in constant time, O(1). Very fast.
Now, let's say the hash function isn't very good, and can't achieve a uniform
distribution of keys. Let's also say there are more keys to store than
buckets. Well, then, you're going to end up with multiple keys stored in a single
linked list. In that case, if the append algorithm is O(n), then inserting values into
the hash table is going to degenerate into a O(n) operation.
How to Increase Performance
There's this idea in when talking about hash tables known as "load factor". The
load factor is a ratio of the number of keys you are storing into the table to the
number of buckets in the table.
A load factor of 1 is ideal.
This means that you have exactly one key per bucket. This means that
searching, inserting, and removing keys will be very fast, O(1). So, anything you
can do to get as close as possible to a load of factor of 1 is what you want to do.
Here are a couple of ideas of what you could do to get closer to that ideal load
factor. If you have 10 buckets and expect to store 20 keys, make your array
twice as big. If your hash function is consistently storing keys in the same three
buckets, you need a different hash function.
Here's another consideration. What do you expect to be the most common
interaction with your hash table? Do you need to add keys to the table as quickly
as possible? Instead of an array of linked lists, how about an array of stacks? If
you speed up the structures attached to each bucket for a particular operation,
you're going to speed up the overall performance of the data structure.
Dynamic Resizing
So, last topic. This technique can be useful for both types of collision resolution,
linear probing and chaining. With this technique a certain threshold is identified,
say the hash table is 75% full, or you've reached some other limit as to the
number of keys that are currently in the table, or maybe your load factor gets to
3.
When that threshold is hit, then we can dynamically allocate a new array of
buckets that's maybe twice the size as the existing bucket. Then, we scan
through the old bucket array, and then add them to the new bucket array,
rehashing as we go. Then delete the old bucket array and keep using the new
one.
This has the benefit of spreading the keys around. It should break up clusters
that form when using linear probing and shorten the lengths of the linked lists
when using chaining. That should move your load factor back to 1.
...
Purchase answer to see full
attachment

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.