Homework 5 - Cuckoo Hashing
Due: Friday, Oct 24th, at 10:00pm
- Starter code: cuckoo_hashing.zip
 - Upload solutions via Gradescope as: 
CuckooHashMap.kt 
Goals
This assignment is designed to help you with the following:
- practice with hashing fundamentals
 - practice extending basic data structures for more complex functionality
 
Collaboration policy
For this assignment, you may work alone or with a partner, but you must type up all of the code yourself. (It is therefore unexpected for two code submissions to be completely identical.) You may have at most one partner.
You may also discuss the assignment at a high level with other students. You can discuss the assignment with Anya, our prefect, or any course staff.
You should list any student or course staff with whom you discussed the assignment (you don’t have to include prefect sessions), and the manner of discussion (high level, partner, etc.) in comments at the top of your CuckooHashMap.kt file.  You also need to include your name.
If you work alone, you should say so instead.
Note on style:
The following style guidelines are expected moving forward:
- You submit only the requested file(s).
 - Variable names should be clear and easy to understand, should not start with a capital letter, and should only be a single letter when appropriate (e.g., 
iorjas indices,xandyas coordinates). - The indentation, spacing, and location of curly braces is consistent.
 - There are no extra print statements in your program.
 - There is no commented-out code that is no longer being used (e.g., debugging print statements).
 - Your program has a comment at the top of each function with at least 5 words describing what the function does.
 - There is good use of comments to explain the “why” of your code and empty lines to break up logical chunks.
 - The code doesn’t have “magic numbers” – these are numbers in the code by themselves, rather than stored in a variable.
 - The code doesn’t compute the right answer by doing extra work (e.g., having a computation in a loop when it could have been done only once, after the loop).
 - Helper functions are used to reduce code duplication.
 
Assessment
The core requirements for your submission are:
- complete Parts A, B, and E
    
- Basic Hashing
 - Basic Cuckoo Hashing
 - Reflection
 
 - not make use of any built-in hashing-related libraries
 - satisfy the code style expectations #0-#4 above
 - include a reflection in comments at the bottom of your 
CuckooHashMap.ktfile 
The advanced requirements for your submission are:
- satisfy the core requirements
 - complete all of Parts A-E
    
- Code Refactoring
 - Complete Cuckoo Hashing
 
 - satisfy all code style expectations listed above (#0-#9)
 - include your name and collaboration statement at the top of your 
CuckooHashMap.ktfile 
Assignment overview
Hashing in an array, also known as “open addressing”, is one of the main approaches to implementing a hashtable. It’s the one that Python uses for its built-in dictionaries. As we discussed in class, one primary disadvantage of open addressing is the possibility of collisions. This issue is typically resolved by some form of probing (e.g., linear or quadratic probing). That works, but it could result in large clusters that cause slow inserts and lookups.
Cuckoo hashing is an alternative approach to probing when implementing open addressing. It guarantees that you can find a value with a maximum of two lookups, no matter what, even if there were collisions when inserting. The tradeoff is that insertions might be dramatically slower than with probing if collisions occur. If one has an application where lookup time matters more than insertion time – which is commonly the case – then Cuckoo hashing is potentially a reasonable approach.
Cuckoo hashing is named after Cuckoo birds, some of which are known for kicking eggs out of their nests. The algorithm kicks entries from one table to another.
Getting started
Find your cs201 folder for assignments.  Maybe it contains an assignments folder inside of it.
Download this cuckoo_hashing.zip file to get the starter code.  Put it in the cs201 folder you just created and unzip it.  Rename the folder cuckoo_hashing.  Make sure that you actually unzip / extract the code, don’t just open up the .zip file in another window.
Now open your folder in VS Code. There are multiple ways to do this, so pick one:
- Open VS Code, then select 
File->Open Folderand navigate to thecuckoo_hashingfolder you just made. - Open VS Code, then drag and drop the folder (e.g., from Finder in MacOS or File Explorer in Windows) into VS Code to open it.
 
Either way, if you are asked, click that you trust the authors. If you have any issues getting started, ask the lab assistants for help – any of them should be able to help you get started, even if they aren’t super familiar with Kotlin.
Building your code with Maven
Like most of our assignments, this project structure uses Maven, so you can use the following commands (in the VS Code terminal) to execute your code (note that you must be in the top-level folder cuckoo_hashing for these commands to work):
- 
    
mvn testruns the tests that we provided (see below) - 
    
mvn compile exec:javacompiles and runs your code, as specified in yourmainfunction - 
    
mvn compilecompiles your code, but doesn’t run it (this is likekotlincby itself) 
If you want to see less output from Maven, you can add -q (for “quiet”) flag:
- 
    
mvn -q testrun tests, with less output - 
    
mvn -q compile exec:javaruns yourmainfunction, with less output (but still any print statements you have) 
If you run the tests with the starter code, you should have errors, as you haven’t written any code yet.
Cuckoo hashing algorithm
Here are part 1 and part 2 of a set of videos on Cuckoo hashing. You are encouraged to watch them to get an idea of how it works.
These are the most important details:
- 
    
There are two arrays (acting as individual hash tables) instead of just one. We’ll denote them as table 0 and table 1.
 - 
    
Each table has its own separate hash function that goes with it; we’ll call them h0 and h1.
 - 
    
You should implement the Cuckoo hashing algorithm within the
CuckooHashMap<K, V>class as two functions:fun set(key: K, value: V)– inserts a key-value pair into the hashmap (if the given key is already present in the hashmap, this just replaces the value with the new one)fun get(key: K): V?– looks up the key in the hashmap, returning its value if found ornullotherwise
 
Insertion
Here is an overview of the algorithm for inserting a key-value pair:
- 
    
Try to insert a new entry into the first table as you would in normal hashing. If this succeeds, then we’re done!
 - 
    
In the event of a collision, remove the entry that is there (but remember it), then insert the new one. Now you’re stuck holding an entry that was removed. Try to insert that entry into the second table. If this succeeds, then we’re done!
 - 
    
In the event of a collision in the second table, remove the entry that is there (but remember it), then insert the entry you were holding on to. Now again you are stuck holding an entry. Try to insert that back into the first table. If this succeeds, then we’re done!
 - 
    
In the event of a collision in the first table, remove the entry that is there (but remember it), then insert the entry you were holding on to. Now you are again stuck holding an entry. Try to insert that back into the second table. If this succeeds, then we’re done!
 - 
    
Keep repeating this process (including switching tables) until you eventually find room to insert something, or until you give up because a certain number of iterations have passed. (This number is configured in
CuckooSetup.kt.) - 
    
If you gave up in the repetition in the previous step because too many iterations passed, it likely means a cycle of insertions happened, and that you’ll never actually succeed. So, pick two completely new hash functions h0 and h1 for the two tables, then rehash everything in them so they land in completely new locations, and start the whole insertion process again back at Step 1. (If you make it back to Step 5, just keep rehashing and repeating…) Note that there is starter code that helps with this.
 
Note: Check out the code in CuckooSetup.kt.  You are expected to read through it before trying to start, so you at least know what is there.  You should not make any changes to this file, but you’ll want to use what you find.
Retrieval
The lookup process is fairly straightforward when using Cuckoo hashing, due to the guarantee that we only have to check each table once, and there isn’t any probing. Something is in its place or it’s not there.
Here is an overview of the algorithm for looking up the value for a key:
- 
    
Look the key up in the first table exactly as you would in normal hashing. If you find it, you’re done!
 - 
    
If you didn’t find it in the first table, try looking it up in the second table using the hash function for that table. If you find it, you’re done!
 - 
    
If you didn’t find the key in either table, then you know it isn’t present, so return
null. 
Generating lots of hash functions
As mentioned in Step 6 for insertion above, Cuckoo hashing requires that we can keep creating new hash functions as needed.  For this assignment, we’re going to leverage a pseudo-random number generator.  Specifically, we’ll use the fact that we can give it a seed, and have it generate for us a predictable random number (so it works for insertion and later retrieval).  Here’s the code we’ll use, which you’ll find in CuckooSetup.kt:
    var seed = 0
    fun cuckooHashCode(tableNum: Int, key: K): Int {
        val multiplier = Random(2 * seed + tableNum).nextInt()
        val newHashCode = multiplier * key.hashCode()
        return abs(newHashCode)
    }
The function cuckooHashCode takes in a table number (which should be 0 or 1) and a key.  It then generates a predictable-yet-random-seeming hash value.  Whenever we want to create a new set of hash functions, we just increment the seed.
Part A: Basic hashing functionality
To get started, you should implement set and get, but only working within a single table.  In other words, you can assume you live in a perfect world in which collisions never occur.  You don’t need to use the second hash table yet.
Here’s a simplified version of the algorithm for each function:
- 
    
Inserting a key-value pair:
- Try to insert a new entry into the first table as you would in normal hashing. This will succeed, because the world is perfect and nothing ever collides.
 
 - 
    
Looking up the value for a key:
- 
        
Look the key up in the first table exactly as you would in normal hashing. If you find it, you’re done!
 - 
        
If you didn’t find it in the first table, then you know it isn’t present, because the world is perfect and this is only place it would have been, so return
null. 
 - 
        
 
Once you have this basic functionality implemented, you should pass the oneTableNoRehashingNeeded test when you run mvn test.
Note: It is important that you do not use any built-in hashing-related libraries.  You have to do this yourself with the arrays of Entry<K, V>? in tables.
Part B: Basic Cuckoo hashing
As the next step, you’ll extend your work from the previous part to incorporate the second table. This means that collisions can occur, but you can assume that you’ll be able to use the second table without any additional complexity.
Here’s the updated subset of the algorithm for each function:
- 
    
Inserting a key-value pair:
- 
        
Try to insert a new entry into the first table as you would in normal hashing. If this succeeds, then we’re done!
 - 
        
In the event of a collision, remove the entry that is there (but remember it), then insert the new one. Now you’re stuck holding an entry that was removed. Try to insert that entry into the second table. This will succeed, because the world is perfect and nothing ever collides in the second table.
 
 - 
        
 - 
    
Looking up the value for a key:
- 
        
Look the key up in the first table exactly as you would in normal hashing. If you find it, you’re done!
 - 
        
If you didn’t find it in the first table, try looking it up in the second table using the hash function for that table. If you find it, you’re done!
 - 
        
If you didn’t find the key in either table, then you know it isn’t present, so return
null. 
 - 
        
 
Once you have this additional functionality implemented, you should pass both the oneTableNoRehashingNeeded and noRehashingNeeded tests in mvn test.
Note: Insertion is what makes Cuckoo hashing complex, so by now you have completed implementing get!
Part C: Code refactoring
It is important that you avoid duplicated code.  By now, you likely have similar code, for example, to look up a value in the first table and the second using their respective hash functions.  Look for any such repeated code, and write one or more helper functions to simplify your get and set implementations.  You will need to write some helper functions to earn Advanced on this assignment.
For example, you may add a function that adds or updates the key/value pair in the given table, and returns the entry that was evicted, or null if no entry was previously stored in that location.  This could be its signature:
    private fun addOrUpdate(key: K, value: V, tableNum: Int): Entry<K, V>?
Part D: Complete Cuckoo hashing
As a final step, you should complete the entire implementation for insertion using the Cuckoo hashing algorithm (see above).
Part E: Reflection
Were there any particular issues or challenges you dealt with in completing this assignment? How long did you spend on this assignment?
Write a brief discussion (a sentence or two is fine) at the bottom of your CuckooHashMap.kt file (in comments, so each line should start with // or be in a /* */ block).
Here are some examples:
// Reflection:
// I got really mixed up making sense of the two tables.
// I had trouble figuring out how to invoke the right hash function for table 0.
// I spent 6 hours on this assignment.
// Reflection:
// I started late, so I had to rush to make sure I knew how to test my code.
// It may be good to start early next time.
// I spent 5 hours on this assignment.
// Reflection:
// It went fine; I found what I needed in my notes.
// I spent 3 hours on this assignment.
Hints and suggestions
- 
    
There is a function named
displaythat will print out both hash tables. You should use it regularly by adding code toMain.ktand runningmvn compile exec:javato make sure thatsetandgetseem to be behaving correctly on simple cases you come up with. This will make debugging much, much easier. - 
    
You can choose how to implement
set, but here’s one approach you can take:- 
        
Create a private helper function named
insertthat does the real work of inserting the key and value into the hash tables. Ifinsertsucceeds, it returnsnull; if it fails because it loops too many times trying, it returns the key/value pair that was most recently kicked out while trying to insert. - 
        
Then, create a private helper function named
rehashthat also takes a key/value pair to insert. This function can be called when an insert or a previous rehash has failed. It just increments the seed, then tries to insert the key/value pair that was passed in. If it succeeds, then it loops over both tables looking for entries. Each time it finds an entry, it takes it out (setting the location in the table tonull) and then tries to reinsert it. At any time, ifinsertfails and kicks out an entry, therehashfunction also gives up and returns that entry. Ifrehashsucceeds, it returnsnull. - 
        
You can create a very short private helper function named
rehashUntilSuccessthat takes a key/value pair and callsrehashin a while loop, doing so over and over untilrehashsucceeds. - 
        
Finally, the
setfunction can just callinsertwith the new key/value pair, and if it fails, it can callrehashUntilSuccess. 
 -