Polyglot Dojo #3: Hashmap
Challenge
The challenge I tend to review in this post is called hash map. To be honest, I couldn't understand the requirement of the challenge from the description and had almost no idea what a hash map is. Even the test suite didn't help much in visualizing the data structure I needed. So I first need to learn what a hash map is.
The best resource I could find on the internet which helped me to clarify all my question, is this video:
It gradually answers following questions visually, which is more than enough to tackle this challenge:
What is a hash table?
Why we use hash tables?
What is the job of a hash function?
What is the hash map?
What is the solution to overcome hash collision?
Python Solution
The test suite provided for us is like this:
from nose.tools import assert_equal, assert_raises
class TestHashMap(object):
# TODO: It would be better if we had unit tests for each
# method in addition to the following end-to-end test
def test_end_to_end(self):
hash_table = HashTable(10)
print("Test: get on an empty hash table index")
assert_raises(KeyError, hash_table.get, 0)
print("Test: set on an empty hash table index")
hash_table.set(0, 'foo')
assert_equal(hash_table.get(0), 'foo')
hash_table.set(1, 'bar')
assert_equal(hash_table.get(1), 'bar')
print("Test: set on a non empty hash table index")
hash_table.set(10, 'foo2')
assert_equal(hash_table.get(0), 'foo')
assert_equal(hash_table.get(10), 'foo2')
print("Test: set on a key that already exists")
hash_table.set(10, 'foo3')
assert_equal(hash_table.get(0), 'foo')
assert_equal(hash_table.get(10), 'foo3')
print("Test: remove on a key that already exists")
hash_table.remove(10)
assert_equal(hash_table.get(0), 'foo')
assert_raises(KeyError, hash_table.get, 10)
print("Test: remove on a key that doesn't exist")
assert_raises(KeyError, hash_table.remove, -1)
print('Success: test_end_to_end')
def main():
test = TestHashMap()
test.test_end_to_end()
if __name__ == '__main__':
main()
It reveals some of the requirements with more details. But it's not enough by itself, since all of the tests would get passed by a hash table, while the pared solution template, asks us to store values as a map (hash map). You can see it by noting to the Item
constructor's signature (it requires key
and value
). This is the solution template:
class Item(object):
def __init__(self, key, value):
# TODO: Implement me
pass
class HashTable(object):
def __init__(self, size):
# TODO: Implement me
pass
def _hash_function(self, key):
# TODO: Implement me
pass
def set(self, key, value):
# TODO: Implement me
pass
def get(self, key):
# TODO: Implement me
pass
def remove(self, key):
# TODO: Implement me
pass
First thing I'm going to develop is the body of Item()
's constructor function, which would be as simple as this:
class Item(object):
def __init__(self, key, value):
self.key = key
self.value = value
Here we just store the values in the class body, and probably could have used a better data structure (like Enum
) instead of a class. But this would suffice for now.
The next logical step would be the _hash_function
itself. Well I had no idea about it. In application level (mostly web-development), I was almost always using Python's standard data structure for this kind of task, and never had a need for such task. So after trying couple of my lame ideas (including some hash methods from Python's standard library, but I knew it should be simpler ๐
), I gave up and looked at the solution page.
To my surprise, the hash function used was quite simple. Actually this simple:
def _hash_function(self, key):
return key % self.size
But this simplicity is also based on two of our main constraints:
The size of hash table should be static. This way, we can always calculate a unique (and valid) index number to place our Item in.
The keys will always be integer (to reduce the complexity of challenge. Other wise, we should have a hash function which minimizes collisions and our lookup process would not always be guaranteed to be $O(1)$).
One cool thing I learned here, is the property of modulo which this function relies on. The right side operand of the modulo, basically defines the maximum value the result of the calculation would be (no matter how big the left side operand is). Look at this:
result = set([n % 10 for n in range(1000)])
print(result)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Cool! So no matter what the value of our numeric key, we would always be sure that we have a place for it in our hash map structure. With that explanation, the __init__
function's implementation is as straightforward as this:
class HashTable(object):
def __init__(self, size):
self.size = size
self.table = [[] for _ in range(size)]
We store the size and create a list of lists to act as our hash table's slots. So when we instantiate the HashTable
like HashTable(10)
the self.table
would contain 10 empty slots. With the main logic implemented, the set
, get
and remove
functions would be so easy to implement. Just using some iterations we would have our functions as:
def set(self, key, value):
idx = self._hash_function(key)
for item in self.table[idx]:
if item.key == key:
item.value = value
return
self.table[idx].append(Item(key, value))
def get(self, key):
idx = self._hash_function(key)
for item in self.table[idx]:
if item.key == key:
return item.value
raise KeyError
def remove(self, key):
idx = self._hash_function(key)
for index, item in enumerate(self.table[idx]):
if item.key == key:
del self.table[idx][index]
return
raise KeyError
I don't think we require any more explanation here and everything looks as clear as it can be. You can also experiment with the final implementation here: