Introduction to HashMaps
So what exactly are HashMaps? Well if you have ever used a simple key-value pair data structure then you've probably used a hashmap. HashMaps are a simple Key-Value pair data structure that allow you to store a key and map some value to it.
They can store arbitrary value that you want to store in them and can be fetched later whenever the developer demands
it. However, the core value they provide is not the simplicity but rather the ability to store and fetch data in a
constant time that is O(1)
(though not necessarily true, we will talk about this a little later). Quite fascinating
right you have a data structure that is not only simple to use but extremely fast!
Let's first see how are HashMaps defined across several languages and I bet you've seen these:
In JavaScript though there is not a direct HashMap object however there is a Map which behaves exactly like a HashMap
would. You can perform Add
, Update
, Remove
operations on it in a constant time with a predictable flow.
const hashmap = new Map();
hashmap.set("key", "value");
C++ exposes two containers called std::unordered_map
and std::map
the core difference being that std::map
runs on
a O(log n)
time complexity because they internally use Red-black trees
.
std::unordered_map<std::string, std::string> u = {
{"RED", "#FF0000"},
{"GREEN", "#00FF00"},
{"BLUE", "#0000FF"}
};
Go (which we'll be using today) has something similar which it calls map!
m := make(map[string]int)
m["route"] = 666
i, ok := m["route"]
Aside of the syntactic difference you'll observe that the idea behind them stays the same, they are a Key-Value pair that can be stored and fetched very quickly.
😶🌫️ The truth behind the pretty face!
Maps, HashMaps, HashTables — whatever you want to call them — are nothing but clever abstractions built atop some surprisingly simple and fundamental data structures, engineered to be fast and powerful.
If I were to write a recipe for a HashMap, it’d go something like this:
Ingredients to make a Tasty HashMap:
- A good hashing function 🍯
- An array of buckets 🍱
- Buckets that hold key-value pairs 🔑📦
- Plus a cup of coffee — never hurts :p ☕
Under the hood, a HashMap stores a collection of key-value pairs inside an array. But how they’re structured is what makes them interesting. Consider this basic struct:
type Entry struct {
Key string
Value int
}
The Key
represents the key ("route"
if we take the above Go example) you wanted to store a Value
against. And the
Value
is of course the value you wanted to store or map it to.
A simple HashMap might be something like this:
type HashMap struct {
table [][]Entry
}
That table
is our main array — and each index holds a bucket ([]Entry
), which is itself a slice of Entry's.
But wait, why would we need multiple entries in a single bucket?
[ <--stuff--, [{Key: "the_cutest", Value: "you_of_course"}], --stuff--> ]
Glad you asked 😏
Let’s walk through what happens when you want to store a key like "foo" with a value like "bar":
- The moment you insert "foo", the HashMap sends the key to the hash function.
- The hash function returns an unsigned integer — a hash of the key.
- Since this integer could be wildly large, we tame it by using modulo:
idx := int(Hash(KEY) % uint64(len(table)))
- Now idx is guaranteed to be within bounds — between 0 and n - 1, where n is the size of our array.
- Using this index, we find the bucket and insert our
{Key, Value}
pair there. - If there’s already something in that bucket (that is, we hit a collision), we either:
- Update the value if the key already exists.
- Append the new entry if it’s a different key entirely.
Boom! 🎉 Just like that, you've got yourself a working (and beautiful) little HashMap.
Could you believe it’s really that simple? Okay... maybe not that simple. But hey, the gist is all there 😉
💭 The idea of Hashing
According to WikiPedia, a Hash or a Hashing function is defined as follows:
A hash function is any function that can be used to map data of arbitrary size to fixed-size values, ... The values returned by a hash function are called hash values, hash codes, (hash/message) digests, or simply hashes.
Think of a function which given an arbitrary input, converts it into something more predictable or bounded. Or maybe a function that is capable of converting a huge object to a number for example:
fn(x any) -> int
That is a huge oversimplification, of course. But, how about this toy function in Go?
func Convert(str string) int {
sum := 0
for _, char := range str {
sum += int(char)
}
return sum
}
func main() {
something := Convert("Sourav is NOT a beta!")
fmt.Println(something) // Prints `1771`
something2 := Convert("Sourav is NOT a beta!")
fmt.Println(something2) // Prints `1771`
}
Observe that the function accepts a string str
and then loops over every rune in that string and converts each rune/char
into an integer int(char)
and then sums it up. It's a function given a string str
will convert that string and
return an int
. However, notice that no matter how many times I call it with the same input, it always returns the
same result — a pure function to be accurate. Being deterministic and pure is what defines a good hash function.
And now you know hashing, the above function for sure isn't a sophisticated hashing function, but it represents the idea. A Hash function can be used to map data of arbitrary size (our string) to a fixed-size value (integer).
💥 Collisions are inevitable
Now you might ask — "Alright, I get it... but what if two keys, like "foo" and "bar", end up producing the same hash?"
Great question! And yes, that's a very real possibility — in fact, it's guaranteed to happen at some point. But don’t worry: we’re ready for it.
Remember those “buckets” in our table? Each one is a []Entry — a slice capable of holding multiple entries. Here's what that might look like:
[
<--stuff--,
[
{Key: "hash-one", Value: 1},
{Key: "hash-two", Value: 2}
],
--stuff-->
]
Notice how both "hash-one"
and "hash-two"
live in the same bucket? This is how we handle collisions.
😨 But wait, Sourav... isn’t that bad?
Like… what if every single key ended up in the same bucket? Wouldn't that turn our glorious O(1)
into a sad, sluggish O(n)
?
Well… yes, in the worst case. But let’s break that down.
🤯 Common misconceptions about HashMap complexity
The nightmare scenario is that multiple keys hash to the same index, dumping everything into one giant bucket.
Suddenly, instead of a quick lookup, you’re scanning a long list — and that's O(n)
.
But the real problem isn’t the data structure itself.
It’s not the Entry
struct.
It’s not the 2D table.
It’s not even the bucket.
✨ It’s the hash function.
If you hard-code a predictable, easily abusable hash function — like this one:
h(x) = (798532978x + 5239023) mod 100000007
Then here's the problem if I know what or how your hash function works I can find out values for x
such that for all
the values in domain x0
to xn
the range of h(x)
will be equal K
and in this case all these values will be
dumped into one bucket! Thereby degrading the performance to O(n)
.
In other words, then anyone (or any pattern) that figures out your function can generate values that collide — on purpose. All those keys will funnel into a single bucket, and boom — linear time complexity.
💡 A Good Hash Functions
A good hash function tends to have:
- Avalanche effect: Small changes in input should result in drastically different output (What is Avalanche Effect)
- Purity: For a given input, the output must always be the same (i.e., deterministic)
For practical implementations, I often use FNV-1A, which is fast, simple, and distributes reasonably well for strings and small keys. You can also explore this great discussion on hashing algorithms for uniqueness and speed.
🤔 So… Is O(1) a Lie?
Not really. When we say "Lookups in a HashMap are O(1)", what we actually mean is:
"The expected runtime for lookups is constant time, assuming a well-behaved hash function and a balanced distribution."
That “1”
is not literal — it’s a small, consistent constant k
.
So realistically, your lookup is O(k)
— where k
is the number of entries in a bucket (often 1, maybe 2, rarely more).
So yes — collisions happen, and yes — they affect performance. But with the right hash function, they’re manageable, rare, and no reason to panic.