Building a Hash Table in JavaScript from Scratch
Hash tables are a cornerstone of computer science, providing fast lookups and efficient key-value storage. But how do they work, and what happens when two keys collide? By building a hash table from scratch, we can uncover its inner mechanics, address the challenge of collisions, and compare it to JavaScript's built-in Objects and Maps.
What Is a Hash Table?
A hash table is a data structure that maps keys to values using a hash function. It works by calculating a hash for each key and using it to determine where the value is stored in an internal array.
Key Characteristics:
- Efficient: Average case O(1) for lookups, inserts, and deletes.
- Collision-Prone: Multiple keys can hash to the same index, requiring special handling.
- Practical: Used in caching, databases, and high-performance applications.
Example:
const map = new Map()
map.set('key1', 'value1')
console.log(map.get('key1')) // Output: "value1"
How JavaScript’s Built-In Structures Handle Hashing
JavaScript provides two key-value structures: Object
and Map
. Let’s explore their differences before building our custom hash table.
Objects
Objects are often used as hash tables but have limitations:
- Keys are always converted to strings or symbols.
- They inherit properties and methods from their prototype, which can lead to unexpected key collisions unless you use
Object.create(null)
.
const obj = { key1: 'value1' }
console.log(obj.key1) // Output: "value1"
Maps
Map
is designed specifically as a hash table:
- Any key type: Supports objects, functions, and primitives as keys.
- Order: Maintains insertion order.
- Performance: Optimized for frequent lookups and inserts.
const map = new Map()
map.set({ id: 1 }, 'value1')
console.log(map.size) // Output: 1
Maps automatically handle collisions internally, ensuring consistent performance.
Building a Hash Table From Scratch
To better understand hash tables, let’s create one step by step, with careful attention to handling collisions.
Step 1: Define the Hash Function
The hash function is the backbone of any hash table. It determines where a key-value pair is stored by converting the key into a numeric index within the table's array.
What Makes a Good Hash Function?
A good hash function must satisfy these properties:
- Deterministic: The same input must always produce the same hash.
- Uniform Distribution: Keys should be distributed evenly across the array to minimize collisions.
- Efficient: The function should be quick to compute, even for large datasets.
- Minimize Collisions: Different keys should produce distinct hashes whenever possible.
How Hash Functions Work
The goal is to map a wide range of possible keys to a fixed range of indices (0 to tableSize - 1). Here's how:
-
Convert the Key to a Numeric Value:
- For strings, sum the Unicode values of the characters.
- For numbers, use the value directly (or modulo it to fit the table size).
-
Compress the Numeric Value:
- Use the modulo operator (
%
) to ensure the hash falls within the array bounds.
- Use the modulo operator (
JavaScript Hash Function Example
Here’s a simple hash function for strings:
function hash(key, tableSize) {
let hash = 0
for (let i = 0; i < key.length; i++) {
// Multiply the character's Unicode value by its position for variation
hash = (hash + key.charCodeAt(i) * i) % tableSize
}
return hash
}
This function:
- Loops through each character in the string.
- Uses the character's Unicode value (
key.charCodeAt(i)
) for the numeric conversion. - Multiplies by
i
(the position) to introduce variability. - Applies modulo (
% tableSize
) to fit the hash within the array size.
Potential Limitations of This Function
While simple, this hash function has some drawbacks:
- Collisions: Similar keys (e.g.,
"abc"
and"cab"
) may hash to the same index. - Poor Distribution: Patterns in the input keys can create clusters, reducing performance.
Improving the Hash Function
For better results, consider:
- Prime Numbers: Use a prime number for the array size to reduce clustering.
- Multiplicative Hashing: Multiply the hash by a constant factor (e.g., 31 or 37), known to spread values better.
- Advanced Techniques: Use established algorithms like FNV-1a or MurmurHash for more robust hashing.
Here’s an enhanced version of our hash function:
function improvedHash(key, tableSize) {
let hash = 0
const PRIME = 37 // A prime number to reduce collisions
for (let i = 0; i < key.length; i++) {
hash = (hash * PRIME + key.charCodeAt(i)) % tableSize
}
return hash
}
Key Takeaways
- A hash function must be tailored to the dataset and application to minimize collisions.
- Prime numbers and constants can improve uniformity.
- For most real-world use cases, libraries or built-in functions (e.g., JavaScript’s
Map
) are better than writing your own hash function.
⚠️ Warning: No matter how good the hash function is, collisions cannot be entirely eliminated, especially when the number of keys exceeds the table size. This is why collision resolution techniques, like separate chaining, are essential.
Step 2: Create the Hash Table Class
We start with a class that initializes a storage array and manages operations.
class HashTable {
constructor(size = 50) {
this.table = new Array(size)
this.size = size
}
}
Step 3: Handle Collisions with Separate Chaining
We resolve collisions using separate chaining, where each index holds a bucket (an array) for storing key-value pairs.
set(key, value) {
const index = hash(key, this.size);
if (!this.table[index]) {
this.table[index] = []; // Create a bucket for this index
}
const existing = this.table[index].find(([k]) => k === key);
if (existing) {
existing[1] = value; // Update the value if key already exists
} else {
this.table[index].push([key, value]); // Insert new key-value pair
}
}
⚠️ Warning: If too many keys collide and end up in the same bucket, performance degrades to O(n) for lookups and inserts.
Step 4: Retrieve Values
To retrieve a value, we look for the key in the appropriate bucket.
get(key) {
const index = hash(key, this.size);
const bucket = this.table[index];
if (bucket) {
const pair = bucket.find(([k]) => k === key);
return pair ? pair[1] : undefined;
}
return undefined;
}
Step 5: Delete Key-Value Pairs
To remove a key-value pair, we filter out the key from the bucket.
delete(key) {
const index = hash(key, this.size);
const bucket = this.table[index];
if (bucket) {
this.table[index] = bucket.filter(([k]) => k !== key);
}
}
Handling Collisions
Collisions are an inherent challenge in hash tables. Without proper handling, they can overwrite data or lead to incorrect results.
Separate Chaining
Our implementation uses separate chaining, where each index in the hash table holds a collection (bucket) of key-value pairs. This approach is simple and effective but relies on keeping the number of collisions low.
Open Addressing
Another approach is open addressing, where collisions are resolved by finding another slot in the table. This requires strategies like:
- Linear Probing: Look for the next empty slot sequentially.
- Double Hashing: Use a secondary hash function to determine the step size.
Resize the Table to Reduce Collisions
To maintain performance, resize the hash table when the load factor (number of elements / table size) exceeds a threshold, like 0.7.
resize() {
const newTable = new Array(this.size * 2);
this.table.forEach(bucket => {
if (bucket) {
bucket.forEach(([key, value]) => {
const index = hash(key, newTable.length);
if (!newTable[index]) {
newTable[index] = [];
}
newTable[index].push([key, value]);
});
}
});
this.table = newTable;
this.size = newTable.length;
}
Performance Analysis
Time Complexity
Operation | Average Time Complexity | Worst Case Complexity |
---|---|---|
set | O(1) | O(n) (with many collisions) |
get | O(1) | O(n) (with many collisions) |
delete | O(1) | O(n) (with many collisions) |
Space Complexity
- Custom Hash Table: Requires extra memory for buckets (in separate chaining).
- Built-In Maps: Optimized for memory and performance.
Key Takeaways
- Collisions Are Unavoidable: Even with a good hash function, collisions will occur.
- Separate Chaining Works Well: It allows for simple and effective collision resolution.
- Consider Built-In Options: JavaScript’s
Map
is highly optimized for key-value storage, handling collisions internally.
Building a hash table deepens your understanding of this essential data structure. While native Map
is often the best choice, knowing how to manage collisions equips you to tackle advanced challenges.