A Hash Table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.
In Python, the built-in dictionary (dict) is a powerful and highly optimized implementation of a hash table.
The magic of a hash table is its ability to provide, on average, constant-time O(1) lookups, insertions, and deletions. Here’s a simplified breakdown of the process:
key:value pair, the hash table first takes the key (e.g., the string "name") and passes it through a hash function.value (e.g., "Alice") is then stored in the array-like structure at the index generated by the hash function.
// Creating a dictionary
student = {
"name": "Alice",
"id": 12345,
"major": "Computer Science"
}
// Accessing a value by key (O(1) on average)
print(student["name"]) # Output: Alice
// Adding a new key-value pair (O(1) on average)
student["gpa"] = 3.8
// Deleting a key-value pair (O(1) on average)
del student["major"]
print(student)
What happens if the hash function generates the same index for two different keys? This is called a collision.
For example, hash("key1") might result in index 5, and hash("key2") might also result in index 5.
Modern hash tables, including Python's dictionaries, solve this using a technique called chaining. Instead of storing a single value at each index, they store a small linked list of key:value pairs. When a collision occurs, the new pair is simply added to the linked list at that index.
Performance Note: While lookups are O(1) on average, in the worst-case scenario (where many keys hash to the same index), the lookup time can degrade to O(n) as Python has to search through the linked list at that index. However, due to Python's excellent hash function, this is extremely rare in practice.
What is the average time complexity for accessing a value in a Python dictionary?