Python Hash Tables

Python Hash Tables (Dictionaries)

A Hash Table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.

In Python, the built-in dictionary (dict) is a powerful and highly optimized implementation of a hash table.


How Do Hash Tables Work?

The magic of a hash table is its ability to provide, on average, constant-time O(1) lookups, insertions, and deletions. Here’s a simplified breakdown of the process:

  1. Hashing the Key: When you want to store a key:value pair, the hash table first takes the key (e.g., the string "name") and passes it through a hash function.
  2. Hash Function: This function is a special algorithm that converts the key into an integer. This integer will be the index for where to store the value in memory. A good hash function always produces the same integer for the same key.
  3. Storing the Value: The value (e.g., "Alice") is then stored in the array-like structure at the index generated by the hash function.
  4. Retrieval: To get the value for a key, the same process is repeated. The key is hashed, the resulting index is located, and the value at that index is returned.

Python Dictionary (Hash Table) in Action

// Creating a dictionary
student = {
    "name": "Alice",
    "id": 12345,
    "major": "Computer Science"
}

// Accessing a value by key (O(1) on average) print(student["name"]) # Output: Alice

// Adding a new key-value pair (O(1) on average) student["gpa"] = 3.8

// Deleting a key-value pair (O(1) on average) del student["major"]

print(student)


Handling Collisions

What happens if the hash function generates the same index for two different keys? This is called a collision.

For example, hash("key1") might result in index 5, and hash("key2") might also result in index 5.

Modern hash tables, including Python's dictionaries, solve this using a technique called chaining. Instead of storing a single value at each index, they store a small linked list of key:value pairs. When a collision occurs, the new pair is simply added to the linked list at that index.

Performance Note: While lookups are O(1) on average, in the worst-case scenario (where many keys hash to the same index), the lookup time can degrade to O(n) as Python has to search through the linked list at that index. However, due to Python's excellent hash function, this is extremely rare in practice.


Key Properties of Dictionaries


Exercise

?

What is the average time complexity for accessing a value in a Python dictionary?