 # Hash Table in Data Structures

Amit Kumar Ghosh  33 min read
22 Sep 2023
397 Views

## Hash Table in Data Structures: An Overview

Have you ever wondered how computers efficiently store and retrieve large amounts of data? The answer lies in the use of an incredibly powerful yet deceptively simple tool known as a hash table. A hash table in data structure is widely used in computer science and data structures to effectively manage complex datasets and facilitate quick look-up operations. In this article, we’ll explore what hash tables are, how they work, their advantages over other storage techniques, and common applications where they’re found today. So read on to learn everything you need to know about these invaluable tools!

## Data structures play a crucial role in organizing and manipulating data in computer science. One data structure that stands out due to its efficiency in searching and insertion is the Hash table. A Hash table is a data structure that implements an associative array. Fast data retrieval and search are made possible by hash tables, which use a hash function to map keys to array indices. Because of their efficiency and suitability for managing huge datasets and enhancing application performance, hash functions must be of excellent quality.

## Hashing (Hash Function)

A hash function is a function that converts keys to array indices. To prevent collisions and guarantee speedy lookup times, the keys should be equally dispersed across the array using a good hash function.

Integer universe assumption: According to the integer universe assumption, the keys are taken to be integers inside a specific range. This makes it possible to use fundamental hashing operations like multiplication or division hashing.

Hashing by division: In this simple hashing method, the index is the value that remains after dividing the key by the size of the array. It works well when the array size is a prime integer and the keys are distributed equally.

Multiplication hashing: This simple hashing operation multiplies the key by a constant ranging from 0 to 1 before taking the fractional component of the result. The fractional component is then multiplied by the array size to produce the index. Additionally, it works well with the keys dispersed evenly.

## Hash Collision

• A hash collision refers to a situation where two different inputs produce the same hash value or hash code when processed by a hash function. In other words, it occurs when two distinct pieces of data result in an identical hash output.
• A hash function is a mathematical algorithm that takes an input or "message" and produces a fixed-size string of characters, which is the hash value or hash code. Hash functions are commonly used in various computer science applications, such as data structures like hash tables, digital signatures, password storage, and data integrity verification. Hash collisions can have practical implications depending on the context.
• For example, in hash tables, collisions can lead to degraded performance if not handled efficiently. Techniques like chaining or open addressing are employed to resolve collisions in hash table implementations.
• Cryptographic hash functions, used in areas like data security and digital signatures, have stricter requirements to prevent collisions. These functions aim to provide a high level of collision resistance, making it computationally infeasible to find two inputs that produce the same hash value. Cryptographic hash collisions are of particular concern because they can undermine the integrity and security of cryptographic systems. Researchers and cryptographers continually evaluate and analyze hash functions to ensure their collision resistance and security properties, as vulnerabilities or weaknesses discovered in hash functions could have significant implications for various applications that rely on them.

## Types of hash collision

In the context of hashing algorithms, a collision occurs when two different inputs produce the same hash value. There are primarily two types of hash collisions: accidental or random collisions and intentional or malicious collisions. Let's explore each type in more detail:

• Accidental Collisions:

Accidental collisions occur when two different inputs produce the same hash value due to the nature of hashing algorithms and the limited range of possible hash values. These collisions are unintended and usually considered rare and coincidental. The probability of accidental collisions depends on the quality of the hashing algorithm and the size of the hash space. Good hashing algorithms strive to minimize the chances of accidental collisions by distributing hash values uniformly across the output space.

• Intentional Collisions:

Intentional collisions occur when an attacker purposefully generates two or more different inputs that produce the same hash value. These collisions are often the result of exploiting vulnerabilities or weaknesses in the hashing algorithm. Intentional collisions can be a significant security concern in various scenarios, such as digital signatures, certificate authorities, password hashing, and data integrity checks.

There are two main categories of intentional collisions:

• Preimage Attacks: In a preimage attack, an attacker tries to find any input that matches a given hash value. This type of attack is aimed at breaking the one-way property of a hash function, which means that it is computationally infeasible to determine the original input from its hash value.
• Collision Attacks: In a collision attack, an attacker aims to find two different inputs that produce the same hash value. The goal is to break the collision resistance property of a hash function, which ensures that it is computationally difficult to find any two inputs with the same hash value

## How does the hash table work?

1. Initialize the Hash Table: Start with an empty array of a fixed size (let's say 10). Each index in the array represents a bucket in the hash table.
2. [empty, empty, empty, empty, empty, empty, empty, empty, empty, empty]
3. Insertion: When inserting a key-value pair into the hash table, the hash function is used to calculate the index where the pair should be stored. The hash function takes the key as input and returns an integer value.
4. Let's say we want to insert the key "apple" with the value "fruit" into the hash table. The hash function calculates the index as follows: hash("apple") = 5 Since the hash value is 5, we store the key-value pair in index 5 of the array: [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]
5. Retrieval: To retrieve a value associated with a given key, we use the same hash function to calculate the index. We then look for the key-value pair in that index.
6. For example, if we want to retrieve the value for the key "apple," the hash function calculates the index as 5. We check index 5 in the array and find the key-value pair ("apple", "fruit"): [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]
7. Collision: Hash tables can encounter collisions when two different keys map to the same index. This can happen if the hash function is not perfectly distributed or if the array size is relatively small compared to the number of keys.
8. For example, if we try to insert the key "banana" and the hash function calculates the index as 5, we encounter a collision. We search for the next available slot and find index 6 is empty. So, we store the key-value pair ("banana", "fruit") in index 6: [empty, empty, empty, empty, empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty] [empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty]

## Collision Resolution Hashing Techniques in Data Structure

Hash collisions occur when two different inputs produce the same hash value. Resolving hash collisions depends on the specific hashing algorithm being used. Here are a few common approaches to resolving hash collisions:

1. Collision resolution by separate Chaining: This technique involves maintaining a linked list of elements that hash to the same value. When a collision occurs, the new element is appended to the linked list associated with that hash value. This allows multiple elements to coexist at the same hash value.
2. Open Addressing: In this open addressing hashing method, when a collision occurs, the algorithm searches for the next available or unoccupied slot in the hash table and stores the element there. Common open-addressing strategies include three major ways, which are
• Linear probing- linear probing hash table is a technique used in hash tables to resolve collisions. A hash table is a data structure that stores key-value pairs and allows efficient retrieval of values based on their associated keys. When two different keys produce the same hash value (a collision), linear probing can be used to find an alternative slot in the hash table to store the new key-value pair.
• Quadratic probing- Quadratic probing hash table is a collision resolution technique used in hash tables or hash-based data structures to handle collisions that occur when two or more elements are mapped to the same hash value. It is an open addressing strategy where, instead of immediately placing the collided element in the next available slot, the algorithm looks for an empty slot by using a quadratic function to probe the table.
• Double hashing- Double hashing is a collision resolution technique used in hash tables to handle collisions when two different keys are mapped to the same hash value. It is an extension of the basic linear probing or open addressing method. In double hashing, an auxiliary hash function is used to calculate the number of steps to probe for the next available slot when a collision occurs. The auxiliary hash function produces a secondary hash value, which is then used to calculate the step size for probing. The step size is added to the current index to determine the next slot to check for availability.
3. Robin Hood Hashing: This technique is a variation of open addressing where elements are stored in an ordered manner based on their distance from their ideal hash position. When a collision occurs, the algorithm checks the displacement (distance from the ideal position) of the existing element. If the new element has a smaller displacement, it replaces the existing element and continues to probe for an empty slot.
4. Cuckoo Hashing: Cuckoo hashing uses multiple hash functions and multiple hash tables. Each element is assigned to one of the hash tables based on the output of the corresponding hash function. If a collision occurs, the algorithm tries to "kick out" the existing element by rehashing it and placing it in the other hash table. This process continues until there are no more collisions or a maximum number of rehashing attempts is reached.
5. Dynamic Resizing: Another approach is to dynamically resize the hash table when a certain threshold (load factor) is exceeded. This involves creating a larger hash table and rehashing all existing elements to distribute them more evenly. Dynamic resizing helps reduce the likelihood of collisions by providing a larger space for elements to be stored.

## Hash Table Data Structure Algorithm

1. Initialize an array of fixed size to serve as the underlying storage for the hash table. The size of the array depends on the expected number of key-value pairs to be stored.
2. Define a hash function that takes a key as input and generates a unique hash code. The hash code is an integer value that represents the index in the array where the key-value pair will be stored. The quality of the hash function is crucial for minimizing collisions (when different keys produce the same hash code).
3. To insert a key-value pair into the hash table, apply the hash function to the key to determine the index. If the calculated index is already occupied, handle collisions using a collision resolution strategy. Common strategies include chaining (using linked lists to store multiple values at the same index) or open addressing (finding the next available index based on a defined probing sequence).
4. To retrieve a value associated with a key, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, return the corresponding value. If the index is empty or the keys don't match, the key is not present in the hash table.
5. To remove a key-value pair from the hash table, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, remove the key-value pair from the table. If the index is empty or the keys don't match, the key is not present in the hash table

## Implementation of Hash Table Example

``````
def get(self, key):
index = self.hashFunction(key)
bucket = self.table[index]

for kvp in bucket:
if kvp == key:
return kvp

def remove(self, key):
index = self.hashFunction(key)
bucket = self.table[index]

for kvp in bucket:
if kvp == key:
bucket.remove(kvp)
return

# Main function
if __name__ == "__main__":
hashTable = HashTable(10)

# Insert key-value pairs
hashTable.insert(1, "Value 1")
hashTable.insert(11, "Value 11")
hashTable.insert(21, "Value 21")
hashTable.insert(2, "Value 2")

# Retrieve values
print(hashTable.get(1)) # Output: Value 1
print(hashTable.get(11)) # Output: Value 11
print(hashTable.get(21)) # Output: Value 21
print(hashTable.get(2)) # Output: Value 2

# Remove a key-value pair
hashTable.remove(11)

# Try to retrieve the removed value
``````
import java.util.ArrayList;
import java.util.List;

// Hash Table class
class HashTable {
// Bucket is a list of key-value pairs
private class KeyValuePair {
int key;
String value;

KeyValuePair(int key, String value) {
this.key = key;
this.value = value;
}
}

private int size;

// Constructor
HashTable(int tableSize) {
size = tableSize;
table = new ArrayList<>(size);
for (int i = 0; i < size; i++) {
}
}

// Hash function
private int hashFunction(int key) {
return key % size;
}

// Insert a key-value pair into the hash table
void insert(int key, String value) {
int index = hashFunction(key);

// Check if the key already exists in the bucket
for (KeyValuePair kvp : bucket) {
if (kvp.key == key) {
// Key found, update the value
kvp.value = value;
return;
}
}

}

// Retrieve the value associated with a given key
String get(int key) {
int index = hashFunction(key);

// Search for the key in the bucket
for (KeyValuePair kvp : bucket) {
if (kvp.key == key) {
// Key found, return the value
return kvp.value;
}
}

}

// Remove a key-value pair from the hash table
void remove(int key) {
int index = hashFunction(key);

// Search for the key in the bucket
for (KeyValuePair kvp : bucket) {
if (kvp.key == key) {
// Key found, remove the pair
bucket.remove(kvp);
return;
}
}
}
}

// Main function
public class Main {
public static void main(String[] args) {
HashTable hashTable = new HashTable(10);

// Insert key-value pairs
hashTable.insert(1, "Value 1");
hashTable.insert(11, "Value 11");
hashTable.insert(21, "Value 21");
hashTable.insert(2, "Value 2");

// Retrieve values
System.out.println(hashTable.get(1)); // Output: Value 1
System.out.println(hashTable.get(11)); // Output: Value 11
System.out.println(hashTable.get(21)); // Output: Value 21
System.out.println(hashTable.get(2)); // Output: Value 2

// Remove a key-value pair
hashTable.remove(11);

// Try to retrieve the removed value
}
}``````
``````
#include <iostream>
#include <vector>
#include <list>

// Hash Table class
class HashTable {
private:
// Bucket is a list of key-value pairs
typedef std::pair<int, std::string> KeyValuePair;
typedef std::list<KeyValuePair> Bucket;
std::vector<Bucket> table;
int size;

public:
// Constructor
HashTable(int tableSize) : size(tableSize) {
table.resize(size);
}

// Hash function
int hashFunction(int key) {
return key % size;
}

// Insert a key-value pair into the hash table
void insert(int key, const std::string& value) {
int index = hashFunction(key);
Bucket& bucket = table[index];

// Check if the key already exists in the bucket
for (auto& kvp : bucket) {
if (kvp.first == key) {
// Key found, update the value
kvp.second = value;
return;
}
}

bucket.push_back(std::make_pair(key, value));
}

// Retrieve the value associated with a given key
std::string get(int key) {
int index = hashFunction(key);
Bucket& bucket = table[index];

// Search for the key in the bucket
for (const auto& kvp : bucket) {
if (kvp.first == key) {
// Key found, return the value
return kvp.second;
}
}

}

// Remove a key-value pair from the hash table
void remove(int key) {
int index = hashFunction(key);
Bucket& bucket = table[index];

// Search for the key in the bucket
for (auto iter = bucket.begin(); iter != bucket.end(); ++iter) {
if (iter->first == key) {
// Key found, remove the pair
bucket.erase(iter);
return;
}
}
}
};

// Main function
int main() {
HashTable hashTable(10);

// Insert key-value pairs
hashTable.insert(1, "Value 1");
hashTable.insert(11, "Value 11");
hashTable.insert(21, "Value 21");
hashTable.insert(2, "Value 2");

// Retrieve values
std::cout << hashTable.get(1) << std::endl; // Output: Value 1
std::cout << hashTable.get(11) << std::endl; // Output: Value 11
std::cout << hashTable.get(21) << std::endl; // Output: Value 21
std::cout << hashTable.get(2) << std::endl; // Output: Value 2

// Remove a key-value pair
hashTable.remove(11);

// Try to retrieve the removed value

return 0;
}``````

Using methods for inserting key-value pairs, retrieving items by key, and removing key-value pairs, this code provides an easy hash table class (HashTable). By adding, locating, and deleting values from the hash table, it also shows how to use the class.

#### Output

``````Value 1
Value 11
Value 21
Value 2

## Characteristics of good Hash Function

Good hash functions have several important characteristics that make them suitable for various applications. Here are some properties of a good hash function:

1. Deterministic: A hash function should always produce the same hash value for the same input. This property ensures consistency and allows for easy verification and comparison.
2. Uniform distribution: The hash function should evenly distribute the hash values across the output space. This property helps to minimize collisions and ensures that different inputs have a low probability of producing the same hash value.
3. Fixed output size: A good hash function has a fixed output size, regardless of the input size. This property enables efficient storage and comparison of hash values.
4. Fast computation: Hash functions should be computationally efficient to handle large amounts of data quickly. Speed is crucial for applications that involve hashing large datasets or frequently computing hash values.
5. Avalanche effect: A small change in the input should result in a significant change in the output hash value. This property ensures that even a minor modification in the input leads to a completely different hash value, making it difficult to predict or reverse-engineer the original input.
6. Resistance to collisions: While it is impossible to completely eliminate collisions (different inputs mapping to the same hash value), a good hash function minimizes their occurrence. It should be difficult to find two different inputs that produce the same hash value.
7. Non-invertibility: Given a hash value, it should be computationally infeasible to determine the original input that generated it. This property provides security and helps protect sensitive information.
8. Well-defined behavior: The hash function should have a clear and unambiguous definition for all possible inputs. It should handle edge cases and corner cases gracefully, without producing unexpected results or errors.
9. Security properties: For cryptographic applications, a good hash function should exhibit additional security properties such as resistance to preimage attacks (finding an input given its hash value) and second preimage attacks (finding a different input with the same hash value).

Examples of widely-used hash functions that possess these characteristics include:

• (Secure Hash Algorithm 3)
• BLAKE2
• MD5 (although it is considered less secure due to vulnerabilities)

Note that the choice of a hash function depends on the specific requirements of the application, and different hash functions may be suitable for different use cases.

## Hash Table Applications in Data Structure

1. Dictionaries: Hash tables are commonly used to implement dictionary data structures, where words or terms are associated with their definitions or values. Hashing allows for quick lookup and retrieval of definitions based on the provided key.
2. Caching: Hash tables are employed in caching systems to store frequently accessed data. By using a hash table, the most recently used or important items can be stored, and retrieval can be performed in constant time, improving overall system performance.
3. Symbol tables: In compilers and interpreters, hash tables are used to implement symbol tables. Symbol tables store information about variables, functions, classes, and other symbols used in a program. Hashing enables efficient lookup and manipulation of symbols during compilation or interpretation.
4. Database indexing: Hash tables are used in database systems to implement indexing structures. Hash-based indexing allows for fast data retrieval based on a specific key or attribute, speeding up query processing in databases.
5. Set membership: Hash tables are utilized to implement sets, where membership operations like adding, removing, and checking for the presence of an element are performed efficiently. Hashing allows for constant-time lookup and insertion operations, making sets suitable for tasks such as duplicate removal and membership testing.
6. Caching of computed values: Hash tables are employed to store pre-computed results or expensive computations. By caching the results based on the input parameters, subsequent requests can be satisfied instantly, avoiding redundant calculations.
7. Spell checkers: Hash tables are used in spell-checking algorithms to store a dictionary of valid words. By hashing the words, the algorithm can quickly check if a given word is present in the dictionary, helping identify potential misspellings.
8. Cryptographic data structures: Hash tables are employed in cryptographic systems for tasks like password storage and digital signatures. Hash functions convert sensitive data into fixed-length hashes, and hash tables provide an efficient way to store and retrieve hashed values securely.

## FAQs

### 1. Why do we use a hash table in data structure?

Data structures use hash tables to quickly access values based on each value's distinct hash code. Hash tables are used to store and retrieve data efficiently using a key.

### 2. What are the types of hashing in data structure?

Open addressing (closed hashing) as well as separate chaining (open hashing) are the two primary types of hashing used in data structures.

### 3. What is the advantage of hashing?

The main benefit of hashing is that it is effective for data access because of its constant-time average complexity for insertion, retrieval, and deletion operations.

### 4. What is the formula for hashing?

Applying a hash function to a key results in a distinct index or hash code that links the key to a particular location in the hash table. This is the formula for hashing.

### 5. What is the concept of a hash table?

An effective way to retrieve and manipulate data based on its keys is to store key-value pairs in a hash table, which uses a hash function to determine the index where each pair should be kept.

##### Summary

In conclusion, the hash table in the data structure is a powerful and versatile algorithm that can be used to store and easily access data. It is a comprehensive search mechanism that allows for the efficient retrieval of values without wasting computing resources. By utilizing hash tables, organizations can quickly index large datasets and accomplish their tasks more efficiently. Additionally, it helps in categorizing the elements into appropriate buckets dependent on some calculated value before performing any search.

Similar Articles 