Hash Table in Data Structures

Hash Table in Data Structures

04 Mar 2024
Advanced
1.55K Views
25 min read
Learn via Video Course & by Doing Hands-on Labs

Data Structures & Algorithms Free Course

Hash Table in Data Structures: An Overview

In the previous tutorial, we saw what is hashing and how it works. We saw that a hash table is a data structure that stores data in an array format. The table maps keys to values using a hash function. In this DSA tutorial, we'll explore the hash table in a little detail like its working, implementation, types, etc.

To further enhance your understanding and application of hashing concepts, consider enrolling in the Dsa Course Online, to gain comprehensive insights into effective data structure utilization for improved problem-solving and time management.

What is a Hash Table in Data Structures?

A Hash table is a data structure used to insert, look up, and remove key-value pairs quickly. It implements an associative array. Here, each key is translated by a hash function into a distinct index in an array. The index functions as a storage location for the matching value.

How Does the Hash Table Work?

  1. Initialize the Hash Table: Start with an empty array of a fixed size (let's say 10). Each index in the array represents a bucket in the hash table.
  2. [empty, empty, empty, empty, empty, empty, empty, empty, empty, empty]
  3. Insertion: When inserting a key-value pair into the hash table, the hash function is used to calculate the index where the pair should be stored. The hash function takes the key as input and returns an integer value.

    Let's say we want to insert the key "apple" with the value "fruit" into the hash table. The hash function calculates the index as follows: hash("apple") = 5 Since the hash value is 5, we store the key-value pair in index 5 of the array: [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]

  4. Retrieval: To retrieve a value associated with a given key, we use the same hash function to calculate the index. We then look for the key-value pair in that index.

    For example, if we want to retrieve the value for the key "apple," the hash function calculates the index as 5. We check index 5 in the array and find the key-value pair ("apple", "fruit"): [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]

  5. Collision: Hash tables can encounter collisions when two different keys map to the same index. This can happen if the hash function is not perfectly distributed or if the array size is relatively small compared to the number of keys.

    For example, if we try to insert the key "banana" and the hash function calculates the index as 5, we encounter a collision. We search for the next available slot and find index 6 is empty. So, we store the key-value pair ("banana", "fruit") in index 6: [empty, empty, empty, empty, empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty] [empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty]

Read More - Data Structure Interview Questions and Answers

Hash Table Data Structure Algorithm

  1. Initialize an array of fixed size to serve as the underlying storage for the hash table. The size of the array depends on the expected number of key-value pairs to be stored.
  2. Define a hash function that takes a key as input and generates a unique hash code.
  3. To insert a key-value pair into the hash table, apply the hash function to the key to determine the index. If the calculated index is already occupied, handle collisions using a collision resolution strategy.
  4. To retrieve the value associated with a key, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, return the corresponding value. If the index is empty or the keys don't match, the key is not present in the hash table.
  5. To remove a key-value pair from the hash table, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, remove the key-value pair from the table. If the index is empty or the keys don't match, the key is not present in the hash table

Implementation of Hash Table in Different Programming Languages


  def get(self, key):
    index = self.hashFunction(key)
    bucket = self.table[index]

    for kvp in bucket:
      if kvp[0] == key:
        return kvp[1]

    return "Key not found"

  def remove(self, key):
    index = self.hashFunction(key)
    bucket = self.table[index]

    for kvp in bucket:
      if kvp[0] == key:
        bucket.remove(kvp)
        return

# Main function
if __name__ == "__main__":
  hashTable = HashTable(10)

  # Insert key-value pairs
  hashTable.insert(1, "Value 1")
  hashTable.insert(11, "Value 11")
  hashTable.insert(21, "Value 21")
  hashTable.insert(2, "Value 2")

  # Retrieve values
  print(hashTable.get(1)) # Output: Value 1
  print(hashTable.get(11)) # Output: Value 11
  print(hashTable.get(21)) # Output: Value 21
  print(hashTable.get(2)) # Output: Value 2
  print(hashTable.get(5)) # Output: Key not found

  # Remove a key-value pair
  hashTable.remove(11)

  # Try to retrieve the removed value
  print(hashTable.get(11)) # Output: Key not found

 import java.util.ArrayList;
 import java.util.LinkedList;
 import java.util.List;
 
 // Hash Table class
 class HashTable {
   // Bucket is a list of key-value pairs
   private class KeyValuePair {
     int key;
     String value;
 
     KeyValuePair(int key, String value) {
       this.key = key;
       this.value = value;
     }
   }
 
   private List<LinkedList<KeyValuePair>> table;
   private int size;
 
   // Constructor
   HashTable(int tableSize) {
     size = tableSize;
     table = new ArrayList<>(size);
     for (int i = 0; i < size; i++) {
       table.add(new LinkedList<>());
     }
   }
 
   // Hash function
   private int hashFunction(int key) {
     return key % size;
   }
 
   // Insert a key-value pair into the hash table
   void insert(int key, String value) {
     int index = hashFunction(key);
     LinkedList<KeyValuePair> bucket = table.get(index);
 
     // Check if the key already exists in the bucket
     for (KeyValuePair kvp : bucket) {
       if (kvp.key == key) {
         // Key found, update the value
         kvp.value = value;
         return;
       }
     }
 
     // Key not found, add a new key-value pair
     bucket.add(new KeyValuePair(key, value));
   }
 
   // Retrieve the value associated with a given key
   String get(int key) {
     int index = hashFunction(key);
     LinkedList<KeyValuePair> bucket = table.get(index);
 
     // Search for the key in the bucket
     for (KeyValuePair kvp : bucket) {
       if (kvp.key == key) {
         // Key found, return the value
         return kvp.value;
       }
     }
 
     // Key not found
     return "Key not found";
   }
 
   // Remove a key-value pair from the hash table
   void remove(int key) {
     int index = hashFunction(key);
     LinkedList<KeyValuePair> bucket = table.get(index);
 
     // Search for the key in the bucket
     for (KeyValuePair kvp : bucket) {
       if (kvp.key == key) {
         // Key found, remove the pair
         bucket.remove(kvp);
         return;
       }
     }
   }
 }
 
 // Main function
 class Main {
   public static void main(String[] args) {
     HashTable hashTable = new HashTable(10);
 
     // Insert key-value pairs
     hashTable.insert(1, "Value 1");
     hashTable.insert(11, "Value 11");
     hashTable.insert(21, "Value 21");
     hashTable.insert(2, "Value 2");
 
     // Retrieve values
     System.out.println(hashTable.get(1)); // Output: Value 1
     System.out.println(hashTable.get(11)); // Output: Value 11
     System.out.println(hashTable.get(21)); // Output: Value 21
     System.out.println(hashTable.get(2)); // Output: Value 2
     System.out.println(hashTable.get(5)); // Output: Key not found
 
     // Remove a key-value pair
     hashTable.remove(11);
 
     // Try to retrieve the removed value
     System.out.println(hashTable.get(11)); // Output: Key not found
   }
 }

 #include <iostream>
 #include <vector>
 #include <list>
 
 // Hash Table class
 class HashTable {
 private:
   // Bucket is a list of key-value pairs
   typedef std::pair<int, std::string> KeyValuePair;
   typedef std::list<KeyValuePair> Bucket;
   std::vector<Bucket> table;
   int size;
 
 public:
   // Constructor
   HashTable(int tableSize) : size(tableSize) {
     table.resize(size);
   }
 
   // Hash function
   int hashFunction(int key) {
     return key % size;
   }
 
   // Insert a key-value pair into the hash table
   void insert(int key, const std::string& value) {
     int index = hashFunction(key);
     Bucket& bucket = table[index];
 
     // Check if the key already exists in the bucket
     for (auto& kvp : bucket) {
       if (kvp.first == key) {
         // Key found, update the value
         kvp.second = value;
         return;
       }
     }
 
     // Key not found, add a new key-value pair
     bucket.push_back(std::make_pair(key, value));
   }
 
   // Retrieve the value associated with a given key
   std::string get(int key) {
     int index = hashFunction(key);
     Bucket& bucket = table[index];
 
     // Search for the key in the bucket
     for (const auto& kvp : bucket) {
       if (kvp.first == key) {
         // Key found, return the value
         return kvp.second;
       }
     }
 
     // Key not found
     return "Key not found";
   }
 
   // Remove a key-value pair from the hash table
   void remove(int key) {
     int index = hashFunction(key);
     Bucket& bucket = table[index];
 
     // Search for the key in the bucket
     for (auto iter = bucket.begin(); iter != bucket.end(); ++iter) {
       if (iter->first == key) {
         // Key found, remove the pair
         bucket.erase(iter);
         return;
       }
     }
   }
 };
 
 // Main function
 int main() {
   HashTable hashTable(10);
 
   // Insert key-value pairs
   hashTable.insert(1, "Value 1");
   hashTable.insert(11, "Value 11");
   hashTable.insert(21, "Value 21");
   hashTable.insert(2, "Value 2");
 
   // Retrieve values
   std::cout << hashTable.get(1) << std::endl; // Output: Value 1
   std::cout << hashTable.get(11) << std::endl; // Output: Value 11
   std::cout << hashTable.get(21) << std::endl; // Output: Value 21
   std::cout << hashTable.get(2) << std::endl; // Output: Value 2
   std::cout << hashTable.get(5) << std::endl; // Output: Key not found
 
   // Remove a key-value pair
   hashTable.remove(11);
 
   // Try to retrieve the removed value
   std::cout << hashTable.get(11) << std::endl; // Output: Key not found
 
   return 0;
 }

Output

Value 1
Value 11
Value 21
Value 2
Key not found
Key not found

Characteristics of a Good Hash Function

  1. Deterministic: A hash function should always produce the same hash value for the same input. This property ensures consistency and allows for easy verification and comparison.
  2. Uniform distribution: The hash function should evenly distribute the hash values across the output space. This property helps to minimize collisions and ensures that different inputs have a low probability of producing the same hash value.
  3. Fixed output size: A good hash function has a fixed output size, regardless of the input size. This property enables efficient storage and comparison of hash values.
  4. Fast computation: Hash functions should be computationally efficient to handle large amounts of data quickly. Speed is crucial for applications that involve hashing large datasets or frequently computing hash values.
  5. Avalanche effect: A small change in the input should result in a significant change in the output hash value. This property ensures that even a minor modification in the input leads to a completely different hash value, making it difficult to predict or reverse-engineer the original input.
  6. Resistance to collisions: While it is impossible to eliminate collisions (different inputs mapping to the same hash value), a good hash function minimizes their occurrence. It should be difficult to find two different inputs that produce the same hash value.
  7. Non-invertibility: Given a hash value, it should be computationally infeasible to determine the original input that generated it. This property provides security and helps protect sensitive information.
  8. Well-defined behavior: The hash function should have an unambiguous definition for all possible inputs. It should handle edge cases and corner cases gracefully, without producing unexpected results or errors.
  9. Security properties: For cryptographic applications, a good hash function should exhibit additional security properties such as resistance to preimage attacks (finding an input given its hash value) and second preimage attacks (finding a different input with the same hash value).

Examples of widely-used hash functions that possess these characteristics include:

  • (Secure Hash Algorithm 3)
  • BLAKE2
  • MD5 (although it is considered less secure due to vulnerabilities)

Note that the choice of a hash function depends on the specific requirements of the application, and different hash functions may be suitable for different use cases.

Hash Collision

  • A hash collision refers to a situation where two different inputs produce the same hash value or hash code when processed by a hash function. In other words, it occurs when two distinct pieces of data result in an identical hash output.
  • Hash collisions can have practical implications depending on the context, e.g. in hash tables, collisions can lead to degraded performance if not handled efficiently.
  • Cryptographic hash functions, used in areas like data security and digital signatures, have stricter requirements to prevent collisions. These functions aim to provide a high level of collision resistance, making it computationally infeasible to find two inputs that produce the same hash value.

    Cryptographic hash collisions are of particular concern because they can undermine the integrity and security of cryptographic systems. Researchers and cryptographers continually evaluate and analyze hash functions to ensure their collision resistance and security properties, as vulnerabilities or weaknesses discovered in hash functions could have significant implications for various applications that rely on them.

Types of Hash Collision

There are primarily two types of hash collisions: accidental or random collisions and intentional or malicious collisions. Let's explore each type in more detail:
  1. Accidental Collisions: Accidental collisions occur when two different inputs produce the same hash value due to the nature of hashing algorithms and the limited range of possible hash values. These collisions are unintended and usually considered rare and coincidental. The probability of accidental collisions depends on the quality of the hashing algorithm and the size of the hash space. Good hashing algorithms strive to minimize the chances of accidental collisions by distributing hash values uniformly across the output space.
  2. Intentional Collisions: Intentional collisions occur when an attacker purposefully generates two or more different inputs that produce the same hash value. These collisions are often the result of exploiting vulnerabilities or weaknesses in the hashing algorithm. Intentional collisions can be a significant security concern in various scenarios, such as digital signatures, certificate authorities, password hashing, and data integrity checks.

    There are two main categories of intentional collisions:

    1. Preimage Attacks: In a preimage attack, an attacker tries to find any input that matches a given hash value. This type of attack is aimed at breaking the one-way property of a hash function, which means that it is computationally infeasible to determine the original input from its hash value.
    2. Collision Attacks: In a collision attack, an attacker aims to find two different inputs that produce the same hash value. The goal is to break the collision resistance property of a hash function, which ensures that it is computationally difficult to find any two inputs with the same hash value

Hash Table Applications

  1. Dictionaries: Hash tables are commonly used to implement dictionary data structures, where words or terms are associated with their definitions or values. Hashing allows for quick lookup and retrieval of definitions based on the provided key.
  2. Caching: Hash tables are employed in caching systems to store frequently accessed data. By using a hash table, the most recently used or important items can be stored, and retrieval can be performed in constant time, improving overall system performance.
  3. Symbol tables: In compilers and interpreters, hash tables are used to implement symbol tables. Symbol tables store information about variables, functions, classes, and other symbols used in a program. Hashing enables efficient lookup and manipulation of symbols during compilation or interpretation.
  4. Database indexing: Hash tables are used in database systems to implement indexing structures. Hash-based indexing allows for fast data retrieval based on a specific key or attribute, speeding up query processing in databases.
  5. Set membership: Hash tables are utilized to implement sets, where membership operations like adding, removing, and checking for the presence of an element are performed efficiently. Hashing allows for constant-time lookup and insertion operations, making sets suitable for tasks such as duplicate removal and membership testing.
  6. Caching of computed values: Hash tables are employed to store pre-computed results or expensive computations. By caching the results based on the input parameters, subsequent requests can be satisfied instantly, avoiding redundant calculations.
  7. Spell checkers: Hash tables are used in spell-checking algorithms to store a dictionary of valid words. By hashing the words, the algorithm can quickly check if a given word is present in the dictionary, helping identify potential misspellings.
  8. Cryptographic data structures: Hash tables are employed in cryptographic systems for tasks like password storage and digital signatures. Hash functions convert sensitive data into fixed-length hashes, and hash tables provide an efficient way to store and retrieve hashed values securely.
Summary

The hash table in data structures is a powerful and versatile algorithm that can be used to store and easily access data. It is a comprehensive search mechanism that allows for the efficient retrieval of values without wasting computing resources. By utilizing hash tables, organizations can quickly index large datasets and accomplish their tasks more efficiently. Additionally, it helps in categorizing the elements into appropriate buckets dependent on some calculated value before performing any search.

FAQs

Q1. What is a Hash Table?

A Hash table is a data structure used to insert, look up, and remove key-value pairs quickly.

Q2. What is meant by Hash Collision?

A hash collision refers to a situation where two different inputs produce the same hash value or hash code when processed by a hash function.

Q3. What are the Types of Hash Collision?

There are primarily two types of hash collisions: accidental or random collisions and intentional or malicious collisions.
Share Article
Batches Schedule
About Author
Amit Kumar Ghosh (SDE and Mentor)

A passionate professional with over 6 years of experience in development and training. He is passionate about learning new technologies and sharing his experience with professionals. He is an expert in C/C++, Java, Python and DSA.
Self-paced Membership
  • 22+ Video Courses
  • 750+ Hands-On Labs
  • 300+ Quick Notes
  • 55+ Skill Tests
  • 45+ Interview Q&A Courses
  • 10+ Real-world Projects
  • Career Coaching Sessions
  • Email Support
Upto 60% OFF
Know More
Accept cookies & close this