Separate chaining for Collision Handling

Describe collision.

Two keys could possibly provide the same value since a hash function returns a little number for a key that is a large integer or string. A collision handling mechanism must be used to deal with the circumstance when a newly added key maps to a hash table slot that is already occupied.

How often are collisions with the big table?

Even if we have a large table to put the keys on, collisions are still highly frequent. Birthday Paradox is a crucial finding. With only 23 people, there is a 50% chance that two people will share the same birthday.

Dealing with collisions

There are primarily two ways to deal with collision:

  • Separate Chaining
  • Open address.

Only independent chaining is mentioned in this article. The following post will cover Open addressing.

Separate Chaining:

With separate chaining, the array is implemented as a chain, which is a linked list. One of the most popular and often employed methods for handling accidents is separate chaining.

This method is implemented using the linked list data structure. As a result, when numerous elements are hashed into the same slot index, those elements are added to a chain, which is a singly-linked list. Here, a linked list is created out of all the entries that hash into the same slot index. Now, using merely linear traversal, we can search the linked list with a key K. If the intrinsic key for any entry equals K, then we have identified our entry. The entry does not exist if we have searched all the way to the end of the linked list and still cannot find it. In separate chaining, we therefore get to the conclusion that if two different entries have the same hash value, we store them both in the same linked list one after the other.

Let's use "key mod 7" as our simple hash function with the following key values: 50, 700, 76, 85, 92, 73, 101.

Separate chaining for Collision Handling

Advantages:

  • easy to implement
  • We can always add more elements to the chain, thus the hash table never runs out of space.
  • less susceptible to load factors or the hash function.
  • When it is unclear how many or how frequently keys might be added or removed, it is typically used.

Disadvantages:

  • Chaining's cache performance is poor since keys are kept in a linked list. Since everything is stored in the same table, open addressing improves cache speed.
  • Space wastage (Some Parts of hash table are never used)
  • In the worst situation, search time can become O(n) as the chain lengthens.
  • additional space is used for connections.

Performance of Chaining:

Under the premise that each key has an equal likelihood of being hashed to any table slot, the performance of hashing may be assessed (simple uniform hashing).

Data Structures For Storing Chains:

  • Linked lists
    • Search: O(l) where l = length of linked list
    • Delete: O(l)
    • Insert: O(l)
    • Not cache friendly
  • Dynamic Sized Arrays ( Vectors in C++, ArrayList in Java, list in Python)
    • Search: O(l) where l = length of array
    • Delete: O(l)
    • Insert: O(l)
    • Cache friendly
  • Self Balancing BST ( AVL Trees, Red Black Trees)
    • Search: O(log(l))
    • Delete: O(log(l))
    • Insert: O(l)
    • Not cache friendly
    • Java 8 onwards use this for HashMap

Summary

  • To deal with the collision, the Separate Chaining technique combines a linked list with a hash table.
  • To solve the problem, this solution makes advantage of more RAM.
  • The hash table's search and deletion operations both take an O(n) amount of time, where n is the number of keys that can haveh to the same space.
  • When we want to add extra elements to the current hash table or rehash the previous hash function, we employ the load factor.