Open and Closed Hashing in Java

In this article, we are going to learn about Open Hashing and Closed Hashing in the Java programming language. By the end of the article, we will cover different parts of the topic, such as why these techniques are used in the Java programming language, what are the advantages and disadvantages of using these techniques and the differences between Open Hashing and Closed Hashing.

Collision Handling Techniques:

If two pieces of data share the same value in a hash table, then it is known as a collision in hashing. As the key of hash function is a small number for a key which is either a big integer or a string, then probably the two keys result in the same value. So, a collision is also known as a situation a where the newly inserted keys maps with the location of an already occupied position in the hash table. These collisions are handled using different techniques known as collision handling techniques. These techniques are also known as Collision Resolution techniques.

The techniques that are used to handle or resolve the collisions are basically classified into two types. They are:

  • Open Hashing ( or ) Separate Chaining
  • Closed Hashing ( or ) Open Addressing

Open Hashing:

The first Collision Resolution or Handling technique, " Open Hashing ", is popularly known as Separate Chaining. This is a technique which is used to implement an array as a linked list known as a chain. It is one of the most used techniques by programmers to handle collisions. Basically, a linked list data structure is used to implement the Separate Chaining technique. When a number of elements are hashed into the index of a single slot, then they are inserted into a singly-linked list. This singly-linked list is the linked list which we refer to as a chain in the Open Hashing technique.

We can make use of a key " K " to search the chain by traversing linearly. If the key K and intrinsic key for any entry in the singly linked list are found to be equal, then it means that we have found our entry. But in a case where we traverse the singly linked list and reach the end without finding our entry, then it means that the entry we are trying to find does not exist.

So, when there are two keys that are fighting for the same key position, then the same key position will be allotted for both keys or records. The key position is furtherly extended after placing one key with a linked list. In the end, the hash table will contain a chain where the collision has happened. That is the main reason for calling this technique as " Chaining technique ".

Advantages of Open Hashing:

  1. The Separate chaining method is simple to implement and understand.
  2. The hash table never ends, so we can add new elements always.
  3. Open Hashing is less sensitive to the load factors or hash function.
  4. It can be implemented when we do not know how frequently keys will be inserted or deleted.

Disadvantages of Open Hashing:

  1. The cache performance of the Separate Chaining method is poor as the keys are stored using a singly linked list.
  2. A lot of storage space is wasted as some parts of the hash table are never used.
  3. In the worst case, the search time can become " O ( n ) ". This happens only if the chain becomes too lengthy.
  4. Extra storage is used for storing links of the chain.

Data Structures used in Open Hashing:

  1. ArrayList
    • Search : O ( l ), where l is the length of the array.
    • Delete : O ( l )
    • Insert : O ( l )
    • It is Cache friendly
  2. Linked List
    • Search : O ( l ), where l is the length of the Linked List.
    • Delete : O ( l )
    • Insert : O ( l )
    • It is not Cache friendly.
  3. Self-Balancing Binary Search Tree
    • Search : O ( log ( l )), where l is the length of the linked list.
    • Delete : O ( log ( l ))
    • Insert : O ( l )
    • It is not Cache friendly.
    • It is used HashMap since Java 8.

A Program to implement Open Hashing:

Output:

3
4
null
2
false

There are different functions embedded within the Open hashing which are used in the implementation of Separate Chaining in the above program. Let us see what these functions are and how they are useful in the process of Open Hashing.

Functions Used in Open Hashing:

  1. get(K key): If the key is present in HT, get(K key) gives the value corresponding to the key (Hast Table).
  2. getSize(): It will return the size of the hash table.
  3. add(): Changes an existing valid key-value pair in the HT if it already exists before adding a new one.
  4. remove(): The remove function will remove both the and the value pair.
  5. isEmpty(): it returns the value " True ", if the size is zero.

Now, we will see the detailed implementation of these functions to get a better understanding and picturization of the program.

Implementation of the Functions:

  1. get(): The function " get() " simply accepts a key as input and, if the key is available in the table, returns the matching value; otherwise, it returns null. The steps in the implementation of the get() function are:
    • To locate the index in the HT, get the input key.
    • If you fully traverse the list without returning, it signifies the value is not there in the table and cannot be fetched, so return null. Traverse the linked list corresponding to the HT.
  2. remove():
    • Use the helper function to retrieve the index that corresponds to the input key.
    • Similar to how the get() traverses a linked list, this method requires removing the key in addition to finding it, which creates two situations.
    • If the key that has to be deleted is at the head of the linked list.
    • If the key that has to be removed is not at the head but rather somewhere else.
  3. add(): The function " add() " is the most fascinating and difficult part of the entire execution. It is intriguing because when the load factor exceeds the figure we set, we must dynamically expand the size of our list.
    • Similar to adding and subtracting steps until traversal, two scenarios (addition at the head spot or non-head spot) do not change.
    • If the value of the load factor is more than 0.7 at the conclusion, We increase the array list's size by two and then recursively use the add() method on the already-existing keys because the array's size is used to compress the internal JVM hash code in our scenario, and we must obtain new indices for the keys that are already present.

Note: If " n " is the total number of cells in the chain that we initially intended to fill, let's say 10, and let's also say that 7 of those cells are now filled, then the load factor is 7/10, i.e. 0.7.

So, this is the Open Hashing technique used in resolving a collision in a hash table. Now, let us see and understand the Closed Hashing technique.

Closed Hashing:

The second most Collision resolution technique, Closed Hashing, is a way of dealing with collisions, similar to the Separate Chaining process. In Open Addressing, the hash table alone stores all of its elements. The size of the table should always be greater than or equal to the total number of keys at all times ( we can also increase the size of the table by copying the old data that is already existing whenever it is needed ). This mechanism is referred to as Closed Hashing. The formation and consideration of the whole process is probing.

Functions Used in Closed Hashing:

  1. Insert( k ): Up till a space is left unfilled, keep probing. Place the key " k " in the first empty slot you find.
  2. Search( k ): Probe each slot until the key is not equal to k or until an empty slot is found.
  3. Delete( k ): It's interesting to delete something. The search may fail when we just remove a key and then perform search operation. The slots that are a part of deleted key slots are considered as "deleted."

Several techniques to perform Implementation of Closed Hashing:

1. Linear Probing: In linear probing, the hash table undergoes clear and neat examination, starting from the hash's initial or beginning point. If the slot that is obtained after the calculation is already occupied, then we should look for a different one. The function that is responsible for performing rehashing is " key = rehash(n+1)%table-size ". The space between the two probes or positions is generally 1.

Let us see Linear Probing for a slot index " hash(a) ", which is computed using a hash function. It is one of the best techniques which has the best cache performance.

Difficulties faced with Linear Probing:

  • Primary Clustering: Primary clustering is one of the major issues that are caused with the linear probing technique. Many elements that are consecutive to each other generally form clusters or a group of scattering, which in turn makes the hash table more difficult to find an empty slot or search for an element.
  • Secondary Clustering: Secondary clustering is not as severe as primary clustering, and the elements or records that must be placed within the same location are only allowed to share a collision chain which is also known as a probe sequence, if they begin at the same location.
  • Clustering is the only problem in Linear probing. If clustering can be reduced within this mechanism, then this can be considered one of the best Collision resolution techniques.

2. Quadratic Probing: In Quadratic Probing, the intervals between the key positions is increased when compared to linear probing as the hash function is mostly different. The issue that is occurred due to the clustering in the above technique can be easily solved by using the quadratic probing technique. This technique is also known as mid-square method. When the iteration that is currently running is " i ", then the i^2th position is considered as the key position for that respective key. Other slots of positions are checked only when the key position that we are trying for is already occupied. This is the most efficient and effective method for a hash table which possesses closed properties. It has an average performance of cache and a subtle problem with clustering.

Difficulties faced with Quadratic Probing:

It deals with secondary clustering, and sometimes, two keys have same prob sequence whenever they possess the same key position.

3. Double hashing

In this resolution technique, another hash function is used, which is created especially for the Double hashing mechanism. In this technique, the clustering that is formed between the keys is handled efficiently and is further reduced. The increment of the key positions is made out of the function that will be used in this mechanism. With that function, the key positions are calculated with their respective keys and are placed in the positions accordingly. The function is then multiplied with the variable " i ", and then the modulo operation is performed.

Difficulties in Double hashing:

Compared to other techniques, double hashing possesses poor cache performance but does not have any clustering issues. The time required for the completion of the entire process is more as there are two hash functions that are supposed to be performed. So, this causes poor cache performance. Other than this, there is no problem with Double hashing.

A Program to implement Closed Hashing:

Output:

The key 246 is not found in the hash table. It has done 20 step sizes.
The key 246 is not removed as the element does not exist in a hash table.

Overview of Open hashing and Closed hashing:

Open hashing is mostly used in order to avoid complexity within the implementation and complete the work in an easy way, whereas Closed hashing deals with more complexity and computation. Open hashing has no outline or boundary of elements or keys or records that can be inserted within. Unlimitedly, the records can be entered when Open hashing is used. Closed hashing is bound with some limited records in which some records may not have enough key positions left for their entry. In Open hashing, extra storage or space is created irrespective of the table size. This storage is used in situations where a collision occurs. This type of mechanism is not performed in Closed hashing as the entire table is the only source for the records to be placed, and no external storage is added.






Latest Courses