Open and Closed Hashing in Java
In this article, we are going to learn about Open Hashing and Closed Hashing in the Java programming language. By the end of the article, we will cover different parts of the topic, such as why these techniques are used in the Java programming language, what are the advantages and disadvantages of using these techniques and the differences between Open Hashing and Closed Hashing.
Collision Handling Techniques:
If two pieces of data share the same value in a hash table, then it is known as a collision in hashing. As the key of hash function is a small number for a key which is either a big integer or a string, then probably the two keys result in the same value. So, a collision is also known as a situation a where the newly inserted keys maps with the location of an already occupied position in the hash table. These collisions are handled using different techniques known as collision handling techniques. These techniques are also known as Collision Resolution techniques.
The techniques that are used to handle or resolve the collisions are basically classified into two types. They are:
The first Collision Resolution or Handling technique, " Open Hashing ", is popularly known as Separate Chaining. This is a technique which is used to implement an array as a linked list known as a chain. It is one of the most used techniques by programmers to handle collisions. Basically, a linked list data structure is used to implement the Separate Chaining technique. When a number of elements are hashed into the index of a single slot, then they are inserted into a singly-linked list. This singly-linked list is the linked list which we refer to as a chain in the Open Hashing technique.
We can make use of a key " K " to search the chain by traversing linearly. If the key K and intrinsic key for any entry in the singly linked list are found to be equal, then it means that we have found our entry. But in a case where we traverse the singly linked list and reach the end without finding our entry, then it means that the entry we are trying to find does not exist.
So, when there are two keys that are fighting for the same key position, then the same key position will be allotted for both keys or records. The key position is furtherly extended after placing one key with a linked list. In the end, the hash table will contain a chain where the collision has happened. That is the main reason for calling this technique as " Chaining technique ".
Advantages of Open Hashing:
Disadvantages of Open Hashing:
Data Structures used in Open Hashing:
A Program to implement Open Hashing:
3 4 null 2 false
There are different functions embedded within the Open hashing which are used in the implementation of Separate Chaining in the above program. Let us see what these functions are and how they are useful in the process of Open Hashing.
Functions Used in Open Hashing:
Now, we will see the detailed implementation of these functions to get a better understanding and picturization of the program.
Implementation of the Functions:
Note: If " n " is the total number of cells in the chain that we initially intended to fill, let's say 10, and let's also say that 7 of those cells are now filled, then the load factor is 7/10, i.e. 0.7.
So, this is the Open Hashing technique used in resolving a collision in a hash table. Now, let us see and understand the Closed Hashing technique.
The second most Collision resolution technique, Closed Hashing, is a way of dealing with collisions, similar to the Separate Chaining process. In Open Addressing, the hash table alone stores all of its elements. The size of the table should always be greater than or equal to the total number of keys at all times ( we can also increase the size of the table by copying the old data that is already existing whenever it is needed ). This mechanism is referred to as Closed Hashing. The formation and consideration of the whole process is probing.
Functions Used in Closed Hashing:
Several techniques to perform Implementation of Closed Hashing:
1. Linear Probing: In linear probing, the hash table undergoes clear and neat examination, starting from the hash's initial or beginning point. If the slot that is obtained after the calculation is already occupied, then we should look for a different one. The function that is responsible for performing rehashing is " key = rehash(n+1)%table-size ". The space between the two probes or positions is generally 1.
Let us see Linear Probing for a slot index " hash(a) ", which is computed using a hash function. It is one of the best techniques which has the best cache performance.
Difficulties faced with Linear Probing:
2. Quadratic Probing: In Quadratic Probing, the intervals between the key positions is increased when compared to linear probing as the hash function is mostly different. The issue that is occurred due to the clustering in the above technique can be easily solved by using the quadratic probing technique. This technique is also known as mid-square method. When the iteration that is currently running is " i ", then the i^2th position is considered as the key position for that respective key. Other slots of positions are checked only when the key position that we are trying for is already occupied. This is the most efficient and effective method for a hash table which possesses closed properties. It has an average performance of cache and a subtle problem with clustering.
Difficulties faced with Quadratic Probing:
It deals with secondary clustering, and sometimes, two keys have same prob sequence whenever they possess the same key position.
3. Double hashing
In this resolution technique, another hash function is used, which is created especially for the Double hashing mechanism. In this technique, the clustering that is formed between the keys is handled efficiently and is further reduced. The increment of the key positions is made out of the function that will be used in this mechanism. With that function, the key positions are calculated with their respective keys and are placed in the positions accordingly. The function is then multiplied with the variable " i ", and then the modulo operation is performed.
Difficulties in Double hashing:
Compared to other techniques, double hashing possesses poor cache performance but does not have any clustering issues. The time required for the completion of the entire process is more as there are two hash functions that are supposed to be performed. So, this causes poor cache performance. Other than this, there is no problem with Double hashing.
A Program to implement Closed Hashing:
The key 246 is not found in the hash table. It has done 20 step sizes. The key 246 is not removed as the element does not exist in a hash table.
Overview of Open hashing and Closed hashing:
Open hashing is mostly used in order to avoid complexity within the implementation and complete the work in an easy way, whereas Closed hashing deals with more complexity and computation. Open hashing has no outline or boundary of elements or keys or records that can be inserted within. Unlimitedly, the records can be entered when Open hashing is used. Closed hashing is bound with some limited records in which some records may not have enough key positions left for their entry. In Open hashing, extra storage or space is created irrespective of the table size. This storage is used in situations where a collision occurs. This type of mechanism is not performed in Closed hashing as the entire table is the only source for the records to be placed, and no external storage is added.
Next TopicDAO Class in Java