Introduction to HashingAssume we want to create a system for storing employee records that include phone numbers (as keys). We also want the following queries to run quickly:
We can consider using the following data structures to store information about various phone numbers.
We must search in a linear fashion for arrays and linked lists, which can be costly in practise. If we use arrays and keep the data sorted, we can use Binary Search to find a phone number in O(Logn) time, but insert and delete operations become expensive because we must keep the data sorted. We get moderate search, insert, and delete times with a balanced binary search tree. All of these operations will be completed in O(Logn) time. The term "access-list" refers to a set of rules for controlling network traffic and reducing network attacks. ACLs are used to filter network traffic based on a set of rules defined for incoming or outgoing traffic. Another option is to use a direct access table, in which we create a large array and use phone numbers as indexes. If the phone number is not present, the array entry is NIL; otherwise, the array entry stores a pointer to the records corresponding to the phone number. In terms of time complexity, this solution is the best of the bunch; we can perform all operations in O(1) time. To insert a phone number, for example, we create a record with the phone number's details, use the phone number as an index, and store the pointer to the newly created record in the table. This solution has a number of practical drawbacks. The first issue with this solution is the amount of extra space required. For example, if a phone number has n digits, we require O(m * 10n) table space, where m is the size of a pointer to record. Another issue is that an integer in a programming language cannot hold n digits. Because of the limitations mentioned above, Direct Access Table cannot always be used. In practise, Hashing is the solution that can be used in almost all such situations and outperforms the above data structures such as Array, Linked List, and Balanced BST. We get O(1) search time on average (under reasonable assumptions) and O(n) in the worst case with hashing. Let's break down what hashing is. What exactly do you mean by hashing?Hashing is a popular technique for quickly storing and retrieving data. The primary reason for using hashing is that it produces optimal results by performing optimal searches. Why should you use Hashing?If we try to search, insert, or delete any element in a balanced binary search tree, the time complexity for the same is O. (logn). Now, there may be times when our applications need to perform the same operations in a faster, more optimised manner, and this is where hashing comes into play. All of the above operations in hashing can be completed in O(1), or constant time. It is critical to understand that hashing's worst-case time complexity remains O(n), but its average time complexity is O. (1). Let us now look at some fundamental hashing operations. Fundamental Operations:
Describe the hash function.
Hash Table: What is it?
Components of Hashing:
The following characteristics a decent hash function ought to have:
A poor hash function for phone numbers, for instance, would be to use the first three digits. Consideration of the last three numbers is a better function. Please be aware that this hash function might not be the best. There could be better options.
Linear ProbingIn data structures, hashing produces array indexes that are already used to store a value. In this situation, hashing does a search operation and linearly probes for the subsequent empty cell. The simplest method for handling collisions in hash tables is known as linear probing in hash algorithms. Any collision that occurred can be located using a sequential search. Hashing twiceTwo hash functions are used in the double hashing method. When the first hash function results in a collision, the second hash function is used. In order to store the value, it offers an offset index. The double hashing method's formula is as follows: (firstHash(key) + i * secondHash(key)) % sizeOfTable The offset value is represented by i. The offset value is continuously increased until it encounters an empty slot. Next TopicSeparate chaining for Collision Handling |