String Hashing using the Polynomial Rolling Hash Function

Introduction:

String matching calculations have essentially affected the field of software engineering, assuming a fundamental part in tackling reasonable issues across different spaces. Their proficiency is especially clear in undertakings that include looking for a particular string inside another. String matching methods find applications in different regions for example Database schema design and Network systems. These calculations add to optimizing the presentation of tasks, demonstrating their versatility and relevance in addressing real-world challenges. Problem with string matching:

Time Complexity: Given any 2 strings s1 and s2 of equal length (let's say n). The time complexity to compare (s1==s2) the two strings is O(n).

Hash Function:

A hash function serves as a tool that transforms data of varying sizes into consistent, fixed-size values. These resulting values are commonly referred to as hash values. The primary purpose of a hash function is to generate a unique identifier for a given set of data, providing a concise representation regardless of the original data's size or complexity.

Solution using Hashing:

Time Complexity: Given 2 Strings s1 and s2 of equal length (Let's say n). Now time complexity to compare (s1==s2) the 2 strings is O(1)(ideal case) using hash comparison.

String Hashing:

String -> Hash function -> Hash value/Key

The hash function above mentioned will take the string as its input and produce a unique value known as the hash value or key.

String Hashing using the Polynomial Rolling Hash Function

Example:

Let's consider we have given strings s1, s2, and s3 as our input to the hash value and thus generated values 109469, 236853, and 945739 respectively.

Now to compare the strings instead of comparing them directly (which will take O(max([s1],[s2]))) we simply compare their hash values which is O(1).

Important points:

  1. The same strings must have the same hash value.
  2. Same hash values mean strings may be the same.

Two unique strings might have a similar hash value. At the point when two unique strings have a similar hash value, it is known as collision.

Polynomial Rolling Hash Function:

We want to compare strings efficiently. The idea is simple, convert strings into integers (hash value) and compare them.

To convert them into integers, we will use polynomial rolling hash as a hash function. The hash value of similar strings ought to be similar.

The polynomial moving hash function is a hash capability that utilizes just increases and increments. The following is the function.

String Hashing using the Polynomial Rolling Hash Function

Here p >= size of the character set.

P is any prime number.

For instance, hash ("abc") = 1+2.51+3.52=90

In this a is mapped to 1, b is mapped to 2 and so on and we could see p = 5 which is a prime number.

Why should we use Modulo?

Since the hash function is polynomial, so hash values increase exponentially

Integer: 10 characters

Long Long int: 20 characters

As p: 11

Why p should be greater than |character set|?

It ought to be more than the length of the charset to decrease collisions. In the event that we take lesser values, there are more chances for collisions.

Simple code implementation for polynomial rolling hash function:

Output:

String Hashing using the Polynomial Rolling Hash Function

Collisions in Polynomial Rolling Hash Function and its resolution:

The Hash function, which outputs an integer in the range [0, m), can lead to collisions, where different strings produce the same hash value. For instance, when using p = 37 and m = 10^9 + 9, the strings "answers" and "stead" result in the same hash value. Achieving a perfect one-to-one mapping is challenging within the given range of [0, m).

While a larger m reduces the chances of collisions, it also slows down the algorithm. Practical constraints, such as integer size limits in languages like C, C++, and Java, restrict the increase of m beyond certain limits.To mitigate collision probabilities, a strategy involves generating a pair of hashes for a given string using different parameter pairs (p, m). This approach doesn't eliminate collisions entirely but significantly reduces their probability.

Conclusion:

Hash String technique, employing the Polynomial Rolling Hash Function, transforms strings into integers for efficient comparisons. This function relies on multiplications and additions for simplicity and effectiveness. The choice of prime numbers as parameters has a significant impact on hash values. The modulo operation is crucial for maintaining the exponential growth in hash values. When used with well-chosen parameters and collision-resolving strategies, the Polynomial Rolling Hash Function enhances the efficiency and reliability of string-matching algorithms, making them valuable tools in various computational applications.






Latest Courses