Pattern Matching Algorithm in C

Pattern Matching is widely used in computer science and many other fields. Pattern Matching algorithms are used to search for patterns within a larger text or data set. One of the most popular algorithms for pattern matching is the Boyer-Moore algorithm, which was first published in 1977. In this article, we will discuss Pattern Matching algorithms in C and how they work.

What is a Pattern Matching Algorithm?

Pattern Matching algorithms are used to find patterns within a bigger set of data or text. These algorithms work by comparing a pattern with a larger data set or text and determining whether or not the pattern is present. Pattern Matching algorithms are important because they allow us to search for patterns in large data sets quickly.

Brute Force Pattern Matching Algorithm:

Brute Force Pattern Matching is the simplest Pattern Matching Algorithm. It involves comparing the characters of the pattern with the characters of the text one by one. If all the characters match, the algorithm returns the starting position of the pattern in the text. If not, the algorithm moves to the next position in the text and repeats the comparison until a match is found or the end of the text is reached. The Time Complexity of the Brute Force Algorithm is O(MXN), where M denotes the length of the text and N denotes the length of the pattern.

Naive Pattern Matching Algorithm:

The Naive Pattern Matching algorithm is an improvement over the Brute Force algorithm. It avoids unnecessary comparisons by skipping some positions in the text. The algorithm starts comparing the pattern with the text at the first position. If the characters match, it moves to the next position and repeats the comparison. If the characters do not match, the algorithm moves to the next position in the text and compares the pattern with the text again. The time complexity of the Naive algorithm is also O(MXN), but it is faster than the Brute Force algorithm in most cases.

Knuth-Morris-Pratt Algorithm:

The Knuth-Morris-Pratt (KMP) algorithm is a more advanced Pattern Matching algorithm. It is based on the observation that when a mismatch occurs, some information about the text and the pattern can be used to avoid unnecessary comparisons. The algorithm precomputes a table that contains information about the pattern. The table determines how many characters of the pattern can be skipped when a mismatch occurs. The Time Complexity of the KMP algorithm is O(M+N).

The Boyer-Moore Algorithm:

One of the most popular Pattern Matching algorithms is the Boyer-Moore algorithm. This algorithm was first published in 1977 by Robert S. Boyer and J Strother Moore. The Boyer-Moore algorithm compares a pattern with a larger set of data or text from right to left instead of left to right, as with most other pattern matching algorithms.

The Boyer-Moore algorithm has two main components: the bad character rule and the good suffix rule. The bad character rule works by comparing the character in the pattern with the corresponding character in the data or text. If the characters do not match, the algorithm moves the pattern to the right until it finds a character that matches. The good suffix rule compares the suffix of the pattern with the corresponding suffix of the data or text. If the suffixes do not match, the algorithm moves the pattern to the right until it finds a matching suffix.

The Boyer-Moore algorithm is known for its efficiency and is widely used in many applications. It is considered one of the fastest pattern matching algorithms available.

Implementing the Boyer-Moore algorithm in C:

To implement the Boyer-Moore algorithm in C, we can start by defining the bad character rule. We can use an array to store the last occurrence of each character in the pattern. This array can determine how far we must move the pattern to the right when a mismatch occurs.

Here is an example of how we can implement the bad character rule in C:

C code:

void bad_character_rule(char *pattern, int pattern_length, int *bad_char)
{
   int i;
   for (i = 0; i < NO_OF_CHARS; i++)
      bad_char[i] = -1;
   for (i = 0; i < pattern_length; i++)
      bad_char[(int) pattern[i]] = i;
}

In this example, we first initialize the array to -1 for all characters. We then iterate through the pattern and update the array with the last occurrence of each character in the pattern.

Next, we can implement the good suffix rule. We can use an array to store the length of the longest suffix of the pattern that matches a suffix of the data or text. This array can be used to determine how far we need to move the pattern to the right.

Next TopicAdaline Program in C

← prev next →