Boyer Moore Algorithm for Pattern Searching in C++

Pattern recognition is an important problem in the field of computer science. Pattern searching methods display the search results when we search for a string in a notepad/word document, browser, or database. The following is an example of a problem statement:

Given a string text[0..n-1] and a pattern[0..m-1], write a function search(char pattern[], char text[]) that outputs all instances of the pattern[] in text[]. You can assume that n > m. Before we get into how to search for or identify occurrences of a given pattern in a string, let's first go over what strings are.

A String represents an immutable data type that holds character sequences. Strings are the most often used data type in any computer language. A string is easily formed using quotes (either single or double quotes).

We can now apply the Boyer Moore pattern searching method (or B-M algorithm) to find the pattern within the string. The Boyer-Moore algorithm's concept is simple: two pointers are aligned at the 0th position of the text string and the corresponding character string. If a character differs from another, the Boyer-Moore algorithm moves the characters simultaneously in two ways. So, we may say that the Boyer-Moore algorithm is a hybrid of two techniques:

2. Good Suffix Heuristic.

Method 1: The Bad Character Heuristic

If a match is found, the pattern's starting index is returned. Otherwise, there are two possibilities:

When the Pattern contains a mismatched character from the input text

In such instances, the character is referred to as a bad character. When a faulty character is identified, we shall move the pattern until it matches the mismatched text characters.

Consider the situation where the pattern string and the input text do not match directly. Let's say the pattern string is "WORLD" and the input text is "HELLOHELLO". We will use the Boyer-Moore string searching technique to locate a pattern that matches the input text.

Example:

Pattern: WORLD

String: HELLOHELLO

Now, let us use the Boyer-Moore technique to compare the pattern to the input text:

Step 1: Examine the final two characters, D and O. They are incompatible.

In this example, there is no mismatched character in the pattern within the input text. Using a bad character rule, the computer changes the pattern to correspond with the mismatched character.

After the Shift:

Pattern: WORLD

Input Text: HELLOHELLO

Step 2: Match the final two characters, D and L. They are not compatible.

Change the pattern by length and run the bad character condition again.

After the Shift:

Pattern: WORLD

Input Text: HELLOHELLO

This process is repeated until the pattern is tested against the whole input text and no match is found, indicating the identified pattern cannot be found. In this example, the pattern "WORLD" does not correspond to the input string "HELLOHELLO".

Filename: PattrernMatch.cpp

Output:

```The pattern present at the position = 5
```

Method 2: Good Suffix Heuristic.

Let's look at an input string, "HelloWorld", with the pattern as "WORLD" To demonstrate how the Boyer-Moore method uses the Good Suffix Heuristic.

Pattern: WORLD

Input : HELLOWORLD

Now, the Good Suffix Heuristic method will be used to compare the pattern with the input text:

• Compare the final two characters, D and D.
• Compare the following two characters, L and L, by moving to the left.
• Keep going until all the characters in the pattern have been matched.
• Finally, we have found a match beginning at index 5 in the input string "HELLOWORLD" because every character in the pattern corresponds to a character in the text.

Filename: PatternMatch2.cpp

Output:

```The pattern found at position = 0
The pattern found at position = 10
```