Longest Repeated Substring

The "Longest Repeated Substring" problem is a well-known computer science challenge determining the longest substring that appears more than once in a given string. That is to say, we must find a substring that occurs a maximum of two times in the original string.

Examples:

Input String: "AABABCA" Longest Repeating Subsequence: "ABA"

Input String: "ABCDABC" Longest Repeating Subsequence: "ABC"

Input String: "Mississippi" Longest Repeating Subsequence: "issi"

How to find the Longest repeated subsequence:

Naïve Approach:

Algorithm:

  • Generate all possible substrings of the given string.
  • Create a suffix array of the string.
  • Sort the suffix array.
  • Compare adjacent suffixes to find the common prefixes (i.e., longest common substrings).
  • Return the longest common substring found.

Example 1:

String = "banana"

Step 1: Produce every substring possible for the given string "banana."

The following are substrings for the word "banana":

"b", "ba", "ban", "bana", "banan", "a", "an", "ana", "anan", "n", "nan", "nana", "a", "an", "ana", "n", "na," "a."

Step 2:

Suffix the text "banana" to an array:

The suffix array is an array of the string's suffix beginning indices arranged in lexicographic order. It will seem as follows:

Suffix Array:

[5, 3, 1, 0, 4, 2]

The corresponding suffixes are: ["a," "and," "banana," "banana," "na," "nana"].

Step 3: Suffix array sort:

The suffix array is unchanged after sorting:

An array of Sorted Suffixes: [5, 3, 1, 0, 4, 2]

Step 4: To identify common prefixes (i.e., the longest common substrings), compare nearby suffixes:

The longest common prefixes (LCP) are now discovered by comparing nearby suffixes in the sorted suffix array. The length of the common prefix between two suffixes, SA[i] and SA[i+1], is the LCP value.

Array LCP: [0, 1, 3, 0, 0]

It is explained that the first suffix ("ana") and the second suffix ("anana") have LCPs of length 1 ("a"), the second suffix ("anana") and the third suffix ("banana") have LCPs of length 3, and all other pairs have LCPs of length 0 (no common prefix).

Step 5: Find the longest common substring and return it:

The LCP array shows that "and" has the longest common substring of 3, according to the LCP array.

This means that "ana" is the longest repeated substring (LRS) in the string "banana."

Code in Java:

Output:

Longest Repeated Substring: ana

To determine the longest repeated substring (LRS) in a given string, the provided Java code uses an algorithm. To do this, it generates every conceivable substring, builds a suffix array, and calculates the Longest Common Prefix (LCP) array to find common substrings. The algorithm then returns the longest repeated substring discovered. The example's output using "banana" as the input is "ana." Using a sorted suffix array and LCP array, the technique effectively solves the LRS issue in O(n log n) time complexity, making it appropriate for big strings.

Efficient Method For solving Longest repeating substring:

Algorithm:

  • Create a suffix array of the given string.
  • Generate the LCP array, where LCP[i] represents the length of the longest common prefix between the suffixes starting at positions SA[i] and SA[i+1].
  • Iterate through the LCP array to find the maximum LCP value and the corresponding suffixes.
  • The substring corresponding to the maximum LCP value is the Longest Repeated Substring.

Example:

Step1: Create a suffix array from the string "banana" provided:

The suffix array is an array of the string's suffix beginning indices arranged in lexicographic order.

The word "banana" suffix array is [5, 3, 1, 0, 4, 2].

Step2: Generate the LCP array:

The length of the longest common prefix between neighboring suffixes in the suffix array is stored in the LCP array, which is created.

The LCP array for the string "banana" is [0, 1, 3, 0, 0].

Step 3: Find the Longest Repeated Substring:

After iterating through the LCP array, we discover that the longest repeated substring has a maximum length of 3, found between the suffixes "ana" and "anana."

Step 4: Return the Longest Repeated Substring: "and" has a length of three and is the Longest Repeated Substring.

Therefore, the longest repeated substring in the string "banana" is "ana."

Implementation:

In java:

Output:

Longest Repeated Substring: ana

The Java code uses the suffix array and LCP array approaches to locate the longest repeated substring. The generateLCPArray Method computes the LCP array for the input string, while the createSuffixArray function creates the suffix array. The longest repeated substring is found by iterating through the LCP array and looking for the longest common prefix between neighboring suffixes. The result is "ana" for the input "banana." The approach is appropriate for big strings with repeated substrings since it efficiently solves the problem in O(n log n) time complexity, where n is the length of the input text.


Next TopicSubstring Check




Latest Courses