Javatpoint Logo
Javatpoint Logo

Kasai's algorithm for construction of LCP array from Suffix array:

Suffix array:

All the suffixes of a particular string are arranged in a suffix array. The concept is comparable to the Suffix Tree, which is a compressed tree of all the text's suffix.

A fundamental data structure that is utilized by numerous algorithms that deal with strings is the suffix array. It displays an array of all suffixes from a given string that has been lexicographically ordered. The time complexity required to construct a suffix array using the most effective approach is typically O(n log n), where n is the length of the input text.

LCP Array:

The Suffix Array of a given string S is accompanied by an array known as the LCP array (Longest Common Prefix array). It details the lengths of the longest common prefixes between subsequent suffixes in the suffixes' sorted order.

The Suffix Array is an array of integers sa[0..n-1] where sa[i] indicates the starting index of the suffix at position i when the suffixes of the string S are sorted in lexicographical order. Given a string S of length n, the Suffix Array is an array of integers sa[0..n-1].

The length of the longest common prefix between the suffixes beginning at indices sa[i] and sa[i+1] in the sorted order is represented by the integer lcp[i], which is part of the LCP array, which is an array of integers lcp[0..n-1].

Similar to Suffix Array, LCP Array is an array with size n. The length of the longest common prefix among the suffixes indexed by suffix[i] and suffix[i+1] is shown by the value lcp[i]. Since there is no suffix following it, suffix[n-1] is not defined.

Kasai Algorithm:

The Longest Common Prefix (LCP) problem, a basic issue in string processing and string algorithms, is discussed in the context of "Kasai's Algorithm". This challenge is a variation on the longest common substring issue.

Finding the longest common prefix among a group of strings is the goal of the Longest Common Prefix (LCP) issue. For instance, the longest common prefix among the strings ["apple", "appetizer", "append", and "appreciate"] is "app."

The "Kasai Algorithm" is frequently employed to resolve this issue. The LCP array is created via the Kasai technique, a linear-time technique, from the suffix array of a given text. The term was first used by Tatsuya Kasai et al. in their article titled "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications."

Algorithm:

  • For the given string, create the suffix array.
  • Compare succeeding suffixes by iterating through the suffix array.
  • The length of the common prefix between each successive pair of suffixes (suffix[i] and suffix[i+1]) should be determined.
  • The LCP array is an array where the lengths of the common prefixes are stored.

Example:

Consider the string S = "banana".

  • For S, create the Suffix Array.

The string S = "banana" has the following Suffix Array:

Suffix Array (sa):

[5, 3, 1, 0, 4, 2]

  • Set the inverse suffix array to zero.

The positions of each suffix in the Suffix Array will be contained in the inverted suffix array inv. As an illustration, inv will be:

Inverse Suffix Array (inv):

[3, 2, 5, 1, 4, 0]

  • Initialize the LCP array lcp with all elements set to 0.

LCP Array (lcp):

[0, 0, 0, 0, 0, 0]

  • Apply Kasai's algorithm to compute the LCP array.

For i = 0:

sa[0] = 5, inv[5] = 0, k = 0.

Update lcp[0] = 0.

For i = 1:

sa[1] = 3, inv[3] = 1, k = 0.

Compare suffixes "na" (starting at index 3) and "ana" (starting at index 1). No common prefix was found.

Update lcp[1] = 0.

For i = 2:

sa[2] = 1, inv[1] = 2, k = 0.

Compare suffixes "ana" (starting at index 1) and "banana" (starting at index 0). No common prefix was found.

Update lcp[2] = 0.

For i = 3:

sa[3] = 0, inv[0] = 3, k = 0.

Compare suffixes "banana" (starting at index 0) and "nana" (starting at index 4). A common prefix of length 1 was found ("n").

Update lcp[3] = 1.

For i = 4:

sa[4] = 4, inv[4] = 4, k = 0.

Compare suffixes "nana" (starting at index 4) and "a" (starting at index 2). No common prefix was found.

Update lcp[4] = 0.

For i = 5:

sa[5] = 2, inv[2] = 5, k = 0.

Compare suffixes "a" (starting at index 2) and "banana" (starting at index 5). No common prefix was found.

Update lcp[5] = 0.

The final LCP array is [0, 0, 0, 1, 0, 0].







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA