Suffix Array nLogn Algorithm:

All the suffixes of a particular string are arranged in a suffix array. The concept is comparable to the Suffix Tree, which is a compressed tree of all the text's suffixes.

A fundamental data structure that is utilized by numerous algorithms that deal with strings is the suffix array. It displays an array of all suffixes from a given string that has been lexicographically ordered. The time complexity required to construct a suffix array using the most effective approach is typically O(n log n), where n is the length of the input text.

Construction of Suffix Array using nlogn Algorithm:

Using the Brute force approach the time Complexity is O(n^2logn).

This is modified and built an Optimized approach which takes O(nlogn) Time complexity.

The "Skew Algorithm" based on the DC3 (Difference Cover 3) technique is a well-known method for building the Suffix Array with O(n log n) time complexity. Here is a general description of the algorithm:

1. Preprocessing:

Create an array of integers from the given string, each one representing a character. Typically, this is accomplished by giving each character a distinct integer value, such as one of its ASCII codes.
Add a special character to the string's end. The string's other characters should be larger than this character. You could use '0' (the null character), for instance, or a character with an extremely low ASCII value.

2. Construct Initial Suffix Array:

Sort all the suffixes of the string using radix sort, which takes O(n) time.

3. Induced Sorting:

Recursively sort the suffixes starting at the places of type L (bigger) for each suffix if it is of type S (smaller) and vice versa as you iterate through the suffix array.
O(n) time complexity can be used to complete this process.

4. Merge Step:

To create the final suffix array, combine the two sorted arrays that were obtained during the induced sorting stage.

Although it can be difficult to implement the skew method from scratch, there are open-source tools and implementations that you can utilize. SuffixArray (C++), SuffixArray.jl (Julia), and pysuffixarray (Python) are a few well-known libraries.

The SA-IS (Suffix Array Induced Sorting) approach, for example, is another algorithm to build a suffix array with O(n log n) time complexity, however, the skew algorithm is frequently favored due to its ease of use and practical effectiveness.

Example:

Let's look at an example to better grasp how a suffix array for a given string is created. Take the word "banana$" as an example. To indicate that the string has concluded, we append the special character "$" (which is smaller than every other character). The DC3 algorithm is used to create the suffix array for this string as follows:

Step 1 Preprocessing:

Using their ASCII values, the characters in the string "banana$" can be represented as integers:

b -> 98
a -> 97
n -> 110
a -> 97
n -> 110
a -> 97
$ -> 36

Step 2: Create the initial suffix array

Using radix sort, order the string's suffixes. The suffixes are listed below, along with their initial positions:

Suffixes:

98 97 110 97 110 97 36 (Starting from index 0)
97 110 97 110 97 36 (Starting from index 1)
110 97 110 97 36 (Starting from index 2)
97 110 97 36 (Starting from index 3)
110 97 36 (Starting from index 4)
97 36 (Starting from index 5)
36 (Starting from index 6)

The suffixes are rearranged after sorting as follows:

36 (Starting from index 6)
97 36 (Starting from index 5)
97 110 97 36 (Starting from index 3)
97 110 97 110 97 36 (Starting from index 1)
110 97 36 (Starting from index 4)
110 97 110 97 36 (Starting from index 2)
98 97 110 97 110 97 36 (Starting from index 0)

Step 3: Induced Sorting

The suffixes are sorted recursively according to their types (S and L). We discriminate between S-type and L-type characters in this stage. Characters that are S-type are those that are smaller than the character after them, and characters that are L-type are those that are larger. For the sake of simplicity, the special character '$' is regarded as an S-type character.

36 (at index 6) and 97 (at index 1, 3, 5) are S-type (S) characters.

98 (at index 0) and 97 (at index 2, 4) are L-type (L) characters.

We recursively sort the S-type and L-type suffixes beginning at the end. The following are the ordered suffixes as a result:

Suffixes:

36 (Starting from index 6)

97 36 (Starting from index 5)

97 110 97 36 (Starting from index 3)

97 110 97 110 97 36 (Starting from index 1)

110 97 36 (Starting from index 4)

110 97 110 97 36 (Starting from index 2)

98 97 110 97 110 97 36 (Starting from index 0)

Step 4: Merge Step

The two sorted arrays resulting from the induced sorting are combined in the last phase while accounting for the suffixes' initial positions.

Merged Suffix:

6 (Starting from index 6)

5 6 (Starting from index 5)

3 4 5 6 (Starting from index 3)

1 2 3 4 5 6 (Starting from index 1)

4 5 (Starting from index 4)

2 3 (Starting from index 2)

0 1 2 3 4 5 6 (Starting from index 0)

The string "banana$"'s final suffix array is [6, 5, 3, 1, 4, 2, 0]. The starting places of the sorted suffixes in the original string are indicated by these numbers. Notably, the special character '$' that denotes the string's end corresponds to the shortest suffix in the sorted array.

This is a simple illustration of how a suffix array for a given string is created. In actual use, the technique is effective even for very long strings.

Implementation of nlogn Algorithm:

The SA-IS or Skew algorithm must be completely implemented in C, which is outside the scope of a single response. I can provide you with a condensed version of the algorithm in C, though. Although this implementation might not be as effective as optimized libraries, it will help you understand the structure of the algorithm.

C Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Structure to store information about suffixes
struct Suffix {
    int index; // Starting index of the suffix
    int rank[2];  // Rank and next rank (to be used in sorting)
};
// Helper function to compare two suffixes
int compare_suffixes(const void* a, const void* b) {
    struct Suffix* sa = (struct Suffix*)a;
    struct Suffix* sb = (struct Suffix*)b;
    return (sa->rank[0] == sb->rank[0]) ? 
        (sa->rank[1] - sb->rank[1]) : (sa->rank[0] - sb->rank[0]);
}
// Helper function to build the suffix array using merge-sort
void merge_sort(struct Suffix* suffixes, int* index_array, int left, int right) {
    if (left < right) {
        int mid = left + (right-left) / 2;
        merge_sort(suffixes, index_array, left, mid);
        merge_sort(suffixes, index_array, mid + 1, right);
        // Merge the sorted halves
        int i, j, k;
        int n1 = mid - left + 1;
        int n2 = right - mid;
        int* L = (int*)malloc(sizeof(int) * n1);
        int* R = (int*)malloc(sizeof(int) * n2);
        for (i = 0; i < n1; i++)
            L[i] = index_array[left + i];
        for (j = 0; j < n2; j++)
            R[j] = index_array[mid + 1 + j];
        i = 0;
        j = 0;
        k = left;
        while (i < n1 && j < n2) {
            if (compare_suffixes(&suffixes[L[i]], &suffixes[R[j]]) <= 0)
                index_array[k++] = L[i++];
            else
                index_array[k++] = R[j++];
        }
        while (i < n1)
            index_array[k++] = L[i++];
        while (j < n2)
            index_array[k++] = R[j++];

        free(L);
        free(R);
    }
}

// Function to construct the suffix array
void build_suffix_array(const char* str, int* suffix_array, int size) {
    // Create an array of Suffix structures to store information about suffixes
    struct Suffix* suffixes = (struct Suffix*)malloc(sizeof(struct Suffix) * size);
    int* index_array = (int*)malloc(sizeof(int) * size);

    // Populate the Suffix array
    for (int i = 0; i < size; i++) {
        suffixes[i].index = i;
        suffixes[i].rank[0] = str[i];
        suffixes[i].rank[1] = (i + 1 < size) ? str[i + 1] : -1;
    }

    // Sort the suffixes using merge-sort with counting sort for comparison
    qsort(suffixes, size, sizeof(struct Suffix), compare_suffixes);

    // Store the sorted suffix indexes in the suffix array
    for (int k = 4; k < 2 * size; k *= 2) {
        int rank = 0;
        int prev_rank = suffixes[0].rank[0];
        suffixes[0].rank[0] = rank;
        index_array[suffixes[0].index] = 0;

        for (int i = 1; i < size; i++) {
            if (suffixes[i].rank[0] == prev_rank &&
                suffixes[i].rank[1] == suffixes[i - 1].rank[1]) {
                suffixes[i].rank[0] = rank;
            } else {
                prev_rank = suffixes[i].rank[0];
                suffixes[i].rank[0] = ++rank;
            }
            index_array[suffixes[i].index] = i;
        }

        for (int i = 0; i < size; i++) {
            int next_index = suffixes[i].index + k / 2;
            suffixes[i].rank[1] = (next_index < size) ? suffixes[index_array[next_index]].rank[0] : -1;
        }

        merge_sort(suffixes, index_array, 0, size - 1);
    }

    for (int i = 0; i < size; i++) {
        suffix_array[i] = suffixes[i].index;
    }

    free(suffixes);
    free(index_array);
}

// Function to print the suffix array
void print_suffix_array(const char* str, const int* suffix_array, int size) {
    printf("Suffix Array for the string \"%s\":\n", str);
    for (int i = 0; i < size; i++) {
        printf("%d: %s\n", suffix_array[i], &str[suffix_array[i]]);
    }
}

int main() {
    const char* str = "banana$";
    int size = strlen(str);

    int* suffix_array = (int*)malloc(sizeof(int) * size);

    build_suffix_array(str, suffix_array, size);
    print_suffix_array(str, suffix_array, size);

    free(suffix_array);
    return 0;
}

Output:

Suffix Array for the string "banana$":
6: $
5: a$
3: ana$
1: anana$
0: banana$
4: na$
2: nana$

Time Complexity: O(nlogn).

Next TopicSuffix tree introduction:

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

DS Tutorial

DS Array

DS Linked List

DS Stack

DS Queue

DS Tree

DS Graph

DS Searching

DS Sorting

Differences

Misc

DS MCQ