Build linear type suffix

Introduction

In string processing algorithms, suffix arrays are essential because they provide effective solutions for a wide range of string-related issues. For best results, a suffix array must be constructed as efficiently as possible. The SA-IS (Skew Algorithm for Induced Sorting) is a well-known algorithm for achieving linear time complexity. This post will explain how to construct a linear type suffix array and use the C programming language to implement it.

A data structure called a suffix array shows the lexicographical order of each suffix in a given string. Sorting the suffixes according to their lexicographical order is the first step in creating a suffix array. Specialized sorting algorithms like SA-IS aim to achieve linear time complexity, making them highly effective for large datasets. For this task, standard sorting algorithms like quicksort or mergesort would take O(n^2 * log n) time.

The Algorithm for SA-IS

The linear time complexity of the SA-IS algorithm, created by Ge Nong, Sen Zhang, and Wai Hong Chan, is well-known. The main idea of the algorithm, which is based on induced sorting, is to sort the suffixes based on their first characters and then sort the induced substrings recursively.

Code

Output:

Build linear type suffix

Code Explanation

Header Files

  • The code contains the header files required for dynamic memory allocation (stdlib.h), standard input/output operations (stdio.h), and string manipulation (string.h).

Suffix Structure

  • A suffix is represented by the struct Suffix, which is defined. Its two members are rank, an array of two integers used to sort the suffixes, and index, which indicates the suffix's starting index in the original string.

Comparison Function

  • The qsort function uses the compareSuffixes function as a comparison function to order the suffixes lexicographically. The rank arrays of two suffixes are compared.

Create the Suffix Array Function

  • The input string text and its length n are passed to the buildSuffixArray function as parameters.
  • The suffixes of the input string are represented by an array of Suffix structures that are initialized.
  • The suffixes are sorted according to their ranks using the qsort function.
  • Suffixes are sorted, and ranks are updated iteratively until the final suffix array is obtained, which forms the basis of the SA-IS algorithm.

Initialization of Rank

  • The characters in the input string are used to determine the starting ranks. The Suffix structure's rank array contains the ranks.

Sorting and Ranking

  • The suffixes are first sorted using the qsort function. The lexicographical order of the results is then used to update the ranking.

Updating Ranks

  • The suffixes are iterated through, and the loop modifies their ranks according to their placement and connections to other suffixes.

Second Sorting Phase

  • After the ranks are updated, there is one more sorting phase. In order to achieve linear time complexity, this stage is essential.

Suffix Array Output

  • The console displays the final suffix array. It displays the input string's suffixes in lexicographical order.

Main Function

  • The example string "banana" is used in the main function, and strlen is used to determine its length.
  • The input string and its length are passed to the buildSuffixArray function.

Printed Output

The suffix array is displayed in the final result that is printed to the console.






Latest Courses