Trie of all Suffixes in C++

In this article, you will learn about the trie of all suffixes in C++ with its history, implementation, applications, advantages, and disadvantages.

What is the Trie in C++?

A trie is also called as a prefix tree. It is a tree-like data structure that is used for efficiently storing and searching a dynamic set of strings. This data structure is used in many advanced algorithms and serves as the foundation for various technologies, especially in areas that involve text or strings. In a trie, every node represents a single character of a string, and the paths from the root to the leaves spell the strings. Tries are mostly helpful for making autocomplete suggestions and spell checking, especially for large text. In short, trie is a very powerful way to manage and retrieve information from a dynamic collection of words.

History of Trie:

The trie was first introduced by René de la Briandais in 1959. Its name comes from the middle letters of "retrieval" due to its use in information retrieval systems. However, the basic idea is a very old one. A trie makes it more efficient to store massive sets of strings, as you might see in a dictionary, and manipulate them. For this reason, the concept has widespread use in computer science and programming for various string-related tasks.

Example 1:

Let us take an example to illustrate the use of trie in C++.

#include <iostream>
#include <unordered_map>

using namespace std;

// TrieNode class
class TrieNode {
public:
    unordered_map<char, TrieNode*> children;
    bool isEndOfWord;

    TrieNode() : isEndOfWord(false) {}
};

// Trie class
class SuffixTrie {
private:
    TrieNode* root;

    // Helper function to insert a string into the trie
    void insertSuffix(TrieNode* node, const string& suffix) {
        for (char ch : suffix) {
            if (node->children.find(ch) == node->children.end()) {
                node->children[ch] = new TrieNode();
            }
            node = node->children[ch];
        }
        node->isEndOfWord = true;
    }

public:
    // Constructor
    SuffixTrie() : root(new TrieNode()) {}

    // Function to build the suffix trie
    void buildSuffixTrie(const string& text) {
        for (int i = 0; i < text.length(); ++i) {
            insertSuffix(root, text.substr(i));
        }
    }

    // Function to search for a suffix in the trie
    bool searchSuffix(const string& suffix) {
        TrieNode* node = root;
        for (char ch : suffix) {
            if (node->children.find(ch) == node->children.end()) {
                return false; // Suffix not found
            }
            node = node->children[ch];
        }
        return node->isEndOfWord; // True if the entire suffix is found
    }
};

int main() {
    // Example usage
    string inputText = "banana";
    SuffixTrie suffixTrie;
    suffixTrie.buildSuffixTrie(inputText);

    // Search for suffixes
    cout << "Searching for 'ana' suffix: " << (suffixTrie.searchSuffix("ana") ? "Found" : "Not Found") << endl;
    cout << "Searching for 'xyz' suffix: " << (suffixTrie.searchSuffix("xyz") ? "Found" : "Not Found") << endl;

    return 0;
}

Output:

Searching for 'ana' suffix: Found
Searching for 'xyz' suffix: Not Found

Explanation:

TrieNode Class:
- It represents a node in the trie.
- children: An unordered map that stores child nodes for each character.
- isEndOfWord: It indicates whether the current node marks the end of a word.
SuffixTrie Class:
- It manages the trie structure and related functions.
- root: The root node of the trie.
insertSuffix Function:
- A helper function to insert a suffix into the trie.
- It iterates through the characters of the suffix, and it creates nodes as needed.
buildSuffixTrie Function:
- It builds the suffix trie for a given input text.
- It calls insertSuffix for each suffix starting at different positions in the text.
searchSuffix Function:
- It searches for a specific suffix in the trie.
- It returns true if the entire suffix is found; otherwise, it returns false.
main Function:
- Example usage of the SuffixTrie class.
- It creates a suffix trie for the input text "banana."
- It searches for the suffixes "ana" and "xyz" and prints whether they are found.

Example 2:

Let us take another example to illustrate the use of trie in C++.

#include <iostream>
#include <unordered_map>
#include <memory>

using namespace std;

// TrieNode class with additional information
class TrieNode {
public:
    unordered_map<char, shared_ptr<TrieNode>> children;
    bool isEndOfWord;
    int occurrences; // Number of occurrences of the suffix

    TrieNode() : isEndOfWord(false), occurrences(0) {}
};

// Trie class with advanced features
class SuffixTrie {
private:
    shared_ptr<TrieNode> root;

    // Helper function to insert a string into the trie
    void insertSuffix(shared_ptr<TrieNode> node, const string& suffix) {
        for (char ch : suffix) {
            if (node->children.find(ch) == node->children.end()) {
                node->children[ch] = make_shared<TrieNode>();
            }
            node = node->children[ch];
            node->occurrences++; // Increment occurrence count for the entire path
        }
        node->isEndOfWord = true;
    }

public:
    // Constructor
    SuffixTrie() : root(make_shared<TrieNode>()) {}

    // Function to build the suffix trie
    void buildSuffixTrie(const string& text) {
        for (int i = 0; i < text.length(); ++i) {
            insertSuffix(root, text.substr(i));
        }
    }

    // Function to count occurrences of a suffix in the trie
    int countOccurrences(const string& suffix) {
        shared_ptr<TrieNode> node = root;
        for (char ch : suffix) {
            if (node->children.find(ch) == node->children.end()) {
                return 0; // Suffix not found
            }
            node = node->children[ch];
        }
        return node->occurrences;
    }
};

int main() {
    // Example usage
    string inputText = "banana";
    SuffixTrie suffixTrie;
    suffixTrie.buildSuffixTrie(inputText);

    // Count occurrences of suffixes
    cout << "Occurrences of 'ana' suffix: " << suffixTrie.countOccurrences("ana") << endl;
    cout << "Occurrences of 'xyz' suffix: " << suffixTrie.countOccurrences("xyz") << endl;

    return 0;
}

Output:

Occurrences of 'ana' suffix: 2
Occurrences of 'xyz' suffix: 0

Explanation:

TrieNode Class:
- It represents a node in the trie.
- children: An unordered map storing child nodes for each character.
- isEndOfWord: It indicates whether the current node marks the end of a word.
- occurrences: It keeps track of the number of occurrences of the suffix.
SuffixTrie Class:
- It manages the trie structure and related functions.
- root: It is the root node of the trie, stored as a smart pointer (shared_ptr).
insertSuffix Function:
- A helper function to insert a suffix into the trie.
- It iterates through the characters of the suffix, and it creates nodes as needed.
- It increments the occurrence count for the entire path.
buildSuffixTrie Function:
- It builds the suffix trie for a given input text.
- It calls insertSuffix for each suffix starting at different positions in the text.
countOccurrences Function:
- It counts the occurrences of a specific suffix in the trie.
- If the entire suffix is found, it returns the occurrence count; otherwise, it returns 0.
main Function:
- Example for SuffixTrie class.
- It creates a suffix trie for the input text "banana".
- It counts and prints the occurrences of the suffixes "ana" and "xyz".

Time and Space complexities:

Time Complexity:

If the length of the suffix is m, and the trie has a depth of maximum m for the given suffix, the time complexity for searching a suffix is O(m).

Building Suffix Trie: O(n * m)
Searching for a Suffix: O(m)

Space Complexity:

If you consider the TrieNode structure and the storage for edges and nodes, the overall space complexity is O(n).

Trie Node Structure: O(n)
Edge and Node Storage: O(n)

Applications of Trie:

There are several applications of the Trie. Some main applications of the Trie are as follows:

Trie: Tries are generally used for implementing any autocomplete text system. By storing a dictionary of words in a trie, you can quickly predict and suggest words based on what users have typed till now.
Spell Checking: Tries are also used in spell-checking algorithms by traversing the trie based on the characters of a wrongly spelled word.
IP Routing and Longest Prefix Matching: In computer networking, tries are used for IP routing and longest prefix matching. They help in efficiently determining the next hop in a routing table based on the longest matching prefix.
Data Compression: Tries are used in data compression algorithms, such as Huffman coding. They help in efficiently representing variable-length codes for different characters.
Genomic Data Analysis: Tries are also used in bioinformatics for analysing genomic data. They can efficiently store and search for DNA sequences and patterns.
Substring Search: Tries are useful for substring search problems. It becomes efficient to search for a particular substring by constructing a trie of all substrings of a text.
Symbol Tables in Compilers: Tries are used in the construction of symbol tables in compilers. They provide a fast way to check the identifiers and keywords in source code.
Network Routing Protocols: Tries are used in network routing protocols like Open Shortest Path First (OSPF) for efficient route lookups.

Advantages of Trie:

There are several advantages of the Trie. Some main advantages of the Trie are as follows:

File System Indexing: Tries are also efficient for prefix-match searches, which can be used in complete-match searches. Also, the resulting complete-match search running time is reduced by O(k), where k is the number of strings that occur in the search sequence.
Natural Language Processing (NLP): Tries are used in many NLP Some examples are storing a dictionary, retrieving the root form of a word, and using count prefixes to complete words or auto-complete.
Phone Number Directory: Tries are also known as digital search trees and can be used in phone number directories. These directories have search keys, which usually use the digits 0-9, and the results are printed in lexicographical order.
Substring Matching in DNA Sequences: Tries can be used to locate specific patterns and sequences and are efficient in substring matching in DNA sequences. It is often utilized in bioinformatics.
For Efficient String Search: You can use tries to search keys in a dictionary. You can perform insertions O(n) Similarly, you can look for a specific string stored in a set of strings in O(m) time, where m is the length of a target string. Also, when searching for a specific string in a set of strings, you can obtain the prefix of the given target string.
Prefix Matching: Tries are good at prefix matching. They can quickly identify all strings with a given prefix, so they can be used in autocomplete and predictive text systems.
Space Efficiency: Tries can be space-efficient, especially in situations where there is a lot of overlap in prefixes among stored strings. Common prefixes are shared among multiple branches in the trie, reducing the overall memory requirements.
Dynamic Set Operations: Tries support dynamic set operations efficiently. Inserting a new string, deleting a string, or checking for the existence of a string can be performed in O(m) time, where m is the length of the string.
Ordered Output: Tries naturally maintain an order among the stored strings. This property can be used when ordered output is required, such as in dictionary applications or for alphabetical listings.

Disadvantages of Trie:

There are several disadvantages of the Trie. Some main disadvantages of the Trie are as follows:

Space complexity: The main disadvantage of the trie is its high overhead in memory consumption to some extent, especially if it has many strings and each string is rather long. Therefore, overhead can happen because every node represents a single character.
Memory Fragmentation: The use of trie can also cause memory fragmentation, particularly when dealing with strings of different lengths. It is excessive with memory use and leads to poor utilization.
Complexity of Implementation: Implementing a trie is very complex compared with other data structures. The insertion and deletion operations require careful management of pointers and memory allocation, which can bring about complex situations that might cause errors.
Not Efficient for Numeric Keys: Tries are designed for strings, so they are not suited for numeric. You may want to try other data structures like a hash table or binary search tree.
Limited Alphabet Size: If the alphabet size is too large, the trie becomes inefficient.
Time Complexity for Building Tries: The time complexity for building a trie can be higher than for some other data structures. Constructing a trie involves traversing each character of each string, leading to a time complexity proportional to the total length of all strings.