C++ program to group anagrams in the given stream of strings

An anagram is a word formed by rearranging the letters of another word, such as "listen" and "silent". To group anagrams in each stream of strings, we need to group all the strings that are anagrams of each other together.

Example 1:

A code snippet in C++ that uses a hash table to group anagrams:

#include <iostream>
#include <algorithm>
#include <unordered_map>
#include <vector>
#include <string>
// Function to group anagrams
void groupAnagrams(std::vector<std::string>& strs) {
std::unordered_map<std::string, std::vector<std::string>>hashTable;
    for (const auto&str : strs) {
std::string temp = str;
        // Sort the string to form a key
std::sort(temp.begin(), temp.end());
hashTable[temp].push_back(str);
    }
strs.clear();
    for (const auto& [key, value] :hashTable) {
        for (const auto&str : value) {
strs.push_back(str);
        }
    }
}

// Main function to test the code
int main() {
std::vector<std::string> strs = {"eat", "tea", "tan", "ate", "nat", "bat"};
groupAnagrams(strs);
    for (const auto&str : strs) {
std::cout<< str << " ";
    }
    return 0;
}

Output

bat tan nat eat tea ate

In this implementation, we first create an unordered map hashTable where the keys are sorted strings and the values are vectors of strings that are anagrams of each other. After that, we iterate through each string in the input vector strs, sort the string to form a key and add it to the hash table. Finally, we clear the input vector "strs" and iterate through the hash table to append the anagram groups to strs.

There are other approaches to grouping anagrams in a stream of strings in C++. One such approach is to use a vector of pairs to store the sorted string as the first element of the pair and the original string as the second element of the pair. After that, we can sort this vector of pairs based on the sorted string, and the anagrams will be adjacent to each other. Finally, we can extract the original strings and store them in a separate vector.

Example 2:

Here is an implementation of this approach:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
// Function to group anagrams
void groupAnagrams(std::vector<std::string>& strs) {
std::vector<std::pair<std::string, std::string>>sortedStrs;
    for (const auto&str : strs) {
std::string sortedStr = str;
std::sort(sortedStr.begin(), sortedStr.end());
sortedStrs.push_back(std::make_pair(sortedStr, str));
    }
std::sort(sortedStrs.begin(), sortedStrs.end());
strs.clear();
    for (const auto&pair :sortedStrs) {
strs.push_back(pair.second);
    }
}
// Main function to test the code
int main() {
std::vector<std::string> strs = {"eat", "tea", "tan", "ate", "nat", "bat"};
groupAnagrams(strs);
    for (const auto&str : strs) {
std::cout<< str << " ";
    }
    return 0;
}

Output

bat ate eat tea nat tan

In this implementation, we first create a vector of pairs sortedStrs, where the first element of each pair is the sorted string and the second element is the original string. After that, we iterate through each string in the input vector strs, sort the string to form the sorted string, and add the pair to sortedStrs. We sort sortedStrs based on the sorted strings using std::sort(), which orders the pairs lexicographically based on their first elements. Finally, we clear the input vector "strs" and iterate through sortedStrs to extract the second element of each pair and append it to "strs".

This approach has a time complexity of O(N * M * log M), where N is the number of strings in the input vector and M is the length of the longest string. The space complexity is O(N * M), as we need to store the sorted strings for each input string.

Example 3:

Another way to implement this is using the counting sort approach:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <unordered_map>

// Function to group anagrams
void groupAnagrams(std::vector<std::string>& strs) {
    std::unordered_map<std::string, std::vector<std::string>> map;
    for (const auto&str : strs) {
        int count[26] = {0};
        for (const auto&c : str) {
count[c - 'a']++;
        }
std::string key;
        for (int i = 0; i< 26; i++) {
            key += std::to_string(count[i]) + "#";
        }
        map[key].push_back(str);
    }
strs.clear();
    for (const auto&pair : map) {
        for (const auto& str : pair.second) {
strs.push_back(str);
        }
    }
}

// Main function to test the code
int main() {
    std::vector<std::string> strs = {"eat", "tea", "tan", "ate", "nat", "bat"};
groupAnagrams(strs);
    for (const auto& str : strs) {
std::cout<< str << " ";
    }
    return 0;
}

Output

bat tan nat eat tea ate

In this implementation, we first create an unordered map map, where the key is the sorted string based on the count of each character in the string, and the value is a vector of the original strings that have the same sorted string. After that, we iterate through each string in the input vector strs, count the frequency of each character using a counting array count, and construct the key by concatenating the count of each character with a "#" separator. We insert the original string into the value vector corresponding to the key in map. Finally, we clear the input vector strs and iterate through map to extract the value vectors and append their elements to strs.

There are several other approaches such as:

Trie approach: We can use a trie data structure to group anagrams based on their sorted strings. We can insert each string into the trie by sorting it first, and store the original string at the leaf node corresponding to its sorted string. After that, we can traverse the trie to extract the anagram groups. This approach has a time complexity of O(N * M * log M), where N is the number of strings in the input vector and M is the length of the longest string.
Bucket sort approach: We can use a bucket sort algorithm to group anagrams based on their sorted strings. We can create an array of vectors, where the index of each vector corresponds to the sorted string, and append the original string to the vector at that index. After that, we can concatenate the vectors in the array to extract the anagram groups. This approach has a time complexity of O(N * M), where N is the number of strings in the input vector and M is the length of the longest string.

All these approaches have their own advantages and disadvantages, and the optimal approach depends on the size of the input vector, the length of the strings, and the available memory.

Next Topiccstdlib in C++

← prev next →