Check if the Given String of Words can be Formed from Words Present in the Dictionary.

Checking if a given string of words exists in a known dictionary is a common task required for many natural language processing applications. Therefore, efficiently validating word membership in a fixed dictionary is a significant challenge. This article will examine three approaches for utilising the Python programming language to determine whether a string of words can be constructed from terms found in a dictionary.

Firstly, we will look at a simple brute force approach that is user-friendly but scales poorly for huge vocabulary sets. Next, we can significantly increase performance by storing intermediate results in a dynamic programming environment. Lastly, we will optimise to get quicker lookup times by employing a map data structure rather than lists.

Every strategy will be implemented using Python, and its temporal and spatial complexity will be examined. The main distinctions and performance traits of map-based, dynamic programming, and brute-force searching approaches will be illustrated using code samples. By the end, you will know the drawbacks of each approach and have three new tools in your toolbox for using Python to validate word membership in dictionaries. Whether you need an easy-to-implement brute force algorithm, a faster dynamic programming solution, or an optimised map-based approach, you will know how to check if a string of words is contained in a dictionary in Python.

Approach 1: Brute Force Method

One straightforward way to determine if a given input string can be constructed using words from a dictionary is through a brute force algorithm. This simply involves generating all possible combinations of dictionary words and checking if any combination successfully produces the input string.

More specifically, the brute force approach iterates through the dictionary words individually, trying all possible combinations and permutations to find a match for the input string. It follows these steps:

Start with an empty combination array to store the current word combination.
Check if the end of the input string has been reached - if so, check if the current combination array forms the input string.
Loop through each word in the dictionary.
Add the current dictionary word to the combination array if the input string starts with the joined combination array plus the dictionary word.
Recursively call the algorithm to try longer combinations starting from the index after the newly added word.
If a valid combination is found (the recursive call returns true), return true.
Otherwise, remove the word from the combination array and try the next term.
If all words have been tried and no valid combination is found, return false.

This brute force approach tries all possible combinations to guarantee to find a valid variety if one exists. However, with an extensive dictionary, the number of combinations grows exponentially, making this inefficient for all but the most miniature dictionaries.

Nonetheless, the relative simplicity of the brute force method makes it an instructive introduction and baseline approach for solving this problem before exploring more sophisticated optimisation techniques.

def can_form_string(dictionary, input_string):
    def is_valid_combination(combination, input_string):
        return ''.join(combination) == input_string

    def generate_combinations(start, current_combination):
        if start == len(input_string):
            if is_valid_combination(current_combination, input_string):
                return True
            return False

        for word in dictionary:
            if input_string.startswith(''.join(current_combination) + word):
                current_combination.append(word)
                if generate_combinations(start + len(word), current_combination):
                    return True
                current_combination.pop()

        return False

    return generate_combinations(0, [])

# Example usage
dictionary = ["check", "if", "the", "given", "string", "words", "can", "we", "formed", "from", "present", "in", "the", "dictionary"]
input_string = "checkifgivenstringcanbeformedfromwordsinthedictionary"
result = can_form_string(dictionary, input_string)
print(result) 

Output:

True

Here is an explanation of the program to check if a string can be formed from words in a dictionary:

The can_form_string function takes the dictionary and input string as parameters.
The is_valid_combination function checks if a given combination of words forms the input string by joining the terms and comparing them to the input.
The generate_combinations function takes the current index and partially constructed combination.
It checks if the current index has reached the end of the input string. If so, it contains if the current combination is valid using is_valid_combination.
It loops through each word in the dictionary:
It checks if the input string starts with the current combination + the dictionary word.
If so, it appends the word to the combination and recursively calls generate_combinations with updated index and combination.
If that returns True, it returns True. If not, it removes the word before moving to the next.
When all words are tried, and none lead to a valid combination, it returns False.
Initially, generate_combinations is called with index 0 and empty combination [].
The result of can_form_string is the result of calling generate_combinations.
This implements a backtracking algorithm that tries all combinations of dictionary words to find a valid variety that forms the input string.
It returns True if any valid combination is found and False otherwise.

Approach 2: Dynamic Programming

The brute force method of generating all possible combinations has an exponential time complexity, which is inefficient for large dictionaries. We can optimise this using dynamic programming, which avoids recomputing the same subproblems by storing results in a table.

Dynamic programming is an optimisation technique to solve complex problems by breaking them into simpler subproblems. It works by storing the results of solving more minor issues and using those results to build up solutions to more significant problems.

In this article, we use dynamic programming to check if a string can be formed from dictionary words. Rather than generating all possible word combinations through brute force, we build up a results table for smaller strings. Each cell stores whether a substring can be formed from the dictionary. Using previous results allows us to avoid recomputing the same subproblems.

The critical steps of the dynamic programming approach are:

Initialise a table dp of length n+1, where n is the length of the input string. dp[i] will store true if a string of length i can be formed from the dictionary.
Base case: dp[0] is true since the empty string can always be formed.
Iterate the length of the input string from 1 to n:
- Check each dictionary word:
After iteration, if dp[n] is true, the whole string can be formed.
Return dp[n].

This builds up the dp table bottom-up by solving shorter subproblems first and iteratively combining solutions. Storing results in the table avoids recomputing duplicate subproblems.

The complexity is reduced from exponential brute force to polynomial time and space. The tradeoff is higher memory usage to store the dp table.

Overall, dynamic programming provides an optimised solution by using a table to cache the results of subcomputations. This topological ordering and storing of intermediate results provides major efficiency gains compared to brute force methods.

def can_form_string(dictionary, input_string):
    n = len(input_string)
    dp = [False] * (n + 1)
    dp[0] = True

    for i in range(1, n + 1):
        for word in dictionary:
            if i >= len(word) and dp[i - len(word)] and input_string[i - len(word):i] == word:
                dp[i] = True

    return dp[n]

# Example usage
dictionary = ["check", "if", "the", "given", "string", "words", "can", "we", "formed", "from", "present", "in", "the", "dictionary"]
input_string = "checkifgivenstringcanbeformedfromwordsinthedictionary"
result = can_form_string(dictionary, input_string)
print(result)  

Output:

True

Here is an explanation of the dynamic programming approach for checking if a string can be formed from dictionary words:

Initialise a boolean array dp of length n+1, where n is the length of the input string. dp[i] will store if the string can be formed till index i.
Set dp[0] = True since the empty string is always possible.
Iterate over all indexes i from 1 to n:
For each dictionary word, check if:
- i is greater than or equal to the word length
- dp[i - word.length] is True (string can be formed till i - word.length)
- The word matches the substring from i - word.length to i
If the above conditions are satisfied, set dp[i] = True
After iterating over all i, the complete string can be formed if dp[n] is True.
Return dp[n]
So, this fills the dp array bottom-up by breaking the string into subproblems.
We start from an empty string and iteratively check if adding a dictionary word leads to a solvable subproblem.
Storing the results in dp allows us to avoid recomputing the same subproblems.
Overall, this reduces the exponential complexity of brute force to polynomial time and space.

Approach 3: Using map() functions

The map() function in Python provides a simple and efficient way to apply a function to every item in an iterable without using an explicit for loop.

The map takes a function and iterable as inputs and returns a new iterator that applies the function to each item from the iterable. The returned map object can then be converted to a suitable data structure like a list or used directly.

We can optimise the dictionary string formation problem using a map data structure instead of lists or arrays. This provides faster lookup times to check word membership in the dictionary.

The key steps are:

Initialise a map in the dictionary where keys are words and values are word frequencies.
Define a recursive function to check if a string can be formed:
- Base case: empty string is valid.
- Loop over prefixes from 1 to string length.
- If the prefix exists in a map and has a non-zero count:
  - Decrement count to mark as used.
  - Recursively check the remainder of the string.
  - If true, return true.
  - Else, increment count back.
Call function on the original string.
Return result.

This implements backtracking with pruning using the word map. The map allows O(1) lookup versus O(N) search with lists.

We recursively try all prefixes, decrementing the word count when used and incrementing it back if that recursive call fails. This prune branches with infeasible prefixes.

The tradeoff is higher base memory usage to store the entire map. But this allows backtracking without repeating expensive list operations.

The map-based approach optimises the brute force and dynamic programming solutions by exploiting fast key-based lookups. This provides an efficient algorithm that balances improved performance with reasonable memory overhead.

def can_form_string(dictionary, input_string):
    word_map = {}
    
    # Create a dictionary to map each word to its count in the dictionary
    for word in dictionary:
        if word in word_map:
            word_map[word] += 1
        else:
            word_map[word] = 1

    def can_form(input_string):
        if input_string == "":
            return True

        for i in range(1, len(input_string) + 1):
            prefix = input_string[:i]
            if prefix in word_map and word_map[prefix] > 0:
                word_map[prefix] -= 1
                if can_form(input_string[i:]):
                    return True
                word_map[prefix] += 1

        return False

    return can_form(input_string)

# Example usage
dictionary = ["check", "if", "the", "given", "string", "words", "can", "we", "formed", "from", "present", "in", "the", "dictionary"]
input_string = "checkifgivenstringcanbeformedfromwordsinthedictionary"
result = can_form_string(dictionary, input_string)
print(result)  

Output:

True

Explanation:

Create a word_map dictionary that maps each word to its frequency count in the dictionary.
Define a recursive can_form function to check if a string can be formed:
Base case: If string is empty, return True.
Loop over all prefixes of the string from 1 to the length of the string.
Check if the prefix exists in word_map and has a count > 0.
If yes, decrease the count to mark it as used.
Recursively call can_form on the remainder of the string after a prefix.
If it returns True, return True.
If not, increment the count back and try the following prefix.
Return False if no prefixes lead to a valid decomposition.
The primary can_form_string function populates the word_map and calls can_form on the input string.
It returns the result of can_form.

This implements a recursive backtracking algorithm but uses a map to achieve an O(1) lookup time instead of an O(N) list search.

The tradeoff is higher base memory usage to store the entire word_map. This performs faster than brute force, using less memory than dynamic programming.

Conclusion

This article explored various techniques to determine if an input string can be constructed using words from a known dictionary. We implemented and analysed brute force searching, dynamic programming, and map-based optimisation in Python.

The brute force approach generates all possible word combinations but has exponential complexity, making it intractable for more extensive dictionaries. Dynamic programming speeds this up by storing intermediate results in a table and avoiding the recomputation of overlapping subproblems. This reduces time complexity to polynomials but requires O(N) space to keep the table.

Finally, we saw how using a map data structure for dictionary lookups improves performance further with O(1) access instead of O(N) searches. The tradeoff is higher base memory usage.

Each technique makes different performance tradeoffs between time and space complexity. While brute force searching is easy to implement, its exponential complexity makes dynamic programming or map-based solutions preferable for most real-world applications. Choosing the correct algorithm requires analysing the dictionary size, input length, and available memory.

This exploration of optimising dictionary string formation in Python demonstrated principles like memoisation, tabulation, and space-time tradeoffs that can be applied to many problems. Mastering these performance optimisation techniques is critical for efficient algorithm design and interviewing.

Next TopicCount sort vs bucket sort

← prev next →