Longest Common Prefix in Python

In this article, you will learn about the longest common prefix in Python. There are various approaches to find the longest common prefix in Python. But before discussing these approaches, you have to know about the longest common prefix.

What is Longest Common Prefix?

The longest string that is the prefix of both string1 and string2 for a pair of strings string1 and string2 is called the longest common prefix (LCP).

A python code snippet that demonstrates how to find the longest common prefix among a list of strings:

Code:

Output:

foo

Explanation:

  • A list of strings named "strs" is the input argument for the "longest_common_prefix"
  • First, we handle the special case where the list is empty by returning an empty string. An empty collection of strings has no prefix in common.
  • Next, we find the length of the shortest string in the list. We will use this value to loop through each character in the shortest string and compare it to the other strings in the list.
  • After that, we loop through each character in the shortest string, stopping at the first mismatch.
  • For each character position i, we check if all the other strings in the list have the same character at that position as the first string strs[0]. If there is a mismatch, we return the prefix up to that point.
  • If we get through the entire loop without finding a mismatch, we know that the entire shortest string is the common prefix.

This method has an O(N*M) time complexity, where N is the number of strings in the list and M is the length of the shortest string. It is because we need to loop through each character in the shortest string, and we need to compare each character to the other strings in the list.

Using the "zip" function

Another approach to finding the longest common prefix in Python is the zip Function. You use the zip function to find the longest common prefix in Python.

Code:

Output:

foo

Explanation:

  • We first handle the special case where the list is empty, as in the previous implementation.
  • We use the zip function to group the characters of each string together by position. The * in zip(*strs) is used to unpack the list of strings as separate arguments to the zip function.
  • We loop through each group of characters, checking if there is more than one distinct character in the group.
  • If there is more than one distinct character in the group, we return the prefix up to that point.
  • If we get through the entire loop without finding a mismatch, we know that the entire shortest string is the common prefix. We return the shortest string in the list, which is guaranteed to be the common prefix.

This algorithm has a time complexity of O(N*M), where N is the number of strings in the list and M is the length of the shortest string. However, it can be faster than the previous implementation in some cases, especially if there are only a few distinct characters in the list of strings.

Using binary search approach

The basic idea of this approach is to perform a binary search on the length of the common prefix, starting with a range of [0, min_len], where min_len is the length of the shortest string in the list. At each step, we check if the prefix of length mid is a common prefix among all the strings. If it is, we can discard the lower half of the range since we know that prefixes shorter than mid are also common prefixes. If it is not, we can discard the upper half of the range since we know that prefixes longer than mid cannot be common prefixes.

Code:

Output:

foo

Explanation:

  • We first handle the special case where the list is empty, as in the previous implementations.
  • We find the length of the shortest string in the list, as in the previous implementations.
  • We set up the initial range for the binary search, which is [0, min_len]. To record the current range, we use left and right variables.
  • We enter a while loop that continues until the range is reduced to a single value. At each step, we compute the midpoint of the range using integer division. We use (left + right + 1) // 2 instead of (left + right) // 2 to ensure that the midpoint is rounded up instead of down. It is due to the reason that we want to return the length of the last element in the range.
  • We determine whether the mid-length prefix is common to all the strings in the list. We do this using a generator expression with the all function. If the prefix is a common prefix, we update the range to [mid, right]. Otherwise, we update the range to [left, mid - 1].
  • We continue the loop until the range is reduced to a single value, which represents the length of the longest common prefix.
  • Finally, we return the longest common prefix by taking a slice of the first string in the list up to the computed length.

This algorithm has a time complexity of O(NMlogM), where N is the number of strings in the list and M is the length of the shortest string. The binary search reduces the number of comparisons we need to make compared to the previous implementations, resulting in a faster algorithm.

Using Trie-based approach

In this approach, we create a Trie data structure to store the prefixes of all the strings in the list. After that, we traverse the Trie from the root to the deepest node that has more than one child. The longest common prefix is represented by the path from the root to this node.mid-1.

Example:

Here is an example implementation of this approach in Python:

Output:

foo

Explanation:

  • We define a TrieNode class to represent a node in the Trie. Each node has a children dictionary to store its child nodes, and a is_end_of_word flag to indicate if the node represents the end of a word.
  • We define a Trie class to represent the Trie data structure. The insert method inserts a word into the Trie by traversing the Trie from the root and adding new nodes as necessary. The longest_common_prefix method traverses the Trie from the root to the deepest node that has more than one child, and returns the path from the root to this node as the longest common prefix.
  • We create a new Trie instance and insert all the strings in the list into the Trie.
  • To determine the longest common prefix, we use the Trie instance's longest_common_prefix
  • The longest_common_prefix method starts at the root of the Trie and initializes an empty string to represent the prefix. After that, it iteratively traverses the Trie until it reaches the deepest node that has more than one child. At each step, it checks if the current node has exactly one child and is not the end of a word. If so, it adds the key of the child to the prefix and moves to the child node. If not, the longest common prefix is the current prefix.
  • This algorithm has a time complexity of O(NM), where N is the number of strings in the list and M is the length of the longest common prefix. In the worst case, where all the strings have the same prefix, the Trie will have M levels and each level will have at most 26 children. Therefore, the space complexity of the Trie is O(NM). However, the actual space used by the Trie may be less than this in practice, depending on the structure of the input strings.