Edit Distance in Python
The "edit distance" between two strings is the minimum number of operations (insertions, deletions, and substitutions) required to transform one string into the other. This concept is used in various applications such as spell correction, DNA sequence alignment, etc.
For example, the edit distance between the strings "kitten" and "sitting" is 3, as it takes 3 operations to transform "kitten" into "sitting" (change "k" to "s", change "e" to "i", and add "g" at the end).
In Python, we can calculate the edit distance between two strings using the distance module from the Pylev library. It can be implemented by the following:
The edit distance between 'kitten' and 'sitting' is: 3
Alternatively, you can implement the edit distance algorithm yourself using a dynamic programming approach, and can be implemented as follows:
This code returns the edit distance between str1 and str2 as 3.
The dynamic programming implementation can be described as follows:
The edit distance can be calculated using a dynamic programming approach. It involves creating a 2D table dp with m + 1 rows and n + 1 columns, where m and n are the lengths of the two strings str1 and str2. After that, the table is filled in using a series of rules, based on whether the characters in str1 and str2 match or not. The final result is the edit distance between str1 and str2 which is stored in dp[m][n].
The idea behind the dynamic programming approach is to use a 2D table dp to store the intermediate results and avoid redundant calculations. The dp[i][j] field stores the edit distance between the first i characters of string str1 and the first j characters of string str2.
The first for loop initializes the first row and first column of the dp table with the number of operations required to transform an empty string into str1 or str2. It is just the same as the relevant string's length.
The edit distance between two strings str1 and str2 can be calculated using the second for loop. If the current characters in both strings match, the corresponding dp entry is equal to the previous dp entry (dp[i - 1][j - 1]). It means no operations are required to transform str1 into str2.
If the characters do not match, the corresponding dp entry is equal to the minimum of the three possible operations: deletion, insertion, and substitution. The dp[i][j - 1] entry corresponds to deletion, the dp[i - 1][j] entry corresponds to insertion, and the dp[i - 1][j - 1] entry corresponds to substitution. The 1 in the formula 1 + min(...) represents the cost of the current operation. After the for loop, the dp[m][n] entry will contain the final result, which is the minimum number of operations required to transform str1 into str2.
The edit distance between two strings is determined using a similar methodology by Levenshtein. The distance functions from the Pylev package. The difference is that the pylev library is implemented in C, which makes it much faster than a pure Python implementation.
Consider the two strings "kitten" and "sitting". The dynamic programming table dp would look like this:
Each dp[i][j] entry represents the minimum number of operations required to transform the first i characters of kitten into the first j characters of sitting.
For example, dp = 2 means that it takes 2 operations to transform the first 2 characters of kitten (ki) into the first 2 characters of sitting (si). The dynamic programming approach gradually fills in the dp table by considering each possible operation at each step.
JavaTpoint offers too many high quality services. Mail us on h[email protected], to get more information about given services.
JavaTpoint offers college campus training on Core Java, Advance Java, .Net, Android, Hadoop, PHP, Web Technology and Python. Please mail your requirement at [email protected].
Duration: 1 week to 2 week