Minimum Edit Distance is an algorithm to find how similar are two strings.

It measures the minimum cost for the application of the editing operations to be performed in order to align X to Y.

Edit operations are: insert, delete and substitute

Initialization: create a matrix with rows (n=|x|) and m+1 cols (in other words, a row and a column more than the length of the two words, with all zeros)

For each cell, compare the adjacent cells (up, left, diagonal left) + 1: take the minimum and write it down in the cell. If x and y are the same though, if the letters are the same then write 0, else copy cost.

def min_edit_dist(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0]*(n+1) for _ in range(m+1)]
 
    for i in range(m+1): dp[i][0] = i
    for j in range(n+1): dp[0][j] = j
 
    for i in range(1, m+1):
        for j in range(1, n+1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i][j-1], dp[i-1][j], dp[i-1][j-1])
 
    return dp[m][n]

Applications

  • hey vs hello
hello
012345
h101234
e210123
y321123
  • mela vs mamma
mamma
012345
m101234
e211234
l322234
a432333

Note that sometimes, the letters are the same, example a with a. If yes, you should copy the top left corner value (and cost = 0), else cost = 1.