Incomputer science, theWagner–Fischer algorithm is adynamic programming algorithm that computes theedit distance between two strings of characters.
The Wagner–Fischer algorithm has a history ofmultiple invention. Navarro lists the following inventors of it, with date of publication, and acknowledges that the list is incomplete:[1]: 43
The Wagner–Fischer algorithm computes edit distance based on the observation that if we reserve amatrix to hold the edit distances between allprefixes of the first string and all prefixes of the second, then we can compute the values in the matrix byflood filling the matrix, and thus find the distance between the two full strings as the last value computed.
A straightforward implementation, aspseudocode for a functionDistance that takes two strings,s of lengthm, andt of lengthn, and returns theLevenshtein distance between them, looks as follows. The input strings are one-indexed, while the matrixd is zero-indexed, and[i..k] is a closed range.
functionDistance(chars[1..m],chart[1..n]):// for all i and j, d[i,j] will hold the distance between// the first i characters of s and the first j characters of t// note that d has (m+1)*(n+1) valuesdeclareintd[0..m,0..n]seteachelementindtozero// source prefixes can be transformed into empty string by// dropping all charactersforifrom1tom:d[i,0]:=i// target prefixes can be reached from empty source prefix// by inserting every characterforjfrom1ton:d[0,j]:=jforjfrom1ton:forifrom1tom:ifs[i]=t[j]:substitutionCost:=0else:substitutionCost:=1d[i,j]:=minimum(d[i-1,j]+1,// deletiond[i,j-1]+1,// insertiond[i-1,j-1]+substitutionCost)// substitutionreturnd[m,n]
Two examples of the resulting matrix (hovering over an underlined number reveals the operation performed to get that number):
|
|
Theinvariant maintained throughout the algorithm is that we can transform the initial segments[1..i] intot[1..j] using a minimum ofd[i,j] operations. At the end, the bottom-right element of the array contains the answer.
As mentioned earlier, theinvariant is that we can transform the initial segments[1..i] intot[1..j] using a minimum ofd[i,j] operations. This invariant holds since:
s[1..i] can be transformed into the empty stringt[1..0] by simply dropping alli characters. Similarly, we can transforms[1..0] tot[1..j] by simply adding allj characters.s[i] = t[j], and we can transforms[1..i-1] tot[1..j-1] ink operations, then we can do the same tos[1..i] and just leave the last character alone, givingk operations.s[1..i] tot[1..j-1] ink operations, then we can simply addt[j] afterwards to gett[1..j] ink+1 operations (insertion).s[1..i-1] tot[1..j] ink operations, then we can removes[i] and then do the same transformation, for a total ofk+1 operations (deletion).s[1..i-1] tot[1..j-1] ink operations, then we can do the same tos[1..i], and exchange the originals[i] fort[j] afterwards, for a total ofk+1 operations (substitution).s[1..n] intot[1..m] is of course the number required to transform all ofs into all oft, and sod[n,m] holds our result.This proof fails to validate that the number placed ind[i,j] is in fact minimal; this is more difficult to show, and involves anargument by contradiction in which we assumed[i,j] is smaller than the minimum of the three, and use this to show one of the three is not minimal.
Possible modifications to this algorithm include:
j.[0,1].cost values can be computed in parallel, and the algorithm can be adapted to perform theminimum function in phases to eliminate dependencies.By initializing the first row of the matrix with zeros, we obtain a variant of the Wagner–Fischer algorithm that can be used forfuzzy string search of a string in a text.[1] This modification gives the end-position of matching substrings of the text. To determine the start-position of the matching substrings, the number of insertions and deletions can be stored separately and used to compute the start-position from the end-position.[4]
The resulting algorithm is by no means efficient, but was at the time of its publication (1980) one of the first algorithms that performed approximate search.[1]