Edit Distance
"Lazy-evaluation in a functional language is exploited to make the simple dynamic-programming algorithm for the edit-distance problem run quickly on similar strings: being lazy can be fast." [Inf. Proc. Lett., 43(4), pp.207-212, Sept' 1992].
a = acgtacgtacgt b = acatacttgtact a = acgtac gtacgt || ||| |||| | b = acatacttgtac t ^ ^^ ^ | || | | || delete | || | insert*2 | change d a b = 4
The algorithm below organises the edit distance "matrix" by diagonals, not by rows or columns. Its time-complexity is O(n×d) where n is the length of the sequences and d is the edit distance between them, i.e. it is fast if the sequences are similar, d<<n. The original program was in Lazy ML; it is given in Haskell 98 here:
module Edit_IPL_V43_N4 (d) where -- compute the edit distance of sequences a and b. d a b = let -- diagonal from the top-left element mainDiag = oneDiag a b (head uppers) ( -1 : (head lowers)) -- diagonals above the mainDiag uppers = eachDiag a b (mainDiag : uppers) -- diagonals below the mainDiag lowers = eachDiag b a (mainDiag : lowers) -- ! oneDiag a b diagAbove diagBelow = -- \ \ \ let -- \ \ \ doDiag [] b nw n w = [] -- \ nw n doDiag a [] nw n w = [] -- \ \ doDiag (a:as) (b:bs) nw n w = -- w me let me = if a==b then nw -- dynamic programming DPA else 1+min3 (head w) nw (head n) in me : (doDiag as bs me (tail n) (tail w)) firstelt = 1+(head diagBelow) thisdiag = firstelt:(doDiag a b firstelt diagAbove (tail diagBelow)) in thisdiag min3 x y z = -- see L. Allison, Lazy Dynamic-Programming can be Eager -- Inf. Proc. Letters 43(4) pp207-212, Sept' 1992 if x < y then x else min y z -- makes it O(|a|*D(a,b)) -- min x (min y z) -- makes it O(|a|*|b|) -- the fast one does not always evaluate all three values. eachDiag a [] diags = [] eachDiag a (b:bs) (lastDiag:diags) = let nextDiag = head(tail diags) in (oneDiag a bs nextDiag lastDiag):(eachDiag a bs diags) -- which is the diagonal containing the bottom R.H. elt? lab = (length a) - (length b) in last( if lab == 0 then mainDiag else if lab > 0 then lowers !! ( lab-1) else uppers !! (-lab-1) ) -- module under Gnu `copyleft' GPL General Public Licence.
The code above calculates the value of the edit distance between the sequences a and b only. The "matrix" does contain enough information to recover an alignment of a and b that achieves this value, but this is left as an exercise.
-
- -- Evaluating O(n×d) entries --