Movatterモバイル変換

[0]ホーム

Jump to content

Longest common subsequence

Edit links

From Wikipedia, the free encyclopedia

Algorithmic problem on pairs of sequences

Not to be confused withLongest common substring.

Comparison of two revisions of an example file, based on their longest common subsequence (black)

Alongest common subsequence (LCS) is the longestsubsequence common to all sequences in a set of sequences (often just two sequences). It differs from thelongest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classiccomputer science problem. Because it ispolynomial and has an efficient algorithm to solve it, it is employed tocompare data andmerge changes to files in programs such as thediff utility andrevision control systems such asGit. It has similar applications incomputational linguistics andbioinformatics.

For example, consider the sequences (ABCD) and (ACBAD). They have five length-2 common subsequences: (AB), (AC), (AD), (BD), and (CD); two length-3 common subsequences: (ABD) and (ACD); and no longer common subsequences. So (ABD) and (ACD) are their longest common subsequences.

	ε	A	G	C	A	T
ε	ε	ε	ε	ε	ε	ε
G	ε	${\overset {\ \ \uparrow }{\leftarrow }}$ ε	${\overset {\nwarrow }{\ }}$ (G)	${\overset {\ }{\leftarrow }}$ (G)	${\overset {\ }{\leftarrow }}$ (G)	${\overset {\ }{\leftarrow }}$ (G)
A	ε
C	ε

	A	G	C	A	T
ε	0	0	0	0	0
G	${\overset {\ \ \uparrow }{\leftarrow }}$ 0	${\overset {\nwarrow }{\ }}$ 1	${\overset {\ }{\leftarrow }}$ 1	${\overset {\ }{\leftarrow }}$ 1	${\overset {\ }{\leftarrow }}$ 1
A	${\overset {\nwarrow }{\ }}$ 1	${\overset {\ \ \uparrow }{\leftarrow }}$ 1	${\overset {\ \ \uparrow }{\leftarrow }}$ 1	${\overset {\nwarrow }{\ }}$ 2	${\overset {\ }{\leftarrow }}$ 2
C	${\overset {\ \uparrow }{\ }}$ 1	${\overset {\ \ \uparrow }{\leftarrow }}$ 1	${\overset {\nwarrow }{\ }}$ 2	${\overset {\ \ \uparrow }{\leftarrow }}$ 2	${\overset {\ \ \uparrow }{\leftarrow }}$ 2

		0	1	2	3	4	5	6	7
		ε	M	Z	J	A	W	X	U
0	ε	0	0	0	0	0	0	0	0
1	X	0	0	0	0	0	0	1	1
2	M	0	1	1	1	1	1	1	1
3	J	0	1	1	2	2	2	2	2
4	Y	0	1	1	2	2	2	2	2
5	A	0	1	1	2	3	3	3	3
6	U	0	1	1	2	3	3	3	4
7	Z	0	1	2	2	3	3	3	4

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Substring index Suffix array Suffix automaton Suffix tree Compressed suffix array LCP array FM-index Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations

		0	1	2	3	4	5	6	7
		ε	M	Z	J	A	W	X	U
0	ε	0	0	0	0	0	0	0	0
1	X	0	0	0	0	0	0	1	1
2	M	0	1	1	1	1	1	1	1
3	J	0	1	1	2	2	2	2	2
4	Y	0	1	1	2	2	2	2	2
5	A	0	1	1	2	3	3	3	3
6	U	0	1	1	2	3	3	3	4
7	Z	0	1	2	2	3	3	3	4

Movatterモバイル変換

Complexity

Solution for two sequences

Prefixes

First property

Second property

LCS function defined

Worked example

Traceback approach

Relation to other problems

Code for the dynamic programming solution

Computing the length of the LCS

Reading out a LCS

Reading out all LCSs

Print the diff

Example

Code optimization

Reduce the problem set

Reduce the comparison time

Reduce strings to hashes

Reduce the required space

Reduce cache misses

Further optimized algorithms

Behavior on random strings

Computing the longest palindromic subsequence of a string

See also

References

External links

		0	1	2	3	4	5	6	7
		ε	M	Z	J	A	W	X	U
0	ε	0	0	0	0	0	0	0	0
1	X	0	0	0	0	0	0	1	1
2	M	0	1	1	1	1	1	1	1
3	J	0	1	1	2	2	2	2	2
4	Y	0	1	1	2	2	2	2	2
5	A	0	1	1	2	3	3	3	3
6	U	0	1	1	2	3	3	3	4
7	Z	0	1	2	2	3	3	3	4