Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Longest Common Subsequent DAA , Study notes of Design and Analysis of Algorithms

please download for Longest Common Subsequent DAA

Typology: Study notes

2016/2017

Uploaded on 12/31/2017

gaurav-sharma-7
gaurav-sharma-7 🇮🇳

4.5

(19)

5 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Longest Common Subsequence
Definition: The longest common
subsequence or LCS of two strings S1 and
S2 is the longest subsequence common
between two strings.
S1 : A-- AT-- G G C C-- A TAn=10
S2: AT A T A A T T CT A T -- m=12
The LCS is AATCAT.The length of the LCS is 6.
The solution is not unique for all pair of strings. Consider the pair (ATTA,
ATAT). The solutions are ATT, ATA. In general, for arbitrary pair of
strings, there may exist many solutions.
LCS Theorem
The LCS can be found by dynamic programming
formulation. One can easily show:
Theorem: With a score of 1 for each match and a zero for
each mismatch or space , the matched characters in an
alignment of maximum value for a LCS.
Since it is using the general dynamic programming
algorithm its complexity is O(nm) .
A longest substring problem, on the other hand has
aO(n+m) solution. Subsequences are much more
complex than substrings.
Can we do better for the LCS problem? We will
see
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Longest Common Subsequent DAA and more Study notes Design and Analysis of Algorithms in PDF only on Docsity!

Longest Common Subsequence

„ Definition: The longest common

subsequence or LCS of two strings S1 and

S2 is the longest subsequence common

between two strings.

S1 : A -- A T -- G G C C -- A T A n= S2: A T A T A A T T C T A T -- m=

The LCS is AATCAT. The length of the LCS is 6. The solution is not unique for all pair of strings. Consider the pair ( ATTA, ATAT ). The solutions are ATT, ATA. In general, for arbitrary pair of strings, there may exist many solutions.

LCS Theorem

„ The LCS can be found by dynamic programming formulation. One can easily show: ‰ Theorem: With a score of 1 for each match and a zero for each mismatch or space , the matched characters in an alignment of maximum value for a LCS. „ Since it is using the general dynamic programming algorithm its complexity is O(nm). „ A longest substring problem, on the other hand has a O(n+m) solution. Subsequences are much more complex than substrings. „ Can we do better for the LCS problem? We will see …

„ The optimal alignment is shown above. Note the alignment shows three insert (dark), one delete green) and three substitution or replacement operations (blue), which gives an edit distance of 7. „ But, the 3 replacement operations can be realized by 3 insert and 3 delete operations because a replacement is equivalent to first delete the character and then insert a character in its place like:

S1 : A -- A T -- G G C C -- A T A n= S2: A T A T A A T T C T A T -- m=

G -- G -- C -- -- A -- T -- T

„ if we give a cost of 2 for replace operation

and cost of 1 for both insert and delete

operations, the minimum edit distance D can

be computed in terms of the length L of LCS

as:

„ For the above example, n= 10, m= 12, L =6.

So, D= 10 ( 6 insert and 4 delete).

D = m + n − 2 L

A Faster Algorithm for LCS

„ An algorithm that is asymptotically better than O ( nm ) for determining LCS. „ Implies that for special cases of edit distance, there exist more efficient algorithm. „ Definition: ‰ Let π be a set of n integers, not necessarily distinct. „ Definition: ‰ An increasing subsequence(IS) of π is a subsequence of π whose values are strictly increasing from left to right. „ Example: π=(5,3,4,4, 9,6,2,1,8,7,10). IS= (3,4,6,8,10), (5,9,10)

„ Definition:

‰ A longest increasing subsequence(LIS) of π is an IS π of maximum length.

„ Definition:

‰ A decreasing subsequence (DS) of π is a non- increasing subsequence f π.

„ Example: DS =(5,4,4,2,1).

„ Definition:

‰ A cover is a set of disjoint DS of π that covers or contains all elements of π. The size of the cover c equals the number of DS in the cover.

„ Example: π=(5,3,4,9,6,2,1,8,7) Cover:{

(5,3,2,1),(4),(9,6),(8,7)}. C =#of DS =4.

„ Definition:

‰ A smallest cover ( SC ) is a cover with a minimum value of c.

Determine LIS and SC simultaneously in

O(nlogn)

„ Lemma :

‰ If I is an IS of π with length equal to the size of a cover C of π, then I is a LIS of π and C is the smallest cover of size c.

Example

„ D1 =(5,3,2,1), D2 =(4), D3 =(9,6), D4 =(8,7),

D4 =(10)

„ The algorithm has O ( n 2 ) complexity. We will

present an O ( n logn ) algorithm.

An Efficient Algorithm for Constructing

the Cover

„ We use a data structure which is a list containing the last number of each of the decreasing sequence that is being constructed.

„ The list is always sorted in increasing order. An identifier indicating which list the number belongs to also included.

„ Procedure Decreasing Sequence Cover

„ Input: π= , the list of input numbers.

„ Output: the set of decreasing sequences Di constituting the cover.

( x 1 , x 2 ,......... xn )

O ( n logn ) Algorithm

„ Initialize: i ←1; Di =( x1 ); L= ( x1 , i ) ; j ←1; „ For i =2 to n do ‰ Search the x -fields of L to find the first x- value such that xi < x. ….takes O ( logn ) time. ‰ If such a value exists, then insert x at the end in the list Di and set xix in L… This step takes constant time. ‰ If such a value does not exist in L , then set jj+ 1. insert in L a new element ( x,j ) and start a new decreasing sequence Dj =( x ) End

„ Lemma: ‰ At any point in the execution of the algorithm the list L is sorted in increasing order with respect to x -values as well as with respect to identifier value.

„ In fact two separate lists will be better from practical implementation point of view.

„ Theorem: ‰ The greedy cover can be constructed taking O ( nlogn ) time. A longest increasing sequence and a smallest cover thus can be constructed using O ( nlogn ) time.

„ Definition:

‰ for each distinct character x in S1, define list(x) to be the positions of x in S2 in decreasing order.

„ Example: list ( a )= (6,3,2); list ( b )=(4,1),

lis t( c )=(5), lis t( x )= φ (empty sequence).

„ Definition: Let Π ( S1,S2 ) be a sequence

obtained by concatenating list ( si ) for

i =1,2,…,n where n is the length of S1 and si

is the i th symbol of S1.

„ Example: Π ( S1,S2 )= (6,3,2,4,1,6,3,2,5).

„ Theorem: ‰ Every increasing sequence I of Π ( S1,S2 ) specifies an equal length common subsequence of S 1 and S 2 and vice versa. Thus a longest common subsequence LCS of S 1 and S 2 corresponds to a longest increasing sequence of Π ( S1,S2 ).

„ Example: Π ( S1,S2 )= (6,3,2,4,1,6,3,2,5). The possible longest increasing sequences used as indices to access the characters in S2 yield the LCS as: (1,2,5)= b a c , (2,3,5)= a a c , (3,4,6)= a b a for S1 = a b a c x an d S2 = b a a b c a.