



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of edit distance and its applications in measuring the similarity between two text strings. It also describes the five transformation operations and their associated costs used to find the minimum cost sequence of transformation operations that transforms one string into another. The applications of edit distance in spelling correction and characterizing the similarity of DNA or protein sequences are also discussed.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!
Programming Assignment : Edit Distance Target : Many word processors and keyword search engines have a spelling correction feature. If you type in a misspelled word x , the word processor or search engine can suggest a correction y. The correction y should be a word that is close to x. One way to measure the similarity in spelling between two text strings is by “edit distance”. The notion of edit distance is useful in other fields as well. For example, biologists use edit distance to characterize the similarity of DNA or protein sequences. Background : The edit distance d ( x , y ) of two strings of text, x [1.. m ] and y [1.. n ], is defined to be the minimum possible cost of a sequence of “ transformation operations”(defined below) that transforms string x [1.. m ] into string y [1.. n ]. To define the effect of the transformation operations, we use an auxiliary string z [1.. s ] that holds the intermediate results. At the beginning of the transformation sequence s = m and z [1.. s ] = x [1.. m ] (i.e., we start with string x [1.. m ]). At the end of the transformation sequence, we should have s = n and z [1.. s ] = y [1.. n ](i.e., our goal is to transform into string y [1.. n ]). Throughout the transformation, we maintain the current length s of string z , as well as a cursor position i , i.e., an index into string z. The invariant 1 i s +1 holds at all times during the transformation. (Notice that the cursor can move one space beyond the end of the string z in order to allow insertion at the end of the string.) Each transformation operation may alter the string z , the size s , and the cursor position i. Each transformation operation also has an associated cost. The cost of a sequence of transformation operations is the sum of the costs of the individual operations on the sequence. The goal of the edit-distance problem is to find a sequence of transformation operation of minimum cost that transforms x [1.. m ] into y [1.. n ]. There are five transformation operations: Operatio n Cos t Effect left 0 If i = 1 then do nothing. Otherwise, set i i - right 0 If^ i^ =^ s^ +1^ then^ do^ nothing. Otherwise,^ set^ i^ ^ i -1. replace 4 If i = s +1 then do nothing. Otherwise, replace the character under the cursor by another character c by setting z [ i ] c , and then incrementing i. delete 2 If i = s +1 then do nothing. Otherwise, delete the character c under the cursor by setting z [ i .. s ] z[ i +1.. s +1] and
decrementing s. The cursor position i does not change. insert 3 Insert the character c into string z by incrementing s , setting z [ i +1.. s ] z [ i .. s -1], setting z [ i ] c , and then incrementing index i. As an example, one way to transform the source string algorithm to the target string
exhibits overlapping subproblems. (e) Describe a dynamic-programming algorithm that computes the edit distance from x [1.. m ] to y [1.. n ].(Do not use a memoized recursive algorithm. Your algorithm should be a classical, bottom-up, tabular algorithm.) Analyze the running time and space requirements of your algorithm.
(f) Implement your algorithm as a computer program in any language you wish. Your program should calculate the edit distance d ( x , y ) between two strings x and y using dynamic programming and print out the corresponding sequence of transformation operations in the style of Table 1. Run your program on the strings x = “electrical engineering”, y = “computer science”. Sample input and output text is provided in the data set to help you debug your program. These solutions are not necessarily unique: there may be other sequences of transformation operations that achieve the same cost. As usual, you may collaborate to solve this problem, but you must write the program by yourself. (g) Run your program on the three input files provided. Each input file contains the following four lines: