














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Methods for identifying previously unrecognized members of a protein family in computational biology. The approaches include model-based methods like motif-based and hidden markov model-based, and non-model-based methods like family pairwise search (fps) and cobbling. The document also compares the performance of these methods using various evaluation metrics.
Typology: Slides
1 / 22
This page cannot be seen from the preview
Don't miss anything!
Robert F. Murphy
Find previously unrecognized members of a family
Develop a model of a family
Motifs can be summarized and searched for using Position-Specific Scoring Matrices
Calculated from a multiple alignment of a conserved region for members of a family
Unsupervised learning methods can be used to find motifs in unaligned sequences
Best characterized algorithm is MEME T.L. Bailey & C. Elkan (1995) Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning J. 21 :51-
Pick “most representative” protein sequence from a family
Convert it to a profile by replacing each amino acid by the corresponding column from a similarity matrix
For each recognized “motif” in the family, replace the corresponding section of the profile with the profile of the motif
scores from profiles of conserved motifs
similarity scores for sequence from “most representative” family member
sequence of “most representative” family member
For all known members of family, calculate (pairwise) homology to each sequence in database (using BLAST) and sum those scores
Compare BLAST using a randomly chosen family member, BLAST FPS, MEME, HMMER
W.N. Gundy (1998) Homology Detection via Family Pairwise Search. J. Comput. Biol. 5 :479-
For each method For each known protein family Train with family members Search database for matches Rank by score from search Determine how many known family members are ranked highly
Caution!
True positive defined as being listed as a member of the family in the PROSITE compilation Some false positives could be actual family members that were missed during PROSITE compilation! (Should be minor effect)
BLAST FPS
BLAST HMMER
MAST
Compare BLAST, BLAST FPS, cobbled BLAST, cobbled BLAST FPS
W.N. Grundy and T.L. Bailey (1999) Family pairwise search with embedded motif models. Bioinformatics 15: 463-
Evaluation metric
rank sum calculate difference in ROC 50 for two methods for a given family sort by absolute value of difference sum ranks of families for which one method is better than the other Bigger is better!