Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computational Biology: Family Pairwise Search and Cobbling, Slides of Computational Biology

Methods for identifying previously unrecognized members of a protein family in computational biology. The approaches include model-based methods like motif-based and hidden markov model-based, and non-model-based methods like family pairwise search (fps) and cobbling. The document also compares the performance of these methods using various evaluation metrics.

Typology: Slides

2010/2011

Uploaded on 11/02/2011

blueeyes_11
blueeyes_11 🇺🇸

4.7

(18)

261 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computational Biology, Part C
Family Pairwise Search and
Cobbling
Robert F. Murphy
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Computational Biology: Family Pairwise Search and Cobbling and more Slides Computational Biology in PDF only on Docsity!

Computational Biology, Part C

Family Pairwise Search and

Cobbling

Robert F. Murphy

Overall Goals

 Find previously unrecognized members of a family

 Develop a model of a family

PSSMs

 Motifs can be summarized and searched for using Position-Specific Scoring Matrices

 Calculated from a multiple alignment of a conserved region for members of a family

Learning PSSMs

 Unsupervised learning methods can be used to find motifs in unaligned sequences

 Best characterized algorithm is MEME  T.L. Bailey & C. Elkan (1995) Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning J. 21 :51-

Cobbling

 Pick “most representative” protein sequence from a family

 Convert it to a profile by replacing each amino acid by the corresponding column from a similarity matrix

Cobbling

 For each recognized “motif” in the family, replace the corresponding section of the profile with the profile of the motif

Cobbler Illustration

scores from profiles of conserved motifs

similarity scores for sequence from “most representative” family member

sequence of “most representative” family member

Family Pairwise Search

 For all known members of family, calculate (pairwise) homology to each sequence in database (using BLAST) and sum those scores

Which method is best?

 Compare BLAST using a randomly chosen family member, BLAST FPS, MEME, HMMER

 W.N. Gundy (1998) Homology Detection via Family Pairwise Search. J. Comput. Biol. 5 :479-

Comparison Protocol

 For each method For each known protein family Train with family members Search database for matches Rank by score from search Determine how many known family members are ranked highly

Comparison Protocol

 Caution!

True positive defined as being listed as a member of the family in the PROSITE compilation Some false positives could be actual family members that were missed during PROSITE compilation! (Should be minor effect)

Results

BLAST FPS

BLAST HMMER

MAST

Which is best (part 2)?

 Compare BLAST, BLAST FPS, cobbled BLAST, cobbled BLAST FPS

 W.N. Grundy and T.L. Bailey (1999) Family pairwise search with embedded motif models. Bioinformatics 15: 463-

Comparison Protocol

 Evaluation metric

rank sum  calculate difference in ROC 50 for two methods for a given family  sort by absolute value of difference  sum ranks of families for which one method is better than the other Bigger is better!