











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Instructions on how to use computer programs and online databases to analyze amino acid sequences and construct a phylogenetic tree. The example focuses on the evolutionary relationships between T. rex, chicken, rainbow trout, human, dog, cattle, toxodon, mastodon, salamander, frog, and T. rex. Students will learn how to obtain protein sequences from GenBank, align sequences using MAB, and interpret the resulting phylogenetic tree. This activity helps students understand how animals are related and the importance of studying extinct species and publishing research findings.
What you will learn
Typology: Exercises
1 / 19
This page cannot be seen from the preview
Don't miss anything!
[Written by: Baylee Goodwin, Dane Besser, and Stephen Ramsey]
Vocabulary Words Phylogenetics Taxa Node Speciation event
Most Recent Common Ancestor (MRCA) Descendants Sister Clades Outgroup
Background
During the Mass Spectrometry and BLAST activities, you were given amino acid sequences that had been recovered from a fossilized bone specimen from a Tyrannosaurus rex (as well as sequences from a Hadrosaur and a Mastodon). You learned how to input the T. rex amino acid sequence into BLAST to identify what present-day animals are most closely related to T. rex. In this activity you will learn how to use a computer to analyze related amino acid sequences from a variety of animals to gain insight on their evolutionary relationships.
Phylogenetics is the study of evolutionary relationships among a set of taxa, where taxa is another name for groups of organisms, like plants and animals. In phylogenetics, evolutionary relationships are laid out on a phylogenetic tree (fig 1 &3). The root of the tree is the start of the evolutionary lineage being depicted. As you move from the left to the right, you are moving forward in time. As time passes you can see how nodes diverge in two directions, this is a speciation event. Which is when a group of animals separates and evolves into two brand new groups of animal. The nodes also mark where the most recent common ancestor (MRCA) is. For example, in figure 3, A and B are groups of animals that diverged from a MRCA found at the node that join the two. The leaves at the end of the tree mark the descendants of the ancestors. Phylogenetic trees are a useful way to compare
Figure 2: This is an example of DNA sequences from multiple species lined up together. Species who share mutations that others do not have are more closely related. This is how molecular biology can help determine evolutionary relationships.
Figure 1 : This is an example of a phylogenetic tree showing how to read it.
how animals are related to one another. In figure 3, animals from group A and B are more closely related to each other than they are to animals in group C. Therefore, A and B would be considered sister clades, since they are the most closely related. Group C would be considered the outgroup since they are the most distantly related.
There are two main methods on how to determine the evolution of a set of taxa: morphology vs molecular data. Morphology uses physical features of animals to determine how they evolved, whereas molecular data uses DNA/amino acid sequences to determine where speciation events occurred. Molecular data is typically more accurate since mutations in DNA are the driving force for evolution.
When mutations arise in the DNA sequence of an organism, they can result in changes to the translated amino acid sequence of a protein. For example, the original DNA sequence in a small portion of a gene might have read ATAAGT, but after the mutation it reads ATAACT (i.e., a G was replaced with a C). This changes the amino acid in the sequence from leucine to a stop codon (signaling the end of the protein), which results in the cell making a shortened protein whose function may substantially differ from the original full- length protein. When a mutation is present in an organism's cell, it can be passed on from the organism to its offspring, which is how animals evolve on a molecular scale.
The genetic differences between two species, such as a bird and a lizard, represent the accumulation of billions of mutations over many millions of years. The differences in the DNA (or, as we will study today, protein) sequences among a set of representative species can be used to determine how the species are related. As we will discover, the more closely related organisms will have more similar protein sequences, and the more distantly related organisms will have more dissimilar protein sequences.
In order to create a phylogenetic tree, the first step is to obtain protein sequence data from a set of animal species that we want to compare. We will be searching for the "alpha-2 type 1 collagen" protein sequence since that is the what scientists were able to extract from the fossilized femur bone of the T. rex. Collagen is evolutionarily rather well-conserved across species, which is why it is a good choice for using amino acid sequences to build a phylogenetic tree. When a protein is “well- conserved” it means that the protein is found in multiple species that are distantly related, collagen is a well- conserved protein found in all animals with true bone. In order to find the collagen sequence, you will conduct a search in an online database called GenBank. The alpha-2 type 1 collagen protein sequences have already been collected for you for most of the animals, however you still need to collect the appropriate amino acid sequence for the T. rex.
Figure 3: This is an example of a basic phylogenetic tree. It highlights where the most recent common ancestors (MRCA) are found on the tree, and which animal groups evolved from the ancestor.
GenBank should then display a FASTA record page, like this:
"FASTA" (an abbreviation for "Fast-All") is the simple text-based file format that is often used to transmit DNA or amino acid sequences from one computer program to another. In a FASTA file, the DNA nucleotides or protein amino acids are represented by individual letter codes. The FASTA file format begins with a ">" (greater than) character followed by a description, which is then followed by lines of sequence data.
(^1) This program uses a common method for aligning the sequences, called MUSCLE (Multiple Sequence Comparison by Log-Expectation). This step is important because it uses an algorithm to align each peptide sequence in order to accurately predict where mutations occurred that signal how the animals evolved. If the sequences are not aligned they can not be used to generate a phylogenetic tree.
Analyzing results
Evaluating results
Trout
MLSFVDNRILLLLAVTSLLASCQSGGLKGPRGAKGPRGDRGPQGPNGRDGKAGLPGIAGPPGPPGLGG
NFAAQFDGGKGSDPGPGPMGLMGSRGPNGPPGAPGPQGFTGHAGEPGEPGQTGSIGARGPTGSAGKP
GEDGNNGRPGKPGDRGGPGTQGARGFPGTPGLPGMKGHRGYNGLDGRKGESGTAGAKGETGAHGA
Dog
MLSFVDTRTLLLLAVTSCLATCQSLQEATARKGPTGDRGPRGERGPPGPPGRDGDDGIPGPPGPPGPPG
PPGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPGASGAPGPQGFQGPAGEPGEPGQTGPAGARGPPG
PPGKAGEDGHPGKPGRPGERGVVGPQGARGFPGTPGLPGFKGIRGHNGLDGLKGQPGAPGVKGEPGA
PGENGTPGQTGARGLPGERGRVGAPGPAGARGSDGSVGPVGPAGPIGSAGPPGFPGAPGPKGEIGPVG
NPGPAGPAGPRGEVGLPGVSGPVGPPGNPGANGLTGAKGAAGLPGVAGAPGLPGPRGIPGPVGAAGAT
GARGIVGEPGPAGSKGESGNKGEPGSAGAQGPPGPSGEEGKRGPNGEAGSAGPSGPPGLRGSPGSRG
Frog
MLSFVDLRSVLLLAVTLYLVTCQEVRRGPRGDKGPPGEQGPPGIPGRDGEDGLPGLPGPPGVPGLGGNF
AAQYDPSKSAEPGQQGIMGPRGPPGPPGSPGSQGFQGLPGENGEPGQTGPVGSRGPSGAPGKAGEDG
HPGKSGRPGERGPVGPQGARGFPGTPGLPGFKGIRGHTGSDGQKGAPGAAGVKGENGANGDNGSPG
QAGARGLPGERGRIGPAGSAGSRGSDGSSGPVGPAGPIGSAGAPGLPGAPGAKGELGPAGNNGPTGA
AGGRGEPGPPGSLGPAGPPGNPGTNGVNGAKGTAGLPGVGGAPGLPGGRGIPGPAGPAGPSGARGLA
GDPGIAGGKGDTGSKGEPGSVGQQGPAGPSGEEGKRGPNGEAGSSGPSGNAGIRGVPGTRGLPGPD
GRAGGIGPAGSRGSSGPPGARGPNGDAGRPGEPGLLGARGLPGFSGSNGPQGKEGPAGPQGIEGRSG
AAGPAGARGEPGAIGFPGPKGPNGEPGKNGDKGNQGPSGNRGAPGPDGNNGAQGPAGLGGATGEKG
EQGPSGAPGFQGLPGPGGPPGEVGKPGERGAPGDFGPPGSAGTRGERGAPGESGGAGPHGPSGSRGP
SGAPGPDGQKGEPGAAGLNGGLGPSGPAGIPGERGTAGTPGTKGEKGDAGNSGDYGNPGRDGARGP
AGAAGAPGPAGGPGDRGESGPAGPSGVAGPRGAPGERGEAGPAGPTGFAGPPGAAGHTGAKGDRGA
Toxodon
GPMGIMGPRGPPGASGAPGPAGEPGEPGQTGPAGARGPPGPPGKAGEDGHPGKPGRPGERGVVGPQG
ARGFPGTPGIPGFKGIRGHNGIDGIKGQPGAPGVKGEPGAPGENGTPGQAGARGIPGERGRVGAPGPA
GARGSDGSVGPVGPAGPIGSAGPPGFPGAPGPKGEIGPVGNPGPAGPAGPRGEVGIPGVSGPVGPPGN
PGANGITGAKGAAGIPGVAGAPGIPGPRGIPGPVGAAGATGARGIVGEPGPAGSKGESGNKGEPGSAG
PQGPPGPAGEEGKRGPNGEAGSTGPTGPPGIRGSRGIPGADGGSRGATGPAGVRGDSGRPGEPGIMG
PRGFPGSPGNIGPAGKEGPVGIPGIDGRPGPTGPAGARGEPGNIGFPGPKGPTGDPGKNGDKGHAGIA
GARGPAGPPGFQGIPGPAGTAGEVGKPGERGIPGEFGIPGPAGARGERGPPGESGAVGPAGPIGSRGPS
GPPGPDGNKGEPGNIGAIGTAGPSGPSGIPGERGAAGIPGGKGEKGETGIRRGAPGAIGAPGPAGANG
DRGEAGPAGPAGPAGPRGSPGERGEVGPAGPNGFAGPAGAAGQPGAKGERGTKGPKGENGPVGPTGP
VGAAGPAGPNGPPGPAGSRGDGGPPGATGFPGAAGRTGPPGPAGITGPPGPPGAAGKEGIRGPRGDQ
GPVGRSGETGASGIPGFAGEKGPAGEPGTAGIPGTPGPQGIIGAPGIIGIPGSRGERGIPGVAGSIGEPG
PIGIAGPPGARGPPGAVGNPGVNGAPGEAGRHGNRGEPGPAGSVGPAGAVGPRGPSGPQGIRGDKGE
PGDKGPRGIPGIKGHNGIQGIPGIAGQHGDQGAPGAVGPAGPRGPAGPSGPAGKDGRIGHPGTVGPA