Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Reconstructing Phylogenetic Relationships: A Case Study with T. rex and Other Animals, Exercises of Phylogenetics

Instructions on how to use computer programs and online databases to analyze amino acid sequences and construct a phylogenetic tree. The example focuses on the evolutionary relationships between T. rex, chicken, rainbow trout, human, dog, cattle, toxodon, mastodon, salamander, frog, and T. rex. Students will learn how to obtain protein sequences from GenBank, align sequences using MAB, and interpret the resulting phylogenetic tree. This activity helps students understand how animals are related and the importance of studying extinct species and publishing research findings.

What you will learn

  • Which pairs of animal species are considered sister species in the phylogenetic tree?
  • Why is it important to learn more about extinct animals?
  • Why is it important to understand evolutionary relationships among animals?
  • Why is it important for scientists to publish their findings, such as genetic sequences, in public databases?
  • Which animal species is T. rex most closely related to according to the phylogenetic tree?

Typology: Exercises

2021/2022

Uploaded on 09/12/2022

arlie
arlie 🇺🇸

4.6

(17)

245 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
FROM FOSSILS TO PHYLOGENIES PART 3: HOW
DINOSAURS FIT INTO THE EVOLUTIONARY TREE
OF LIFE
[Written by: Baylee Goodwin, Dane Besser, and Stephen Ramsey]
Vocabulary Words
Phylogenetics
Taxa
Node
Speciation event
Most Recent Common Ancestor (MRCA)
Descendants
Sister Clades
Outgroup
Background
During the Mass Spectrometry and BLAST activities, you were given amino acid sequences that had
been recovered from a fossilized bone specimen from a Tyrannosaurus rex (as well as sequences from a
Hadrosaur and a Mastodon). You learned how to input the T. rex amino acid sequence into BLAST to identify
what present-day animals are most closely related to T. rex. In this activity you will learn how to use a
computer to analyze related amino acid sequences from a variety of animals to gain insight on their
evolutionary relationships.
Phylogenetics is the study of evolutionary relationships
among a set of taxa, where taxa is another name for groups of
organisms, like plants and animals. In phylogenetics, evolutionary
relationships are laid out on a phylogenetic tree (fig 1 &3). The root
of the tree is the start of the evolutionary lineage being depicted. As
you move from the left to the right, you are moving forward in
time. As time passes you can see how nodes diverge in two
directions, this is a speciation event. Which is when a group of
animals separates and evolves into two brand new groups of
animal. The nodes also mark where the most recent common
ancestor (MRCA) is. For example, in figure 3, A and B are groups
of animals that diverged from a MRCA found at the node that join
the two. The leaves at the end of the tree mark the descendants
of the ancestors. Phylogenetic trees are a useful way to compare
Figure 2: This is an example of DNA sequences
from multiple species lined up together. Species
who share mutations that others do not have are
more closely related. This is how molecular biology
can help determine evolutionary relationships.
Figure 1: This is an example of a phylogenetic tree
showing how to read it.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Reconstructing Phylogenetic Relationships: A Case Study with T. rex and Other Animals and more Exercises Phylogenetics in PDF only on Docsity!

FROM FOSSILS TO PHYLOGENIES PART 3: HOW

DINOSAURS FIT INTO THE EVOLUTIONARY TREE

OF LIFE

[Written by: Baylee Goodwin, Dane Besser, and Stephen Ramsey]

Vocabulary Words Phylogenetics Taxa Node Speciation event

Most Recent Common Ancestor (MRCA) Descendants Sister Clades Outgroup

Background

During the Mass Spectrometry and BLAST activities, you were given amino acid sequences that had been recovered from a fossilized bone specimen from a Tyrannosaurus rex (as well as sequences from a Hadrosaur and a Mastodon). You learned how to input the T. rex amino acid sequence into BLAST to identify what present-day animals are most closely related to T. rex. In this activity you will learn how to use a computer to analyze related amino acid sequences from a variety of animals to gain insight on their evolutionary relationships.

Phylogenetics is the study of evolutionary relationships among a set of taxa, where taxa is another name for groups of organisms, like plants and animals. In phylogenetics, evolutionary relationships are laid out on a phylogenetic tree (fig 1 &3). The root of the tree is the start of the evolutionary lineage being depicted. As you move from the left to the right, you are moving forward in time. As time passes you can see how nodes diverge in two directions, this is a speciation event. Which is when a group of animals separates and evolves into two brand new groups of animal. The nodes also mark where the most recent common ancestor (MRCA) is. For example, in figure 3, A and B are groups of animals that diverged from a MRCA found at the node that join the two. The leaves at the end of the tree mark the descendants of the ancestors. Phylogenetic trees are a useful way to compare

Figure 2: This is an example of DNA sequences from multiple species lined up together. Species who share mutations that others do not have are more closely related. This is how molecular biology can help determine evolutionary relationships.

Figure 1 : This is an example of a phylogenetic tree showing how to read it.

how animals are related to one another. In figure 3, animals from group A and B are more closely related to each other than they are to animals in group C. Therefore, A and B would be considered sister clades, since they are the most closely related. Group C would be considered the outgroup since they are the most distantly related.

There are two main methods on how to determine the evolution of a set of taxa: morphology vs molecular data. Morphology uses physical features of animals to determine how they evolved, whereas molecular data uses DNA/amino acid sequences to determine where speciation events occurred. Molecular data is typically more accurate since mutations in DNA are the driving force for evolution.

When mutations arise in the DNA sequence of an organism, they can result in changes to the translated amino acid sequence of a protein. For example, the original DNA sequence in a small portion of a gene might have read ATAAGT, but after the mutation it reads ATAACT (i.e., a G was replaced with a C). This changes the amino acid in the sequence from leucine to a stop codon (signaling the end of the protein), which results in the cell making a shortened protein whose function may substantially differ from the original full- length protein. When a mutation is present in an organism's cell, it can be passed on from the organism to its offspring, which is how animals evolve on a molecular scale.

The genetic differences between two species, such as a bird and a lizard, represent the accumulation of billions of mutations over many millions of years. The differences in the DNA (or, as we will study today, protein) sequences among a set of representative species can be used to determine how the species are related. As we will discover, the more closely related organisms will have more similar protein sequences, and the more distantly related organisms will have more dissimilar protein sequences.

In order to create a phylogenetic tree, the first step is to obtain protein sequence data from a set of animal species that we want to compare. We will be searching for the "alpha-2 type 1 collagen" protein sequence since that is the what scientists were able to extract from the fossilized femur bone of the T. rex. Collagen is evolutionarily rather well-conserved across species, which is why it is a good choice for using amino acid sequences to build a phylogenetic tree. When a protein is “well- conserved” it means that the protein is found in multiple species that are distantly related, collagen is a well- conserved protein found in all animals with true bone. In order to find the collagen sequence, you will conduct a search in an online database called GenBank. The alpha-2 type 1 collagen protein sequences have already been collected for you for most of the animals, however you still need to collect the appropriate amino acid sequence for the T. rex.

Figure 3: This is an example of a basic phylogenetic tree. It highlights where the most recent common ancestors (MRCA) are found on the tree, and which animal groups evolved from the ancestor.

  1. Click on the blue Search button.
  2. It will provide a list with the top results relevant to your search. It should load 3 to 4 items, be sure to click on the result that says alpha- 2 (I) chain, not alpha-1(I). The results display every known protein sequence that matches with the key words alpha 2, type 1, and T. rex. It will show results of things that do not precisely match your search, so be sure to fully read the names of the results. If you were to broaden your search to “alpha 2 collagen” it will result in hundreds of matches, rather than only three or four.
  3. Once you select the correct result, it will open up this page (pictured below). In order to make sure that you have selected the correct result, look at the column on the left hand side of the page. The fourth item down should say “source organism”, and the organism should be Tyrannosaurus rex. If it does not say it, hit the back button and retype the search query exactly as shown in Step 3.
  1. Once you reach the correct protein record page, click on the FASTA button underneath the protein’s name in black bold writing.

GenBank should then display a FASTA record page, like this:

"FASTA" (an abbreviation for "Fast-All") is the simple text-based file format that is often used to transmit DNA or amino acid sequences from one computer program to another. In a FASTA file, the DNA nucleotides or protein amino acids are represented by individual letter codes. The FASTA file format begins with a ">" (greater than) character followed by a description, which is then followed by lines of sequence data.

  1. This link will open up directly to “A la carte” mode. Under “Workflow Settings” insert a name for your analysis.
  2. Scroll to the bottom of the page and select “create workflow”. Do not change any of the settings, they are already set to the correct options for creating your phylogenetic tree.
  1. MAB should open up to the second browser tab, “data and settings”. This tab gives you the option to upload your file or paste the sequence. Copy and paste the T. rex sequence into the big text box below "Input Data" in the MAB browser window. You need to change the description to say "T-rex". Delete “P0C2W4.1 RecName: Full=Collagen alpha-2(I) chain; AltName: Full=Alpha-2 type I collagen” and replace it with “T-rex”. (Be sure to leave the > otherwise it will not recognize the format. This step is important, now instead of the tree reading the full protein name, it will read the name of the animal.)
  2. Now you will need to paste the sequences for all of the other animals being compared in your phylogenetic tree. Scroll down to the end of this PDF and copy everything under the heading “Collagen Sequence Data (Copy and past everything below, including the “>”):”. Paste all of the sequences into the text box below the T. rex sequence.
  3. Scroll down to the bottom and enter your email address if you wish to be emailed your tree, if not select “submit”. Do not change any settings before hitting submit. After clicking the Submit button, MAB will display a brief animation of a phylogenetic tree. During this time, MAB is aligning the sequences and then comparing them.^1

(^1) This program uses a common method for aligning the sequences, called MUSCLE (Multiple Sequence Comparison by Log-Expectation). This step is important because it uses an algorithm to align each peptide sequence in order to accurately predict where mutations occurred that signal how the animals evolved. If the sequences are not aligned they can not be used to generate a phylogenetic tree.

  1. The final setting that needs to be adjusted is under “Display”, change the setting from “Branch support values” to “none”.
  2. You now have your finished phylogenetic tree. It should look like this:
  3. If you want to save your tree, you can click on the "PNG" or "PDF" links underneath the tree:

Analyzing results

  1. Which of the species that you analyzed, is the T. rex most closely related to? Does this match with the BLAST results from Session 2?
  2. Which pairs of animal species are "sister species"? (i.e., which animals are most closely related?)
  3. What species is the "out-group" (i.e., the least related to the rest of the species) in this phylogenetic tree?
  4. Can you find anything puzzling with the relationships depicted in this phylogenetic tree? (hint, look at dog). Do you suppose this might reflect the fact that only a very short amino acid sequence from a single gene was analyzed?

Evaluating results

  1. Why is it important to understand evolutionary relationships among animals?
  2. Why is it important to learn more about extinct animals?
  3. Why is it important for scientists to publish their findings, such as genetic sequences, in public databases?
  4. What other questions could these same techniques be used to answer?

GLGGNFAAQYDPSKAADFGPGPMGLMGPRGPPGASGPPGPPGFQGVPGEPGEPGQTGPQGPRGPPGP

PGKAGEDGHPGKPGRPGERGVAGPQGARGFPGTPGLPGFKGIRGHNGLDGQKGQPGTPGTKGEPGAP

GENGTPGQPGARGLPGERGRIGAPGPAGARGSDGSAGPTGPAGPIGAAGPPGFPGAPGAKGEIGPAGN

VGPTGPAGPRGEIGLPGSSGPVGPPGNPGANGLPGAKGAAGLPGVAGAPGLPGPRGIPGPPGPAGPSG

ARGLVGEPGPAGAKGESGNKGEPGAAGPPGPPGPSGEEGKRGSNGEPGSAGPPGPAGLRGVPGSRGL

PGADGRAGVMGPAGNRGASGPVGAKGPNGDAGRPGEPGLMGPRGLPGQPGSPGPAGKEGPVGFPGA

DGRVGPIGPAGNRGEPGNIGFPGPKGPTGEPGKPGEKGNVGLAGPRGAPGPEGNNGAQGPPGVTGNQ

GAKGETGPAGPPGFQGLPGPSGPAGEAGKPGERGLHGEFGVPGPAGPRGERGLPGESGAVGPAGPIGS

RGPSGPPGPDGNKGEPGNVGPAGAPGPAGPGGIPGERGVAGVPGGKGEKGAPGLRGDTGATGRDGA

RGLPGAIGAPGPAGGAGDRGEGGPAGPAGPAGARGIPGERGEPGPVGPSGFAGPPGAAGQPGAKGER

GPKGPKGETGPTGAIGPIGASGPPGPVGAAGPAGPRGDAGPPGMTGFPGAAGRVGPPGPAGITGPPGP

PGPAGKDGPRGLRGDVGPVGRTGEQGIAGPPGFAGEKGPSGEAGAAGPPGTPGPQGILGAPGILGLPG

SRGERGLPGIAGATGEPGPLGVSGPPGARGPSGPVGSPGPNGAPGEAGRDGNPGNDGPPGRDGAPGF

KGERGAPGNPGPSGALGAPGPHGQVGPSGKPGNRGDPGPVGPVGPAGAFGPRGLAGPQGPRGEKGEP

GDKGHRGLPGLKGHNGLQGLPGLAGQHGDQGPPGNNGPAGPRGPPGPSGPPGKDGRNGLPGPIGPA

GVRGSHGSQGPAGPPGPPGPPGPPGPNGGGYEVGFDAEYYRADQPSLRPKDYEVDATLKTLNNQIETLL

TPEGSKKNPARTCRDLRLSHPEWSSGFYWIDPNQGCTADAIRAYC

DFATGETCIHASLEDIPTKTWYVSKNPKDKKHIWFGETINGGTQFEYNGEGVTTKDMATQLAFMRLLAN

HASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELRAEGNSRFTFSVLVDGCSKKNNKWGKTIIEY

RTNKPSRLPILDIAPLDIGGADQEFGLHIGPVCFK

Trout

MLSFVDNRILLLLAVTSLLASCQSGGLKGPRGAKGPRGDRGPQGPNGRDGKAGLPGIAGPPGPPGLGG

NFAAQFDGGKGSDPGPGPMGLMGSRGPNGPPGAPGPQGFTGHAGEPGEPGQTGSIGARGPTGSAGKP

GEDGNNGRPGKPGDRGGPGTQGARGFPGTPGLPGMKGHRGYNGLDGRKGESGTAGAKGETGAHGA

NGSPGPAGSRGLNGERGRAGPAGPAGARGADGSTGPAGPAGPLGAAGPPGFPGAPGPKGEIGGAGSN

GPSGPQGGRGEPGINGAVGPVGPVGNPGNNGINGAKGAAGLPGVAGAPGFPGPRGGPGPQGPQGST

GARGLGGDPGPSGQKGDSGAKGEPGHSGVQGAAGPAGEEGKRGSTGEVGATGPAGLRGARGGAGTR

GLPGLEGRGGPIGMPGARGATGPGGIRGAPGDAGRAGESGLTGARGLPGNSGQGGPPGKEGPPGAAG

LDGRTGPPGPTGPRGQPGNIGFPGPKGPGGEAGKGGDKGPTGATGLRGGPGADGNNGAPGPAGVVG

NTGEKGEQGPAGAPGFQGLPGPAGPAGEAGKAGNQGMPGDQGLPGPAGVKGERGNSGPAGSAGSQG

AIGARGPAGTPGPDGGKGEPGSVGIVGAAGHQGPGGMPGERGAGGTPGPKGEKGEGGHRGLEGNMG

RDGARGGPGPSGPPGPSGANGEKGESGSFGPAGPAGLRGPSGERGEGGPAGLPGFAGPPGSDGQSGP

RGEKGPAGGKGDVGPAGPAGPSGQSGPSGASGPAGPPGGRGDAGPSGLTGFPGAAGRVGGPGPAGI

AGPPGSAGPAGKDGPRGLRGDPGPGGPQGEQGVVGPAGISGDKGPSGESGPPGAPGTAGPQGVLGPS

GFVGLPGSRGDKGLPGGPGAVGEPGRLGPAGASGPRGPAGNIGMPGMTGTQGEAGREGNSGNDGPP

GRPGAAGFKGDRGEPGSPGALGSSGQPGPNGPAGSAGRPGNRGESGPTGNGGPVGAVGARGAPGPA

GPRGEKGGAGEKGDRGMKGLRGHGGLQGMPGPNGPSGETGSAGITGPAGPRGPAGPHGPPGKDGRA

GGHGAIGPVGHRGSPGHLGPAGPPGSPGLPGPAGPAGGGYDQSGGYDEYRADQPSFRAKDYEVDATI

KSLNSQIENLLTPEGSKKNPARTCRDIRLSHPDWSSGFYWIDPNQGCIADAIKAYCDFSTGHTCIHPHPE

SIARKNWYRSSENKKHVWFGETINGGTEFAYNDETLSPQSMATQLAFMRLLANQATQNITYHCKNSVA

YMDGENGNLKKAVLLQGSNDVELRAEGNSRFTFNVLEDGCTRHTGQWSKTVIEYRTNKPSRLPILDIAP

LDIGEADQEFGLDIGPVCFK

Dog

MLSFVDTRTLLLLAVTSCLATCQSLQEATARKGPTGDRGPRGERGPPGPPGRDGDDGIPGPPGPPGPPG

PPGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPGASGAPGPQGFQGPAGEPGEPGQTGPAGARGPPG

PPGKAGEDGHPGKPGRPGERGVVGPQGARGFPGTPGLPGFKGIRGHNGLDGLKGQPGAPGVKGEPGA

PGENGTPGQTGARGLPGERGRVGAPGPAGARGSDGSVGPVGPAGPIGSAGPPGFPGAPGPKGEIGPVG

NPGPAGPAGPRGEVGLPGVSGPVGPPGNPGANGLTGAKGAAGLPGVAGAPGLPGPRGIPGPVGAAGAT

GARGIVGEPGPAGSKGESGNKGEPGSAGAQGPPGPSGEEGKRGPNGEAGSAGPSGPPGLRGSPGSRG

KGEQGPAGPPGFQGLPGPAGTAGEAGKPGERGIPGEFGLPGPAGARGERGPPGESGAAGPTGPIGSRG

PSGPPGPDGNKGEPGVVGAPGTAGPSGPSGLPGERGAAGIPGGKGEKGETGLRGDIGSPGRDGARGA

PGAIGAPGPAGANGDRGEAGPAGPAGPAGPRGSPGERGEVGPAGPNGFAGPAGAAGQPGAKGERGTK

GPKGENGPVGPTGPVGAAGPSGPNGPPGPAGSRGDGGPPGATGFPGAAGRTGPPGPSGISGPPGPPGP

AGKEGLRGPRGDQGPVGRSGETGASGPPGFVGEKGPSGEPGTAGPPGTPGPQGLLGAPGFLGLPGSRG

ERGLPGVAGSVGEPGPLGIAGPPGARGPPGNVGNPGVNGAPGEAGRDGNPGNDGPPGRDGQPGHKG

ERGYPGNAGPVGAAGAPGPQGPVGPVGKHGNRGEPGPAGAVGPAGAVGPRGPSGPQGIRGDKGEPG

DKGPRGLPGLKGHNGLQGLPGLAGHHGDQGAPGAVGPAGPRGPAGPSGPAGKDGRIGQPGAVGPAGI

RGSQGSQGPAGPPGPPGPPGPPGPSGGGYEFGFDGDFYRADQPRSPTSLRPKDYEVDATLKSLNNQIE

TLLTPEGSRKNPARTCRDLRLSHPEWSSGYYWIDPNQGCTMDAIKVYCDFSTGETCIRAQPEDIPVKNW

YRNSKAKKHVWVGETINGGTQFEYNVEGVTTKEMATQLAFMRLLANHASQNITYHCKNSIAYMDEETG

NLKKAVILQGSNDVELVAEGNSRFTYTVLVDGCSKKTNEWQKTIIEYKTNKPSRLPILDIAPLDIGGADQ

EIRLNIGPVCFK

Frog

MLSFVDLRSVLLLAVTLYLVTCQEVRRGPRGDKGPPGEQGPPGIPGRDGEDGLPGLPGPPGVPGLGGNF

AAQYDPSKSAEPGQQGIMGPRGPPGPPGSPGSQGFQGLPGENGEPGQTGPVGSRGPSGAPGKAGEDG

HPGKSGRPGERGPVGPQGARGFPGTPGLPGFKGIRGHTGSDGQKGAPGAAGVKGENGANGDNGSPG

QAGARGLPGERGRIGPAGSAGSRGSDGSSGPVGPAGPIGSAGAPGLPGAPGAKGELGPAGNNGPTGA

AGGRGEPGPPGSLGPAGPPGNPGTNGVNGAKGTAGLPGVGGAPGLPGGRGIPGPAGPAGPSGARGLA

GDPGIAGGKGDTGSKGEPGSVGQQGPAGPSGEEGKRGPNGEAGSSGPSGNAGIRGVPGTRGLPGPD

GRAGGIGPAGSRGSSGPPGARGPNGDAGRPGEPGLLGARGLPGFSGSNGPQGKEGPAGPQGIEGRSG

AAGPAGARGEPGAIGFPGPKGPNGEPGKNGDKGNQGPSGNRGAPGPDGNNGAQGPAGLGGATGEKG

EQGPSGAPGFQGLPGPGGPPGEVGKPGERGAPGDFGPPGSAGTRGERGAPGESGGAGPHGPSGSRGP

SGAPGPDGQKGEPGAAGLNGGLGPSGPAGIPGERGTAGTPGTKGEKGDAGNSGDYGNPGRDGARGP

AGAAGAPGPAGGPGDRGESGPAGPSGVAGPRGAPGERGEAGPAGPTGFAGPPGAAGHTGAKGDRGA

KGPKGEAGSPGPLGAHGSAGPAGPNGPAGSTGARGDAGPSGATGFPGPAGRAGAPGPPGNVGPSGPT

GHPGKDGSRGPRGDSGPVGRPGEQGQHGPVGLAGDKGPSGEAGPAGPPGAAGPSGVLGARGILGLP

GTRGERGLPGGPGSNGEPGPSGLAGSSGPRGPPGSVGSPGPVGHSGEAGRDGHPGNDGPPGRDGLP

GAKGERGYPGNTGPSGLAGAPGPAGSAGPAGKSGNRGEGGPSGPAGITGPSGPRGPAGPQGVRGDKG

EAGERGARGLDGRKGHNGLSGLPGPSGTPGETGPSGSVGPVGPRGPSGPSGPPGKEGRSGHPGAMGP

VGPRGPAGFTGPAGPPGPPGPPGHAGPSGGGYDGGDGGEYYRADQPERKPKDYEVDATLKSLNQQIEV

ILTPEGSRKNPARTCRDLRLSHPEWTSGFYWIDPNQGCTSDAIRVFCDFSSGETCIHANPDEITQKNWY

INTSNKDKKHLWFGEILNGGTQFEYHDEGLTAKDMATQLAFMRLLANQASQNITYHCKNSIAYMDEET

GNLKKAVILQGSNDVELRAEGNTRFTYSVLEDGCTKHTGEWGKTVIEYRTNKPSRLPI

LDIAPLDIGGHDQEIGFEIGPVCFK

Toxodon

GPMGIMGPRGPPGASGAPGPAGEPGEPGQTGPAGARGPPGPPGKAGEDGHPGKPGRPGERGVVGPQG

ARGFPGTPGIPGFKGIRGHNGIDGIKGQPGAPGVKGEPGAPGENGTPGQAGARGIPGERGRVGAPGPA

GARGSDGSVGPVGPAGPIGSAGPPGFPGAPGPKGEIGPVGNPGPAGPAGPRGEVGIPGVSGPVGPPGN

PGANGITGAKGAAGIPGVAGAPGIPGPRGIPGPVGAAGATGARGIVGEPGPAGSKGESGNKGEPGSAG

PQGPPGPAGEEGKRGPNGEAGSTGPTGPPGIRGSRGIPGADGGSRGATGPAGVRGDSGRPGEPGIMG

PRGFPGSPGNIGPAGKEGPVGIPGIDGRPGPTGPAGARGEPGNIGFPGPKGPTGDPGKNGDKGHAGIA

GARGPAGPPGFQGIPGPAGTAGEVGKPGERGIPGEFGIPGPAGARGERGPPGESGAVGPAGPIGSRGPS

GPPGPDGNKGEPGNIGAIGTAGPSGPSGIPGERGAAGIPGGKGEKGETGIRRGAPGAIGAPGPAGANG

DRGEAGPAGPAGPAGPRGSPGERGEVGPAGPNGFAGPAGAAGQPGAKGERGTKGPKGENGPVGPTGP

VGAAGPAGPNGPPGPAGSRGDGGPPGATGFPGAAGRTGPPGPAGITGPPGPPGAAGKEGIRGPRGDQ

GPVGRSGETGASGIPGFAGEKGPAGEPGTAGIPGTPGPQGIIGAPGIIGIPGSRGERGIPGVAGSIGEPG

PIGIAGPPGARGPPGAVGNPGVNGAPGEAGRHGNRGEPGPAGSVGPAGAVGPRGPSGPQGIRGDKGE

PGDKGPRGIPGIKGHNGIQGIPGIAGQHGDQGAPGAVGPAGPRGPAGPSGPAGKDGRIGHPGTVGPA

TRGLPGPDGRAGGMGPPGSRGSSGPAGVRGPSGDAGRPGEPGLLGQRGLPGFPGNTGPVGKEGPAGP

AGIEGRTGAAGPTGARGEPGSIGFPGPKGPGGEPGKNGDKGSAGPSGARGAPGPDGNNGAQGPPGVV

GNTGEKGEQGPAGAPGFQGLPGPGGAAGEAGKVGDRGMPGDFGPPGPAGVRGERGAPGESGSAGPL

GPVGSRGPSGPPGPDGTKGEPGVAGLAGAVGPSGSGGSPGERGGAGTPGPKGEKGEAGNRGEYGNQ

GRDGARGPAGASGAPGPSGGPGDRGESGPSGPAGPAGSRGAPGERGEHGPGGPTGFGGPPGAAGHT

GVKGERGEKGPKGELGPQGPVGASGASGPAGPNGPAGAPGSRGEVGPAGATGFPGPAGRTGGPGPAG

MGGPPGPSGHAGKDGPRGPRGDSGPVGRPGEQGGLGPQGISGEKGPSGEPGTAGPPGSSGPSGVLG

ARGILGLPGTRGERGLPGGPGGNGEPGATGPTGTAGSRGAPGPVGSAGMNGPAGEAGRDGNPGNDGP

PGRDGQAGAKGERGYPGNTGGVGHAGAPGPHGSVGPAGKSGNRGEPGPSGSQGPAGLPGARGPAGP

AGSRGDKGESGEKGGRGLDGRKGHNGLQGLPGLPGTSGEAGSAGPSGPSGPRGPAGPSGPPGKDGH

SGQPGPVGPAGVRGSPGHQGPAGPPGSPGAPGPAGPSGGGYDGGFEGGEFYRADQPSLRPKDYEVDS

TLKTLNNQIETLLTPEGSRKNPARTCRDLRLSHPEWSSGFYWIDPNQGCTADAIRVYCDFSTGETCIHSN

PETISAKTSYVNKNPKDKKHVWVGEVLNGGTQFEYNEEGVTTKDMATQFAFMRLLANHASQNITYHCK

NSIAYMDGETGNLKKAVLLQGSNDVELRAEGNSRFTFSVLEDSCTKHTGEWGRTVMEYRTNKPSRLPIL

DIAPMDIGGAEQEFRVDIGPVCFK