





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This document from Stanford University's CS273 course discusses the importance of protein structure in determining function and introduces the concepts of primary, secondary, tertiary, and quaternary structure. It also covers the identification of sequence motifs and domains, focusing on secondary structures such as alpha helices and beta sheets. Various algorithms and programs for predicting these structures are mentioned, including NewCoils, PairCoil, BetaWrap, PSIPRED, and TRILOGY.
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!
CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004
Lecture #5: 13 April 2004 Topics: Sequence motif identification Scribe: Samantha Chui
A proteinās function is determined by itās 3-D structure, which in turn is determined by the specific amino acid sequence. It is this sequence which directs the folding of the protein into its major configuration. The goal then is to predict the 3-D folding of a protein given its amino acid sequence. This is done by looking for structural motifs and positions of specific secondary structures. Several programs have written to predict these structures, including NewCoils, PairCoil, BetaWrap, PSIPRED, and TRILOGY.
The primary structure of a protein is its amino acid sequence. The secondary structure is the initial folding of the sequence into alpha helices and beta sheets. The tertiary structure is a more complex folding of the protein upon itself. The quarternary structure is the combination of two or more of the same protein. And finally, the supramolecular structure is the combination of several different protein subunits. It is this 3-D structure that determines the function of the protein, either for signaling, transport, catalysis, movement, structure, or regulation.
A protein is type a polymer which is made up of a series of monomers, amino acids. There are twenty different kinds of amino acids. They can be categorized into three groups: 1) hydrophilic, 2) hydrophobic, and 3) special (Figure 1). The hydrophilic amino acids can be further categorized as either basic or acidic. Basic hydrophilic amino acids are positively charged whereas acidic hydrophilic amino acids are negatively charged, according to the polarity of the R group.
These twenty amino acids have distinct shapes and properties. They are joined in a sequence via peptide bonds. Peptide bonds are formed through hydrolysis between the carboxyl group of one amino acid and the amino group of another.
Figure 1: Categories of amino acids
The secondary structure is the initial folding of the amino acid sequence into alpha helices and beta sheets (Figure 2).
The amino acids in an alpha helix are arranged in a helical structure about 5 angstroms in diameter. Each amino acid results in a 100 degree turn in the helix; thus there are 3.6 amino acids per turn. As shown in Figure 2, the R side chains are located on the exterior of the helix. Hydrogen bonds (indicated by the dotted lines) form between the
Figure 4: Ribbon representation of tertiary structure
The tertiary structure of a protein is the arrangement of secondary structure elements to form an overall three-dimensional structure (Figure 4). These tertiary structures are generally compact with hydrophilic side chains on the outer surface and the hydrophobic side chains buried in the interior. As in the secondary structure, hydrogen bonds help to stabilize the tertiary structure in its major conformation. In addition, cysteine bridges may form through disulfide bonds between cysteine amino acids (see special amino acids Figure 1). These covalent bonds help further stabilize the protein.
Many proteins consist of two or more different polypeptide chains. These are termed oligomeric proteins. The tertiary subunits are held together in the quaternary struc- ture through hydrogen bonding, salt bridges, and disulfide bonds. The main stabilizing force, however, is hydrophobic interaction between the different subunits. When a single polypeptide folds into a tertiary structure with its hydrophilic side chains exposed and its hydrophobic side chains shielded inside, there may still be some hydrophobic sections on the outside. In such cases, two or more subunits will assemble themselves so that their exposed hydrophobic sections are in contact, thus placing the hydrophobic sections in their combined interior. An example of a oligomeric protein is hemoglobin, which is a tetramer consisting of two a and two b subunits. Other proteins may be even more complex; for example, ribosomes and replisomes are macromolecules made up of many different protein subunits.
Motifs are basic building blocks that occur repeatedly in different proteins. Some exam- ples of motifs are the helix-loop-helix motif, the zinc-finger motif, the coiled coil motif,
Figure 5: Examples of structural motifs
the beta barrel, and the beta helix (Figure 5). The coiled coil motif is found in fibrous proteins and consists of two alpha-helices wrapped around each other (see Figure 6). The residues at positions a and d are buried in the core of the coiled coil. Thus these residues are typically hydrophobic. Hydrophilic residues are likely to be found at positions e and g. Another interesting motif is the beta helix, which is only found in prokaryotic proteins and not eukaryotic ones. This may perhaps be an example of divergent evolution.
A domain is a region of proteinās amino acid sequence that has evolutionary, structural, or functional significance. Domains determine the binding sites of proteins. An exam- ple of a domain is the Epidermal Growth Factor (EGF) domain. EGF is generated by proteolytic cleavage of a precursor protein containing multiple EGF domains and a mem- brane spanning domain. Another example of protein domains is the globular domain and fibrous domain in viral membrane proteins.
Given a small protein consisting of about one hundred amino acids, assume that there are three conformations per peptide bond. This gives us 3^100 = 5ā 1047 conformations. If it takes 10ā^15 seconds to sample each conformation, sampling all the conformations would take 5ā 1032 seconds. This is equivalent to 1.6ā 1025 years! But each protein folds quickly into a single stable native conformation. How does this happen?
Proteins can fold quickly into a stable conformation because energy and kinematics de- termine the major configuration. In general, the protein will fold to its minimum energy
Figure 6: Top view of coiled coil motif
Figure 7: Beta helix motif
The beta helix is a prokaryotic motif made up of three beta sheets (Figure 7). Currently, there is only one algorithm for predicting Beta helices, called BetaWrap. Predicting Beta helices is computationally difficult because there are very few solved structures, each of which is very different from one another. The algorithm has three stages:
Rungs subproblem Given the location of a T2 turn of a rung, find location of T2 turn of next rung.
Multiple rungs Find multiple initial B2-T2-B3 rungs and find āoptimal wrapā using dynamic programming.
Completing the parse Find B1 strands by locally optimizing their location.
The goal of secondary structure prediction is to classify the positions of a given amino acid sequence into alpha helices, beta strands, and loops. In general this problem is harder than motif identification. The best method for solving this problem is Neural networks. One such algorithm for classifying secondary structures is PSIPRED. The basic steps in this algorithm are:
PSIPRED has a 76% classification accuracy.
BLAST stands for Basic Local Alignment Search Tool. First it constructs a dictionary of all the k-long words in the query sequence. Then it initiates a local alignment for each word match between the words in the query and the words in the database. The alignment is determined using ungapped extensions in both directions until the score drops below a statistical threshold. The output of BLAST is all the alignments with scores above that threshold. PSI-BLAST is an extension of the original BLAST algorithm that implements an iterative approach. The basic steps are as follows:
TRILOGY is a program which identifies sequence-structure patterns in proteins. A pat- tern object in TRILOGY consists of a sequence pattern that specifies the spacing and amino acid type of the three residues and a structure pattern that specifies the three- dimensional arrangement and orientation of the residues. A residue triplet that matches to a particular structure pattern must have Cα-Cα distances and CαCβ vectors that agree