Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Protein Structure and Motif Identification: CS273 Handout #5, Lecture notes of Kinematics

University of Maine at Farmington Kinematics

This document from Stanford University's CS273 course discusses the importance of protein structure in determining function and introduces the concepts of primary, secondary, tertiary, and quaternary structure. It also covers the identification of sequence motifs and domains, focusing on secondary structures such as alpha helices and beta sheets. Various algorithms and programs for predicting these structures are mentioned, including NewCoils, PairCoil, BetaWrap, PSIPRED, and TRILOGY.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

tylar 🇺🇸

4.8

(19)

240 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

CS273: Algorithms for Structure Handout # 5

and Motion in Biology

Stanford University Tuesday, 13 April 2004

Lecture #5: 13 April 2004

Topics: Sequence motif identification

Scribe: Samantha Chui

1 Introduction

A protein’s function is determined by it’s 3-D structure, which in turn is determined

by the specific amino acid sequence. It is this sequence which directs the folding of the

protein into its major configuration. The goal then is to predict the 3-D folding of a

protein given its amino acid sequence. This is done by looking for structural motifs and

positions of specific secondary structures. Several programs have written to predict these

structures, including NewCoils, PairCoil, BetaWrap, PSIPRED, and TRILOGY.

2 Protein Hierarchy

The primary structure of a protein is its amino acid sequence. The secondary structure

is the initial folding of the sequence into alpha helices and beta sheets. The tertiary

structure is a more complex folding of the protein upon itself. The quarternary structure

is the combination of two or more of the same protein. And finally, the supramolecular

structure is the combination of several different protein subunits. It is this 3-D structure

that determines the function of the protein, either for signaling, transport, catalysis,

movement, structure, or regulation.

2.1 Primary Structure

A protein is type a polymer which is made up of a series of monomers, amino acids. There

are twenty different kinds of amino acids. They can be categorized into three groups: 1)

hydrophilic, 2) hydrophobic, and 3) special (Figure 1).

The hydrophilic amino acids can be further categorized as either basic or acidic. Basic

hydrophilic amino acids are positively charged whereas acidic hydrophilic amino acids are

negatively charged, according to the polarity of the R group.

These twenty amino acids have distinct shapes and properties. They are joined in a

sequence via peptide bonds. Peptide bonds are formed through hydrolysis between the

carboxyl group of one amino acid and the amino group of another.

Partial preview of the text

Download Protein Structure and Motif Identification: CS273 Handout #5 and more Lecture notes Kinematics in PDF only on Docsity!

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

Lecture #5: 13 April 2004 Topics: Sequence motif identification Scribe: Samantha Chui

1 Introduction

A protein’s function is determined by it’s 3-D structure, which in turn is determined by the specific amino acid sequence. It is this sequence which directs the folding of the protein into its major configuration. The goal then is to predict the 3-D folding of a protein given its amino acid sequence. This is done by looking for structural motifs and positions of specific secondary structures. Several programs have written to predict these structures, including NewCoils, PairCoil, BetaWrap, PSIPRED, and TRILOGY.

2 Protein Hierarchy

The primary structure of a protein is its amino acid sequence. The secondary structure is the initial folding of the sequence into alpha helices and beta sheets. The tertiary structure is a more complex folding of the protein upon itself. The quarternary structure is the combination of two or more of the same protein. And finally, the supramolecular structure is the combination of several different protein subunits. It is this 3-D structure that determines the function of the protein, either for signaling, transport, catalysis, movement, structure, or regulation.

2.1 Primary Structure

A protein is type a polymer which is made up of a series of monomers, amino acids. There are twenty different kinds of amino acids. They can be categorized into three groups: 1) hydrophilic, 2) hydrophobic, and 3) special (Figure 1). The hydrophilic amino acids can be further categorized as either basic or acidic. Basic hydrophilic amino acids are positively charged whereas acidic hydrophilic amino acids are negatively charged, according to the polarity of the R group.

These twenty amino acids have distinct shapes and properties. They are joined in a sequence via peptide bonds. Peptide bonds are formed through hydrolysis between the carboxyl group of one amino acid and the amino group of another.

Figure 1: Categories of amino acids

2.2 Secondary Structure

The secondary structure is the initial folding of the amino acid sequence into alpha helices and beta sheets (Figure 2).

The amino acids in an alpha helix are arranged in a helical structure about 5 angstroms in diameter. Each amino acid results in a 100 degree turn in the helix; thus there are 3.6 amino acids per turn. As shown in Figure 2, the R side chains are located on the exterior of the helix. Hydrogen bonds (indicated by the dotted lines) form between the

Figure 4: Ribbon representation of tertiary structure

2.3 Tertiary Structure

The tertiary structure of a protein is the arrangement of secondary structure elements to form an overall three-dimensional structure (Figure 4). These tertiary structures are generally compact with hydrophilic side chains on the outer surface and the hydrophobic side chains buried in the interior. As in the secondary structure, hydrogen bonds help to stabilize the tertiary structure in its major conformation. In addition, cysteine bridges may form through disulfide bonds between cysteine amino acids (see special amino acids Figure 1). These covalent bonds help further stabilize the protein.

2.4 Quaternary Structure and Supramolecular Assemblies

Many proteins consist of two or more different polypeptide chains. These are termed oligomeric proteins. The tertiary subunits are held together in the quaternary struc- ture through hydrogen bonding, salt bridges, and disulfide bonds. The main stabilizing force, however, is hydrophobic interaction between the different subunits. When a single polypeptide folds into a tertiary structure with its hydrophilic side chains exposed and its hydrophobic side chains shielded inside, there may still be some hydrophobic sections on the outside. In such cases, two or more subunits will assemble themselves so that their exposed hydrophobic sections are in contact, thus placing the hydrophobic sections in their combined interior. An example of a oligomeric protein is hemoglobin, which is a tetramer consisting of two a and two b subunits. Other proteins may be even more complex; for example, ribosomes and replisomes are macromolecules made up of many different protein subunits.

3 Motifs and Domains

3.1 Second-and-a-half-ary Structures: Motifs

Motifs are basic building blocks that occur repeatedly in different proteins. Some exam- ples of motifs are the helix-loop-helix motif, the zinc-finger motif, the coiled coil motif,

Figure 5: Examples of structural motifs

the beta barrel, and the beta helix (Figure 5). The coiled coil motif is found in fibrous proteins and consists of two alpha-helices wrapped around each other (see Figure 6). The residues at positions a and d are buried in the core of the coiled coil. Thus these residues are typically hydrophobic. Hydrophilic residues are likely to be found at positions e and g. Another interesting motif is the beta helix, which is only found in prokaryotic proteins and not eukaryotic ones. This may perhaps be an example of divergent evolution.

3.2 Second-and-2/3-ary Structures: Domains

A domain is a region of protein’s amino acid sequence that has evolutionary, structural, or functional significance. Domains determine the binding sites of proteins. An exam- ple of a domain is the Epidermal Growth Factor (EGF) domain. EGF is generated by proteolytic cleavage of a precursor protein containing multiple EGF domains and a mem- brane spanning domain. Another example of protein domains is the globular domain and fibrous domain in viral membrane proteins.

4 The Leventhal Paradox

Given a small protein consisting of about one hundred amino acids, assume that there are three conformations per peptide bond. This gives us 3^100 = 5∗ 1047 conformations. If it takes 10−^15 seconds to sample each conformation, sampling all the conformations would take 5∗ 1032 seconds. This is equivalent to 1.6∗ 1025 years! But each protein folds quickly into a single stable native conformation. How does this happen?

4.1 Energy and Kinematics

Proteins can fold quickly into a stable conformation because energy and kinematics de- termine the major configuration. In general, the protein will fold to its minimum energy

Figure 6: Top view of coiled coil motif

Figure 7: Beta helix motif

5.2 Ex: Beta helices

The beta helix is a prokaryotic motif made up of three beta sheets (Figure 7). Currently, there is only one algorithm for predicting Beta helices, called BetaWrap. Predicting Beta helices is computationally difficult because there are very few solved structures, each of which is very different from one another. The algorithm has three stages:

Rungs subproblem Given the location of a T2 turn of a rung, find location of T2 turn of next rung.

Multiple rungs Find multiple initial B2-T2-B3 rungs and find ”optimal wrap” using dynamic programming.

Completing the parse Find B1 strands by locally optimizing their location.

6 Secondary Structure Prediction

The goal of secondary structure prediction is to classify the positions of a given amino acid sequence into alpha helices, beta strands, and loops. In general this problem is harder than motif identification. The best method for solving this problem is Neural networks. One such algorithm for classifying secondary structures is PSIPRED. The basic steps in this algorithm are:

Given a sequence x, generate its profile using PSI-BLAST (see below).
Pass the profile to a pre-trained neural network.
Output classification: alpha helix/beta strand/loop.

PSIPRED has a 76% classification accuracy.

6.1 BLAST vs. PSI-BLAST

BLAST stands for Basic Local Alignment Search Tool. First it constructs a dictionary of all the k-long words in the query sequence. Then it initiates a local alignment for each word match between the words in the query and the words in the database. The alignment is determined using ungapped extensions in both directions until the score drops below a statistical threshold. The output of BLAST is all the alignments with scores above that threshold. PSI-BLAST is an extension of the original BLAST algorithm that implements an iterative approach. The basic steps are as follows:

Find all pairwise alignments of query x to sequences in database D.
Collect all matches of x to y with some minimum significance.
Construct position specific matrix M.
Using M, search D for more matches.
Iterate until convergence.

7 TRILOGY

TRILOGY is a program which identifies sequence-structure patterns in proteins. A pat- tern object in TRILOGY consists of a sequence pattern that specifies the spacing and amino acid type of the three residues and a structure pattern that specifies the three- dimensional arrangement and orientation of the residues. A residue triplet that matches to a particular structure pattern must have Cα-Cα distances and CαCβ vectors that agree

Protein Structure and Motif Identification: CS273 Handout #5, Lecture notes of Kinematics

Related documents

Partial preview of the text

Download Protein Structure and Motif Identification: CS273 Handout #5 and more Lecture notes Kinematics in PDF only on Docsity!

1 Introduction

2 Protein Hierarchy

2.1 Primary Structure

2.2 Secondary Structure

2.3 Tertiary Structure

2.4 Quaternary Structure and Supramolecular Assemblies

3 Motifs and Domains

3.1 Second-and-a-half-ary Structures: Motifs

3.2 Second-and-2/3-ary Structures: Domains

4 The Leventhal Paradox

4.1 Energy and Kinematics

5.2 Ex: Beta helices

6 Secondary Structure Prediction

6.1 BLAST vs. PSI-BLAST

7 TRILOGY