DOC PREVIEW
Stanford CS 374 - Lecture 19 - Protein Structure Alignment

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Protein Structure Alignment CS374 Fall 2006 Lecture 19, 11/14/04 Lecturer: Ramji Srinivasan Scribe: Omkar Mate Protein Structure Alignment Based on the following papers: 1. R Kolodny, N Linial. “Approximate protein structure alignment in polynomial time” Foundations of Computer Science, 1999. 40th Annual Symposium on. 512-521 2. J Xu, F Jiao, B Berger. “A Parameterized Algorithm for Protein Structure Alignment”. A. Apostolico et al. (Eds.): RECOMB 2006, LNBI 3909, pp. 488–499, 2006. 1. Motivation: During the evolution process, protein structure gets conserved more than the protein sequence. Hence structural similarity between two protein sequences might result in functional similarity between them. If we know the structures of the proteins A and B, and the function of protein A, then if we can detect structural similarity between A and B by producing an alignment between them, it can help us infer functionality of B. The problem of protein structure alignment is essentially similar to the sequence alignment; the fact that we have to do structural alignment in 3-d adds to the complexity of the problem. 2. Preliminaries and Definitions: Each protein is a chain of atoms in 3-d. Let A be a protein of n atoms, A = (a1, a2, …, an) Then to define a k-long subchain, use P = (p1, p2, …, pk), where pi’s are in ascending order. Define the subchain as A(P) = (ap1, ap2, …, apk)Protein Structure Alignment CS374 Fall 2006 Lecture 19, 11/14/04 Lecturer: Ramji Srinivasan Scribe: Omkar Mate A gap is two consecutive indices pi, pi+1, such that pi + 1 < pi+1. Consider 2 proteins A and B with subchains P and Q respectively. Then a correspondence between A and B is defined by the two subchains P and Q of equal length; a correspondence associates pairs of atoms from two proteins that appear in the same position in their respective subchains. Let GP,Q denote the number of gaps in a correspondence. To match the correspondence, we can fix A and apply a rigid transformation – translation or rotation – to B. 3. Problem Definition and Algorithm: A structural alignment problem is defined as finding the subchains P and Q, of same length, over the two given proteins A and B such that 1. A(P) and B(Q) are similar 2. Correspondence length is maximal Similarity measures: To determine similarity, we can use following measures:Protein Structure Alignment CS374 Fall 2006 Lecture 19, 11/14/04 Lecturer: Ramji Srinivasan Scribe: Omkar Mate Conditions for polynomial time complexity: The proposed algorithm for the protein structure alignment would run in polynomial time under the following conditions: 1. Number of rigid transformations under consideration are bounded by a polynomial 2. Given a rigid transformation, correspondence is detected in polynomial time. Finding correspondence: The problem of finding the best correspondence can be solved by taking dynamic programming approach. We can divide the entire structure into substructures. Then an optimal correspondence for the entire structure would also be the optimal correspondence for a given substructure. Moreover, the number of relevant substructures is again bounded by a polynomial. Hence the problem of finding the optimal correspondence can be solved in polynomial time and space, O(n2), using dynamic programming. Rigid transformations: Hard part of the algorithm consists of bounding the number of rigid transformations. We need to show that the number of rigid transformations can be bounded by a polynomial. Rigid transformations consist of translation and rotation, which can be parameterized by vectors. Score: Henceforth, the term “score” means the structural alignment score. Scoring function: Assume a correspondence between subchain P of A and Q of B, with A fixed. Apply rigid transformations to B and for each such transformation, compute the CDS (Correspondence Dependent Scoring) function using the distance between corresponding atom pairs in space. There are exponentially many correspondences and scoring functions but we want only those functions which give best or near-best correspondence scores. Lipschitz conditions are imposed on such functions so that we can evaluate that same function at only a few points to find a near-optimal value.Protein Structure Alignment CS374 Fall 2006 Lecture 19, 11/14/04 Lecturer: Ramji Srinivasan Scribe: Omkar Mate In case of such functions, a small perturbation in the transformation affects the scoring function only by a user-defined constant. The CDS functions which satisfy the Lipschitz conditions yield only a finite set of translations and rotations. Once we have this small set of rigid transformations, we use them to evaluate the function on to get a good alignment score. Furthermore, it is possible to find a particular transformation in this small set of transformations, whose score is within a pre-specified bound from the score of the transformation specified.Protein Structure Alignment CS374 Fall 2006 Lecture 19, 11/14/04 Lecturer: Ramji Srinivasan Scribe: Omkar Mate Let G be the set of rigid transformations. For a protein with n residues, we have the total size of G, Where Cr and Ct are coefficients related to Lipschitz conditions. We can see that the size of the set grows as the epsilon decreases. Epsilon also determines the margin of error allowed for the alignment scores. Hence if we want better results in terms of accuracy, we need to deal with a bigger set of transformations. Hence there is a trade-off involved. It can be proved that for any given rigid transformation, there exists a transformation in G that produces a score within epsilon distance of the score produced by the given transformation. For details of the proof, please refer to the notes attached. Algorithm to find all the epsilon-maximal points: When we mean points, we mean points in the space of rigid transformation. Hence a point actually refers to either a rotation or a translation. We define an epsilon-maximal point as the transformation whose score is within a range of epsilon from the global maximum score. It is guaranteed that there exists at least one transformation in G that will produce a score within


View Full Document

Stanford CS 374 - Lecture 19 - Protein Structure Alignment

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Lecture 19 - Protein Structure Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 19 - Protein Structure Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 19 - Protein Structure Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?