DOC PREVIEW
Stanford CS 374 - Simple and Fast Inverse Alignment

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IntroductionInverse Alignment and Its VariationsReduction to Linear ProgrammingSolving the Linear ProgramApplication to Global AlignmentComputational ResultsConclusionSimple and Fast Inverse AlignmentJohn Kececioglu and Eagu KimDepartment of Computer Science,The University of Arizona, Tucson, AZ 85721, USA{kece, egkim}@cs.arizona.eduAbstract. For as long as biologists have been computing alignments ofsequences, the question of what values to use for scoring substitutionsand gaps has persisted. While some choices for substitution scores arenow common, largely due to convention, there is no standard for choosinggap penalties. An objective way to resolve this question is to learn theappropriate values by solving the Inverse String Alignment Problem:given examples of correct alignments, find parameter values that makethe examples be optimal-scoring alignments of their strings.We present a new polynomial-time algorithm for Inverse String Align-ment that is simple to implement, fast in practice, and for the first timecan learn hundreds of parameters simultaneously. The approach is alsoflexible: minor modifications allow us to solve inverse unique alignment(find parameter values that make the examples be the unique optimalalignments of their strings), and inverse near-optimal alignment (find pa-rameter values that make the example alignments be as close to optimalas possible). Computational results with an implementation for globalalignment show that, for the first time, we can find best-possible valuesfor all 212 parameters of the standard protein-sequence scoring-modelfrom hundreds of alignments in a few minutes of computation.Keywords: Sequence analysis, parametric sequence alignment, substi-tution score matrices, affine gap penalties, supervised learning, linearprogramming, cutting plane algorithms.1 IntroductionPerhaps the most studied problem in computational biology is the alignment ofbiological sequences with substitutions, insertions, and deletions. The standardformulations of string alignment optimize a sum of scores for each type of op-eration, often giving a penalty for a run of insertions or deletions, called a gap,that is linear in the length of the gap. When performing sequence alignment inpractice, the question of what weights and penalties to use inevitably arises. Aninteresting attack on this question is parametric sequence alignment,whereforagiven pair of strings, the alignment problem is solved for effectively all possiblechoices of the scoring parameters, thereby eliminating the need to specify anyweights and penalties. The problem with this approach is that it in effect defersResearch supported by US National Science Foundation Grant DBI-0317498.A. Apostolico et al. (Eds.): RECOMB 2006, LNBI 3909, pp. 441–455, 2006.c Springer-Verlag Berlin Heidelberg 2006442 J. Kececioglu and E. Kimthe question, since eventually a user must choose one of the solutions, and onwhat basis should this be done? Essentially one needs an example of a correctalignment to discriminate among the welter of parametric solutions. When onedoes solve the parametric problem and knows a biologically correct alignmentof the sequences, this alignment is used to decide what region of the parameterspace makes the correct alignment have optimal score.This optimal parameter choice could be found much more directly, however, bysolving inverse parametric alignment. In this problem, the input is an alignmentof a pair of strings, and the output is a choice of parameters that makes the inputalignment be an optimal-scoring alignment of its strings. We present a simple andfast algorithm for inverse parametric alignment that for the first time is capableof determining all substitution weights and gap penalties simultaneously. Suchan algorithm, applied to a collection of benchmark protein-sequence alignmentsthat are constructed by aligning their three-dimensional structures, could providethe first rigorous way of determining substitution scores and gap penalties forcharacterized classes of proteins.Related Work. The first algorithms for parametric alignment of two sequenceswere discovered in the early 1990’s by Waterman, Eggert and Lander [16] andGusfield, Balasubramanian and Naor [7]. These algorithms handled two pa-rameters, usually the gap open and extension penalties with fixed substitu-tion scores. Zimmer and Lengauer [17] addressed numerical stability. Gusfieldet al. [7] also bounded the number of regions in the decomposition of the pa-rameter space, and constructed the decomposition with one optimal alignmentcomputation per region. Fern´andez-Baca, Sepp¨al¨ainen and Slutzki [4] showedthese bounds are asymptotically tight, and initiated the study of parametricmultiple-sequence alignment. Gusfield and Stelling [8] released a software im-plementation called XPARAL, and were the first to consider inverse parametricalignment, for which they gave a heuristic that attempted to avoid computing adecomposition of the entire parameter space. Pachter and Sturmfels [13] exploredthe relation of algebraic statistics to parametric sequence alignment.Recently, Sun, Fern´andez-Baca and Yu [15] gave the first direct algorithmfor inverse parametric alignment. While they consider three parameters, theirsolution effectively fixes one parameter value at zero. For two strings of length n,their algorithm runs in O(n2log n) time. Their approach is involved, and doesnot appear to have been implemented.In contrast to prior work, our algorithm for inverse parametric alignment issimple to implement, does not compute a decomposition of the entire parameterspace, solves both the near-optimal and unique-optimal inverse alignment prob-lems, handles a set of input alignments, and for the first time can quickly solveproblems with hundreds of free parameters.Overview. In the next section we give a precise statement of the inverse para-metric alignment problem and two variations: inverse near-optimal and unique-optimal alignment. Section 3 reduces all three variations to linear programming.Section 4 explains how the resulting linear programs, even though they have anexponential number of inequalities, can be solved in polynomial time. Section 5Simple and Fast Inverse Alignment 443then discusses a version of this approach, called a cutting-plane algorithm, thatis highly effective in practice. Finally Section 6 presents experimental resultswith an implementation for inverse optimal and near-optimal


View Full Document

Stanford CS 374 - Simple and Fast Inverse Alignment

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Simple and Fast Inverse Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Simple and Fast Inverse Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Simple and Fast Inverse Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?