DOC PREVIEW
Stanford CS 262 - Sequence Alignment

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Scribed By: Brian TranCS 262: Lecture 2 – Sequence AlignmentI. MotivationA. DNA SequencingOver the past few years, many entire genomes have become available and have been found at an increasing speed. More than 300 complete genomes have been sequenced, mainly bacteria because they can be easily sequenced. Mammals however have also been sequenced (3 billion bases) to be compared with humans. Sequencing an entire genome is expensive. For example, the human genome cost a few billion dollars, however costs have been dropping exponentially to the 10 to 30 million dollar range.B. Sequencing to Study EvolutionSequencing genomes will help explain how organisms have evolved. Also, the human genome can be better understood by examining the rates of evolution within different parts of the genome1. Evolution at the DNA Level. Evolution at the DNA level can be divided into two types:1) sequence edits – here small changes occur such as mutations where a base is transformed into another, or insertions or deletions of a few bases. Insertions occur in areas with more repetition2) rearrangements - here large changes occur in the range of 50 to 100 bases to entire chromosomes. Inversions are when parts are rotated 180 degrees. Translocations are when two parts are exchanged. Duplications are when parts copy itself in tandem. An important note is that to find recent duplications, one need only find very similar duplications, since this infers that other types of changes have not had a chance to occur yet.Different parts of the genome evolve at different rates, however on a large scale evolution occurs at a constant rate. With this, we can tell how recently two regions have diverged from one another. Additionally, more important regions tend to evolve at a much slower rate, since changes to an important region would cause the organism to easily die off. Conversely, we can search for regions that have evolved slowly to find important regions of the genome. 2. Conservation implies functionConsider the diagram (Figure 1) shown on the next page. The graphs show the differences in the genomes between different species. The red line indicates areas of high conservation across the different species. These areas can be considered areas of high importance because they are shared by all species. The graphs can also be used to show how similar two species are.Figure 1: Wiggle plot showing areas of conservation (red line) in the genomes of different species. The pie chart below shows how many genes of the rat genome are conserved to a specific type of organism. As can be seen, much of the conservation is not specific to just rodents or mammals, but in more general organisms. This shows that much of the conservation occurs in areas representing the basic functions of life. Figure 2: Pie chart showing that much of the conservation in the rat genome is specific to more than just mammalian organisms.In conclusion, sequence alignment is key to finding important regions, determining function, and uncovering the evolutionary forces. II. Sequence AlignmentSequence alignment deals with lining up two or more sequences to see how similar they are to each other. Though there is no elegant definition, it can be defined as follows:Given two sequences x = x = x1x2...xM and y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence. In other words, we add gaps to either sequence to make the two sequences of equal length and as “similar” to each other as possible. Also, no gap can line up with another gap. An example alignment of two sequences is shown below.A. Finding the “goodness” of an alignment1. Scoring FunctionThere are many different metrics to measure how good an alignment is. The scoring system is generally based on the number of matches, mismatches and gaps present in the alignment. Intuitively, these three components represent the sequence edits occurring during evolution. We can provide scores to each of these occurrences: +m for matches, -s for mismatches, -d for gaps that contribute to the overall score of the alignment. The values for these variables are determined by the user, and are normally based on the frequency or probability of true events. The overall score of the alignment is the weighted sum of the matches, mismatches and gaps:Score F = (# matches) x m – (# mismatches) x s – (# gaps) x dAn alternative (classical) definition of alignment is the minimal edit distance. Given two strings x and y, the minimal edit distance is the minimum number of edits (insertions, deletions, or mutations) needed to transform one string to the other. 2. Finding the best alignmentWe can find the best alignment of two sequences by constructing a matrix as shown in Figure 3. In this matrix, a row of the matrix represents a letter in one sequence, whereas a column represents a letter in the other sequence. A possible alignment for the two sequences can be generated by constructing a path through the matrix, starting from the top left corner of the matrix to the bottom right corner. Traversal through the matrix can be done in three ways:1) diagonal path - representing a match in both letters of sequences2) right path – representing a gap in row sequence3) down path – representing a gap in the column sequence-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACFigure 3: A possible alignment (blue) for the two sequences.As can be seen, there are an exponential amount of possible alignments (>> 2N) and scoring each alignment would be too expensive. 3. Alignment is additiveLuckily, the score of aligning two sequences is additive and thus, can be broken into different parts. For example, the two strings x1x2...xM and y1y2…yNcan be broken into x1…..xi and xi+1…...xM and y1……yj and yj+1……yNand the scores for aligning each part can be added to find the total score. The equation is as follows:F(x[1:M], y[1:N]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:N])This concept will be used when introducing dynamic programming.B. Dynamic Programming1. BasicsDynamic programming is a common technique used to simplify problems where we wish to find an optimal solution from a seemingly exponential number of possibilities. We know we can use dynamic


View Full Document

Stanford CS 262 - Sequence Alignment

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?