DOC PREVIEW
Stanford CS 374 - Genomic Sequence Alignment

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Genomic Sequence Alignment1. Introduction2. Basics of Sequence Alignment3. Dynamic Programming Approach for Sequence Alignment4. Index Based Local Alignment5. Global Alignment with Chaining Approach6. Multiple Alignment7. Rearrangement in Genomic sequences8. ReferencesComparative Genomics CS374 Fall 2005 Lecture 2, 9/29/05Lecturer: Serafim Batzoglou Scribe: Vignesh GanapathyGenomic Sequence AlignmentThe topics covered in this lecture are:1. Introduction to Genomic Sequence Alignment2. Dynamic Programming and Needleman Wunsch Algorithm3. Local Alignment – BLAST4. Fast Global Alignment5. Multiple Sequence Alignment6. Rearrangements in Genomic Sequences1. IntroductionDNA (deoxyribonucleic acid) and proteins are biological macromolecules built as longlinear chains of chemical components. In the case of DNA these components are nucleo-tides, of which there are four different ones, each denoted by one of the letters A, C, Gand T. Proteins are made up of 20 different amino acids (or "residues") which are denotedby 20 different letters of the alphabet.The roles of DNA include:- DNA plays a fundamental role in the processes of life in two respects. First it con-tains the templates for the synthesis of proteins, which are essential molecules forany organism. (Fig 1: DNA to Proteins)- The second role in which DNA is essential to life is as a medium to transmit infor-mation (namely the building plans for proteins) from generation to generation.Comparative Genomics CS374 Fall 2005 Lecture 2, 9/29/05Lecturer: Serafim Batzoglou Scribe: Vignesh GanapathyA genome refers to the total genetic information of a particular organism. A gene is a se-quence of DNA that represents a fundamental unit of heredity. Most genes encode pro-teins, but some code for RNA molecules.Comparative genomics is the study of relationships between the genomes of differentspecies. Alignment of genomic sequences is useful in the following ways:1. Finding important regions: One of the important applications of alignment ofgenomic sequences is to find genes in the genome database. It is always the casethat the genes and the other functional elements of an organism undergo mutationat a slower rate than the rest of the genome. This is because mutations in the functional elements are more likely to negativelyimpact the organism than mutations elsewhere. So this property allows genes tobe identified by comparing the genomes of related species to detect this conserva-tion.(Fig 2- Evolutionary Rates)2. Determining function : Since the structure of DNA and proteins determine thefunctions performed by them, sequence alignment helps to determine the func-tionality by finding similarity between different genomic sequences. Proteins that have a significant biological relationship to one another often shareonly isolated regions of sequence similarity. For identifying relationships of thisnature, the ability to find local regions of optimal similarity is very useful.3. Uncovering the evolutionary forces: Homology is the presence of a similar fea-ture because of descent from a common ancestor. It is an inference, a conclusiondrawn based on observed similarity. Significantly similar molecular sequences arevery unlikely to arise by chance. And if primary structure of two DNA sequenceshave strong similarity, they are very likely to have similar secondary and tertiarystructures, probably similar function too. So by doing global and local alignment,Comparative Genomics CS374 Fall 2005 Lecture 2, 9/29/05Lecturer: Serafim Batzoglou Scribe: Vignesh Ganapathythat is comparing the relatedness of DNA sequences, it can be determined if theycame from the common ancestor, if they are evolutionarily related.2. Basics of Sequence AlignmentSequence Alignment can be defined as follows:Given two strings x=(x1,…, xn) and y=(y1,..yn), an alignment is an assignment ofgaps to positions 0, 1,…, N in x and 0, 1,..N in y, so as to line up each letter in one se-quence with either a letter or a gap in the other sequence.Given two sequences GAATTCAGTTA and GGATCGA, a possible alignment betweenthem is: G A A T T C A G T T A | | | | | | G G A _ T C _ G _ _ ASince there can be multiple alignments possible between sequences, it is required to findout the best way to align them which is if maximum number of letters in the sequencematch and there are minimum number of mismatches and gaps. This is done by consider-ing sequence edits. So, we find the minimum number of sequence edits required to alignthe two sequences.Edit distance: It can be defined as the measure of difference or distance between two strings. It is mea-sured as the minimum cost needed to transform one string into the other by a series ofedit operations on individual characters. The permitted edit operations are insertion (I) ofa character into the first string, deletion (D) of a character from the first string and substi-tution or replacement (R) of a character in the first string with a character in the secondstring. Insertion, deletion or substitution can be considered as single operations. Thus theedit distance is the minimum number of operations needed to accomplish the transforma-tion. To give a value to the alignment, we can define a score depending on the number of let-ters matched, the number of mismatches and the number of gaps. This score value is thenoptimized in the different sequence alignment algorithms.Score F = (# matches) - m - (# mismatches) - s – (#gaps) - dWhere match=m, mismatches=-s and gaps=-d are the values assigned for the differentcombinations of letters in the sequences.Global Alignment:Comparative Genomics CS374 Fall 2005 Lecture 2, 9/29/05Lecturer: Serafim Batzoglou Scribe: Vignesh GanapathyIt involves alignment of the entire string of X with the entire string of Y, spaces may beinserted either into or at the ends of X and Y.Local AlignmentIn many biological applications, two DNA sequences may not be highly similar in theirentire length, but may contain regions that are highly similar, because only some internalsections of those strings may be related. When comparing such DNA sequences, localalignment becomes critical.So local alignment is, given two strings X of length n and Y of length m, find substrings aand b of X and Y, respectively, whose similarity is maximum over all pairs of substringsfrom X and Y.Since there are exponential number of ways to align two sequences (2^N), we use a dy-namic programming approach to solve this problem to


View Full Document

Stanford CS 374 - Genomic Sequence Alignment

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Genomic Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Genomic Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Genomic Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?