DOC PREVIEW
Stanford CS 374 - Lecture 2 - Genomic Sequence Alignment

This preview shows page 1-2-3-4-26-27-28-54-55-56-57 out of 57 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Genomic Sequence AlignmentOverviewBiology in One Slide – Twentieth CenturyComplete DNA SequencesEvolutionEvolution at the DNA levelEvolutionary RatesSequence conservation implies functionSequence AlignmentWhat is a good alignment?Scoring FunctionHow do we compute the best alignment?Dynamic ProgrammingDynamic Programming (cont’d)Slide 15ExampleThe Needleman-Wunsch AlgorithmAlignment on a Large ScaleIndex-based Local AlignmentIndex-based Local Alignment — BLASTIndex-based Local Alignment — BLASTGapped BLASTSlide 24Efficient global alignmentGlobal alignment with the chaining approachSlide 27Slide 28LAGAN: 3. Restricted DPMultiple AlignmentSlide 31DefinitionScoring Function: Sum Of PairsA Profile RepresentationMultiple Sequence AlignmentsSlide 36Multidimensional DPSlide 38Slide 39Progressive AlignmentSlide 41Slide 42Some useful sitesSlide 44Local & Global AlignmentGlocal Alignment ProblemSlide 47Slide 48Slide 49SLAGAN Example: Chromosome 20SLAGAN example: HOX clusterSlide 52Examples of shuffled regionsSlide 54Slide 55Slide 56Slide 57Slide 58Genomic Sequence AlignmentOverview•Dynamic programming & the Needleman-Wunsch algorithm•Local alignment—BLAST •Fast global alignment•Multiple sequence alignment•Rearrangements in genomic sequencesBiology in One Slide – Twentieth Century…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT……and todayComplete DNA SequencesAbout 300 complete genomes have been sequencedEvolutionEvolution at the DNA level…ACGGTGCAGTTACCA……AC----CAGTCCACCA…MutationSEQUENCE EDITSREARRANGEMENTSDeletionInversionTranslocationDuplicationEvolutionary Rates OKOKOKXXStill OK?next generationSequence conservation implies functionAlignment is the key to• Finding important regions• Determining function• Uncovering the evolutionary forcesSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACWhat is a good alignment?Alignment: The “best” way to match the letters of one sequence with those of the otherHow do we define “best”?Alignment:A hypothesis that the two sequences come from a common ancestor through sequence editsParsimonious explanation:Find the minimum number of edits that transform one sequence into the otherScoring Function•Sequence edits: AGGCCTCMutations AGGACTCInsertionsAGGGCCTCDeletionsAGG.CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches)  m - (# mismatches)  s – (#gaps)  dHow do we compute the best alignment?AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTCToo many possible alignments:O( 2N)Dynamic Programming•Given two sequences x = x1……xM and y = y1……yN•Let F(i, j) = Score of best alignment of x1……xi to y1……yj•Then, F(M, N) == Score of best alignmentIdea: Compute F(i, j) for all i and jDo this by using F(i–1 , j), F(i, j–1), F(i–1, j–1)Dynamic Programming (cont’d)Notice three possible cases:1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj2. xi aligns to a gapx1……xi-1 xiy1……yj -3. yj aligns to a gapx1……xi -y1……yj-1 yj m, if xi = yjF(i,j) = F(i-1, j-1) + -s, if not F(i,j) = F(i-1, j) - d F(i,j) = F(i, j-1) - dDynamic Programming (cont’d)•How do we know which case is correct?Inductive assumption:F(i, j-1), F(i-1, j), F(i-1, j-1) are optimalThen,F(i-1, j-1) + s(xi, yj)F(i, j) = max F(i-1, j) – dF( i, j-1) – dWhere s(xi, yj) = m, if xi = yj; -s, if noti-1, j-1 i-1, ji, j-1 i, jExamplex = AGTA m = 1y = ATA s = -1d = -1A G T A0 -1 -2 -3 -4A -1 1 0 -1 -2T -2 0 0 1 0A -3 -1 -1 0 2F(i,j) i = 0 1 2 3 4j = 0123Optimal Alignment:F(4,3) = 2AGTAA - TAThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i-1,j) – d [case 1]F(i, j) = max F(i, j-1) – d [case 2] F(i-1, j-1) + s(xi, yj) [case 3]UP if [case 1]Ptr(i,j) = LEFT if [case 2]DIAG if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentAlignment on a Large Scale•Given a gene that we care about, how can we compare it to all existing DNA? •Assume we use Dynamic Programming:The entire genomic databasegene of interest~105~1011Index-based Local AlignmentMain idea:1. Construct a dictionary of all the words in the query2. Initiate a local alignment for each word match between query and DBRunning Time:Theoretical worst case: O(MN)Fast in practicequeryDBIndex-based Local Alignment — BLAST Dictionary:All words of length k (~11)Alignment initiated between exact-matching words (more generally, between words of alignment score  T)Alignment:Ungapped extensions until score below statistical thresholdOutput:All local alignments with score > statistical threshold…………queryDBqueryscanIndex-based Local Alignment — BLASTA C G A A G T A A G G T C C A G TC C C T T C C T G G A T T G C G AExample:k = 4,T = 4The matching word GGTC initiates an alignmentExtension to the left and right with no gaps until alignment falls < 50%Output:GTAAGGTCCGTTAGGTCCGapped BLASTA C G A A G T A A G G T C C A G TC T G A T C C T G G A T T G C G AAdded features:•Pairs of words can initiate alignment•Nearby alignments are merged•Extensions with gaps until score < T below best score so farOutput:GTAAGGTCCAGTGTTAGGTC-AGTExampleQuery: gattacaccccgattacaccccgattaca (29 letters) [2 mins]Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences) 1,726,556 sequences; 8,074,398,388 total letters >gi|28570323|gb|AC108906.9| Oryza sativa chromosome 3 BAC OSJNBa0087C10 genomic sequence, complete sequence Length = 144487 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 125138


View Full Document

Stanford CS 374 - Lecture 2 - Genomic Sequence Alignment

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Lecture 2 - Genomic Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2 - Genomic Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 - Genomic Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?