DOC PREVIEW
Stanford CS 262 - Computational Genomics

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Biology in One Slide Computational Genomics Lecture 1 Tuesday April 1 2003 High Throughput Biology Lecture 1 Tuesday April 1 2003 High Throughput Biology 1 DNA Sequencing 2 Sequencing of expressed genes EST sequencing ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT protein sequence mRNA sequence Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 High Throughput Biology High Throughput Biology 3 Gene Expression Microarrays Lecture 1 Tuesday April 1 2003 4 Gene Regulation CH IP Lecture 1 Tuesday April 1 2003 1 The goals of genomics The role of CS in Biology Study organisms at the DNA level Essential DNA sequencing and assembly Microarray analysis Protein 3D reconstruction Identify parts genes etc Figure out connections between parts Study evolution at the DNA level Complementary Gene finding genome annotation Protein fold prediction Phylogeny comparative genomics Compare organisms Uncover evolutionary history Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 Syllabus Course responsibilities Tools 80 Final 20 Alignment algorithms Hidden Markov models Statistical algorithms 4 challenging problem sets 4 5 problems pset Collaboration allowed 5 late days total Televised students required to do 75 Takehome 1 day Collaboration not allowed Easy Applications Homeworks DNA sequencing and assembly Sequence analysis comparison annotation Microarray analysis Evolutionary analysis Lecture 1 Tuesday April 1 2003 Scribing Mandatory Grade replaces lowest 2 problems Due one week after the lecture Lecture 1 Tuesday April 1 2003 Reading material Books Biological sequence analysis Eddy Krogh Mitchinson by Durbin Chapters 1 4 6 7 8 9 10 Topic 1 Sequence Alignment Algorithms on strings trees and sequences by Gusfield Chapters 5 7 11 12 13 14 17 Papers Lecture notes Lecture 1 Tuesday April 1 2003 2 Complete genomes Evolution Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 Evolution at the DNA level Evolutionary Rates C next generation ACGGTGCAGTCACCA OK OK OK ACGTTGCAGTCCACCA X X SEQUENCE EDITS REARRANGEMENTS Still OK Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 Sequence conservation implies function Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC AGGCTATCACCTGACCTCCAGGCCGA TGCCC TAG CTATCAC GACCGC GGTCGATTTGCCCGAC Definition Given two strings Interleukin region in human and mouse Lecture 1 Tuesday April 1 2003 x x1x2 xM y y1y2 yN an alignment is an assignment of gaps to positions 0 N in x and 0 N in y so as to line up each letter in one sequence with either a letter or a gap in the other sequence Lecture 1 Tuesday April 1 2003 3 What is a good alignment Scoring Function Alignment The best way to match the letters of one sequence with those of the other Sequence edits AGGCCTC Mutations AGGA CTC Insertions How do we define best Alignment A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation Find the minimum number of edits that transform one sequence into the other AGGGCCTC Deletions AGG CTC Scoring Function Match m Mismatch s Gap d Score F matches m mismatches s gaps d Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 How do we compute the best alignment Alignment is additive Observation The score of aligning AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC Too many possible alignments O 2M N x1 xM y1 yN is additive Say that aligns to x1 xi y1 yj xi 1 xM yj 1 yN The two scores add up F x 1 M y 1 N F x 1 i y 1 j F x i 1 M y j 1 N Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 Dynamic Programming Dynamic Programming cont d We will now describe a dynamic programming algorithm Suppose we wish to align x1 xM y1 yN Let F i j optimal score of aligning x1 xi y1 yj Lecture 1 Tuesday April 1 2003 Notice three possible cases 1 2 3 xi aligns to yj x1 xi 1 xi y1 yj 1 yj xi aligns to a gap x1 xi 1 xi y1 yj yj aligns to a gap x1 xi y1 yj 1 yj F i j F i 1 j 1 m if xi yj s if not F i j F i 1 j d F i j F i j 1 d Lecture 1 Tuesday April 1 2003 4 Dynamic Programming cont d Example How do we know which case is correct x AGTA y ATA Inductive assumption F i j 1 F i 1 j F i 1 j 1 F i j i 0 1 A G T A j 0 0 1 2 3 4 are optimal Then F i 1 j 1 s xi yj F i 1 j d F i j 1 d F i j max Where s xi yj m if xi y j s if not 2 3 4 m 1 s 1 d 1 Optimal Alignment 1 A 1 1 0 1 2 F 4 3 2 2 T 2 0 0 1 0 AGTA A TA 3 A 3 1 1 0 2 Lecture 1 Tuesday April 1 2003 Lecture 1 Tuesday April 1 2003 The Needleman Wunsch Matrix The Needleman Wunsch Algorithm y1 yN x1 x M Lecture 1 Tuesday April 1 2003 1 Every nondecreasing path corresponds to an alignment of the two sequences Can think of it as a divide and conquer algorithm Performance j d i d Main Iteration Filling in partial alignments a For each i 1 M For each j 1 N F i 1 j d F i j max F i j 1 d F i 1 j 1 s x i y j Ptr i j 3 UP LEFT DIAG case 1 case 2 case 3 if case 1 if case 2 if case 3 Termination F M N is the optimal score and from Ptr M N can trace back optimal alignment Lecture 1 Tuesday April 1 2003 A variant of the basic algorithm Time O NM Space O NM Later we will cover more efficient methods Lecture 1 Tuesday April 1 2003 2 from 0 0 to M N Initialization a F 0 0 0 b F 0 j c F i 0 Maybe it is OK to have an unlimited of gaps in the beginning and end CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC GACCGC GGTCG Then we don t want to penalize gaps in the ends Lecture 1 Tuesday April 1 2003 5 Different types of overlaps The Overlap Detection variant y1 yN x1 x M Lecture 1 Tuesday April 1 2003 Changes 1 Initialization For all i j F i 0 0 F 0 j 0 2 Termination FOPT max maxi F i N maxj F M j Lecture 1 Tuesday April 1 2003 Next Lecture Local alignment More elaborate scoring function Memory efficient algorithms Reading Durbin Chapter 2 Gusfield Chapter 11 Lecture 1 Tuesday April 1 2003 6


View Full Document

Stanford CS 262 - Computational Genomics

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Computational Genomics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Genomics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Genomics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?