DOC PREVIEW
U of I CS 498 - Dynamic Programming

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Dynamic Programming (cont’d)Previous lecture cont’dAffine Gap PenaltiesAccounting for GapsAffine gap penalty in DPAffine Gap Penalty RecurrencesReading assignment Section 6.10 (J & P) Multiple AlignmentGene PredictionGene Prediction: Computational ChallengePowerPoint PresentationThe Genetic CodeCodonsGreat Discovery Provoking Wrong AssumptionExons and IntronsCentral Dogma and SplicingGene predictionGene prediction: broadly speakingStatistical approachesSplicing SignalsSlide 20Consensus splice sitesSplicing and gene predictionOpen Reading Frames (ORFs)ORFsLong vs.Short ORFsCodon usageSlide 27Codon Usage in Human GenomeSlide 29Promoter Structure in Prokaryotes (E.Coli)Ribosomal Binding SiteStatistical approaches: summarySimilarity based approachesThe basic approachExon chainingExon Chaining ProblemExon Chaining Problem: Graph RepresentationAssumptionsExon Chaining AlgorithmNot very helpfulSpliced AlignmentSpliced Alignment Problem: FormulationThe DAGDynamic programmingIdeaRecurrenceDynamic Programming (cont’d)CS 498 SSSaurabh SinhaPrevious lecture cont’dAffine Gap Penalties•In nature, a series of k indels often come as a single event rather than a series of k single nucleotide events:Normal scoring would give the same score for both alignmentsThis is more likely.This is less likely.Accounting for Gaps•Gaps- contiguous sequence of spaces in one of the rows•Score for a gap of length x is: -(ρ + σx) where ρ >0 is the penalty for introducing a gap: gap opening penalty ρ will be large relative to σ: gap extension penalty because you do not want to add too much of a penalty for extending the gap.Affine gap penalty in DP•When computing si,j, need to look at si,j-1, si,j-2, si,j-3,…. and si-1,j, si-2,j, … •Each cell needs O(n) time for update•O(n2) cells•Therefore, O(n3) algorithm•We can still do this in O(n2) timeAffine Gap Penalty Recurrencessi,j = s i-1,j - σ max s i-1,j –(ρ+σ)si,j = s i,j-1 - σ max s i,j-1 –(ρ+σ)si,j = si-1,j-1 + δ (vi, wj) max s i,j s i,jContinue Gap in w (deletion)Start Gap in w (deletion): from middleContinue Gap in v (insertion)Start Gap in v (insertion):from middleMatch or MismatchEnd deletion: from topEnd insertion: from bottomReading assignmentSection 6.10 (J & P)Multiple AlignmentGene Prediction•Gene: A sequence of nucleotides coding for protein•Gene Prediction Problem: Determine the beginning and end positions of genes in a genomeGene Prediction: Computational ChallengeThe Genetic CodeSOURCE: http://www.bioscience.org/atlases/genecode/genecode.htm•In 1961 Sydney Brenner and Francis Crick discovered frameshift mutations•Systematically deleted nucleotides from DNA–Single and double deletions dramatically altered protein product–Effects of triple deletions were minor–Conclusion: every triplet of nucleotides, each codo n, codes for exactly one amino acid in a proteinCodons•In 1964, Charles Yanofsky and Sydney Brenner proved collinearity in the order of codons with respect to amino acids in proteins•As a result, it was incorrectly assumed that the triplets encoding for amino acid sequences form contiguous strips of information. Great Discovery Provoking Wrong AssumptionExons and Introns•In eukaryotes, the gene is a combination of coding segments (exons) that are interrupted by non-coding segments (introns) •This makes computational gene prediction in eukaryotes even more difficult•Prokaryotes don’t have introns - Genes in prokaryotes are continuousCentral Dogma and Splicingexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = codingintron = non-codingBatzoglouGene prediction•More difficult in eukaryotes than in prokaryotes (due to introns).•In human genome, ~3% of DNA sequence is genes•Lot of “junk” DNA between genes, and even inside genes (between exons).•Gene prediction must deal with this.Gene prediction: broadly speaking•Statistical approaches:look for features than appear frequently in genes and infrequently elsewhere•Similarity based approaches: a newly sequenced gene may be similar to a known gene.–even this is not so simple. The exon structures may be different between otherwise similar genesStatistical approachesSplicing SignalsExons are interspersed with introns and typically flanked by GT and AGSplice site detection5’3’Donor sitePosition%-8 … -2 -1 0 1 2 … 17A 26 … 60 9 0 1 54 … 21C 26 … 15 5 0 1 2 … 27G 25 … 12 78 99 0 41 … 27T 23 … 13 8 1 98 3 … 25From lectures by Serafim Batzoglou (Stanford)Consensus splice sitesDonor: 7.9 bitsAcceptor: 9.4 bitsSplicing and gene prediction•Using splice sites (profiles) to predict genes ?•Limited scope, too many false predictions•Detect potential coding regions by looking at ORFs–A region of length n is comprised of (n/3) codons–Stop codons break genome into segments between consecutive Stop codons–The subsegments of these that start from the Start codon (ATG) are ORFsGenomic SequenceOpen reading frameATG TGAOpen Reading Frames (ORFs)ORFs•6 reading frames in any given sequence–6 ways to map the DNA sequence to codon sequence (+1,+2,+3,-1,-2,-3)–3 on either strand•Look at all 6 reading frames for ORFs•Long open reading frames may be a gene–At random, we should expect one stop codon every (64/3) ~= 21 codons–However, genes are usually much longer than this•A basic approach is to scan for ORFs whose length exceeds certain threshold–This is naïve because some genes (e.g. some neural and immune system genes) are relatively shortLong vs.Short ORFsCodon usage•In a given sequence (e.g., an ORF), compute frequency distribution of codons (64 element array): codon usage array•Codon usage array for coding sequences is different from that for non-coding sequences•If the codon usage array for an ORF is much more similar to that of coding sequences than to that of non-coding sequences, the ORF could be a geneCodon usage•Codons coding for “Arg” in human:–CGU: 37%, CGC: 38%, CGA: 7%, CGG: 10%, AGA: 5%, AGG: 3%–In a coding sequence, codon CGC is 12 times more likely than codon AGG–An ORF preferring CGC over AGG is likely to be a geneCodon Usage in Human GenomeCodon usage•One way to test if an ORF is a gene is to compute–Pr(ORF sequence under a coding sequence model)–Pr(ORF sequence under a non-coding model)–Ratio of the two.•These methods work best in prokaryotes•The


View Full Document

U of I CS 498 - Dynamic Programming

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Dynamic Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dynamic Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dynamic Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?