DOC PREVIEW
U of I CS 466 - Greedy Algorithms

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Greedy AlgorithmsCS 466Saurabh SinhaA greedy approach to themotif finding problem• Given t sequences of length n each, to finda profile matrix of length l.• Enumerative approach O(l nt)– Impractical?• Instead consider a more practical algorithmcalled “GREEDYMOTIFSEARCH”Chapter 5.5Greedy Motif Search• Find two closest l-mers in sequences 1 and 2 and form 2 x l alignment matrix with Score(s,2,DNA)• At each of the following t-2 iterations, finds a “best” l-mer in sequence ifrom the perspective of the already constructed (i-1) x l alignmentmatrix for the first (i-1) sequences• In other words, it finds an l-mer in sequence i maximizing Score(s,i,DNA) under the assumption that the first (i-1) l-mers have been alreadychosen• Sacrifices optimal solution for speed: in fact the bulk of the time isactually spent locating the first 2 l-mersGreedy Motif Search pseudocode• GREEDYMOTIFSEARCH (DNA, t, n, l)• bestMotif := (1,…,1)• s := (1,…,1)• for s1=1 to n-l+1 for s2 = 1 to n-l+1 if (Score(s,2,DNA) > Score(bestMotif,2,DNA) bestMotif1 := s1 bestMotif2 := s2• s1 := bestMotif1; s2 := bestMotif2• for i = 3 to t for si = 1 to n-l+1 if (Score(s,i,DNA) > Score(bestMotif,i,DNA) bestMotifi := si si := bestMotifi• Return bestMotifIssues with the “score”• Score of a profile matrix looks only atthe “majority” base in each column, notat the entire distribution• The issue of non-uniform “background”frequencies of bases in the genome• A better “score” of a profile matrix ?Information Content• First convert a “profile matrix” to a “position weightmatrix” or PWM– Convert frequencies to probabilities• PWM W: Wβk = frequency of base β at position k• qβ = frequency of base β by chance• Information content of W:! W"klogW"kq""#{A,C,G,T }$k$Information Content• If Wβk is always equal to qβ, i.e., if W issimilar to random sequence, informationcontent of W is 0.• If W is different from q, informationcontent is high.Greedy Motif Search• Can be trivially modified to use “Information Content” as the score• Use statistical criteria to evaluate significance of Information Content• At each step, instead of choosing the top (1) partial motif, keep the top kpartial motifs– “Beam search”• The program “CONSENSUS” from Stormo lab.• Further Reading: Hertz, Hartzell & Stormo, CABIOS (1990)http://bioinf.kvl.dk/~gorodkin/teach/bioinf2004/hertz90.pdfA new problem:Genome RearrangementsGenome Rearrangements• Most mouse genes have human orthologs(i.e., share common evolutionary ancestor)• The order of genes in the mouse genome isnot exactly the same as in human• However, there are subsets of genes withpreserved order between human-mouse (“insynteny”)Genome Rearrangements• The mouse genome can be cut into ~300 (not necessarilyequal) pieces and joined pasted together in a differentorder (“rearranged”) to obtain the gene order of thehuman genome• Each such piece is called a “synteny block”• Synteny blocks from different chromosomes in mouse aretogether on the same chromosome in humanComparative Genomic Architectures:Mouse vs Human Genome• Humans and micehave similar genomes,but their genes areordered differently• ~245 rearrangements– Reversals– Fusions– Fissions– TranslocationA type of rearrangement:Reversal132410568971, 2, 3, 4, 5, 6, 7, 8, 9, 10132410568971, 2, 3, -8, -7, -6, -5, -4, 9, 10A type of rearrangement:Reversal132410568971, 2, 3, -8, -7, -6, -5, -4, 9, 10The reversal introduced two breakpointsBreakpointsFour types of RearrangementsReversal1 2 3 4 5 6 1 2 -5 -4 -3 6Translocation1 2 344 5 61 2 64 5 31 2 3 4 5 61 2 3 4 5 6FusionFissionTurnip vs Cabbage: Almost IdenticalmtDNA gene sequences• In 1980s Jeffrey Palmer studied evolution of plantorganelles by comparing mitochondrial genomes ofthe cabbage and turnip• 99% similarity between genes• These surprisingly identical gene sequences differedin gene order• This study helped pave the way to analyzing genomerearrangements in molecular evolutionTransforming Cabbage intoTurnipGenome Rearrangement• Consider reversals only.– These are most common. “… genome rearrangementsis a common mode of molecular evolution inmitochondrial, chloroplast, viral and bacterial DNA”[Hannenhalli & Pevzner 1999].• How to transform one genome (i.e., geneordering) to another, using the least number ofreversals ?Reversals: Example π = 1 2 3 4 5 6 7 8 ρ(3,5) 1 2 5 4 3 6 7 8Reversals: Example π = 1 2 3 4 5 6 7 8 ρ(3,5) 1 2 5 4 3 6 7 8ρ(5,6)1 2 5 4 6 3 7 8Reversals and Gene Orders• Gene order is represented by a permutation π:π = π 1 ------ π i-1 π i π i+1 ------ π j-1 π j π j+1 ----- π n π 1 ------ π i-1 π j π j-1 ------ π i+1 π i π j+1 ----- πn Reversal ρ ( i, j ) reverses (flips) theelements from i to j in πρ(ι,j)Reversal Distance Problem• Goal: Given two permutations, find the shortest series ofreversals that transforms one into another• Input: Permutations π and σ• Output: A series of reversals ρ1,…ρt transforming π into σ, suchthat t is minimum• t - reversal distance between π and σDifferent way of stating the sameproblem: Sorting By Reversals• Goal: Given a permutation, find a shortest seriesof reversals that transforms it into the identitypermutation (1 2 … n )• Input: Permutation π• Output: A series of reversals ρ1, … ρttransforming π into the identity permutation suchthat t is minimumSorting By Reversals: A Greedy Algorithm• If sorting permutation π = 1 2 3 6 4 5, thefirst three elements are already in order soit does not make any sense to break them.• The length of the already sorted prefix of πis denoted prefix(π)– prefix(π) = 3• This results in an idea for a greedyalgorithm: increase prefix(π) at every step• Doing so, π can be sorted1 2 3 6 4 5 1 2 3 4 6 5 1 2 3 4 5 6• Number of steps to sort permutation of lengthn is at most (n – 1)Greedy&Algorithm:&An&ExampleGreedy Algorithm:PseudocodeSimpleReversalSort(π)1 for i  1 to n – 12 j  position of element i in π (i.e., πj = i)3 if j ≠i4 π  π * ρ(i, j)5 output π6 if π is the identity permutation7 returnAnalyzing SimpleReversalSort• SimpleReversalSort does not guaranteethe smallest number of reversals andtakes five steps on π = 6 1 2 3 4 5 :• Step 1: 1 6 2 3 4 5• Step 2: 1 2 6 3 4 5• Step 3: 1 2 3 6 4 5• Step 4:


View Full Document

U of I CS 466 - Greedy Algorithms

Download Greedy Algorithms
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Greedy Algorithms and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Greedy Algorithms 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?