DOC PREVIEW
U of I CS 498 - DNA Sequencing

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DNA SequencingOutlineThe Basic Shotgun Sequencing Strategy Step 1: Fragment SequencingSlide 4Step 1: Generating Read (Sanger Method)Step 2: Shortest Superstring ProblemShortest Superstring Problem: ExampleA Greedy Algorithm for SSPA Special Case of SSPSequencing By HybridizationHybridization on DNA Arrayl-mer compositionSlide 13Slide 14Different sequences – the same spectrumThe SBH ProblemHow do we solve the SBH problem efficiently?Detour… Graph AlgorithmsThe Bridge Obsession ProblemFormalization of Königsberg Bridge Problem: Graph & Eulerian CycleHamiltonian Cycle ProblemBalanced GraphsEuler TheoremEuler Theorem: ProofAlgorithm for Constructing an Eulerian CycleAlgorithm for Constructing an Eulerian Cycle (cont’d)Algorithm for Constructing an Eulerian Cycle (cont’d)Euler Theorem: ExtensionEnd of Detour… Let’s see how graph algorithms can help DNA sequencing…Reducing SSP to TSP (Traveling Salesman Problem)Reducing SSP to TSP (cont’d)SSP to TSP: An ExampleSBH: Hamiltonian Path ApproachSBH: Hamiltonian Path ApproachSlide 35SBH: Eulerian Path ApproachSlide 37Some Difficulties with SBHThe Problem of RepeatsWhat You Should KnowDNA Sequencing(Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005ChengXiang ZhaiDepartment of Computer ScienceUniversity of Illinois, Urbana-ChampaignMany slides are taken/adapted from http://www.bioalgorithms.info/slides.htmOutline•The Basic Shotgun Sequencing Strategy•The shortest superstring problem–Graph algorithms•Sequencing by HybridizationThe Basic Shotgun Sequencing StrategyStep 1: Fragment Sequencingcut many times at random (Shotgun)genomic segmentGet one or two reads from each segment~500 bp ~500 bpThe Basic Shotgun Sequencing StrategyStep 2: Fragment AssemblyCover region with ~7-fold redundancyOverlap reads and extend to reconstruct the original genomic regionreadsStep 1: Generating Read (Sanger Method)1. Start at primer (restriction site)2. Grow DNA chain3. Include ddNTPs 4. Stops reaction at all possible points5. Separate products by length, using gel electrophoresisTAA ... T …TStep 2: Shortest Superstring Problem•Problem: Given a set of strings, find a shortest string that contains all of them•Input: Strings s1, s2,…., sn•Output: A string s that contains all strings s1, s2,…., sn as substrings, such that the length of s is minimized•Complexity: NP – complete How likely is the found s indeed the original genome? s approaches the genome as n if no sequencing error and fragmentation is randomShortest Superstring Problem: ExampleHow do we solve such a problem efficiently? - Greedy algorithms (approximation) - Efficient algorithms exist for special cases and are related to “graph algorithms”A Greedy Algorithm for SSP•For each pair of (segment) strings, compute an overlap score•Merge the pair with the highest score•Repeat until no more strings can be merged•If multiple strings are left, any concatenation of them would be a solution. Think about an example when this algorithm is not optimal…A Special Case of SSP•When each segment is an L-mer (L-gram), linear algorithm exists!•This makes it attractive to do “sequencing by hybridization”…Sequencing By Hybridization•Attach all possible DNA probes of length l (e.g., l = 8) to a flat surface, each probe at a distinct and known location. This set of probes is called the DNA array.•Apply a solution containing fluorescently labeled DNA fragment to the array.•The DNA fragment hybridizes with those probes that are complementary to substrings of length l of the fragment, allowing us to see which l-mers match the DNA fragmentHybridization on DNA Arrayl-mer composition•Define Spectrum ( s, l ) as the unordered multiset of all possible (n – l + 1) l-mers in a string s of length n•The order of individual elements in Spectrum ( s, l ) does not matterl-mer composition•For example, for s = TATGGTGC all of the following are equivalent representations of Spectrum ( s, 3 ): {TAT, ATG, TGG, GGT, GTG, TGC} {ATG, GGT, GTG, TAT, TGC, TGG} {TGG, TGC, TAT, GTG, GGT, ATG}l-mer composition•For example, for s = TATGGTGC all of the following are equivalent representations of Spectrum ( s, 3 ): {TAT, ATG, TGG, GGT, GTG, TGC} {ATG, GGT, GTG, TAT, TGC, TGG} {TGG, TGC, TAT, GTG, GGT, ATG} We usually choose the lexicographically maximal representation as the canonical one.Different sequences – the same spectrum•Different sequences may have the same spectrum: Spectrum(GTATCT,2)= Spectrum(GTCTAT,2)= {AT, CT, GT, TA, TC}The SBH Problem•Goal: Reconstruct a string from its l-mer composition•Input: A set S, representing all l-mers from an (unknown) string s•Output: String s such that Spectrum ( s,l ) = S and the length of s is minimumHow likely is the found s indeed the original genome? s approaches the genome as l if no sequencing errorHow do we solve the SBH problem efficiently? The solution is related to graph algorithms…Detour… Graph AlgorithmsThe Bridge Obsession ProblemBridges of KönigsbergFind a tour crossing every bridge just onceLeonhard Euler, 1735Formalization of Königsberg Bridge Problem: Graph & Eulerian Cycle•Graph G=(V,E)–V= Vertices; E= Edges•Eulerian cycle: A cycle that visits every edge exactly once•Linear time algorithm existsMore complicated KönigsbergHamiltonian Cycle Problem•Find a cycle that visits every vertex exactly once•NP – complete Game invented by Sir William Hamilton in 1857Balanced Graphs •A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing vertices: in(v)=out(v)Euler Theorem•A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing vertices: in(v)=out(v)•Theorem: A connected graph is Eulerian if and only if each of its vertices is balanced.Euler Theorem: Proof•Eulerian  balanced for every edge entering v (incoming edge) there exists an edge leaving v (outgoing edge). Therefore in(v)=out(v)•balanced  Eulerian ???Algorithm for Constructing an Eulerian Cycle a. Start with an arbitrary vertex v and form an arbitrary cycle with unused edges until a dead end is reached. Since the graph is Eulerian this dead end is necessarily the starting point, i.e., vertex v.Algorithm for Constructing an Eulerian Cycle (cont’d)b. If cycle from (a) above is not an Eulerian cycle, it must contain a


View Full Document

U of I CS 498 - DNA Sequencing

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download DNA Sequencing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DNA Sequencing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DNA Sequencing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?