DOC PREVIEW
UCSD CSE 182 - Lecture

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Whole Genome Assembly Microarray analysisMate PairsSuper-contigs are quite largeProblem 3: RepeatsRepeats & ChimerismsRepeat detectionDetecting Repeat Contigs 1: Read DensityDetecting Chimeric readsContig assemblyCreating Super ContigsSupercontig assemblySupercontig mergingRepeat SupercontigsFilling gaps in SupercontigsConsenus Derivation & AssemblySummaryBiol. Data analysis: ReviewOther static analysis is possibleA Static picture of the cell is insufficientMicro-array analysisThe Biological ProblemPowerPoint PresentationGene Expression DataThree types of analysis problemsSupervised Classification: BasicsBasicsNon-linear separabilityFormalizing of the classification problem for micro-arraysFormalizing ClassificationBasic geometryDot ProductHyperplanePoints on the hyperplaneHyperplane propertiesSeparating by a hyperplaneError in classificationGradient DescentRosenblatt’s perceptron learning algorithmClassification based on perceptron learningPerceptron learningLinear Discriminant analysisLDA Cont’dMaximum Likelihood discriminationML discriminationML discrimination recipeWhole Genome AssemblyMicroarray analysisMate Pairs•Mate-pairs allow you to merge islands (contigs) into super-contigsSuper-contigs are quite large•Make clones of truly predictable length. EX: 3 sets can be used: 2Kb, 10Kb and 50Kb. The variance in these lengths should be small.•Use the mate-pairs to order and orient the contigs, and make super-contigs.Problem 3: RepeatsRepeats & Chimerisms •40-50% of the human genome is made up of repetitive elements.•Repeats can cause great problems in the assembly!•Chimerism causes a clone to be from two different parts of the genome. Can again give a completely wrong assemblyRepeat detection•Lander Waterman strikes again!•The expected number of clones in a Repeat containing island is MUCH larger than in a non-repeat containing island (contig).•Thus, every contig can be marked as Unique, or non-unique. In the first step, throw away the non-unique islands.RepeatDetecting Repeat Contigs 1: Read Density•Compute the log-odds ratio of two hypotheses:•H1: The contig is from a unique region of the genome.•The contig is from a region that is repeated at least twiceDetecting Chimeric reads•Chimeric reads: Reads that contain sequence from two genomic locations.•Good overlaps: G(a,b) if a,b overlap with a high score•Transitive overlap: T(a,c) if G(a,b), and G(b,c) •Find a point x across which only transitive overlaps occur. X is a point of chimerismContig assembly•Reads are merged into contigs upto repeat boundaries.•(a,b) & (a,c) overlap, (b,c) should overlap as well. Also, –shift(a,c)=shift(a,b)+shift(b,c)•Most of the contigs are unique pieces of the genome, and end at some Repeat boundary.•Some contigs might be entirely within repeats. These must be detectedCreating Super ContigsSupercontig assembly•Supercontigs are built incrementally•Initially, each contig is a supercontig.•In each round, a pair of super-contigs is merged until no more can be performed.•Create a Priority Queue with a score for every pair of ‘mergeable supercontigs’.–Score has two terms:•A reward for multiple mate-pair links•A penalty for distance between the links.Supercontig merging•Remove the top scoring pair (S1,S2) from the priority queue.•Merge (S1,S2) to form contig T.•Remove all pairs in Q containing S1 or S2•Find all supercontigs W that share mate-pair links with T and insert (T,W) into the priority queue.•Detect Repeated Supercontigs and removeRepeat Supercontigs•If the distance between two super-contigs is not correct, they are marked as Repeated•If transitivity is not maintained, then there is a RepeatFilling gaps in SupercontigsConsenus Derivation & Assembly•Summary–Do an “all pairs” prefix-suffix alignment. (Speedup using k-mer hashing).–Construct a graph of overlapping alignments.–Break the graph into “unique” regions (Number of clones similar to prediction using LW), and “repeat/chimeric” regions. Each such “unique’ region is called a contig.–Merge contigs into super-contigs using mate-pair links–For each contig, construct a multiple alignment, and consensus sequence.–Pad the consensus sequence using NNs.Summary•Once controversial, whole genome shotgun is now routine:–Human, Mouse, Rat, Dog, Chimpanzee..–Many Prokaryotes (One can be sequenced in a day)–Plant genomes: Arabidopsis, Rice –Model organisms: Worm, Fly, Yeast•WGS must be followed up with a finishing effort.•A lot is not known about genome structure, organization and function.–Comparative genomics offers low hanging fruit.Biol. Data analysis: ReviewProtein SequenceAnalysisSequence Analysis/DNA signalsGene FindingAssemblyOther static analysis is possibleProtein SequenceAnalysisSequence AnalysisGene FindingAssemblyncRNAGenomicAnalysis/Pop. GeneticsA Static picture of the cell is insufficient•Each Cell is continuously active, –Genes are being transcribed into RNA–RNA is translated into proteins–Proteins are PT modified and transported–Proteins perform various cellular functions•Can we probe the Cell dynamically? –Which transcripts are active? –Which proteins are active?–Which proteins interact?GeneRegulationProteomic profilingTranscript profilingMicro-array analysisThe Biological Problem•Two conditions that need to be differentiated, (Have different treatments).•EX: ALL (Acute Lymphocytic Leukemia) & AML (Acute Myelogenous Leukima)•Possibly, the set of genes over-expressed are different in the two conditionsSupplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50 genes most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent dataset. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples. The expression level of each gene in the independent dataset is shown relative to the mean of expression levels for that gene in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in ALL, the bottom panel shows genes more highly expressed in AML.Gene Expression Data•Gene Expression data:–Each row corresponds to a gene–Each column corresponds to an expression value•Can we


View Full Document

UCSD CSE 182 - Lecture

Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?