CMU BSC 03510 - Clustering and Unmixing of Subcellular Patterns - D472856

Home> Schools> Carnegie Mellon University> Biological Sciences (BSC) > BSC 03510> Clustering and Unmixing of Subcellular Patterns

DOC PREVIEW

CMU BSC 03510 - Clustering and Unmixing of Subcellular Patterns

School name Carnegie Mellon University

Course Bsc 03510-

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Computational Biology, Part 24 Clustering and Unmixing of Subcellular PatternsUnsupervised Learning to Identify High-Resolution Protein PatternsLocation ProteomicsSlide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Decomposing (unmixing) complex patternsDecomposing mixture patternsObject type determinationCluster Number SelectionExample of Object TypesUnmixing: Learning strategySlide 19Two-stage Strategy for unmixing unknown imageTest samplesSlide 22Slide 23Slide 24Pattern unmixing resultsComputational Biology, Part 24Clustering and Unmixing of Subcellular PatternsComputational Biology, Part 24Clustering and Unmixing of Subcellular PatternsRobert F. MurphyRobert F. MurphyCopyright Copyright  1996, 1999, 2000-2009. 1996, 1999, 2000-2009.All rights reserved.All rights reserved.Unsupervised Learning to Identify High-Resolution Protein PatternsUnsupervised Learning to Identify High-Resolution Protein PatternsLocation ProteomicsLocation ProteomicsTagTag many proteins many proteinscDNA taggingcDNA taggingPut individual cDNAs into GFP tagging vector (puts GFP coding at Put individual cDNAs into GFP tagging vector (puts GFP coding at end)end)Transfect individual clones with each tagged cDNATransfect individual clones with each tagged cDNACD-tagging CD-tagging (developed by (developed by Jonathan Jarvik and Peter BergetJonathan Jarvik and Peter Berget): ): Infect population of cells with a retrovirus carrying DNA sequence Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cellthat will “tag” in a random gene in each cellIsolate separate Isolate separate clonesclones, each of which produces express one tagged , each of which produces express one tagged proteinproteinUse RT-PCR to Use RT-PCR to identify tagged geneidentify tagged gene in each clone in each cloneCollect Collect many live cell images many live cell images for each clone using for each clone using spinning disk confocal fluorescence microscopy or spinning disk confocal fluorescence microscopy or automated high-throughput microscopyautomated high-throughput microscopyImages of CD-tagged 3T3 cellsSLF features can be used to measure similarity of SLF features can be used to measure similarity of protein patternsprotein patternsThis allows us for the first time to create a This allows us for the first time to create a systematic, objective, framework for describing systematic, objective, framework for describing subcellular locations: a subcellular locations: a Subcellular Location Subcellular Location TreeTreeStart by grouping two proteins whose patterns are Start by grouping two proteins whose patterns are most similar, keep adding branches for less and most similar, keep adding branches for less and less similar patternsless similar patternsChen et al 2003;Chen and Murphy 2005Protein nameHuman descriptionFrom databaseshttp://murphylab.web.cmu.edu/services/PSLID/tree.htmlNucleolar ProteinsPunctate Nuclear ProteinsPredominantly Nuclear Proteins with Some Punctate Cytoplasmic StainingNuclear and Cytoplasmic Proteins with Some Punctate StainingUniformBottom: Visual Assignment to “known” locationsTop: Automated Grouping and AssignmentProtein namehttp://murphylab.web.cmu.edu/services/PSLID/tree.htmlDecomposing (unmixing) complex patternsDecomposing (unmixing) complex patternsDecomposingmixture patternsDecomposingmixture patternsClustering or classifying whole cell patterns Clustering or classifying whole cell patterns will consider each combination of two or will consider each combination of two or more “basic” patterns as a unique new more “basic” patterns as a unique new patternpatternDesirable to have a way to Desirable to have a way to decomposedecompose mixtures insteadmixtures insteadOne approach would be to assume that each One approach would be to assume that each basic pattern has a recognizable basic pattern has a recognizable combination of combination of different types of objectsdifferent types of objectsObject type determinationObject type determinationRather than specifying object types, we can Rather than specifying object types, we can choose to learn them from the datachoose to learn them from the dataUse subset of SLFs to describe objectsUse subset of SLFs to describe objectsPerform Perform kk-means clustering for -means clustering for kk from 2 to from 2 to 4040Evaluate goodness of clustering using Evaluate goodness of clustering using Akaike Information CriterionAkaike Information CriterionChoose Choose kk that gives lowest AIC that gives lowest AIC16Cluster Number SelectionCluster Number SelectionAkaike Information Criterion (AIC) = 2k – 2ln(L)Akaike Information Criterion (AIC) = 2k – 2ln(L)k=number of clustersk=number of clustersL=likelihood of modelL=likelihood of modelgiven datagiven data17Example of Object TypesExample of Object TypesType AType BType CType DUnmixing: Learning strategyUnmixing: Learning strategyOnce object types are known, each cell in Once object types are known, each cell in the training (pure) set can be represented as the training (pure) set can be represented as a vector of the amount of fluorescence for a vector of the amount of fluorescence for each object typeeach object typeLearn probability model for these vectors Learn probability model for these vectors for each classfor each classMixed images can then be represented using Mixed images can then be represented using mixture fractions times the probability mixture fractions times the probability distribution of objects for each classdistribution of objects for each class12345678Nuclear classLysosomal classGolgi class00.10.20.30.40.5Amt fluor.Object type12345678Nuclear classLysosomal classGolgi class00.10.20.30.40.5Amt fluor.Object type12345678Nuclear classLysosomal classGolgi classAll00.050.10.150.20.25Amt fluor.Object typePure Golgi PatternPure Golgi PatternPure Lysosomal Pattern50% mix of each50% mix of eachTwo-stage Strategy for unmixing unknown imageTwo-stage Strategy for unmixing unknown imageFind objects in unknown (test) image, Find objects in unknown (test) image, classify each object into one of the object classify each object into one of the object types using learned object type classifier types using learned object type classifier built with all objects from training imagesbuilt with all objects from training imagesFor each test image,

View Full Document