Computational Biology, Part 24 Clustering and Unmixing of Subcellular PatternsUnsupervised Learning to Identify High-Resolution Protein PatternsLocation ProteomicsSlide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Decomposing (unmixing) complex patternsDecomposing mixture patternsObject type determinationCluster Number SelectionExample of Object TypesUnmixing: Learning strategySlide 19Two-stage Strategy for unmixing unknown imageTest samplesSlide 22Slide 23Slide 24Pattern unmixing resultsComputational Biology, Part 24Clustering and Unmixing of Subcellular PatternsComputational Biology, Part 24Clustering and Unmixing of Subcellular PatternsRobert F. MurphyRobert F. MurphyCopyright Copyright 1996, 1999, 2000-2009. 1996, 1999, 2000-2009.All rights reserved.All rights reserved.Unsupervised Learning to Identify High-Resolution Protein PatternsUnsupervised Learning to Identify High-Resolution Protein PatternsLocation ProteomicsLocation ProteomicsTagTag many proteins many proteinscDNA taggingcDNA taggingPut individual cDNAs into GFP tagging vector (puts GFP coding at Put individual cDNAs into GFP tagging vector (puts GFP coding at end)end)Transfect individual clones with each tagged cDNATransfect individual clones with each tagged cDNACD-tagging CD-tagging (developed by (developed by Jonathan Jarvik and Peter BergetJonathan Jarvik and Peter Berget): ): Infect population of cells with a retrovirus carrying DNA sequence Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cellthat will “tag” in a random gene in each cellIsolate separate Isolate separate clonesclones, each of which produces express one tagged , each of which produces express one tagged proteinproteinUse RT-PCR to Use RT-PCR to identify tagged geneidentify tagged gene in each clone in each cloneCollect Collect many live cell images many live cell images for each clone using for each clone using spinning disk confocal fluorescence microscopy or spinning disk confocal fluorescence microscopy or automated high-throughput microscopyautomated high-throughput microscopyImages of CD-tagged 3T3 cellsSLF features can be used to measure similarity of SLF features can be used to measure similarity of protein patternsprotein patternsThis allows us for the first time to create a This allows us for the first time to create a systematic, objective, framework for describing systematic, objective, framework for describing subcellular locations: a subcellular locations: a Subcellular Location Subcellular Location TreeTreeStart by grouping two proteins whose patterns are Start by grouping two proteins whose patterns are most similar, keep adding branches for less and most similar, keep adding branches for less and less similar patternsless similar patternsChen et al 2003;Chen and Murphy 2005Protein nameHuman descriptionFrom databaseshttp://murphylab.web.cmu.edu/services/PSLID/tree.htmlNucleolar ProteinsPunctate Nuclear ProteinsPredominantly Nuclear Proteins with Some Punctate Cytoplasmic StainingNuclear and Cytoplasmic Proteins with Some Punctate StainingUniformBottom: Visual Assignment to “known” locationsTop: Automated Grouping and AssignmentProtein namehttp://murphylab.web.cmu.edu/services/PSLID/tree.htmlDecomposing (unmixing) complex patternsDecomposing (unmixing) complex patternsDecomposingmixture patternsDecomposingmixture patternsClustering or classifying whole cell patterns Clustering or classifying whole cell patterns will consider each combination of two or will consider each combination of two or more “basic” patterns as a unique new more “basic” patterns as a unique new patternpatternDesirable to have a way to Desirable to have a way to decomposedecompose mixtures insteadmixtures insteadOne approach would be to assume that each One approach would be to assume that each basic pattern has a recognizable basic pattern has a recognizable combination of combination of different types of objectsdifferent types of objectsObject type determinationObject type determinationRather than specifying object types, we can Rather than specifying object types, we can choose to learn them from the datachoose to learn them from the dataUse subset of SLFs to describe objectsUse subset of SLFs to describe objectsPerform Perform kk-means clustering for -means clustering for kk from 2 to from 2 to 4040Evaluate goodness of clustering using Evaluate goodness of clustering using Akaike Information CriterionAkaike Information CriterionChoose Choose kk that gives lowest AIC that gives lowest AIC16Cluster Number SelectionCluster Number SelectionAkaike Information Criterion (AIC) = 2k – 2ln(L)Akaike Information Criterion (AIC) = 2k – 2ln(L)k=number of clustersk=number of clustersL=likelihood of modelL=likelihood of modelgiven datagiven data17Example of Object TypesExample of Object TypesType AType BType CType DUnmixing: Learning strategyUnmixing: Learning strategyOnce object types are known, each cell in Once object types are known, each cell in the training (pure) set can be represented as the training (pure) set can be represented as a vector of the amount of fluorescence for a vector of the amount of fluorescence for each object typeeach object typeLearn probability model for these vectors Learn probability model for these vectors for each classfor each classMixed images can then be represented using Mixed images can then be represented using mixture fractions times the probability mixture fractions times the probability distribution of objects for each classdistribution of objects for each class12345678Nuclear classLysosomal classGolgi class00.10.20.30.40.5Amt fluor.Object type12345678Nuclear classLysosomal classGolgi class00.10.20.30.40.5Amt fluor.Object type12345678Nuclear classLysosomal classGolgi classAll00.050.10.150.20.25Amt fluor.Object typePure Golgi PatternPure Golgi PatternPure Lysosomal Pattern50% mix of each50% mix of eachTwo-stage Strategy for unmixing unknown imageTwo-stage Strategy for unmixing unknown imageFind objects in unknown (test) image, Find objects in unknown (test) image, classify each object into one of the object classify each object into one of the object types using learned object type classifier types using learned object type classifier built with all objects from training imagesbuilt with all objects from training imagesFor each test image,
View Full Document