Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Inferring Cellular Processes from Coexpressing GenesDaniel KorenblumNovember 26, 2001Motivation for Clustering●High throughput experiments●Reduce complexity by coarse graining:●Extract essential features●Visualize data matrix entries with efficient display●Obtain similarities that reflect biological properties1998: Eisen, Spellman, Brown, & Botstein●Average Linkage Clustering of Time Courses●Correlation measures similarity (scale invariant)●Fixed offset:●Genes assumed symmetric with respect to changes from reference state●Reorder genes:●Permute rows of expression data matrix●Proximity corresponds to similarityWhat determines the Patterns●Assess the significance of the clusters●Could results be statistical artifacts?●Swap matrix elements ●Apply clustering algorithm: ●See different patterns●No prolonged correlations●Signal from different conditions counteracts noise from single observations and cDNA variations●Biologically interpretable implies significantGene Shaving●Avoids a single reordering for all genes●Different genes may require different measures of similarity●Use the principle component of a set of genes (eigengen e) as a reference state●Select genes with high covariance with the eigengeneGene Shaving, Cont'd●High variation across samples●Strong correlation across genes (coherence)●Hierarchical methods address variations over samples●Supervising affects average gene effects to select strong contributions on predictvie abilitiesConclusions●Change in methodology over the past few years●Array data holds comprehensive picture of cellular
View Full Document