DOC PREVIEW
Unsupervised Learning

This preview shows page 1-2-3-4-5-38-39-40-41-42-43-76-77-78-79-80 out of 80 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Unsupervised Learning with Random Forest Predictors: Applied to Tissue Microarray DataContentsAcknowledgementsTissue Microarray DataTissue Microarray DNA MicroarrayTissue Array SectionKi-67 Expression in Kidney CancerMultiple measurements per patient: Several spots per tumor sample and several “scores” per spotProperties of TMA DataSlide 10Thresholding methods for tumor marker expressionsTumor class discovery Keywords: unsupervised learning, clusteringTumor Class DiscoveryTumor Class Discovery using DNA Microarray DataClusters involving TMA data may have unconventional shapes: Low risk prostate cancer patients are colored in black.Unconventional shape of a clinically meaningful patient clusterHow to cluster patients on the basis of Tissue Microarray Data?A dissimilarity measure is an essential input for tumor class discoveryChallengeWe have found that a random forest (Breiman 2001) dissimilarity can work well in the unsupervised analysis of TMA data. Shi et al 2004, Seligson et al 2005. http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htmKidney cancer: Comparing PAM clusters that result from using the RF dissimilarity vs the Euclidean distanceThe RF dissimilarity is determined by dependent tumor markersThe RF cluster can be described using a thresholding rule involving the most dependent markersRandom Forest Predictors Breiman L. Random forests. Machine Learning 2001;45(1):5-32 http://stat-www.berkeley.edu/users/breiman/RandomForests/Tree predictors are the basic unit of random forest predictorsAn example of CARTSlide 27CART ConstructionRF ConstructionRandom Forest (RF)Prediction by plurality votingRandom forest predictors give rise to a dissimilarity measureIntrinsic Similarity MeasureSlide 34Unsupervised problem as a Supervised problem (RF implementation)Two standard ways of generating synthetic covariatesRF clusteringUnderstanding RF Clustering (Theoretical Studies) Shi, T. and Horvath, S. (2005) “Unsupervised learning using random forest predictors” J. Comp. Graph. StatAbstract: Random forest dissimilarityGeometric interpretation of RF clustersGeometric interpretation of RF clustersRF clustering is not rotationally invariantSimulated Example ExRule: contrast RF dissimilarity with Euclidean distanceSimulated Cluster structureExample ExRuleThe clustering results for example ExRuleTypical Addcl2 ExampleNature of Addcl2 RF clusteringRF dissimilarity vs. Euclidean distance (DNA Microarray Data)Theoretical reasons for using an RF dissimilarity for TMA dataApplications to prostate tissue microarray data Seligson DB, Horvath S, Shi T, Yu H, Tze S, Grunstein M, Kurdistani SK (2005) Global histone modification patterns predict risk of prostate recurrence. NatureSlide 52Analysis OutlineCluster Analysis of Low Gleason Score Prostate Samples (UCLA data)1) Construct a tumor marker rule for predicting RF cluster membership. 2) Validate the rule predictions in an independent data setDiscussion Prostate TMA DataSummarySlide 58References RF clusteringApplications to renal cell carcinoma tissue microarray data Shi T, Seligson D, Belldegrun AS, Palotie A, Horvath S (2005) Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma. Mod Pathol. 2005 Apr;18(4):547-57.TMA DataMDS Plot of All the RCC PatientsInterpreting the clusters in terms of survivalHierarchical clustering with Euclidean distance leads to less satisfactory resultsMolecular grouping is superior to pathological groupingIdentify “irregular” patients`Regular’ Clear Cell Patients`Regular’ Clear Cell Patients (cont.)Detect novel cancer subtypesResults TMA clusteringSlide 71THE ENDAppendixCasting an unsupervised problem into a supervised problemSlide 75Slide 76RF variable importance vs. Average Corr and Cox p valueWhich multi-dimensional scaling method to use?The random forest dissimilarity L. Breiman: RF manual Technical Report: Shi and Horvath 2005 http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htmFrequency plot of the same tumor marker in 2 independent data setsUnsupervised Learning with Random Forest Predictors:Applied to Tissue Microarray DataSteve HorvathBiostatistics and Human GeneticsUniversity of California, LAContents•Tissue Microarray Data•Random forest (RF) predictors•Understanding RF clustering–Shi, T. and Horvath, S. (2006) “Unsupervised learning using random forest predictors” J. Comp. Graph. Stat.•Applications to Tissue Microarray Data:•Shi et al (2004) “Tumor Profiling of Renal Cell Carcinoma Tissue Microarray Data” Modern Pathology•Seligson DB et al (2005) Global histone modification patterns predict risk of prostate cancer recurrence. NatureAcknowledgements•Former students & Postdocs for TMA–Tao Shi, PhD–Tuyen Hoang, PhD –Yunda Huang, PhD–Xueli Liu, PhDUCLA Tissue Microarray Core–David Seligson, MD–Aarno Palotie, MD –Arie Belldegrun, MD–Robert Figlin, MD–Lee Goodglick, MD–David Chia, MD–Siavash Kurdistani, MDTissue Microarray DataTissue MicroarrayDNA MicroarrayTissue Array Section ~700 TissueSamples0.6 mm 0.2mmKi-67 Expression in Kidney CancerHigh GradeLow GradeMessage: brown staining related to tumor gradeMultiple measurements per patient:Several spots per tumor sample and several “scores” per spot• Maximum intensity = Max• Percent of cells staining = Pos• Spots have a spot grade: NL,1,2,.• Each patients (tumor sample) is usually represented by multiple spots – 3 tumor spots –1 matched normal spotProperties of TMA Data•Highly skewed, non-normal, semi-continuous.–Often a good idea to model as ordinal variables with many levels.•Staining scores of the same markers are highly correlated0 20 40 60 80 1000 50 100 150 200 2500 20 40 60 80 1000 50 100 150 200 2500 20 40 60 80 1000 50 100 150 2000 0.5 1 1.5 2 2.5 30 50 100 1500 0.5 1 1.5 2 2.5 30 50 100 150 2000 0.5 1 1.5 2 2.5 30 50 100 150P53 CA9EpCamPercent of Cells Staining(POS)Maximum Intensity (MAX)Histogram of tumor marker expression scores: POS and MAXThresholding methods for tumor marker expressions•Since clinicians and pathologists prefer thresholding tumor marker expressions, it is natural to use statistical methods that are based on thresholding covariates, e.g. regression trees, survival trees, rpart, forest predictors etc.•Dichotomized marker expressions are often fitted in a Cox (or alternative) regression model–Danger: Over-fitting due to optimal cut-off


Unsupervised Learning

Download Unsupervised Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unsupervised Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unsupervised Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?