Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9ReferencesBREAST CANCER CLASSIFICATION USING ANNRoss NordstromIntro to Artificial Neural NetworksYu Hen HuDecember 14, 2010Learning Methods•K-Nearest Neighbor•Maximum Likelihood (using negative log)Training Methods•Leave One Out•Random grouping over ~5%-95% of data set used for training•Used 10+ different sets for each ratioLearning Methods•K-Nearest Neighbor•Maximum Likelihood (using negative log)Training Methods•Leave One Out•Random grouping over ~5%-95% of data set used for training•Used 10+ different sets for each ratioOverviewOverviewAbout~Leading Global Cause of Death in Women[1]~1.302M cases~460K deaths~30% in developed~45% in 3rd world~ Treatment based on the stage of the cancer~Not easily detectable. Early signs may be hidden or attributed to hormones~Mammograms detect 85-90%Existing ApproachesBasic~ 97.5% [2] - 1 separating plane in 3D space of worst area/smoothness* Has correctly diagnosed 176 consecutive new patientsDiagnosis~95.6%[4] - Parallel hyper planes~93.7%[4] - 1 nearest neighborPrognosis~ 86.3%[2] - Multisurface Method-Tree in 4D spaceAbout~Leading Global Cause of Death in Women[1]~1.302M cases~460K deaths~30% in developed~45% in 3rd world~ Treatment based on the stage of the cancer~Not easily detectable. Early signs may be hidden or attributed to hormones~Mammograms detect 85-90%Existing ApproachesBasic~ 97.5% [2] - 1 separating plane in 3D space of worst area/smoothness* Has correctly diagnosed 176 consecutive new patientsDiagnosis~95.6%[4] - Parallel hyper planes~93.7%[4] - 1 nearest neighborPrognosis~ 86.3%[2] - Multisurface Method-Tree in 4D spaceMaximum LikelihoodLeave-One-Out TrainingMaximum LikelihoodLeave-One-Out Training0 100 200 300 400 500 600 70000.51Basic Data - C-rate = 90.6296% Blue areas are accurate classifications0 100 200 300 400 500 60000.51Diagnostic Data - C-rate = 87.8735% Blue areas are accurate classifications0 20 40 60 80 100 120 140 160 180 20000.51Prognostic Data - C-rate = 76.2887% Blue areas are accurate classificationsMaximum LikelihoodRandomly Distributed Training Set (2%-98%)Maximum LikelihoodRandomly Distributed Training Set (2%-98%)0 10 20 30 40 50 60 70 80 90050100Basic Data - Classification Rate vs Training:Testing Ratio% of Data used for training 0 10 20 30 40 50 60 70 80 90050100Diagnostic Data - Classification Rate vs Training:Testing Ratio% of Data used for training AveBest0 10 20 30 40 50 60 70 80 90050100Prognostic Data - Classification Rate vs Training:Testing Ratio% of Data used for training AveBestAveBestPeak : 100%Best avg: 80%Avg: 64%Peak : 100%Best avg: 91%Avg: 86%Peak : 100%Best avg: 91%Avg: 89%K-Nearest NeighborLeave-One-Out Training1-15 NeighborsK-Nearest NeighborLeave-One-Out Training1-15 Neighbors0 5 10 15050100Basic Data - Accuracy vs Neighbor Size (Best is 1-nn: 85.5051%)k -nearest neighbors0 5 10 15050100Diagnostic Data - Accuracy vs Neighbor Size (Best is 11-nn: 88.5764%)k -nearest neighbors% c la s s ifi c a t io n e rro r0 5 10 15050100Prognostic Data - Accuracy vs Neighbor Size (Best is 11-nn: 75.2577%)k -nearest neighbors Basic Diag ProgMean: 81% 86% 68%Median: 80% 87% 69%Std: 2.25 3.40 4.79K-Nearest NeighborRandomly Distributed Training Set (2%-98%) 1-10 NeighborsK-Nearest NeighborRandomly Distributed Training Set (2%-98%) 1-10 Neighbors1 2 3 4 5 6 7 8 9 10708090100Basic Data - Classification Rate vs K-Neighbors Best of BestBest of AvgMean of Avg1 2 3 4 5 6 7 8 9 1060708090100Classifi cation accuracy (% )Diagnostic Data - Classification Rate vs K-Neighbors Best of BestBest of AvgMean of Avg1 2 3 4 5 6 7 8 9 10406080100Number (K) of NeighborsPrognostic Data - Classification Rate vs K-Neighbors Best of BestBest of AvgMean of Avg0 10 20 30 40 50 60 70 80 90 100405060708090100Diagnostic Data - Classification Rate vs Training:Testing Ratio% of data used for trainingc l a s s i fi c a t i o n r a t e ( % ) K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10AverageBest0 10 20 30 40 50 60 70 80 90 100405060708090100Basic Data - Classification Rate vs Training:Testing Ratio% of data used for trainingc l a s s i fi c a t i o n r a t e ( % ) K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10AverageBest0 10 20 30 40 50 60 70 80 90 10050556065707580859095100Prognostic Data - Classification Rate vs Training:Testing Ratio% of data used for trainingc l a s s i fi c a t i o n r a t e ( % ) K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10K=1K=2K=3K=4K=5K=6K=7K=8K=9K=10BestAverage~Existing results are reachable~Prognosis is the most difficult~Better accuracy on groups than anecdotes~Plane Separation > Max Likelihood > K-Nearest~Existing results are reachable~Prognosis is the most difficult~Better accuracy on groups than anecdotes~Plane Separation > Max Likelihood > K-NearestConclusionsConclusionsReferences[1] Garcia M, Jemal A, Ward EM, Center MM, Hao Y, Siegel RL, Thun MJ.Global Cancer Facts & Figures 2007. Atlanta, GA: American Cancer Society, 2007.[2] Wolberg, Dr. William H. "Breast Cancer Wisconsin (Diagnostic) Data Set." UCI Machine Learning Repository(1995): n. pag. Web. 30 Nov 2010.[3] Zielinski, Jerzy, Nidhal Bouaynaya, and Dan Schonfeld. "Two-dimensional ARMA modeling for breast cancer detection and classification.” Engineering Village (2010): n. pag. Web. 30 Nov 2010.[4] Zwitter, Matjaz, and Milan Soklic. "Breast Cancer Data Set."UCI Machine Learning Repository (1988): n. pag. Web. 20 Nov
View Full Document