NYU CSCI-GA 3033 - Data Mining - Classification - D1292950

Home> Schools> New York University> Computer Science (CSCI-GA) > CSCI-GA 3033> Data Mining - Classification

DOC PREVIEW

NYU CSCI-GA 3033 - Data Mining - Classification

School name New York University

Course Csci-Ga 3033- Cloud Computing

Pages 70

This preview shows page 1-2-3-4-5-33-34-35-36-66-67-68-69-70 out of 70 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 70 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Data Mining: ClassificationClassification and PredictionClassification vs. PredictionClassification—A Two-Step ProcessClassification Process (1): Model ConstructionClassification Process (2): Use the Model in PredictionSupervised vs. Unsupervised LearningSlide 8Issues (1): Data PreparationIssues (2): Evaluating Classification MethodsSlide 11Classification by Decision Tree InductionTraining DatasetOutput: A Decision Tree for “buys_computer”Algorithm for Decision Tree InductionAttribute Selection MeasureInformation Gain (ID3/C4.5)Information Gain in Decision Tree InductionAttribute Selection by Information Gain ComputationGini Index (IBM IntelligentMiner)Extracting Classification Rules from TreesAvoid Overfitting in ClassificationApproaches to Determine the Final Tree SizeEnhancements to basic decision tree inductionClassification in Large DatabasesScalable Decision Tree Induction Methods in Data Mining StudiesData Cube-Based Decision-Tree InductionPresentation of Classification ResultsSlide 29Bayesian Classification: Why?Bayesian TheoremBayesian classificationEstimating a-posteriori probabilitiesNaïve Bayesian ClassificationPlay-tennis example: estimating P(xi|C)Play-tennis example: classifying XThe independence hypothesis…Bayesian Belief Networks (I)Bayesian Belief Networks (II)Slide 42Neural NetworksA NeuronNetwork TrainingMulti-Layer PerceptronSlide 48Association-Based ClassificationSlide 50Other Classification MethodsInstance-Based MethodsThe k-Nearest Neighbor AlgorithmDiscussion on the k-NN AlgorithmCase-Based ReasoningRemarks on Lazy vs. Eager LearningGenetic AlgorithmsRough Set ApproachFuzzy SetsSlide 60What Is Prediction?Predictive Modeling in DatabasesRegress Analysis and Log-Linear Models in PredictionPrediction: Numerical DataPrediction: Categorical DataSlide 66Classification Accuracy: Estimating Error RatesBoosting and BaggingBoosting Technique (II) — AlgorithmSlide 70SummaryReferences (I)References (II)Data Mining: ClassificationClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryClassification: predicts categorical class labelsclassifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new dataPrediction: models continuous-valued functions, i.e., predicts unknown or missing values Typical Applicationscredit approvaltarget marketingmedical diagnosistreatment effectiveness analysisClassification vs. PredictionClassification—A Two-Step Process Model construction: describing a set of predetermined classesEach tuple/sample is assumed to belong to a predefined class, as determined by the class label attributeThe set of tuples used for model construction: training setThe model is represented as classification rules, decision trees, or mathematical formulaeModel usage: for classifying future or unknown objectsEstimate accuracy of the modelThe known label of test sample is compared with the classified result from the modelAccuracy rate is the percentage of test set samples that are correctly classified by the modelTest set is independent of training set, otherwise over-fitting will occurClassification Process (1): Model ConstructionTrainingDataNAME RAN K YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 noClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’ Classifier(Model)Classification Process (2): Use the Model in PredictionClassifierTestingDataNAME RANK YEAR S TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yesUnseen Data(Jeff, Professor, 4)Tenured?Supervised vs. Unsupervised LearningSupervised learning (classification)Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observationsNew data is classified based on the training setUnsupervised learning (clustering)The class labels of training data is unknownGiven a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the dataClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryIssues (1): Data PreparationData cleaningPreprocess data in order to reduce noise and handle missing valuesRelevance analysis (feature selection)Remove the irrelevant or redundant attributesData transformationGeneralize and/or normalize dataIssues (2): Evaluating Classification MethodsPredictive accuracySpeed and scalabilitytime to construct the modeltime to use the modelRobustnesshandling noise and missing valuesScalabilityefficiency in disk-resident databases Interpretability: understanding and insight provded by the modelGoodness of rulesdecision tree sizecompactness of classification rulesClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryClassification by Decision Tree InductionDecision tree A flow-chart-like tree structureInternal node denotes a test on an attributeBranch represents an outcome of the testLeaf nodes represent class labels or class distributionDecision tree generation consists of two phasesTree constructionAt start, all the training examples are at the rootPartition examples recursively based on selected attributesTree pruningIdentify and remove branches that reflect noise or outliersUse of decision tree:

View Full Document