DOC PREVIEW
NYU CSCI-GA 3033 - Data Mining - Classification

This preview shows page 1-2-3-4-5-33-34-35-36-66-67-68-69-70 out of 70 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Data Mining: ClassificationClassification and PredictionClassification vs. PredictionClassification—A Two-Step ProcessClassification Process (1): Model ConstructionClassification Process (2): Use the Model in PredictionSupervised vs. Unsupervised LearningSlide 8Issues (1): Data PreparationIssues (2): Evaluating Classification MethodsSlide 11Classification by Decision Tree InductionTraining DatasetOutput: A Decision Tree for “buys_computer”Algorithm for Decision Tree InductionAttribute Selection MeasureInformation Gain (ID3/C4.5)Information Gain in Decision Tree InductionAttribute Selection by Information Gain ComputationGini Index (IBM IntelligentMiner)Extracting Classification Rules from TreesAvoid Overfitting in ClassificationApproaches to Determine the Final Tree SizeEnhancements to basic decision tree inductionClassification in Large DatabasesScalable Decision Tree Induction Methods in Data Mining StudiesData Cube-Based Decision-Tree InductionPresentation of Classification ResultsSlide 29Bayesian Classification: Why?Bayesian TheoremBayesian classificationEstimating a-posteriori probabilitiesNaïve Bayesian ClassificationPlay-tennis example: estimating P(xi|C)Play-tennis example: classifying XThe independence hypothesis…Bayesian Belief Networks (I)Bayesian Belief Networks (II)Slide 42Neural NetworksA NeuronNetwork TrainingMulti-Layer PerceptronSlide 48Association-Based ClassificationSlide 50Other Classification MethodsInstance-Based MethodsThe k-Nearest Neighbor AlgorithmDiscussion on the k-NN AlgorithmCase-Based ReasoningRemarks on Lazy vs. Eager LearningGenetic AlgorithmsRough Set ApproachFuzzy SetsSlide 60What Is Prediction?Predictive Modeling in DatabasesRegress Analysis and Log-Linear Models in PredictionPrediction: Numerical DataPrediction: Categorical DataSlide 66Classification Accuracy: Estimating Error RatesBoosting and BaggingBoosting Technique (II) — AlgorithmSlide 70SummaryReferences (I)References (II)Data Mining: ClassificationClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryClassification: predicts categorical class labelsclassifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new dataPrediction: models continuous-valued functions, i.e., predicts unknown or missing values Typical Applicationscredit approvaltarget marketingmedical diagnosistreatment effectiveness analysisClassification vs. PredictionClassification—A Two-Step Process Model construction: describing a set of predetermined classesEach tuple/sample is assumed to belong to a predefined class, as determined by the class label attributeThe set of tuples used for model construction: training setThe model is represented as classification rules, decision trees, or mathematical formulaeModel usage: for classifying future or unknown objectsEstimate accuracy of the modelThe known label of test sample is compared with the classified result from the modelAccuracy rate is the percentage of test set samples that are correctly classified by the modelTest set is independent of training set, otherwise over-fitting will occurClassification Process (1): Model ConstructionTrainingDataNAME RAN K YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 noClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’ Classifier(Model)Classification Process (2): Use the Model in PredictionClassifierTestingDataNAME RANK YEAR S TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yesUnseen Data(Jeff, Professor, 4)Tenured?Supervised vs. Unsupervised LearningSupervised learning (classification)Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observationsNew data is classified based on the training setUnsupervised learning (clustering)The class labels of training data is unknownGiven a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the dataClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryIssues (1): Data PreparationData cleaningPreprocess data in order to reduce noise and handle missing valuesRelevance analysis (feature selection)Remove the irrelevant or redundant attributesData transformationGeneralize and/or normalize dataIssues (2): Evaluating Classification MethodsPredictive accuracySpeed and scalabilitytime to construct the modeltime to use the modelRobustnesshandling noise and missing valuesScalabilityefficiency in disk-resident databases Interpretability: understanding and insight provded by the modelGoodness of rulesdecision tree sizecompactness of classification rulesClassification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian ClassificationClassification by backpropagationClassification based on concepts from association rule miningOther Classification MethodsPredictionClassification accuracySummaryClassification by Decision Tree InductionDecision tree A flow-chart-like tree structureInternal node denotes a test on an attributeBranch represents an outcome of the testLeaf nodes represent class labels or class distributionDecision tree generation consists of two phasesTree constructionAt start, all the training examples are at the rootPartition examples recursively based on selected attributesTree pruningIdentify and remove branches that reflect noise or outliersUse of decision tree:


View Full Document

NYU CSCI-GA 3033 - Data Mining - Classification

Documents in this Course
Design

Design

2 pages

Real Time

Real Time

17 pages

Load more
Download Data Mining - Classification
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining - Classification and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining - Classification 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?