GSU CSC 2010 - Chapter 6 (28 pages)

Previewing pages 1, 2, 3, 26, 27, 28 of 28 page document View the full content.
View Full Document

Chapter 6



Previewing pages 1, 2, 3, 26, 27, 28 of actual document.

View the full content.
View Full Document
View Full Document

Chapter 6

107 views

Lecture Notes


Pages:
28
School:
Georgia State University
Course:
Csc 2010 - Principles of Computer Science

Unformatted text preview:

Chapter 6 Classification and Prediction Dr Bernard Chen Ph D University of Central Arkansas Fall 2010 Classification and Prediction Classification and Prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends For example Bank loan applicants are safe or risky Guess a customer will buy a new computer Analysis cancer data to predict which one of three specific treatments should apply Classification Classification is a Two Step Process Learning step classifies data constructs a model based on the training set and the values class labels in a classifying attribute and uses it in classifying new data Prediction step predicts categorical class labels discrete or nominal Learning step Model Construction Training Data NAME Mike Mary Bill Jim Dave Anne RANK YEARS TENURED Assistant Prof 3 no Assistant Prof 7 yes Professor 2 yes Associate Prof 7 yes Assistant Prof 6 no Associate Prof 3 no Classification Algorithms Classifier Model IF rank professor OR years 6 THEN tenured yes Learning step Model construction describing a set of predetermined classes Each tuple sample is assumed to belong to a predefined class as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules decision trees or mathematical formulae Prediction step Using the Model in Prediction Classifier Testing Data Unseen Data Jeff Professor 4 NAME Tom Merlisa George Joseph RANK YEARS TENURED Assistant Prof 2 no Associate Prof 7 no Professor 5 yes Assistant Prof 7 yes Tenured Prediction step Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set otherwise over fitting will occur N fold Cross validation In order to solve over fitting problem n fold



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Chapter 6 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 6 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?