SBU CSE 634 - Classification Lecture Notes (2)

Unformatted text preview:

Classification Lecture Notes (2) (cse634) Review, Training, Testing, Predictive Accuracy Classification DataClassification (Training ) Data with objects CHARACTERISTIC DESCRIPTIONS (Review)Characteristic Formula (Review)Characteristic Rule (Review)Characteristic Rule (Review)Characteristic Rule (Review)Discrimination (Review)Discriminant Formula Discriminant Rule (Definition)Discriminant RuleCharacteristic and discriminant rules Descriptive DM Classification Goal (1)Descriptive DM Classification Goal (2)Supervised Learning ClassificationA FULL SET OF DISCRIMINANT RULES for our Training Dataset (Obtained by the Decision Tree Algorithm)Rules testingExample of a TEST Data for our TRAINING setTest dataset and Predictive AccuracyCorrectly and Not Correctly Classified RecordsExercise 1Exercise 2TEST DATA for Example 2Predictive AccuracyClassification and ClassifiersClassification and ClassifiersTraining: a Classifier Construction (book slide)Testing and Prediction (use of a trained classifier) (book slide)Classifiers Predictive Accuracy Predictive Accuracy EvaluationClassification Lecture Notes (2) (cse634) Review, Training, Testing, Predictive Accuracy Professor Anita WasilewskaClassification Data• Data format: a data table with key attribute removed. Special attribute- class attribute must be distinguishedage income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no30…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent noClassification (Training ) Data with objects rec Age Income Student Credit_rating Buys_computer (CLASS)r1 <=30 High No Fair Nor2 <=30 High No Excellent Nor3 31…40 High No Fair Yesr4 >40 Medium No Fair Yesr5 >40 Low Yes Fair Yesr6 >40 Low Yes Excellent Nor7 31…40 Low Yes Excellent Yesr8 <=30 Medium No Fair Nor9 <=30 Low Yes Fair Yesr10 >40 Medium Yes Fair Yesr11 <-=30 Medium Yes Excellent Yesr12 31…40 Medium No Excellent Yesr13 31…40 High Yes Fair Yesr14 >40 Medium No Excellent NoCHARACTERISTIC DESCRIPTIONS (Review)Example: • Some of the characteristic descriptions of the concept C with description: buys_computer= noare• Age=<= 30 & income=high & student=no & credit_rating=fair• Age=>40& income=medium & student=no & credit_rating=excellent• Age=>40& income=medium• Age=<= 30• student=no & credit_rating=excellentCharacteristic FormulaCharacteristic Formula (Review)(Review)Any formula (of a proper language) of a formIF concept desciption THEN characteristicsis called a characteristic formulaExample:• IF buys_computer= no THEN income = low & student=yes & credit=excellent • IF buys_computer= no THEN income = low & credit=fairCharacteristic Rule (Review)• A characteristic formulaIF concept desciption THEN characteristicsis called a characteristic rule (for a given database)if and only if it is TRUE in the given database, i.e.{r: concept description} /\{r: characteristics} = not empty setCharacteristic Rule (Review)EXAMPLE:The formula• IF buys_computer= no THEN income = low & student=yes & credit=excellent is a characteristic rule for our database because{r: buys_computer= no } = {r1,r2, r6, r8, r16 },{r: income = low & student=yes & credit=excellent } = {r6,r7} and{r1,r2, r6, r8, r16 } /\ {r6,r7} = not emptysetCharacteristic Rule (Review)EXAMPLE:The formula• IF buys_computer= no THEN income = low & credit=fairIs NOT a characteristic rule for our database because{r: buys_computer= no } = {r1,r2, r6, r8, r16 },{r: income = low & credit=fair} = {r5, r9 } and{r1,r2, r6, r8, r16 } /\ {r5,r9} = emptysetDiscrimination (Review)• Discrimination is the process which aim is to find rules that allow us to discriminate the objects (records) belonging to a given concept (one class ) from the rest of records ( classes)• DISCRIMINANT RULES have a form:If characteristics then conceptExample• If Age=<= 30 & income=high & student=no & credit_rating=fair then buys_computer= noDiscriminant FormulaA discriminant formula is any formulaIf characteristics then concept• Example:• IF Age=>40 & inc=low THEN buys_comp= noDiscriminant Rule (Definition)• A discriminant formula If characteristics then conceptis a DISCRIMINANT RULE (in a given database)iff{r: Characteristic} ⊐{r: concept}Discriminant Rule• Example:• A discrimant formulaIF Age=>40 & inc=low THEN buys_comp= noIS NOT a discriminant rule in our data base because{o: Age=>40 & inc=low} = {o5, o6} is not a subset of the set {o:buys_comp= no}= {o1, o2, o6, o8, o14}Characteristic and discriminant rules• The inverse implication to the characteristic rule is usually NOT a discriminant rule• Example : the inverse implication to our characteristic rule: If buys_computer= no then income = low & student=yes & credit=excellent is:• If income = low & student=yes & credit=excellent then buys_computer= no• The above rule is NOT a discriminant rule as it can’t discriminate between concept with description buys_computer= no and buys_computer= yes• (see records r6 and r8 in our training dataset)Descriptive DM Classification Goal (1)• Given a data set and a concept (class) c defined in this dataset FIND a minimal set (or as small as possible set) characteristic, and/or discriminant rules, or other descriptions for the concept c, or class, or classes.Descriptive DM Classification Goal (2)• We also want these rules to involve as few attributes as it is possible, i.e. we want the rules to have as short as possible length of descriptions.• It the supervised LEARNING process the rules must be also TESTED on an independent test data.Supervised Learning Classification• The process of creating discriminant and/or characteristic rules and TESTING them• is called a learning process, and when it is finished we say that the concept (class) has been learned (and tested) from examples (records in the datasetthat form the the TRAINING set).• It is called a supervised learning because we know the concept description for all examples in the training and test set.A FULL SET OF DISCRIMINANT RULES for our Training Dataset (Obtained by the Decision Tree Algorithm)• The rules are:IF age = “<=30” AND student = “no” THEN buys_computer = “no”IF age = “<=30” AND


View Full Document
Download Classification Lecture Notes (2)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Classification Lecture Notes (2) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Classification Lecture Notes (2) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?