Lecture outlineEager vs Lazy learnersk-nearest neighbor classifiersk-nearest neighbor classificationCharacteristics of nearest-neighbor classifiersBayes TheoremBayes Theorem for ClassificationBuilding the ClassifierSlide 9Computing posterior probabilitiesNaïve Bayes ClassifierNaïve Bayes ClassifierConditional probabilities for categorical attributesEstimating conditional probabilities for continuous attributes?Naïve Bayes Classifier: ExampleNaïve Bayes Classifier: ExampleCharacteristics of Naïve Bayes ClassifierLecture outline•Classification•Naïve Bayes classifier•Nearest-neighbor classifierEager vs Lazy learners•Eager learners: learn the model as soon as the training data becomes available•Lazy learners: delay model-building until testing data needs to be classified–Rote classifier: memorizes the entire training datak-nearest neighbor classifiersXXX(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor k-nearest neighbors of a record x are data points that have the k smallest distance to xk-nearest neighbor classification•Given a data record x find its k closest points–Closeness: Euclidean, Hamming, Jaccard distance •Determine the class of x based on the classes in the neighbor list–Majority vote–Weigh the vote according to distance• e.g., weight factor, w = 1/d2–Probabilistic votingCharacteristics of nearest-neighbor classifiers•Instance of instance-based learning•No model building (lazy learners)–Lazy learners: computational time in classification–Eager learners: computational time in model building•Decision trees try to find global models, k-NN take into account local information•K-NN classifiers depend a lot on the choice of proximity measureBayes Theorem•X, Y random variables•Joint probability: Pr(X=x,Y=y)•Conditional probability: Pr(Y=y | X=x)•Relationship between joint and conditional probability distributions•Bayes Theorem:)Pr()|Pr()Pr()|Pr(),Pr( XXYYYXYX )Pr()Pr()|Pr()|Pr(XYYXXY Bayes Theorem for Classification•X: attribute set•Y: class variable•Y depends on X in a non-determininstic way•We can capture this dependence using Pr(Y|X) : Posterior probabilityvsPr(Y): Prior probabilityBuilding the Classifier•Training phase:–Learning the posterior probabilities Pr(Y|X) for every combination of X and Y based on training data•Test phase:–For test record X’, compute the class Y’ that maximizes the posterior probabilityPr(Y’|X’)Bayes Classification: ExampleX’=(Home Owner=No, Marital Status=Married, AnnualIncome=120K)Compute: Pr(Yes|X’), Pr(No|X’) pick No or Yes with max Prob. How can we compute these probabilities??Computing posterior probabilities•Bayes Theorem•P(X) is constant and can be ignored•P(Y): estimated from training data; compute the fraction of training records in each class•P(X|Y)?)Pr()Pr()|Pr()|Pr(XYYXXY Naïve Bayes Classifier•Attribute set X = {X1,…,Xd} consists of d attributes•Conditional independence:–X conditionally independent of Y, given X: Pr(X|Y,Z) = Pr(X|Z)–Pr(X,Y|Z) = Pr(X|Z)xPr(Y|Z)diiyYXyYX1)|Pr()|Pr(Naïve Bayes Classifier•Attribute set X = {X1,…,Xd} consists of d attributesdiiyYXyYX1)|Pr()|Pr()Pr()|Pr()Pr()|Pr(1XYXYXYdiiConditional probabilities for categorical attributes•Categorical attribute Xi•Pr(Xi = xi|Y=y): fraction of training instances in class y that take value xi on the i-th attributePr(homeOwner = yes|No) = 3/7Pr(MaritalStatus = Single| Yes) = 2/3Estimating conditional probabilities for continuous attributes?•Discretization?•How can we discretize?Naïve Bayes Classifier: Example•X’ = (HomeOwner = No, MaritalStatus = Married, Income=120K)•Need to compute Pr(Y|X’) or Pr(Y)xPr(X’|Y)•But Pr(X’|Y) is–Y = No:•Pr(HO=No|No)xPr(MS=Married|No)xPr(Inc=120K|No) = 4/7x4/7x0.0072 = 0.0024–Y=Yes:•Pr(HO=No|Yes)xPr(MS=Married|Yes)xPr(Inc=120K|Yes) = 1x0x1.2x10-9 = 0Naïve Bayes Classifier: Example•X’ = (HomeOwner = No, MaritalStatus = Married, Income=120K)•Need to compute Pr(Y|X’) or Pr(Y)xPr(X’|Y)•But Pr(X’|Y = Yes) is 0?•Correction process:mnm pnyYxXcjii )|Pr(nc: number of training examples from class yj that take value xin: total number of instances from class yj m: equivalent sample size (balance between prior and posterior)p: user-specified parameter (prior probability)Characteristics of Naïve Bayes Classifier•Robust to isolated noise points –noise points are averaged out•Handles missing values –Ignoring missing-value examples•Robust to irrelevant attributes–If Xi is irrelevant, P(Xi|Y) becomes almost uniform•Correlated attributes degrade the performance of NB
View Full Document