DOC PREVIEW
Berkeley COMPSCI 188 - Perceptrons

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 188: Artificial Intelligence Fall 2007General Naïve BayesExample: Spam FilteringExample: OCRExample: OverfittingGeneralization and OverfittingEstimation: SmoothingSlide 8Estimation: Laplace SmoothingSlide 10Estimation: Linear InterpolationReal NB: SmoothingTuning on Held-Out DataBaselinesConfidences from a ClassifierGenerative vs. DiscriminativeErrors, and What to DoWhat to Do About Errors?FeaturesFeature ExtractorsSome (Vague) BiologyThe Binary PerceptronExample: SpamBinary Decision RuleThe Multiclass PerceptronExampleThe Perceptron Update RuleSlide 28Examples: PerceptronMistake-Driven ClassificationProperties of PerceptronsSlide 32Issues with PerceptronsLinear SeparatorsSupport Vector MachinesSummarySimilarity FunctionsCase-Based ReasoningParametric / Non-parametricNearest-Neighbor ClassificationBasic SimilarityInvariant MetricsRotation Invariant MetricsTangent FamiliesTemplate DeformationCS 188: Artificial IntelligenceFall 2007Lecture 24: Perceptrons11/20/2007Dan Klein – UC BerkeleyGeneral Naïve BayesA general naive Bayes model:We only specify how each feature depends on the classTotal number of parameters is linear in nCE1EnE2|C| parametersn x |E| x |C| parameters|C| x |E|n parametersExample: Spam FilteringModel:Parameters:the : 0.016to : 0.015and : 0.012...free : 0.001click : 0.001...morally : 0.001nicely : 0.001...the : 0.021to : 0.013and : 0.011...free : 0.005click : 0.004...screens : 0.000minute : 0.000...ham : 0.66spam: 0.33Example: OCR1 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.10 0.11 0.012 0.053 0.054 0.305 0.806 0.907 0.058 0.609 0.500 0.801 0.052 0.013 0.904 0.805 0.906 0.907 0.258 0.859 0.600 0.80Example: Overfitting2 wins!!Generalization and OverfittingRelative frequency parameters will overfit the training data!Unlikely that every occurrence of “minute” is 100% spamUnlikely that every occurrence of “seriously” is 100% hamWhat about all the words that don’t occur in the training set?In general, we can’t go around giving unseen events zero probabilityAs an extreme case, imagine using the entire email as the only featureWould get the training data perfect (if deterministic labeling)Wouldn’t generalize at allJust making the bag-of-words assumption gives us some generalization, but isn’t enoughTo generalize better: we need to smooth or regularize the estimatesEstimation: SmoothingProblems with maximum likelihood estimates:If I flip a coin once, and it’s heads, what’s the estimate for P(heads)?What if I flip 10 times with 8 heads?What if I flip 10M times with 8M heads?Basic idea:We have some prior expectation about parameters (here, the probability of heads)Given little evidence, we should skew towards our priorGiven a lot of evidence, we should listen to the dataEstimation: SmoothingRelative frequencies are the maximum likelihood estimatesIn Bayesian statistics, we think of the parameters as just another random variable, with its own distribution????Estimation: Laplace SmoothingLaplace’s estimate:Pretend you saw every outcome once more than you actually didCan derive this as a MAP estimate with Dirichlet priors (see cs281a)H H TEstimation: Laplace SmoothingLaplace’s estimate (extended):Pretend you saw every outcome k extra timesWhat’s Laplace with k = 0?k is the strength of the priorLaplace for conditionals:Smooth each condition independently:H H TEstimation: Linear Interpolation In practice, Laplace often performs poorly for P(X|Y):When |X| is very largeWhen |Y| is very largeAnother option: linear interpolationAlso get P(X) from the dataMake sure the estimate of P(X|Y) isn’t too different from P(X)What if  is 0? 1?For even better ways to estimate parameters, as well as details of the math see cs281a, cs294Real NB: SmoothingFor real classification problems, smoothing is criticalNew odds ratios:helvetica : 11.4seems : 10.8group : 10.2ago : 8.4areas : 8.3...verdana : 28.8Credit : 28.4ORDER : 27.2<FONT> : 26.9money : 26.5...Do these make more sense?Tuning on Held-Out DataNow we’ve got two kinds of unknownsParameters: the probabilities P(Y|X), P(Y)Hyperparameters, like the amount of smoothing to do: k, Where to learn?Learn parameters from training dataMust tune hyperparameters on different dataWhy?For each value of the hyperparameters, train and test on the held-out dataChoose the best value and do a final test on the test dataBaselinesFirst task: get a baselineBaselines are very simple “straw man” proceduresHelp determine how hard the task isHelp know what a “good” accuracy isWeak baseline: most frequent label classifierGives all test instances whatever label was most common in the training setE.g. for spam filtering, might label everything as hamAccuracy might be very high if the problem is skewedFor real research, usually use previous work as a (strong) baselineConfidences from a ClassifierThe confidence of a probabilistic classifier:Posterior over the top labelRepresents how sure the classifier is of the classificationAny probabilistic model will have confidencesNo guarantee confidence is correctCalibrationWeak calibration: higher confidences mean higher accuracyStrong calibration: confidence predicts accuracy rateWhat’s the value of calibration?Generative vs. DiscriminativeGenerative classifiers:E.g. naïve BayesWe build a causal model of the variablesWe then query that model for causes, given evidenceDiscriminative classifiers:E.g. perceptron (next)No causal model, no Bayes rule, often no probabilitiesTry to predict output directlyLoosely: mistake driven rather than model drivenErrors, and What to DoExamples of errorsDear GlobalSCAPE Customer, GlobalSCAPE has partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question we've received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . . .. . . To receive your $30 Amazon.com promotional certificate, click through to http://www.amazon.com/appareland see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message. However, if you'd


View Full Document

Berkeley COMPSCI 188 - Perceptrons

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Perceptrons
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Perceptrons and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Perceptrons 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?