Berkeley COMPSCI 188 - Perceptrons - D2222463

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 188> Perceptrons

DOC PREVIEW

Berkeley COMPSCI 188 - Perceptrons

School name University of California, Berkeley

Course Compsci 188- Introduction to Artificial Intelligence

Pages 45

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 188: Artificial Intelligence Fall 2007General Naïve BayesExample: Spam FilteringExample: OCRExample: OverfittingGeneralization and OverfittingEstimation: SmoothingSlide 8Estimation: Laplace SmoothingSlide 10Estimation: Linear InterpolationReal NB: SmoothingTuning on Held-Out DataBaselinesConfidences from a ClassifierGenerative vs. DiscriminativeErrors, and What to DoWhat to Do About Errors?FeaturesFeature ExtractorsSome (Vague) BiologyThe Binary PerceptronExample: SpamBinary Decision RuleThe Multiclass PerceptronExampleThe Perceptron Update RuleSlide 28Examples: PerceptronMistake-Driven ClassificationProperties of PerceptronsSlide 32Issues with PerceptronsLinear SeparatorsSupport Vector MachinesSummarySimilarity FunctionsCase-Based ReasoningParametric / Non-parametricNearest-Neighbor ClassificationBasic SimilarityInvariant MetricsRotation Invariant MetricsTangent FamiliesTemplate DeformationCS 188: Artificial IntelligenceFall 2007Lecture 24: Perceptrons11/20/2007Dan Klein – UC BerkeleyGeneral Naïve BayesA general naive Bayes model:We only specify how each feature depends on the classTotal number of parameters is linear in nCE1EnE2|C| parametersn x |E| x |C| parameters|C| x |E|n parametersExample: Spam FilteringModel:Parameters:the : 0.016to : 0.015and : 0.012...free : 0.001click : 0.001...morally : 0.001nicely : 0.001...the : 0.021to : 0.013and : 0.011...free : 0.005click : 0.004...screens : 0.000minute : 0.000...ham : 0.66spam: 0.33Example: OCR1 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.10 0.11 0.012 0.053 0.054 0.305 0.806 0.907 0.058 0.609 0.500 0.801 0.052 0.013 0.904 0.805 0.906 0.907 0.258 0.859 0.600 0.80Example: Overfitting2 wins!!Generalization and OverfittingRelative frequency parameters will overfit the training data!Unlikely that every occurrence of “minute” is 100% spamUnlikely that every occurrence of “seriously” is 100% hamWhat about all the words that don’t occur in the training set?In general, we can’t go around giving unseen events zero probabilityAs an extreme case, imagine using the entire email as the only featureWould get the training data perfect (if deterministic labeling)Wouldn’t generalize at allJust making the bag-of-words assumption gives us some generalization, but isn’t enoughTo generalize better: we need to smooth or regularize the estimatesEstimation: SmoothingProblems with maximum likelihood estimates:If I flip a coin once, and it’s heads, what’s the estimate for P(heads)?What if I flip 10 times with 8 heads?What if I flip 10M times with 8M heads?Basic idea:We have some prior expectation about parameters (here, the probability of heads)Given little evidence, we should skew towards our priorGiven a lot of evidence, we should listen to the dataEstimation: SmoothingRelative frequencies are the maximum likelihood estimatesIn Bayesian statistics, we think of the parameters as just another random variable, with its own distribution????Estimation: Laplace SmoothingLaplace’s estimate:Pretend you saw every outcome once more than you actually didCan derive this as a MAP estimate with Dirichlet priors (see cs281a)H H TEstimation: Laplace SmoothingLaplace’s estimate (extended):Pretend you saw every outcome k extra timesWhat’s Laplace with k = 0?k is the strength of the priorLaplace for conditionals:Smooth each condition independently:H H TEstimation: Linear Interpolation In practice, Laplace often performs poorly for P(X|Y):When |X| is very largeWhen |Y| is very largeAnother option: linear interpolationAlso get P(X) from the dataMake sure the estimate of P(X|Y) isn’t too different from P(X)What if  is 0? 1?For even better ways to estimate parameters, as well as details of the math see cs281a, cs294Real NB: SmoothingFor real classification problems, smoothing is criticalNew odds ratios:helvetica : 11.4seems : 10.8group : 10.2ago : 8.4areas : 8.3...verdana : 28.8Credit : 28.4ORDER : 27.2<FONT> : 26.9money : 26.5...Do these make more sense?Tuning on Held-Out DataNow we’ve got two kinds of unknownsParameters: the probabilities P(Y|X), P(Y)Hyperparameters, like the amount of smoothing to do: k, Where to learn?Learn parameters from training dataMust tune hyperparameters on different dataWhy?For each value of the hyperparameters, train and test on the held-out dataChoose the best value and do a final test on the test dataBaselinesFirst task: get a baselineBaselines are very simple “straw man” proceduresHelp determine how hard the task isHelp know what a “good” accuracy isWeak baseline: most frequent label classifierGives all test instances whatever label was most common in the training setE.g. for spam filtering, might label everything as hamAccuracy might be very high if the problem is skewedFor real research, usually use previous work as a (strong) baselineConfidences from a ClassifierThe confidence of a probabilistic classifier:Posterior over the top labelRepresents how sure the classifier is of the classificationAny probabilistic model will have confidencesNo guarantee confidence is correctCalibrationWeak calibration: higher confidences mean higher accuracyStrong calibration: confidence predicts accuracy rateWhat’s the value of calibration?Generative vs. DiscriminativeGenerative classifiers:E.g. naïve BayesWe build a causal model of the variablesWe then query that model for causes, given evidenceDiscriminative classifiers:E.g. perceptron (next)No causal model, no Bayes rule, often no probabilitiesTry to predict output directlyLoosely: mistake driven rather than model drivenErrors, and What to DoExamples of errorsDear GlobalSCAPE Customer, GlobalSCAPE has partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question we've received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . . .. . . To receive your $30 Amazon.com promotional certificate, click through to http://www.amazon.com/appareland see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message. However, if you'd

View Full Document