CORNELL CS 4700 - Study Notes - D102708

Home> Schools> Cornell University> Computer Science (CS) > CS 4700> Study Notes

DOC PREVIEW

CORNELL CS 4700 - Study Notes

School name Cornell University

Course Cs 4700- Foundations of Artificial Intelligence

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Perceptrons and Optimal HyperplanesExample: Majority-Vote Function • Definition: Majority-Vote Function fmajority – N binary attributes, i.e. x  {0,1}N – If more than N/2 attributes in x are true, then fmajority(x)=1, else fmajority(x)=-1. • How can we represent this function as a decision tree? – Huge and awkward tree! • Is there an “easier” representation of fmajority?Example: Spam Filtering • Instance Space X: – Feature vector of word occurrences => binary features – N features (N typically > 50000) • Target Concept c: – Spam (+1) / Ham (-1) • Type of function to learn: – Set of Spam words S, Set of Ham words H – Classify as Spam (+1), if more Spam words than Ham words in example. viagra learning the dating lottery spam?Example: Spam Filtering • Use weight vector w=(+1, -1, 0, +1, +1) – Compute sign(wx) • More generally, we can use real valued weights to express “spamminess” of word. • w=(+10,-1,-0.3,+1,+5) • Which vector is most likely to be spam with this weighting? A=x1, B=x2, C=x3 viagra learning the dating lottery spam?Linear Classification Rules • Hypotheses of the form – unbiased: – biased: – Parameter vector w, scalar b • Hypothesis space H – – • Notation – – –Geometry of Hyperplane Classifiers • Linear Classifiers divide instance space as hyperplane • One side positive, other side negativeHomogeneous Coordinates X = (x1, x2) W = (w1, w2, b) X = (x1, x2, 1) W = (w1, w2, w3)1 0(Batch) Perceptron Algorithm Training EpochExample: Perceptron Training Data: Updates to weight vector: 3•Init: w=0, =1 •(w0 x1) = 0  incorrect w1 = w0 +  y1 x1 = 0+ 1*1*(1,2) = (1,2)  hw1x1 = (w0+1*1*x1) * x1 = hw0(x1)+ 1 * 1 * (x1*x1) = 0 + 5 •(w1x2) = (1,2) (3,1) = 5  correct •(w1  x3) = (1,2) (-1,-1) = -3  correct •(w1  x4) = (1,2) (-1,1) = 1  incorrect •w2 = (1,2) +  y4 x4 = (1,2) - (-1,1) = (2,1)  hw2 x4 = (w1+1*-1*x4) * x4 = hw1(x4) + 1 * -1 * (x4 * x4) = -1Example: Reuters Text Classification “optimal hyperplane”Optimal Hyperplanes Assumption: Training examples are linearly separable.Hard-Margin Separation Goal: Find hyperplane with the largest distance to the closest training examples. Support Vectors: Examples with minimal distance (i.e. margin). Optimization Problem (Primal): d d dWhy min ½w·w? • Maximizing δ and constraining w is equivalent to constraining δ and minimizing w – We want maximum margin δ, • we don’t care about w • But because δ=wx, just requiring maximum δ will yield large w… – So we ask for maximum δ but constrain w • This is equivalent to constraining δ and minimizing wNon-Separable Training Data Limitations of hard-margin formulation – For some training data, there is no separating hyperplane. – Complete separation (i.e. zero training error) can lead to suboptimal prediction error.SlackSoft-Margin Separation Idea: Maximize margin and minimize training error. Soft-Margin OP (Primal): Hard-Margin OP (Primal): • Slack variable ξi measures by how much (xi,yi) fails to achieve margin δ • Σξi is upper bound on number of training errors • C is a parameter that controls trade-off between margin and training error.Soft-Margin OP (Primal): A B Which of these two classifiers was produced using a larger value of C?Controlling Soft-Margin Separation •Σξi is upper bound on number of training errors •C is a parameter that controls trade-off between margin and training error. Soft-Margin OP (Primal):Example Reuters “acq”: Varying CExample: Margin in High-Dimension x1 x2 x3 x4 x5 x6 x7 y Example 1 1 0 0 1 0 0 0 1 Example 2 1 0 0 0 1 0 0 1 Example 3 0 1 0 0 0 1 0 -1 Example 4 0 1 0 0 0 0 1 -1 w1 w2 w3 w4 w5 w6 w7 b Hyperplane 1 1 1 0 0 0 0 0 2 Hyperplane 2 0 0 0 1 1 -1 -1 0 Hyperplane 3 1 -1 1 0 0 0 0 0 Hyperplane 4 1 -1 0 0 0 0 0 0 Hyperplane 5 0.95 -0.95 0 0.05 0.05 -0.05 -0.05

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

CORNELL CS 4700 - Study Notes

Sign up for free to view:

Please select your school