DOC PREVIEW
CMU CS 15381 - 042607clustering

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Artificial Intelligence: Representation and Problem Solving15-381April 26, 2007Clustering(including k-nearest neighbor classification, k-means clustering, cross-validation, and EM, with a brief foray into dimensionality reduction with PCA)Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringA different approach to classification2•Nearby points are likely to be members of the same class.•What if we used the points themselves to classify?classify x in Ck if x is “similar” to a point we already know is in Ck.•Eg: unclassified point x is more similar Class 2 than Class 1.•Issue: How to define “similar” ?Simplest is Euclidean distance:•Could define other metrics depending on application, e.g. text documents, images, etc.1 2 3 4 5 6 700.511.522.5x1x2xClass 1Class 2Class 3Potential advantages:• don’t need an explicit model• the more examples the better• might handle more complex classes• easy to implement• “no brain on part of the designer”Nearest neighbor classification on the iris datasetd(x, y) =!i(xi− yi)2Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringA complex, non-parametric decision boundary•How do we control the complexity of this model?-difficult•How many parameters?-every data point is a parameter!•This is an example of a non-parametric model, ie where the model is defined by the data. (Also, called, instance based)•Can get very complex decision boundaries3example from Martial Herbert Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringExample: Handwritten digits•Use Euclidean distance to see which known digit is closest to each class.•But not all neighbors are the same:•“k-nearest neighbors”:look at k-nearest neighbors and choose most frequent.•Cautions: can get expensive to find neighbors4from LeCun etal, 1998digit data available at:http://yann.lecun.com/exdb/mnist/Error Bounds for NN 8• Amazing fact: asymptotically, err(1-NN) < 2 err(Bayes):eB≤ e1NN≤ 2eB−MM − 1e2Bthis is a tight upper bound, achieved in the “zero-information” casewhen the classes have identical densities.• For K-NN there are also bounds. e.g. for two classes and odd K:eB≤ eKNN≤(K−1)/2!i=0"ki#$ei+1B(1 − eB)k−i+ ek−iB(1 − eB)i+1%• For more on these bounds, see the book A Probabilistic Theory ofPattern Recognition, by L. Devroye, L. Gyorfi & G. Lugosi (1996).Example: USPS Digits 9• Take 16x16 grayscale images (8bit) of handwritten digits.• Use Euclidean distance in raw pixel space (dumb!) and 7-nn.• Classification error (leave-one-out): 4.85%.Example 7 Nearest NeighboursNonparametric (Instance-Based) Models 10• Q: W hat are the parameters in K-NN? What is the complexity?A: the scalar K and the entire training set.Models which need the entire training set at test time but(hopefully) have very few other parameters are known asnonparametric, instance-based or case based.• What if we want a classifier that uses only a small number ofparameters at test time? (e.g. for speed or memory reasons)Idea 1: single linear boundary, of arbitrary orientationIdea 2: many boundaries, but axis-parallel & tree structured1 2 3 4 5 6 7 8 9 1012345678910x1x2x1x2t1 t2t3t4t5ABCDEFLinear Classification for Binary Output 11• Goal: find the line (or hyperplane) which best separates two classes:c(x) = sign[x#w&'()weight− w0&'()threshold]• w is a vector perpendicular to decision boundary• This is the opposite of non-parametric: only d + 1 parameters!• Typically we augment x with a constant term ±1 (“bias unit”) andthen absorb w0into w, so we don’t have to treat it specially.1 2 3 4 5 6 7 8 9 1012345678910x1x2example nearest neighborsexample from Sam Roweis Digits are just represented as a vector.Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringThe problem of using templates (ie Euclidean distance)•Which of these is more like the example? A or B?•Euclidean distance only cares about how many pixels overlap.•Could try to define a distance metric that is insensitive to small deviations in position, scale, rotation, etc.•Digit example: -60,000 training images, -10,000 test images-no “preprocessing”5example A Bfrom Simard etal, 1998Classifiererror rate on test data (%)linear12.0k=3 nearest neighbor(Euclidean distance)5.02-layer neural network (300 hidden units)4.7nearest neighbor(Euclidean distance)3.1k-nearest neighbor (improved distance metric)1.1convolutional neural net0.95best (the conv. net with elastic distortions)0.4humans0.2 - 2.5performance results of various classifiers(from http://yann.lecun.com/exdb/mnist/)ClusteringMichael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringClustering: Classification without labels:7•In many situations we don’t have labeled training data, only unlabeled data.•Eg, in the iris data set, what if we were just starting and didn’t know any classes?1 2 3 4 5 6 700.511.522.5petal length (cm)petal width (cm)Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringTypes of learning8world(or data)model{θ1, . . . , θn}desired output{y1, . . . , yn}supervisedworld(or data)model{θ1, . . . , θn}unsupervisedworld(or data)model{θ1, . . . , θn}model outputreinforcementreinforcementnext weekMichael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringA real example: clustering electrical signals from neuronsAn application of PCA: Spike sortingoscilloscopesoftwareanalysiselectrodefiltersamplifierA/DPrincipal Component Analysis, Apr 23, 2001 / Michael S. Lewicki, CMU!!!!? 59An extracellular waveform with many different spikes0 5 10 15 20 25msecHow do we s ort the different spikes?Principal Component Analysis, Apr 23, 2001 / Michael S. Lewicki, CMU!!!!? 6Basic problem: only information is signal. The true classes are always unknown.Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringAn extracellular waveform with many different neural “spikes”An extracellular waveform with many different spikes0 5 10 15 20 25msecHow do we sort the different spikes?Principal Component Analysis, Apr 23, 2001 / Michael S. Lewicki, CMU!!!!? 610Michael S. Lewicki ! Carnegie MellonArtificial Intelligence: ClusteringSorting with level detection11Sorting with level detection!0.5 0 0.5 1 1.5msecPrincipal Component Analysis, Apr 23, 2001 / Michael S. Lewicki, CMU!!!!? 7Sorting with level detection!0.5 0 0.5 1 1.5msec!0.5 0 0.5 1 1.5msecLevel detection doesn’t


View Full Document

CMU CS 15381 - 042607clustering

Documents in this Course
Planning

Planning

19 pages

Planning

Planning

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

27 pages

Lecture

Lecture

19 pages

FOL

FOL

41 pages

lecture

lecture

34 pages

Exam

Exam

7 pages

Lecture

Lecture

22 pages

Handout

Handout

11 pages

Midterm

Midterm

14 pages

lecture

lecture

83 pages

Handouts

Handouts

38 pages

mdp

mdp

37 pages

HW2

HW2

7 pages

nn

nn

25 pages

lecture

lecture

13 pages

Handout

Handout

5 pages

Lecture

Lecture

27 pages

Lecture

Lecture

62 pages

Lecture

Lecture

5 pages

Load more
Download 042607clustering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 042607clustering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 042607clustering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?