DOC PREVIEW
UCLA STAT 231 - Lecture12-Percept

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1. Stat 231. A.L. Yuille. Fall 2004.2. Linear Separation3. Linear Separation4. Perceptron Rule5. Perceptron Convergence6. Perceptron Convergence7. Perceptron Capacity8. Generalization and Capacity.9. Perceptron Capacity10. Capacity and Generalization11. Multi-Layer Perceptrons12. Multilayer Perceptrons13. Multilayer PerceptronsSummaryLecture notes for Stat 231: Pattern Recognition and Machine Learning1. Stat 231. A.L. Yuille. Fall 2004.Perceptron Rule and Convergence ProofCapacity of Perceptrons.Multi-layer Perceptrons.Read 5.4,5.5 9.6.8 Duda, Hart, Stork.Lecture notes for Stat 231: Pattern Recognition and Machine Learning2. Linear SeparationN samples where the Can we find a hyperplane in feature space through the origin, that separates the two types of samplesLecture notes for Stat 231: Pattern Recognition and Machine Learning3. Linear SeparationFor the two-class case, simplify by replacing all samples with Then find a plane such that The weight vector is almost never unique. Determine the weight vector that has the biggest margin m(>0), where (Next lecture). Discriminative: no attempt to model probability distributions. Recall that the decision boundary is a hyperplane if the distributions are Gaussian with identical covariance.Lecture notes for Stat 231: Pattern Recognition and Machine Learning4. Perceptron RuleAssume there is a hyperplane separating the two classes. How can we find it?Single Sample Perceptron Rule.Order samplesSet loop over j, if is misclassified, set repeat until all samples are classified correctly.Lecture notes for Stat 231: Pattern Recognition and Machine Learning5. Perceptron ConvergenceNovikov’s Theorem: the single sample Perceptron rule will converge to a solution weight, if one exists.Proof. Suppose is a separating weight. Then decreases by at least for each misclassified sample. Initialize weight at 0. Then number of weight changes is less thanLecture notes for Stat 231: Pattern Recognition and Machine Learning6. Perceptron ConvergenceProof of claim. IfUsingLecture notes for Stat 231: Pattern Recognition and Machine Learning7. Perceptron CapacityThe Perceptron was very influencial and unrealistic claims were made about its abilities (1950’s, early 1960’s).The model is an idealized model of neurons.An entire book was published in the mid 1960’s describing the limited capacity of Perceptrons (Minsky and Papert). Some classifications, exclusive or, can’t be performed by linear separation.But, from Learning Theory, limited capacity is good.Lecture notes for Stat 231: Pattern Recognition and Machine Learning8. Generalization and Capacity.The Perceptron is useful precisely because it has finite capacity and so cannot represent all classifications. The amount of training data required to ensure Generalization will need to be larger than the capacity. Infinite capacity requires infinite data. Full definition of Perceptron capacity must wait till we introduce Vapnik Chevonenkis (VC) dimension.But the following result (Cover) gives the basic idea..Lecture notes for Stat 231: Pattern Recognition and Machine Learning9. Perceptron CapacitySuppose we have n sample points in a d dimensional feature space. Assume that these points are in general position – no subset of (d+1) points lies in a (d-1) dimensional subspaceLet f(n,d) be the fraction of the 2^n dichotomies of the n points which can be expressed by linear separation.It can be shown (D.H.S) that f(n,d) =1, for otherwise There is a critical value 2(d+1). f(n,d)=1 for n << 2(d+1), f(n,d) =0 for n >> 2(d+1), transition rapid for large d.Lecture notes for Stat 231: Pattern Recognition and Machine Learning10. Capacity and GeneralizationPerceptron capacity is d+1. The probability of finding a separating hyperplane by chance alignment of the samples decreases rapidly for n > 2(d+1).Lecture notes for Stat 231: Pattern Recognition and Machine Learning11. Multi-Layer PerceptronsMultilayer Perceptrons were introduced in the 1980’s to increase capacity. Motivated by biological arguments (dubious).Key Idea: replace the binary decision rule by a Sigmoid function: (Step function as T tends to 0).Input units activityHidden unitsOutput units Weights connecting the Input units to the hidden units, and the hidden units to the output units.Lecture notes for Stat 231: Pattern Recognition and Machine Learning12. Multilayer PerceptronsMultilayer perceptrons can represent any function provided there are a sufficient number of hidden units. But the number of hidden units may be enormous.Also the ability to represent any function may be bad, because of generalization/memorization. Difficult to analyze multilayer perceptrons. They are like “black boxes”. When they are successful, there is often a simpler, more transparent alternativeThe Neuronal plausibility for multilayer perceptrons is unclear.Lecture notes for Stat 231: Pattern Recognition and Machine Learning13. Multilayer PerceptronsTrain the multilayer perceptron using training dataDefine error function for each sample Minimize the error function for each sample by steepest descent:Backpropagation algorithm (propagation of errors).Lecture notes for Stat 231: Pattern Recognition and Machine LearningSummaryPerceptron and Linear Separability.Perceptron rule and convergence proof.Capacity of Perceptrons.Multi-layer Perceptrons.Next Lecture – Support Vector Machines for Linear


View Full Document

UCLA STAT 231 - Lecture12-Percept

Download Lecture12-Percept
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture12-Percept and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture12-Percept 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?