DOC PREVIEW
CMU CS 10601 - review

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

10#601%Review%Tom$Mitchell$and$Aar1$Singh$$$Machine$Learning$107601$Dec$8,$2011$$TexPoint$fonts$used$in$EMF.$$Read$the$TexPoint$manual$before$you$delete$this$box.:$AAAAAAAAAAAA$1$Machine%Learning%Algorithm%2$Goal:$$$Learn$a$rule$Z$è$f(Z)$that$op1mizes$some$objec1ve$–$$loss(f(Z)).$$Z$can$be$X$or$(X,Y)$$$$$$modeled$as$a$random$variable,$and$we$op1mize$EZ[loss(f(Z))]$$$$$$$$$$$D$=$$$$Why$do$we$need$training$data?$Learning algorithm Training$Data$ Rule${Zi}ni=1bfnModeling%Distribu=ons%3$Parametric:$$Pθ(Z)$$$Gaussian$–$con1nuous$random$variables$$Bernoulli$–$binary/boolean$random$variable$$Binomial$–$sum$of$binary/boolean$random$variables$$Mul1nomial$–$sum$of$k7ary$random$variables$$Beta,$Dirichlet$(conjugate$prior$for$binomial,$mul1nomial),$Poisson,$…$$If$θ$is$a$random$variable,$Pθ(Z)$=$P(Z|θ)$$$$$likelihood$$Bayes$Rule:$$$P(θ|Z)$=$P(Z|θ)$P(θ) $posterior $$$$$$$$$ $ $$$$$$$P(Z)$θ$$µ,$σ$θ$n, θ$n, θ1, θ2, …, θk$Modeling%Distribu=ons%Condi1onal$independence$assump1ons$for$joint$distribu1on s:$$• Markov$Models$ $$$• Hidden$Markov$Models$$$$$• Bayes$Nets/Graphical$models$$$$4$p(X)=nYi=1p(Xi|pa(Xi))p(X)=nYi=1p(Xi|Xi1)p(X, Z)=nYi=1p(Xi|Zi)nYi=1p(Zi|Zi1)Machine%Learning%Problems%5$Broad$categories$7$$• $Unsupervised%learning$$$Density$es1ma1on,$Clustering,$Dimensionality$reduc1on$• %%Supervised%learning$$$Classifica1on,$Regression$$• $Semi#supervised%learning%%• %Ac=ve%learning%• Many$more$…$Unsupervised%&%Supervised%Le arning%6$Unsupervised$Learning$–$Learning$without$a$teacher$Learning algorithm Model%for%word%distribu=on%OR%Clustering%of%similar%documents%Documents%Learning algorithm Supervised$Learning$–$Learning$with$a$teacher$Mapping%between%Documents%and%topics%Documents,%topics%Semi#supervised%&%Ac=ve%Learning%7$Learning algorithm Semi7Supervised$Learning$–$randomly$labeled$examples$Ac1ve$Learning$–$selec,vely$labeled$examples$$Mapping%between%Documents%and%topics%Documents,%topics%Documents%Selective labeling Learning algorithm Documents,%topics%Documents%Unsupervised%Learning%8$Density$es1ma1on:$$$$Parametric$(MLE,$MAP)$$Nonparametric$(Histogram,$Kernel)$$Dimensionality$reduc1on:$$$Feature$Selec1on$$Principal$Component$Analysis$(PCA)$$Laplacian$Eigenmaps$$Clustering:$$$$Gaussian$mixture$models$$k7means$$spectral$$Supervised%Learning%9$Regression:$(Con1nuous$labels,$Mean$Square$Error)$$$Op1mal$es1ma1on$rule$$$$f*(X)$=$E[Y|X] $ $MLE$under$P(Y|X)$=$N(f*(X),$σ2)$$$$$$$$$$$$Linear$Regression $$$$f(X)$=$$X$w,$X$=$[x1,$x2,$…,$xd]$$Polynomial$Regression$$$$$$$$$$$$$$$$$$$$$$$X$=$[x12,$x1x2,x22$,$…]$$Basis$Regression $ $$$$$$$$$$$$$$$$$X$=$[φ1(x),φ2(x),$…,$φd(x)]%$Regularized$versions$(MAP)$$$Neural$Networks $$$f(X)$=$$nonlinear$(combina1on$of$$$$$$mul1ple$logis1c$units)$$$Kernel$(locally7weighted)$7$Weighted$mean$square$error$$Supervised%Learning%10$Classifica1on:$$(Discrete$labels,$Probability$of$error)$$$Bayes$op1mal$classifica1on$rule$$$$$$f*(X)$=$arg$max$P(Y|X)$$$$$$$$$$$$$Y$$$plug7in$MLE,$MAP$of$distribu1on$model$$Naïve$Bayes$$Decision$Trees$$Logis1c$Regression$$k7nearest$neighbor$$SVM$$Boos1ng$$Decision%Trees%K#NN% Gauss%Naïve%Bayes%Logis=c%regression%Neural%Nwks%HMM% Bayes%Net%SVM% Boos=ng%%Gen/Disc$Loss$func1ons$Decision$boundary$Output$Algorithm$$Model$Complexity$Rela1on$to$others$Parametric/Nonparam$Comparison%Chart%for%Classifica=on%(before$Midterm)$(aper$Midterm)$Hidden$Markov$Models$$$$$1me7series/sequen1al$modeling$$$$$representa1on,$parameters$$$$$evaluate$prob$of$output$sequence$$$$$decode$hidden$states$$$$$learning$parameters$$Neural$Networks$$$$$nonlinear$classifier$$$$$layers$of$mul1ple$logis1c$units$$$$$training$–$backpropaga1on$$$$$local$minimum$$Dimensionality$reduc1on$$$$$feature$selec1on$$$$$PCA$–linear,$direc1ons$of$max$$$$$$$$$$$$$$$$$$variance,$SVD$$$$$$Laplacian$Eigenmaps$–$nonlinear$Clustering$$$$$k7means$–$isotropic,$convex$$$$$spectral$7$connec1vity$based$$Nonparametric$methods$$$$$histogram,$kernel$density$est$$$$$kernel$regression$$$$$k7NN$classifier$$Support$Vector$Machines$$$$$hard7margin,$sop7margin$$$$$support$vectors$$$$$dual$formula1on,$kernel$trick$$Boos1ng$$$$$weak$base$classifiers$trained$on$re7$$$$$$$weighted$data$$$$$Adaboost$algorithm,$exp$loss$Four Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? Local minima? – gradient descent/ascent as general fallback approachFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the statistical estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) dataFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) data 3. Error arises from three sources – unavoidable error, bias, variance – PAC learning theory: probabilistic bound on overfitting: errortrue - errortraingiven some estimator Y for some parameter θ, we note Y is a random variable (why?) the bias of estimator Y : the variance of estimator Y : consider when • θ is the probability of “heads” for my coin • Y = proportion of heads observed from 3 flips consider when • θ is the vector of correct parameters for learner • Y = parameters output by learning algorithm Bias and Variance of EstimatorsFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) data 3. Error arises from three sources – unavoidable error, bias, variance – PAC learning theory: probabilistic bound on overfitting:


View Full Document

CMU CS 10601 - review

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?