CMU CS 10601 - review - D2497832

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> review

DOC PREVIEW

CMU CS 10601 - review

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 21

This preview shows page 1-2-20-21 out of 21 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

10#601%Review%Tom$Mitchell$and$Aar1$Singh$$$Machine$Learning$107601$Dec$8,$2011$$TexPoint$fonts$used$in$EMF.$$Read$the$TexPoint$manual$before$you$delete$this$box.:$AAAAAAAAAAAA$1$Machine%Learning%Algorithm%2$Goal:$$$Learn$a$rule$Z$è$f(Z)$that$op1mizes$some$objec1ve$–$$loss(f(Z)).$$Z$can$be$X$or$(X,Y)$$$$$$modeled$as$a$random$variable,$and$we$op1mize$EZ[loss(f(Z))]$$$$$$$$$$$D$=$$$$Why$do$we$need$training$data?$Learning algorithm Training$Data$ Rule${Zi}ni=1bfnModeling%Distribu=ons%3$Parametric:$$Pθ(Z)$$$Gaussian$–$con1nuous$random$variables$$Bernoulli$–$binary/boolean$random$variable$$Binomial$–$sum$of$binary/boolean$random$variables$$Mul1nomial$–$sum$of$k7ary$random$variables$$Beta,$Dirichlet$(conjugate$prior$for$binomial,$mul1nomial),$Poisson,$…$$If$θ$is$a$random$variable,$Pθ(Z)$=$P(Z|θ)$$$$$likelihood$$Bayes$Rule:$$$P(θ|Z)$=$P(Z|θ)$P(θ) $posterior $$$$$$$$$ $ $$$$$$$P(Z)$θ$$µ,$σ$θ$n, θ$n, θ1, θ2, …, θk$Modeling%Distribu=ons%Condi1onal$independence$assump1ons$for$joint$distribu1on s:$$• Markov$Models$ $$$• Hidden$Markov$Models$$$$$• Bayes$Nets/Graphical$models$$$$4$p(X)=nYi=1p(Xi|pa(Xi))p(X)=nYi=1p(Xi|Xi1)p(X, Z)=nYi=1p(Xi|Zi)nYi=1p(Zi|Zi1)Machine%Learning%Problems%5$Broad$categories$7$$• $Unsupervised%learning$$$Density$es1ma1on,$Clustering,$Dimensionality$reduc1on$• %%Supervised%learning$$$Classiﬁca1on,$Regression$$• $Semi#supervised%learning%%• %Ac=ve%learning%• Many$more$…$Unsupervised%&%Supervised%Le arning%6$Unsupervised$Learning$–$Learning$without$a$teacher$Learning algorithm Model%for%word%distribu=on%OR%Clustering%of%similar%documents%Documents%Learning algorithm Supervised$Learning$–$Learning$with$a$teacher$Mapping%between%Documents%and%topics%Documents,%topics%Semi#supervised%&%Ac=ve%Learning%7$Learning algorithm Semi7Supervised$Learning$–$randomly$labeled$examples$Ac1ve$Learning$–$selec,vely$labeled$examples$$Mapping%between%Documents%and%topics%Documents,%topics%Documents%Selective labeling Learning algorithm Documents,%topics%Documents%Unsupervised%Learning%8$Density$es1ma1on:$$$$Parametric$(MLE,$MAP)$$Nonparametric$(Histogram,$Kernel)$$Dimensionality$reduc1on:$$$Feature$Selec1on$$Principal$Component$Analysis$(PCA)$$Laplacian$Eigenmaps$$Clustering:$$$$Gaussian$mixture$models$$k7means$$spectral$$Supervised%Learning%9$Regression:$(Con1nuous$labels,$Mean$Square$Error)$$$Op1mal$es1ma1on$rule$$$$f*(X)$=$E[Y|X] $ $MLE$under$P(Y|X)$=$N(f*(X),$σ2)$$$$$$$$$$$$Linear$Regression $$$$f(X)$=$$X$w,$X$=$[x1,$x2,$…,$xd]$$Polynomial$Regression$$$$$$$$$$$$$$$$$$$$$$$X$=$[x12,$x1x2,x22$,$…]$$Basis$Regression $ $$$$$$$$$$$$$$$$$X$=$[φ1(x),φ2(x),$…,$φd(x)]%$Regularized$versions$(MAP)$$$Neural$Networks $$$f(X)$=$$nonlinear$(combina1on$of$$$$$$mul1ple$logis1c$units)$$$Kernel$(locally7weighted)$7$Weighted$mean$square$error$$Supervised%Learning%10$Classiﬁca1on:$$(Discrete$labels,$Probability$of$error)$$$Bayes$op1mal$classiﬁca1on$rule$$$$$$f*(X)$=$arg$max$P(Y|X)$$$$$$$$$$$$$Y$$$plug7in$MLE,$MAP$of$distribu1on$model$$Naïve$Bayes$$Decision$Trees$$Logis1c$Regression$$k7nearest$neighbor$$SVM$$Boos1ng$$Decision%Trees%K#NN% Gauss%Naïve%Bayes%Logis=c%regression%Neural%Nwks%HMM% Bayes%Net%SVM% Boos=ng%%Gen/Disc$Loss$func1ons$Decision$boundary$Output$Algorithm$$Model$Complexity$Rela1on$to$others$Parametric/Nonparam$Comparison%Chart%for%Classiﬁca=on%(before$Midterm)$(aper$Midterm)$Hidden$Markov$Models$$$$$1me7series/sequen1al$modeling$$$$$representa1on,$parameters$$$$$evaluate$prob$of$output$sequence$$$$$decode$hidden$states$$$$$learning$parameters$$Neural$Networks$$$$$nonlinear$classiﬁer$$$$$layers$of$mul1ple$logis1c$units$$$$$training$–$backpropaga1on$$$$$local$minimum$$Dimensionality$reduc1on$$$$$feature$selec1on$$$$$PCA$–linear,$direc1ons$of$max$$$$$$$$$$$$$$$$$$variance,$SVD$$$$$$Laplacian$Eigenmaps$–$nonlinear$Clustering$$$$$k7means$–$isotropic,$convex$$$$$spectral$7$connec1vity$based$$Nonparametric$methods$$$$$histogram,$kernel$density$est$$$$$kernel$regression$$$$$k7NN$classiﬁer$$Support$Vector$Machines$$$$$hard7margin,$sop7margin$$$$$support$vectors$$$$$dual$formula1on,$kernel$trick$$Boos1ng$$$$$weak$base$classiﬁers$trained$on$re7$$$$$$$weighted$data$$$$$Adaboost$algorithm,$exp$loss$Four Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? Local minima? – gradient descent/ascent as general fallback approachFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the statistical estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) dataFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) data 3. Error arises from three sources – unavoidable error, bias, variance – PAC learning theory: probabilistic bound on overfitting: errortrue - errortraingiven some estimator Y for some parameter θ, we note Y is a random variable (why?) the bias of estimator Y : the variance of estimator Y : consider when • θ is the probability of “heads” for my coin • Y = proportion of heads observed from 3 flips consider when • θ is the vector of correct parameters for learner • Y = parameters output by learning algorithm Bias and Variance of EstimatorsFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) data 3. Error arises from three sources – unavoidable error, bias, variance – PAC learning theory: probabilistic bound on overfitting:

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-20-21 out of 21 pages.

CMU CS 10601 - review

Sign up for free to view:

Please select your school