STEVENS CS 559 - CS 559 Midterm Recap - D2640843

Home> Schools> Stevens Institute of Technology> (CS) > CS 559> CS 559 Midterm Recap

DOC PREVIEW

STEVENS CS 559 - CS 559 Midterm Recap

School name Stevens Institute of Technology

Course Cs 559- Machine Learning Syllabus

Pages 82

This preview shows page 1-2-3-4-5-39-40-41-42-43-44-78-79-80-81-82 out of 82 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 82 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications(Midterm Recap)Instructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215MidtermMidterm•October 21(first part of the class)October 21 (first part of the class)– Open books/notesNo graphingcalculators–No graphing calculatorsPattern Classification, Chapter 2 2OutlineOutline• Probability theory• Bayes decision theory• Maximum-likelihood and Bayesian yparameter estimation•Expectation maximizationExpectation maximization• Non-parametric techniquesHiddMk dl•Hidden Markov models • Principal component analysis3Pairs of Discrete Random VariablesPairs of Discrete Random Variables• Let xand ybe two discrete r.v. • For each possible pair of values, we can define a joint probability pij=Pr[x=xi, y=yj]jp ypij[i,yyj]• We can also define a joint probability mass functionP(xy) which offers a completefunctionP(x,y) which offers a complete characterization of the pair of r.v.PP)()(Marginal distributionsyYyxyxPyPyxPxP),()(),()(4Note that Pxand Pyare different functionsXxyyy),()(Conditional ProbabilityConditional Probability•When tworvare not independentWhen two r.v. are not independent, knowing one allows better estimate of the other (e g outside temperature season)other (e.g. outside temperature, season)]yPr[y]yy , xPr[x]yy |xPr[xjiji• If independent P(x|y)=P(x)]yPr[yj5Law of Total ProbabilityLaw of Total Probability•If an event A can occur inmdifferent waysIf an event A can occur in mdifferent ways and if these mdifferent ways are mutually exclusive then the probability of Aexclusive, then the probability of A occurring is the sum of the probabilities of the sub-eventsthe subeventsjjiiyYPyYxXPxXP )()|()(jjj6BayesRuleBayesRule)()()|()(),()|( PxPxyPPyxPyxPprior*likelihoodposterior),()(XxyxPyP•xis the unknown causeevidenceposterior• y is the observed evidence• Denominator often omitted (maximum a posteriori li )solution)• Bayes rule shows how probability of xchanges after we have observedyafter we have observed y7Normal (Gaussian) DistributionNormal (Gaussian) Distribution•Central Limit Theorem: under variousCentral Limit Theorem: under various conditions, the distribution of the sum of dindependent random variables approachesindependent random variables approaches a limiting form known as the normal distribution1)(2xdistribution),(21)(22)(2Nexpx8Normal (Gaussian) DistributionNormal (Gaussian) Distribution9OutlineOutline• Probability theory• Bayes decision theory• Maximum-likelihood and Bayesian yparameter estimation•ExpectationmaximizationExpectation maximization• Non-parametric techniquesHiddMk dl•Hidden Markov models • Principal component analysis10Bayes Decision TheoryBayes Decision Theory•Probability distributions for the categoriesProbability distributions for the categories are known•Do notneedtraining data•Do not need training data• Can design optimal classifier• Very rare in real lifeyPattern Classification, Chapter 2 11Decision RulesDecision Rules• Decision rule with only the prior informationyp– Decide 1if P(1) > P(2) otherwise decide 2– Prior comes from prior knowledge, no data have been seen yetbeen seen yet– If there is a reliable source prior knowledge, it should be used• Use of the class–conditional information• p(x | 1) and p(x | 2) describe the difference in lightness between populations12Pattern Classification, Chapter 2Decision using PosteriorsDecision using Posteriors• Decision given the posterior probabilitiesX is an observation for which:if P(1| x) > P(2| x) True state of nature = 1if P(1| x) < P(2| x) True state of nature = 2Therefore:whenever we observe a particular x, the probability of error is :of error is :P(error | x) = P(1| x) if we decide 2P(error | x) = P(2| x) if we decide 113Pattern Classification, Chapter 214Pattern Classification, Chapter 2Minimizing RiskMinimizing Risk•Let {} be the set of c states ofLet {1, 2,…, c} be the set of c states of nature (or “categories”)• Let {1, 2,…, a} be the set of possible actionsactionsL(|)bhl i df ki•Let (i| j) be the loss incurred for taking action iwhen the state of nature is j15Pattern Classification, Chapter 2Overall RiskRis the expected loss associated with a given decision ruleMinimizing R Minimizing R(i| x) for i = 1,…, a(lt tith t i i i i k f ti f )(select action that minimizes risk as a function of x)cjjjii)x|(P)|()x|(Rfor i = 1,…,aSelect the action ifor which R(i| x)is minimum1jjjii)|()|()|(R is minimum and R in this case is called the Bayes risk= best performance that can be achievedrisk = best performance that can be achievedPattern Classification, Chapter 2 16Conditional RiskConditional Risk•Two-category classificationTwocategory classification1: decide 12: decide22ij= (i| j)loss incurred for deciding iwhen the true state of nature is jgijConditional risk:R(1| x) = 11P(1| x) + 12P(2| x)R(2| x) = 21P(1| x) + 22P(2| x) (2|)21(1 |)22(2|)17Pattern Classification, Chapter 2Decision RuleDecision RuleOur rule is the following:Our rule is the following:if R(1| x) < R(2| x) action1:decide1action 1: decide 1This results in the equivalentrule:This results in the equivalent rule:decide 1if:()P(x|)P()>()P(x|)P()(21-11) P(x | 1) P(1) > (12-22) P(x | 2) P(2)and decide2otherwise18Pattern Classification, Chapter 2Likelihood ratioLikelihood ratioThe preceding rule is equivalent to the following rule:)(P)(P.)|x(P)|x(P if121121221221)()|(111212Then take action 1(decide 1)Otherwise take action 2(decide 2)2(2)19Pattern Classification, Chapter 2The Zero-one Loss FunctionThe Zeroone Loss Function• Zero-one loss function:c,...,1j,i ji 1ji 0),(jiTherefore, the conditional risk is: )|()|()|(cjxPxR1)|(1)|( )|()|()|(ijjjjiixPxPxPxR• The risk corresponding to this loss function is the

View Full Document