STEVENS CS 559 - CS 559 2nd Set of Notes - D2581506

Home> Schools> Stevens Institute of Technology> (CS) > CS 559> CS 559 2nd Set of Notes

STEVENS CS 559 - CS 559 2nd Set of Notes

School name Stevens Institute of Technology

Course Cs 559- Machine Learning Syllabus

Pages 87

Download Save

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications2ndSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Logistics: ProjectLogistics: Project •Class webpage:Class webpage: http://www.cs.stevens.edu/~mordohai/classes/cs559_s10.html• Project types:–Application project: pick an application of interest, pp p j p pp ,apply learning algorithm on it–Algorithmic project: develop an algorithm (or a gpj pg(variant) and solve some problem with itPattern Classification, Chapter 2 2Logistics: ProjectLogistics: Project • Topics: pick one before March 23 (Tue. after pp(Spring break)– Related to your research, but it has to be extendedBd l til–Based on class material– Brain storm with me•Has to be approved byme before March 23Has to be approved by me before March 23• Present for 5 minutes on March 25• Do actual work• Present project in class on April 29• Submit brief report by May 5 (midnight)Pattern Classification, Chapter 2 3Note: Independent vs. Uncorrelated RdViblRandom Variables•Two events A and B are independent if and onlyTwo events A and B are independent if and only if Pr(A and B) = Pr(A)Pr(B).• Two random variables X and Y are independent pif and only if for any numbers a and b the events {X ≤ a} (the outcomes where X being less than or equal to a) and {Y ≤ b} are independent events as defined above.4Note: Independent vs. Uncorrelated Rd Vibl E lRandom Variables Example•Xuniform in [-11]X uniform in [1, 1]• Y=X2XdY ld[]•X and Y are uncorrelated [prove], not independent 5OverviewOverview•Bayesian Decision Theory–ContinuousBayesian Decision TheoryContinuous Features•Discriminant Functions for the Normal•Discriminant Functions for the Normal DensityBiDii ThDi•Bayesian Decision Theory –Discrete FeaturesPattern Classification, Chapter 2 6Chapter 2 (Part 1): Bayesian Decision Theoryyy(Sections 2.1-2.2) Bayesian Decision Theory–Continuous Features7Bayes Rule-IntuitionBayes Rule IntuitionThe essence of the Bayesian approach is to id h i l l lii hprovide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence In other words it allows scientistsnew evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise. pFrom the Economist (2000) 8Bayes Rule-IntuitionBayes Rule IntuitionThe canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. Thesun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child's degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble and the probability (and thuschild adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.From the Economist (2000) 9Bayes’RuleBayes RuleposteriorlikelihoodpriorxxpPPjjj| | posteriorxpjevidence110jjPP1|1|000|11|xxxxxjjjjjjppPpPpp1|1|0xxjjppA leading proponent I.J. Good argued persuasively that ``the subjectivist (i.e. Bayesian) states his judgments whereas the objectivist sweeps them under the carpet by callingPattern Classification, Chapter 2 10states his judgments, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science''.The Origin of Bayes’RuleThe Origin of Bayes Rule•A simple consequence of usingA simple consequence of using probability to represent degrees of belief•For any two random variables:•For any two random variables:)|()()&( ABpApBAp)|()()&( BApBpBAp)|()()|()(ABpApBApBp)|()()|()(pppp)()|()()|(BpABpApBAp Pattern Classification, Chapter 2 11)(BpBayesian Decision TheoryBayesian Decision Theory•Know probability distribution of theKnow probability distribution of the categories•Do not even need training data•Do not even need training data• Can design optimal classifier• Very rare in real lifeyPattern Classification, Chapter 2 12PriorPrior•Prior comes from prior knowledge nodataPrior comes from prior knowledge, no data have been seen yet•Ifthere is a reliable source prior•If there is a reliable source prior knowledge, it should be usedSbl bld•Some problems cannot even be solved reliably without a good prior• However prior alone is not enough, we still need likelihoodPattern Classification, Chapter 2 13Decision RulesDecision Rules•Decision rule with only the priorDecision rule with only the prior information–Decide 1if P(1) > P(2) otherwise decide 21(1)(2)2•Use of theclass–conditional informationUse of the classconditional information•p(x|1) and p(x|2) describe thep(x | 1) and p(x | 2) describe the difference in lightness between populations of sea and salmonpp14Pattern Classification, Chapter 2Class-conditional Density vs. Lik lih dLikelihood•Class-conditional densities areprobabilityClassconditional densities are probability density functions p(x| ) when class is fixedfixed• Likelihoods are values of p(x| ) for a given xgiven x• This is a subtle point. Think about it.pPattern Classification, Chapter 2 1516Pattern Classification, Chapter 2PosteriorLikelihoodEvidencePosterior, Likelihood, Evidence– P(j| x) = P(x | j) . P (j ) / P(x)– In the case of two categories 2j21)()|()(jjjjPxPxP– Posterior = (Likelihood * Prior) / Evidence17Pattern Classification, Chapter 2Decision using PosteriorsDecision using Posteriors• Decision given the posterior probabilitiesX is an observation for which:if P(1| x) > P(2| x)True state of nature = 1if

View Full Document


School:
Email:
New Password:
Confirm Password:

STEVENS CS 559 - CS 559 2nd Set of Notes

Sign up for free to view:

Please select your school