10 601 Recitation Wednesday September 28th 2011 Will Bishop Announcements The recitation time has been permanently set 6 7 PM on Wednesdays in Wean 7500 this room HW2 should be out soon Topics for Today Conjugate Priors MAP Estimators Example Derivation Na ve Bayes Decoders Motivated through a real world example from brain computer interface BCI How would you design a decoder for this Feature Extraction Acquisition of Neural Signals Visual Feedback Time High Level Signal to Interface Device Classification of User Intent Decoders we know about so far Decision Trees Na ve Bayes Notation Assume we have U neurons yi will be label for trial i xi j will be observed count for neuron j for trial i Review of Na ve Bayes In general we would like P Yi Xi 1 Xi U Review of Na ve Bayes In general we would like P Yi Xi 1 Xi U Let assume we know 1 P Xi 1 Xi U Yi for Yi 0 and Yi 1 2 P Yi for Yi 0 and Yi 1 Review of Na ve Bayes In general we would like P Yi Xi 1 Xi U Let assume we know 1 P Xi 1 Xi U Yi for Yi 0 and Yi 1 2 P Yi for Yi 0 and Yi 1 How do we get P Yi Xi 1 Xi U Probability of target Given observed data Review of Na ve Bayes Bayes Rule Likelihood Term We assume we know this P Xi 1 Xi U Yi P Yi P Yi Xi 1 Xi U P Xi 1 Xi U Review of Na ve Bayes Bayes Rule Prior Term We assume we know this too P Xi 1 Xi U Yi P Yi P Yi Xi 1 Xi U P Xi 1 Xi U Review of Na ve Bayes Bayes Rule P Xi 1 Xi U Yi P Yi P Yi Xi 1 Xi U P Xi 1 Xi U Normalizing Term Can calculate this though in practice we often don t if all we care about is finding the class with the highest posterior probability Review of Na ve Bayes Bayes Rule So if we know P Xi 1 Xi U Yi and P Yi we can easily calculate the probabilities we need to decode with But how do we learn P Xi 1 Xi U Yi Let s assume that each Xi value can take 10 di erent values If U 10 and we try to learn this using the truth table approach how many parameters must we fit 10 10 1 Parameters Review of Na ve Bayes With na ve Bayes we assume P Xi 1 Xi U Yi QU u 1 P Xi u Yi This means we can fit U separate truth tables so how many parameters do we need now Review of Na ve Bayes With na ve Bayes we assume P Xi 1 Xi U Yi QU u 1 P Xi u Yi This means we can fit U separate truth tables so how many parameters do we need now 10 1 10 90 Parameters Of course we have to do this for both possible values of Yi so actually need to 180 parameters to fit P Xi 1 Xi U Yi 0 and P Xi 1 Xi U Yi 1 Motivating Example In practice we don t use a truth table for P Xi Yi but instead assume it is a Poisson distribution P X X e X Motivating Example So given a set of N observed counts for neuron j X1 j XN j when the subject was reaching for target Yi 1 how can we learn the appropriate value for P Xi j Yi 1 Motivating Example So given a set of N observed counts for neuron j X1 j XN j when the subject was reaching for target Yi 1 how can we learn the appropriate value for P Xi j Yi 1 1 Maximum Likelihood Estimator Covered last recitation 2 Maximum A Posteriori Estimator Covered today MAP Estimators Given a set of N observations X1 XN we are after P X1 XN P P X1 XN P X1 XN Likelihood term Prior MAP Estimators Given a set of N observations X1 XN we are after P X1 XN P P X1 XN P X1 XN Prior MAP Estimators Given a set of N observations X1 XN we are after P X1 XN P P X1 XN P X1 XN How do we choose the prior Prior MAP Estimators Considerations when selecting the prior The prior encodes your initial beliefs before you ve seen any data about parameter values Often we select the prior so things work out nicely mathematically Conjugate Priors Conjugate priors A prior is conjugate to the distribution we are using for our likelihood term if When we multiply the the prior by the likelihood term and divide by the normalizing constant in Bayes equation the resulting probability distribution is in the same family as the prior Conjugate Priors Conjugate priors A prior is conjugate to the distribution we are using for our likelihood term if When we multiply the the prior by the likelihood term and divide by the normalizing constant in Bayes equation the resulting probability distribution is in the same family as the prior It makes the math easy MAP Estimator An Example Assume we have X1 XN observations from a Poisson distribution With unknown Let s find a MAP estimator for lambda MAP Estimator An Example Assume we have X1 XN observations from a Poisson distribution With unknown Assume our prior belief on a Gamma distribution In other words is given by Gamma MAP Estimator An Example P Gamma The pdf for a Gamma distribution is 1 P 1 C 1 1 e e MAP Estimator An Example P Gamma The pdf for a Gamma distribution is 1 P 1 C 1 1 e e Just a normalizing constant MAP Estimator An Example P Gamma The pdf for a Gamma distribution is 1 P 1 C 1 1 e e MAP Estimator An Example Let s write out the likelihood for our data Let s write out the likelihood for our data P x1 xN N Y P xn n 1 N Y e n 1 Q e xn N n 1 N Q e QN n 1 N e xn N n 1 xn xn x1 x2 xn QN xn n 1 xn QN n 1 PN n 1 xn Let s write out the likelihood for our data P x1 xN N Y P xn n 1 N Y e n 1 Q e xn N n 1 N Q e QN n 1 N e xn N n 1 xn xn x1 x2 xn QN xn n 1 xn QN n 1 PN n 1 xn Let s write out the likelihood for our data P x1 xN N Y P xn n 1 N Y e n 1 Q e xn N n 1 N Q e QN n 1 N e xn N n 1 xn xn x1 x2 xn QN xn n 1 xn QN n 1 PN …
View Full Document