DOC PREVIEW
Pitt CS 2710 - Learning probability distributions

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 24Milos [email protected] Sennott SquareLearning probability distributionsCS 2710 Foundations of AIUnsupervised learning• Data:a vector of attribute values– e.g. the description of a patient– no specific target attribute we want to predict (no output y)• Objective:– learn (describe) relations between attributes, examplesTypes of problems:• ClusteringGroup together “similar” examples• Density estimation– Model probabilistically the population of examples},..,,{21 nDDDD =iiD x=2CS 2710 Foundations of AIDensity estimationData: Attributes:• modeled by random variables with:– Continuous values– Discrete valuesE.g. blood pressure with numerical values or chest pain with discrete values [no-pain, mild, moderate, strong]Underlying true probability distribution:},..,,{21 nDDDD =iiD x=a vector of attribute values},,,{21 dXXX K=X)(XpCS 2710 Foundations of AIDensity estimationData: Objective: try to estimate the underlying true probability distribution over variables , , using examples in DStandard (iid) assumptions: Samples• are independent of each other• come from the same (identical) distribution (fixed )},..,,{21 nDDDD =iiD x=a vector of attribute valuesX)(Xp},..,,{21 nDDDD=n samplestrue distributionestimate)(ˆXp)(Xp)(Xp3CS 2710 Foundations of AILearning via parameter estimationIn this lecture we consider parametric density estimationBasic settings:• A set of random variables • A model of the distribution over variables in Xwith parameters • DataObjective: find parameters that fit the data the best, or in other words reduce the misfit between the data and the model • What is the best set of parameters? – There are various criteria one can apply here.},,,{21 dXXX K=XΘ},..,,{21 nDDDD =ΘˆCS 2710 Foundations of AIParameter estimation. Basic criteria.• Maximum likelihood (ML) criterion• Maximum a posteriori probability (MAP) criterion),|(maxargξΘΘDpξ- represents prior (background) knowledge),|(maxargξDp ΘΘ)|()|(),|(),|(ξξξξDppDpDpΘΘ=ΘMAP selects the mode of the posteriorLikelihood of dataPosterior probability4CS 2710 Foundations of AIParameter estimation. Coin example.Coin example: we have a coin that can be biasedOutcomes: two possible values -- head or tailData: D a sequence of outcomes such that • head• tailModel: probability of a headprobability of a tailObjective:We would like to estimate the probability of a headfrom dataθ)1(θ−0=ix1=ixixθˆCS 2710 Foundations of AIParameter estimation. Example.• Assume the unknown and possibly biased coin• Probability of the head is• Data:H H T T H H T H T H T T T H T H H H H T H H H H T– Heads: 15– Tails: 10What would be your estimate of the probability of a head ?θ?~=θ5CS 2710 Foundations of AIParameter estimation. Example• Assume the unknown and possibly biased coin• Probability of the head is• Data:H H T T H H T H T H T T T H T H H H H T H H H H T– Heads: 15– Tails: 10What would be your choice of the probability of a head ?Solution: use frequencies of occurrences to do the estimateThis is the maximum likelihood estimate of the parameterθ6.02515~==θθCS 2710 Foundations of AIProbability of an outcomeData: D a sequence of outcomes such that •head• tailModel: probability of a headprobability of a tailAssume: we know the probabilityProbability of an outcome of a coin flip– Combines the probability of a head and a tail– So that is going to pick its correct probability – Gives for– Gives for)1()1()|(iixxixP−−=θθθθ)1(θ−0=ix1=ixixBernoulli distributionixixθ)1(θ−0=ix1=ixθ6CS 2710 Foundations of AIProbability of a sequence of outcomes.Data: D a sequence of outcomes such that •head• tailModel: probability of a headprobability of a tailAssume: a sequence of independent coin flips D = H H T H T H (encoded as D= 110101)What is the probability of observing the data sequence D:?)|( =θDPθ)1(θ−0=ix1=ixixCS 2710 Foundations of AIProbability of a sequence of outcomes.Data: D a sequence of outcomes such that •head• tailModel: probability of a headprobability of a tailAssume: a sequence of coin flips D = H H T H T Hencoded as D= 110101What is the probability of observing a data sequence D:θθθθθθθ)1()1()|(−−=DPθ)1(θ−0=ix1=ixix7CS 2710 Foundations of AIProbability of a sequence of outcomes.Data: D a sequence of outcomes such that •head• tailModel: probability of a headprobability of a tailAssume: a sequence of coin flips D = H H T H T Hencoded as D= 110101What is the probability of observing a data sequence D:θθθθθθθ)1()1()|(−−=DPθ)1(θ−0=ix1=ixixlikelihood of the dataCS 2710 Foundations of AIProbability of a sequence of outcomes.Data: D a sequence of outcomes such that •head• tailModel: probability of a headprobability of a tailAssume: a sequence of coin flips D = H H T H T Hencoded as D= 110101What is the probability of observing a data sequence D:Can be rewritten using the Bernoulli distribution:θθθθθθθ)1()1()|(−−=DPθ)1(θ−0=ix1=ixix)1(61)1()|(iixixDP−=−=∏θθθ8CS 2710 Foundations of AIThe goodness of fit to the data.Learning: we do not know the value of the parameterOur learning goal: • Find the parameter that fits the data D the best? One solution to the “best”: Maximize the likelihoodIntuition:• more likely are the data given the model, the better is the fitNote: Instead of an error function that measures how bad the data fit the model we have a measure that tells us how well the data fit :θ)1(1)1()|(iixnixDP−=−=∏θθθθ)|(),(θθDPDError−=CS 2710 Foundations of AIMaximum likelihood (ML) estimate.Maximum likelihood estimate1N- number of heads seen2N- number of tails seen),|(maxargξθθθDPML=Likelihood of data:)1(1)1(),|(iixnixDP−=−=∏θθξθOptimize log-likelihood (the same as maximizing likelihood)=−==−=∏)1(1)1(log),|(log),(iixnixDPDlθθξθθ)1()1log(log)1log()1(log111∑∑∑===−−+=−−+niiniiniiixxxxθθθθ9CS 2710 Foundations of AIMaximum likelihood (ML) estimate.2111NNNNNML+==θML Solution:Optimize log-likelihood)1log(log),(21θθθ−+=NNDlSet derivative to zero0)1(),(21=−−=∂∂θθθθNNDl211NNN+=θSolvingCS 2710 Foundations of AIMaximum likelihood estimate. Example• Assume the


View Full Document

Pitt CS 2710 - Learning probability distributions

Documents in this Course
Load more
Download Learning probability distributions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning probability distributions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning probability distributions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?