DOC PREVIEW
Pitt CS 2710 - Learning probability distributions

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 25Milos [email protected] Sennott SquareLearning probability distributionsCS 2710 Foundations of AIDensity estimationData: Attributes:• modeled by random variables with:– Continuous values– Discrete valuesE.g. blood pressure with numerical values or chest pain with discrete values [no-pain, mild, moderate, strong]Underlying true probability distribution:},..,,{21 nDDDD =iiD x=a vector of attribute values},,,{21 dXXX K=X)(Xp2CS 2710 Foundations of AIDensity estimationData: Objective: try to estimate the underlying true probability distribution over variables , , using examples in DStandard (iid) assumptions: Samples• are independent of each other• come from the same (identical) distribution (fixed )},..,,{21 nDDDD =iiD x=a vector of attribute valuesX)(Xp},..,,{21 nDDDD=n samplestrue distributionestimate)(ˆXp)(Xp)(XpCS 2710 Foundations of AILearning via parameter estimationIn this lecture we consider parametric density estimationBasic settings:• A set of random variables • A model of the distribution over variables in Xwith parameters • DataObjective: find parameters that fit the data the best, or in other words reduce the misfit between the data and the model • What is the best set of parameters? – There are various criteria one can apply here.},,,{21 dXXX K=XΘ},..,,{21 nDDDD =Θˆ3CS 2710 Foundations of AIParameter estimation. Basic criteria.• Maximum likelihood (ML) criterion• Maximum a posteriori probability (MAP) criterion),|(maxargξΘΘDpξ- represents prior (background) knowledge),|(maxargξDp ΘΘ)|()|(),|(),|(ξξξξDppDpDpΘΘ=ΘMAP selects the mode of the posteriorLikelihood of dataPosterior probabilityCS 2710 Foundations of AIParameter estimation. Coin example.Coin example: we have a coin that can be biasedOutcomes: two possible values -- head or tailData: D a sequence of outcomes such that • head• tailModel: probability of a headprobability of a tailObjective:We would like to estimate the probability of a headfrom dataθ)1(θ−0=ix1=ixixθˆ4CS 2710 Foundations of AIMaximum a posterior probabilityMaximum a posteriori estimate– Selects the mode of the posterior distributionNotice that parameters of the prioract like counts of heads and tails (sometimes they are also referred to as prior counts)21212111−+++−+=NNNMAPαααθMAP Solution:11221121212211)1()()()(−+−+−+Γ+Γ+++Γ=ααθθααααNNNNNN),|()|(),|(),|(),|(221121NNBetaDPBetaDPDp ++==ααθξααθξθξθCS 2710 Foundations of AIMAP estimate example• Note that the prior and data fit (data likelihood) are combined• The MAP can be biased with large prior counts• It is hard to overturn it with a smaller sample size• Data:H H T T H H T H T H T T T H T H H H H T H H H H T– Heads: 15– Tails: 10• Assume )20,5|()|(θξθBetap =)5,5|()|(θξθBetap =3319=MAPθ4819=MAPθ5CS 2710 Foundations of AIMultinomial distributionExample: Multi-way coin toss, roll of dice• Data: a set of N outcomes (multi-set)Model parameters:Probability of data(likelihood)ML estimate:NNiMLi=,θ),,(21 kθθθK=θ11=∑=kiiθs.t.iN- a number of times an outcome i has been seenkNkNNkkNNNNNNNPθθθξKKK21212121!!!!),|,,( =θiθ- probability of an outcome iMultinomialdistributionCS 2710 Foundations of AIMAP estimateChoice of prior: Dirichlet distribution),..,|()|(),..,|(),|(),|(1121kkkNNDirDPDirDPDp ++==ααξαααξξθθθθ()∑=−+−+=kiiiiiMAPikNN,..1,1ααθMAP estimate:Posterior distribution1121111121)()(),..,|(−−−==∏∑ΓΓ=kkkiikiikDirαααθθθααααKθDirichlet is the conjugate choice for multinomialkNkNNkkNNNNNNNPDPθθθξξKKK21212121!!!!),|,,(),|( == θθ6CS 2710 Foundations of AILearning complex distributions• The problem of learning complex distributions – can be sometimes reduced to the problem of learning a number of simpler distributions• Such a decomposition occurs for example in Bayesian networks– Builds upon independences encoded in the network•Why learning of BBNs?– Large databases are available • uncover important probabilistic dependencies from data and use them in inference tasksCS 2710 Foundations of AILearning of BBN parametersLearning. Two steps:– Learning of the network structure– Learning of parameters of conditional probabilities•Variables:– Observable – values present in every data sample– Hidden – values are never in the sample– Missing values – values sometimes present, sometimes not•Here:– learning parameters for the fixed graph structure– All variables are observed in the dataset7CS 2710 Foundations of AILearning of BBN parameters. Example.Example:PneumoniaCoughFeverPalenessHigh WBCP(Pneumonia)? ? T FPn T FT ? ?F ? ?P(HWBC|Pneum)P(Cough|Pneum)P(Fever|Pneum)P(Palen|Pneum)? ? ? CS 2710 Foundations of AILearning of BBN parameters. Example.Data D (different patient cases):Pal Fev Cou HWB PneuT T T T FT F F F FF F T T TF F T F TF T T T TT F T F FF F F F FT T F F FT T T T TF T F T TT F F T FF T F F FPneumoniaCoughFeverPalenessHigh WBC8CS 2710 Foundations of AIEstimates of parameters of BBN• Much like multiple coin tosses or rolls of a dice• A “smaller” learning problem corresponds to the learning of exactly one conditional distribution •Example:• Problem: How to pick the data to learn?)|( TPneumoniaFever=PCS 2710 Foundations of AILearning of BBN parameters. Example.Data D (different patient cases):Pal Fev Cou HWB PneuT T T T FT F F F FF F T T TF F T F TF T T T TT F T F FF F F F FT T F F FT T T T TF T F T TT F F T FF T F F FPneumoniaCoughFeverPalenessHigh WBC?)|( ==TPneumoniaFeverPHow to estimate:9CS 2710 Foundations of AILearning of BBN parameters. Example.Learn:Step 1: Select data points with Pneumonia=TPal Fev Cou HWB PneuT T T T


View Full Document

Pitt CS 2710 - Learning probability distributions

Documents in this Course
Load more
Download Learning probability distributions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning probability distributions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning probability distributions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?