Pitt CS 2710 - Learning probability distributions - D2280445

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2710> Learning probability distributions

DOC PREVIEW

Pitt CS 2710 - Learning probability distributions

School name University of Pittsburgh

Course Cs 2710- Foundtns of Artificl Intellgnc

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 25Milos [email protected] Sennott SquareLearning probability distributionsCS 2710 Foundations of AIDensity estimationData: Attributes:• modeled by random variables with:– Continuous values– Discrete valuesE.g. blood pressure with numerical values or chest pain with discrete values [no-pain, mild, moderate, strong]Underlying true probability distribution:},..,,{21 nDDDD =iiD x=a vector of attribute values},,,{21 dXXX K=X)(Xp2CS 2710 Foundations of AIDensity estimationData: Objective: try to estimate the underlying true probability distribution over variables , , using examples in DStandard (iid) assumptions: Samples• are independent of each other• come from the same (identical) distribution (fixed )},..,,{21 nDDDD =iiD x=a vector of attribute valuesX)(Xp},..,,{21 nDDDD=n samplestrue distributionestimate)(ˆXp)(Xp)(XpCS 2710 Foundations of AILearning via parameter estimationIn this lecture we consider parametric density estimationBasic settings:• A set of random variables • A model of the distribution over variables in Xwith parameters • DataObjective: find parameters that fit the data the best, or in other words reduce the misfit between the data and the model • What is the best set of parameters? – There are various criteria one can apply here.},,,{21 dXXX K=XΘ},..,,{21 nDDDD =Θˆ3CS 2710 Foundations of AIParameter estimation. Basic criteria.• Maximum likelihood (ML) criterion• Maximum a posteriori probability (MAP) criterion),|(maxargξΘΘDpξ- represents prior (background) knowledge),|(maxargξDp ΘΘ)|()|(),|(),|(ξξξξDppDpDpΘΘ=ΘMAP selects the mode of the posteriorLikelihood of dataPosterior probabilityCS 2710 Foundations of AIParameter estimation. Coin example.Coin example: we have a coin that can be biasedOutcomes: two possible values -- head or tailData: D a sequence of outcomes such that • head• tailModel: probability of a headprobability of a tailObjective:We would like to estimate the probability of a headfrom dataθ)1(θ−0=ix1=ixixθˆ4CS 2710 Foundations of AIMaximum a posterior probabilityMaximum a posteriori estimate– Selects the mode of the posterior distributionNotice that parameters of the prioract like counts of heads and tails (sometimes they are also referred to as prior counts)21212111−+++−+=NNNMAPαααθMAP Solution:11221121212211)1()()()(−+−+−+Γ+Γ+++Γ=ααθθααααNNNNNN),|()|(),|(),|(),|(221121NNBetaDPBetaDPDp ++==ααθξααθξθξθCS 2710 Foundations of AIMAP estimate example• Note that the prior and data fit (data likelihood) are combined• The MAP can be biased with large prior counts• It is hard to overturn it with a smaller sample size• Data:H H T T H H T H T H T T T H T H H H H T H H H H T– Heads: 15– Tails: 10• Assume )20,5|()|(θξθBetap =)5,5|()|(θξθBetap =3319=MAPθ4819=MAPθ5CS 2710 Foundations of AIMultinomial distributionExample: Multi-way coin toss, roll of dice• Data: a set of N outcomes (multi-set)Model parameters:Probability of data(likelihood)ML estimate:NNiMLi=,θ),,(21 kθθθK=θ11=∑=kiiθs.t.iN- a number of times an outcome i has been seenkNkNNkkNNNNNNNPθθθξKKK21212121!!!!),|,,( =θiθ- probability of an outcome iMultinomialdistributionCS 2710 Foundations of AIMAP estimateChoice of prior: Dirichlet distribution),..,|()|(),..,|(),|(),|(1121kkkNNDirDPDirDPDp ++==ααξαααξξθθθθ()∑=−+−+=kiiiiiMAPikNN,..1,1ααθMAP estimate:Posterior distribution1121111121)()(),..,|(−−−==∏∑ΓΓ=kkkiikiikDirαααθθθααααKθDirichlet is the conjugate choice for multinomialkNkNNkkNNNNNNNPDPθθθξξKKK21212121!!!!),|,,(),|( == θθ6CS 2710 Foundations of AILearning complex distributions• The problem of learning complex distributions – can be sometimes reduced to the problem of learning a number of simpler distributions• Such a decomposition occurs for example in Bayesian networks– Builds upon independences encoded in the network•Why learning of BBNs?– Large databases are available • uncover important probabilistic dependencies from data and use them in inference tasksCS 2710 Foundations of AILearning of BBN parametersLearning. Two steps:– Learning of the network structure– Learning of parameters of conditional probabilities•Variables:– Observable – values present in every data sample– Hidden – values are never in the sample– Missing values – values sometimes present, sometimes not•Here:– learning parameters for the fixed graph structure– All variables are observed in the dataset7CS 2710 Foundations of AILearning of BBN parameters. Example.Example:PneumoniaCoughFeverPalenessHigh WBCP(Pneumonia)? ? T FPn T FT ? ?F ? ?P(HWBC|Pneum)P(Cough|Pneum)P(Fever|Pneum)P(Palen|Pneum)? ? ? CS 2710 Foundations of AILearning of BBN parameters. Example.Data D (different patient cases):Pal Fev Cou HWB PneuT T T T FT F F F FF F T T TF F T F TF T T T TT F T F FF F F F FT T F F FT T T T TF T F T TT F F T FF T F F FPneumoniaCoughFeverPalenessHigh WBC8CS 2710 Foundations of AIEstimates of parameters of BBN• Much like multiple coin tosses or rolls of a dice• A “smaller” learning problem corresponds to the learning of exactly one conditional distribution •Example:• Problem: How to pick the data to learn?)|( TPneumoniaFever=PCS 2710 Foundations of AILearning of BBN parameters. Example.Data D (different patient cases):Pal Fev Cou HWB PneuT T T T FT F F F FF F T T TF F T F TF T T T TT F T F FF F F F FT T F F FT T T T TF T F T TT F F T FF T F F FPneumoniaCoughFeverPalenessHigh WBC?)|( ==TPneumoniaFeverPHow to estimate:9CS 2710 Foundations of AILearning of BBN parameters. Example.Learn:Step 1: Select data points with Pneumonia=TPal Fev Cou HWB PneuT T T T

View Full Document