1Eric Xing, ML/CMU1Machine LearningMachine Learning1010--701/15701/15--781, Fall 2006781, Fall 2006Tutorial on Basic ProbabilityTutorial on Basic ProbabilityEric XingEric XingLecture 2, September 14, 2006Reading: Chap. 1&2, CB & Chap 5,6, TMµµxxff((xx))µµxxff((xx))Eric Xing, ML/CMU2What is this?z Classical AI and ML research ignored this phenomena z The Problem (an example): z you want to catch a flight at 10:00am from Pitt to SF, can I make it if I leave at 7am and take a 28X at CMU?z partial observability (road state, other drivers' plans, etc.)z noisy sensors (radio traffic reports)z uncertainty in action outcomes (flat tire, etc.)z immense complexity of modeling and predicting trafficz Reasoning under uncertainty!2Eric Xing, ML/CMU3Basic Probability Conceptsz A sample spaceSis the set of all possible outcomes of a conceptual or physical, repeatable experiment. (Scan be finite or infinite.)z E.g., Smay be the set of all possible outcomes of a dice roll: z E.g., Smay be the set of all possible nucleotides of a DNA site: z E.g., Smay be the set of all possible positions time-space positions of a aircraft on a radar screen: z An eventAis the any subsetS :z Seeing "1" or "6" in a roll; observing a "G" at a site; UA007 in space-time interval Xz An event spaceEis the possible worlds the outcomes can happenz All dice-rolls, reading a genome, monitoring the radar signal{}GC,T,A,≡S{}61,2,3,4,5,≡S},{},{},{omax+∞××≡ 036000RSEric Xing, ML/CMU4Visualizing Probability Spacez A probability space is a sample space of which, for every subset s∈S, there is an assignment P(s)∈Ssuch that:z 0≤P(s) ≤1z Σs∈SP(s)=1zP(s) is called the probability (or probability mass) of sWorlds in which A is trueWorlds in which A is falseP(a)is the area of the ovalEvent space of all possible worlds. Its area is 13Eric Xing, ML/CMU5Kolmogorov Axiomsz All probabilities are between 0 and 1z 0 ≤ P(X) ≤ 1z P(true) = 1z regardless of the event, my outcome is truez P(false)=0 z no event makes my outcome truez The probability of a disjunction is given byz P(A ∨B) = P(A) + P(B) − P(A ∧ B)ABA∧B¬A∧¬BA∨B ?Eric Xing, ML/CMU6Why use probability?z There have been attempts to develop different methodologies for uncertainty:z Fuzzy logicz Qualitative reasoning (Qualitative physics)z …z“Probability theory is nothing but common sense reduced to calculation”z — Pierre Laplace, 1812.zIn 1931, de Finetti proved that it is irrational to have beliefs that violate these axioms, in the following sense:z If you bet in accordance with your beliefs, but your beliefs violate the axioms, then you can be guaranteed to lose money to an opponent whose beliefs more accurately reflect the true state of the world. (Here, “betting” and “money” are proxies for “decision making” and “utilities”.)zWhat if you refuse to bet? This is like refusing to allow time to pass: every action (including inaction) is a bet4Eric Xing, ML/CMU7Random Variablez A random variable is a function that associates a unique numerical value (a token) with every outcome of an experiment. (The value of the r.v. will vary from trial to trial as the experiment is repeated) z Discrete r.v.:z The outcome of a dice-rollz The outcome of reading a nt at site i: z Binary event and indicator variable:z Seeing an "A" at a site ⇒ X=1, o/wX=0. z This describes the true or false outcome a random event.z Can we describe richer outcomes in the same way? (i.e., X=1, 2, 3, 4, for being A, C, G, T) --- think about what would happen if we take average of X.z Unit-Base Random vectorXi=[Xi A, Xi T, Xi G, Xi C]', Xi=[0,0,1,0]' ⇒ seeing a "G" at site i z Continuous r.v.:z The outcome of recording the true location of an aircraft: z The outcome of observing the measured location of an aircraftωωSSX(ω)iXtrueXobsXXEric Xing, ML/CMU8Discrete Prob. Distributionz (In the discrete case), a probability distribution P on S(and hence on the domain of X ) is an assignment of a non-negative real number P(s) to each s∈S(or each valid value of x) such that Σs∈SP(s)=1. (0≤P(s) ≤1)z intuitively, P(s) corresponds to the frequency (or the likelihood) of getting sin the experiments, if repeated many timesz call θs= P(s) the parameters in a discrete probability distributionz A probability distribution on a sample space is sometimes calleda probability model, in particular if several different distributions are under considerationz write models as M1, M2, probabilities as P(X|M1), P(X|M2)z e.g., M1may be the appropriate prob. dist. if X is from "fair dice", M2is for the "loaded dice". z M is usually a two-tuple of {dist. family, dist. parameters}5Eric Xing, ML/CMU9z Bernoulli distribution: Ber(p)z Multinomial distribution: Mult(1,θ)z Multinomial (indicator) variable: . 1 , w.p.1 1 and ],1,0[ where , [1,...,6][1,...,6]654321====⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡=∑∑∈∈jjjjjjjXXXXXXXXXXθθ()xkxkxTxGxCxAjjkTGCAjXPjxpθθθθθθθ==×××====∏}face-dice index the where,1{))((Discrete Distributions⎩⎨⎧==−=101xpxpxPfor for )(xxppxP−−=11 )()(⇒Eric Xing, ML/CMU10z Multinomial distribution: Mult(n,θ)z Count variable:Discrete Distributions∑=⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=jjKnxxxX where , M1xKxKxxKxxxnxxxnxpKθθθθ!!!!!!!!)(LLL21212121==6Eric Xing, ML/CMU11Continuous Prob. Distributionz A continuous random variableXcan assume any value in an interval on the real line or in a region in a high dimensional spacez X usually corresponds to a real-valued measurements of some property, e.g., length, position, …z It is not possible to talk about the probability of the random variable assuming a particular value --- P(x) = 0z Instead, we talk about the probability of the random variable assuming a value within a given interval, or half intervalzzArbitrary Boolean combination of basic propositions [](), ,21xxXP∈()[]()xXPxXP,∞−∈=<Eric Xing, ML/CMU12Continuous Prob. Distributionz The probability of the random variable assuming a value within some given interval from x1to x2is defined to be the area under the graph of the probability density function betweenx1andx2.z Probability mass: note that z Cumulative distribution function (CDF):z Probability density function (PDF): ()∫∞−=<=xdxxpxXPxP')'()([](), )( ,∫=∈2121xxdxxpxxXP()xPdxdxp=)(. 1 )( =∫+∞∞−dxxpCar flow on Liberty Bridge (cooked up!)xxpdxxp∀>=∫+∞∞−,)( ; 1 )( 07Eric Xing, ML/CMU13What is the
View Full Document