ProbabilityProbability•Random variables•Atomic events•Sample spaceRVs: variables whose values are (potentially) uncertain! tomorrowʼs weather (rain/sun), change in AAPL stock price (up/same/dn), grade on HW1 (0..100)! discrete for nowatomic event: setting for *all* rvs of interest! w=rainy & AAPL=down & HW1=93sample space: Omega = set of all atomic eventsProbability•Events•Combining eventsweather = rainy, grade = 93/100 grade >= 90! set of atomic eventscombining: and, or, not = iters., union, set diff! w = rainy, AAPL != dn! (note , means AND)Probability•Measure:•disjoint union:•e.g.:•interpretation:•Distribution:•interpretation:•e.g.:measure: fn mu from 2^Omega -> R+subsets of sample space to reals >= 0[note: R++ means +ve reals]! additive: events e1, e2, ..., e_k: mu(union e_i) = sum(mu(e_i))! implies mu(empty-set) = 0! e.g.: counting (mu(S) = |S|)! interp: “size” of setdistʼn: Omega measures 1; interp: probability of set! e.g.: uniform (1/|Omega| on each singleton)ExampleWeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14note that they sum to 1note we only need to list atomic eventswork out P(sun & ~down) = .24! used disjoint union=====>> [.3; .7] * [.3 .5 .2] 0.0900 0.1500 0.0600 0.2100 0.3500 0.1400Bigger exampleWeatherAAPL priceupsamedownsunrain0.030.050.020.070.120.05Weatherupsamedownsunrain0.140.230.090.060.100.04LAX PITcalculate P(up) = .03 + .07 + .14 + .06 = .3P(down & sun) = .02 + .09 = .11====>> [.3; .7] * [.3 .5 .2] * (1/3)ans = 0.0300 0.0500 0.0200 0.0700 0.1167 0.0467>> [.7; .3] * [.3 .5 .2] * (2/3)ans = 0.1400 0.2333 0.0933 0.0600 0.1000 0.0400Notation•X=x: event that r.v. X is realized as value x•P(X=x) means probability of event X=x•if clear from context, may omit “X=”•instead of P(Weather=rain), just P(rain)•complex events too: e.g., P(X=x, Y≠y)•P(X) means a function: x → P(X=x)P: under some distribution understood from context -- may write P_theta if there are parameters thetaFunctions of RVs•Extend definition: any deterministic function of RVs is also an RV•E.g., WeatherAAPL priceupsamedownsunrain383050eg: 3[sunny] + 5[same]note bracket notation: *indicator* of eventSample v. population•Suppose we watch for 100 days and count up our observationsWeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14WeatherAAPL priceupsamedownsunrainwrite: 7 12 3 22 41 15(actual matlab-generated sample)note: if we normalize, get similar but not same distʼn as we started withLaw of large numbers•If we take a sample of size N from distribution P, count up frequencies of atomic events, and normalize (divide by N) to get a distribution P•Then P → P as N → ∞~~this and related properties are what allow learning from samplesWorking w/ distributions•Marginals•Jointmarginal: get rid of an rv, get distʼn as if it werenʼt therejoint: before marginalization (to distinguish)MarginalsWeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14[.3 .7] and [.3 .5 .2]notation: P(Weather) or P(AAPL)MarginalsWeatherAAPL priceupsamedownsunrain0.030.050.020.070.120.05Weatherupsamedownsunrain0.140.230.090.060.100.04LAX PITmarginalize out location, then AAPL 0.17 0.28 0.11 0.13 0.22 0.09then [.56 .44]===if we had marginalized location then weather: 0.30 0.50 0.20Law of total probability•Two RVs, X and Y•Y has values y1, y2, …, yk•P(X) = P(X) = P(X, Y=y1) + P(X, Y=y2) + …Working w/ distributions•Conditional:•Observation•Consistency•Renormalization•Notation:WeatherCoinHTsunrain0.150.150.350.35observation: an event that happened, or that we imagine happened -- e.g., coin Hconsistency: zero out impossibilities! note: every atomic event is either perfectly consistent or completely inconsistent w/ observed eventrenorm: makes a distribution againnotation: P(Weather | Coin=H) or P(sun | H)! conditioning bar -- read as “given”Conditionals in the literatureWhen you have eliminated the impossible, whatever remains, however improbable, must be the truth.—Sir Arthur Conan Doyle, as Sherlock HolmesConditionalsWeatherAAPL priceupsamedownsunrain0.030.050.020.070.120.05Weatherupsamedownsunrain0.140.230.090.060.100.04LAX PITcondition on sun: P(sun) = .56>> [.03 .05 .02; .14 .23 .09] / .56ans = (table of location by AAPL) 0.0536 0.0893 0.0357 0.2500 0.4107 0.1607now condition on AAPL=uplocation: 1/6 5/6In general•Zero out all but some slice of high-D table•or an irregular set of entries•Throw away zeros•unless irregular structure makes it inconvenient•Renormalize•normalizer for P(. | event) is P(event)Conditionals•Thought experiment: what happens if we condition on an event of zero probability?answer: undefined! Not useful to ask what happens in an impossible situation, so NaN is not a problem.•P(X | Y) is a function: x, y → P(X=x | Y=y)•As is standard, expressions are evaluated separately for each realization:•P(X | Y) P(Y) means the function x, y → NotationP(X=x | Y=y) P(Y=y)ExerciseMonty Hall paradoxprize behind one door, other 2 empty (uniform)say we pick #1; 3 cases: T1, T3, T3 (1/3 each) T1: O2 or O3, equallyT2: O3T3: O2observe
View Full Document