10 701 Fall 2011 Recitation Probability Review Suyash Shringarpure 09 13 11 What we will cover Basic probability Definitions and Axioms Random Variables PDF and CDF Joint distributions Some common distributions Independence Conditional distributions Information theory basics Application to decision trees Overfitting and pruning Probability Real world Full of uncertainty Eg I have to reach home by 7 30 pm Can I take the 7 15 pm 61 C at CMU and reach How much time will the bus take after I get it possible delays due to traffic roads etc What if the bus arrives late Probability A mechanism for decision making in the presence of uncertainty Probability is a way of using information about a population to learn about a sample Why use probability There have been attempts to develop different methodologies for uncertainty Fuzzy logic Qualitative reasoning Qualitative physics In 1931 de Finetti proved that If you gamble using probability you can t be unfairly exploited by an opponent using some other system Basic Concepts A sample space S is the set of all possible outcomes of a conceptual or physical repeatable experiment S can be finite or infinite E g S may be the set of all possible outcomes of a dice roll An event A is any subset of S Eg A Event that the dice roll is 3 Probability A probability P A is a function that maps an event A onto the interval 0 1 P A is also called the probability measure or probability mass of A Worlds in which A is false Sample space of all possible worlds Call it E Its area is 1 Worlds in which A is true P A is the area of the oval Kolmogorov Axioms 1 All probabilities are non negative 1 0 P A for all A 2 P E 1 3 P A1 U A2 P A1 P A2 1 If the Ai are pairwise disjoint Ai Aj 0 for all I j All other results about probability derive from these axioms A B B A B A Consequences of Axioms P 0 P AC 1 P A Proof P A P B if A is a subset of B Proof Proof P A U B P A P B P A B Proof A B B A B A Random Variable A random variable is a function that associates a unique number with every outcome of an experiment S Discrete r v The outcome of a dice roll D 1 2 3 4 5 6 Binary event and indicator variable X 1 o w X 0 Seeing a 6 on a toss This describes the true or false outcome a random event Xi Continuous r v The outcome of observing the measured location of an aircraft Xobs X Probability distributions For each value that r v X can take assign a number in 0 1 Like the probability measure defined earlier Suppose X takes values v1 vn Then P X v1 P X vn 1 Intuitively the probability of X taking value vi is the frequency of getting outcome represented by vi Discrete Distributions Bernoulli distribution Ber p P x 1 p for x 0 p 1 for x P x p x 1 p 1 x Binomial distribution Bin n p Suppose a coin with head prob p is tossed n times What is the probability of getting k heads How many ways can you get k heads in a sequence of k heads and n k tails More distributions Multinomial Consider a k sided die Similar to a coin but with more possible outcomes A die is tossed n times What is the probability of getting x1 ones x2 twos xk k s Let x x1 x2 xk p x n x1 x2 xK 1 x1 2 x2 K xK n x1 x2 xK x Continuous Prob Distribution A continuous random variable X is defined on a continuous sample space an interval on the real line a region in a high dimensional space etc X usually corresponds to a real valued measurements of some property e g length position It is meaningless to talk about the probability of the random variable assuming a particular value P x 0 Instead we talk about the probability of the random variable assuming a value within a given interval or half interval or arbitrary Boolean combination of basic propositions P X x1 x2 PX x PX x1 x2 x3 x4 PX x Probability Density If the prob of x falling into x x dx is given by p x dx for dx then p x is called the probability density function over x The probability of the random variable assuming a value within some given interval from x1 to x2 is equivalent to the area under the graph of the probability density function between x1 and x2 Probability mass P X x1 x 2 x2 x1 p x dx p x dx 1 Gaussian Distribution Continuous Distributions Uniform Density Function p x 1 b a 0 for a b elsewhere Normal Gaussian Density Function 1 p x e x 2 x f x 2 2 2 The distribution is symmetric and is often illustrated as a bell shaped curve Two parameters mean and standard deviation determine the location and shape of the distribution The highest point on the normal curve is at the mean which is also the median and mode x Back to RVs CDF Cumulative Distribution Function In a single dice roll what is the probability of the number rolled being less than 4 P x 4 P x 4 P x 1 OR x 2 OR x 3 OR x 4 But that is the same as P x 1 P x 2 P x 3 P x 4 A function to represent this quantity is called the Cumulative Distribution Function FX x P X x CDF details Definition for a continuous probability function P x P X x p x dx Property of continous CDF p x x d P x dx Does it have any monotonicity property Statistical Characterizations Expectation the centre of mass mean first moment E X i S xi p xi xp x dx discrete continuous Sample mean Variance the spread Var X Sample variance x S xi E X 2 p xi x E X 2 p x dx discrete continuous Elementary manipulations of probabilities Set probability of multi valued r v P x Odd P 1 P 3 P 5 1 6 1 6 1 6 P X x1 X x 2 X i xi j 1 P X xj Multi variant distribution Joint probability PY X x1 X P X true Y true x2 X Marginal Probability P Y i xi j 1 j S P Y P Y X X Y xj xj X Y X Joint Probability A joint probability distribution for a set of RVs gives the probability of every atomic event sample point P Flu DrinkBeer a 2 2 matrix of values B B F 0 005 0 02 F 0 195 0 78 P Flu P Flu DrinkBeer P Flu DrinkBeer How 0 005 0 02 0 025 …
View Full Document