Math 19b: Linear Algebra with Probability Oliver Knill, Spring 2011Lecture 23: Chebychev theoremIn this lecture we look at more probability distributions and prove the fantastically useful Cheby-chev’s theorem.Remember that a continuous probability density is a nonnegative function fsuch t hatRRf(x) dx = 1. A random variable X has this probability density ifP[X ∈ [a, b] ] =Zbaf(x) dxfor all intervals [a, b].If we know the probability density of a random variable, we can compute all t he impo r tantquantities like the expectation or the variance.If X has the probability density f, then m = E[X] =Rxf(x) dx and Var[X] =R(x − m)2f(x) dx.The distribution function of a random variable with probability density f isdefined asF (s) =Zx−∞f(x) dx = P[X ≤ s] .By definition F is a monotone function: F (b) ≥ F (a) f or b ≥ a. One abbreviates the probabilitydensity function with P DF and the distribution function with CDF which abbreviates cumulativedistribution function.1 The most important distribution on the real line is the normal distributionf(x) =1√2πσ2e−(x−m)22σ2.It has mean m and standard deviation σ. This is a probability measure because after a changeof variables y = (x − m)/(√2σ), the integralR∞−∞f(x) dx becomes1√πR∞−∞e−y2dy = 1.ab2 The most important distribution on the positive real line is the exponential distributionf(x) = λe−λx.Lets compute its mean:m =Z∞0xf(x) dx =1λ.From λR∞0x2exp(−λx) dx = 2/λ2, we get the variance2/λ2− 1/λ2= 1/λ2and the standard deviation 1/λ.ab3 The most important distribution on a finite interval [a, b] is the uniform distributionf(x) = 1[a,b]1b − a,where 1Iis the characteristic function1I(x) =(1 x ∈ I0 x /∈ I.The following theorem is very important for estimation purposes. Despite the simplicity ofits pr oof it has a lot of applications:Chebychev theorem If X is a random variable with finite varia nce, thenP[|X − E[X]| ≥ c] ≤Var [X]c2.Pro of. The r andom variable Y = X − E[X] has zero mean and the same variance. We needonly to show P[|Y | ≥ c] ≤Var[Y ]c2. Taking the expectation of the inequalityc21{|Y |≥c}≤ Y2givesc2P[|Y | ≥ c] ≤ E[Y2] = Var[Y ]finishing the proof.The theorem also gives more meaning to the notion ”Variance” as a measure for t he deviationfrom the mean. The following example is similar to the one section 11.6 of Cliff’s notes:4 A die is rolled 144 times. What is the probability to see 50 or more times the number 6shows up? Let X be the random variable which counts the number of times, the number 6appears. This random variable has a binomial distribution with p = 1/6 and n = 144. It hasthe expectation E[X] = np = 144/6 = 24 and the variance Var[X] = np(1 −p) = 20. Settingc = ( 50 −24) = 26 in Chebychev, we get P[|X −24| ≥ 26] ≤ 20/262∼ 0.0 296.... The chanceis smaller than 3 percent. The actual valueP144k=50 144k!pk(1 − p)144−k∼ 1.17 · 10−7ismuch smaller. Chebychev does not necessarily give good estimates, but it is a handy anduniveral ”rule of thumb”.Finally, lets look at a practical application of the use of the cumulative distribution function.It is the task to generate random variables with a given distribution:5 Assume we want to generate random variables X with a given distribution function F . ThenY = F (X) has the uniform distribution on [0, 1]. We can reverse this. If we want to pro-duce random variables with a distribution function F , just take a random var iable Y withuniform distribution on [0, 1] and define X = F−1(Y ). This random variable has the distri-bution function F because {X ∈ [a, b] } = {F−1(Y ) ∈ [a, b] } = {Y ∈ F ([a, b]) } = {Y ∈[F (a), F (b)]} = F (b) −F(a). We see that we need only to have a random number generatorwhich produces uniformly distributed random variables in [0, 1] to get a random numbergenerator for a given continuous distribution. A computer scientist implementing randomprocesses on the computer only needs to have access to a random number generator pro-ducing uniformly distributed random numbers. The later are provided in any programminglanguage which deserves this name.To generate random variables with cumulative distribution function F , we producerandom variables X with uniform distribution in [0, 1] and form Y = F−1(X).With computer algebra systems1) In Mathematica, you can generate random variables with a certain distribution with a commandlike in the following example:X:=Random [ NormalDistribut i o n [ 0 , 1 ] ]L is t P lo t [ Table [X, {1 0 00 0}] , PlotRange−>All ] 2) Here is how to access the probability density function (PDF)f=PDF[ Ca u chyDistributio n [ 0 , 1 ] ] ;S=P l o t [ f [ x ] , {x , −10 ,10} , PlotRange−>All , F i l l i ng −>Axis ] 3) And the cumulative probability distribution (CDF)f=CDF[ C hi Sq u ar eD is t ri bu t io n [ 1 ] ] ;S=P l o t [ f [ x ] , {x , 0 , 10 } , PlotRange−>All , F i l l i n g −>Bottom ] 200040006000800010000-4-22Random numbers-10-55100.050.100.150.200.250.30The PDF2468100.20.40.60.81.0The CDFHomework due March 30, 20111 The random variable X has a normal distribution with standard deviation 2 and mean 5.Estimate the proba bility that |X − 5 | > 3.2 Estimate the probability of the event X > 10 for a Poisson distributed random variable Xwith mean 4.3 a) Verify that φ(x) = tan(xπ) maps the interval [0, 1] onto the real line so that its inverseF (y) = arctan(y)/π is a map from R to [0, 1].b) Show that f = F′(y) =1π11+y2.c) Assume we have random numbers in [0, 1] handy and want to random variables whichhave the probability density f . How do we achieve this?d) The meanR∞−∞xf(x) dx does not exist as an indefinite integral but can be assigned thevalue 0 by taking the limitRR−Rxf(x) dx = 0 for R → ∞. Is it possible t o assign a value tothe varianceR∞−∞x2f(x) dx?.The probability distribution with density1π11 + y2which appeared in this homework problem is called the Cauchy distribution.Physicists call it t he Cauchy-Lorentz distribution.Why is the Cauchy distribution natural? As one can deduce from the homework, if you chosea random point P on the unit circle, then the slope of the line OP has a Cauchy distribution.Instead of the circle, we can take a rotationally symmetric probability distribution like theGaussian with probability measure P [A] =R RAe−x2−y2/π dxdy on the plane. Randompoints can be written as (X, Y ) where both X, Y have t he normal
View Full Document