Random variables 9.07 2/19/2004A few notes on the homework • If you work together, tell us who you’re workingwith. – You should still be generating your own homework solutions. Don’t just copy from your partner. We want to see your own words. • Turn in your MATLAB code (this helps us give you partial credit) • Label your graphs – xlabel(‘text’) – ylabel(‘text’) – title(‘text’)More homework notes • Population vs. sample – The population to which the researcher wants to generalize can be considerably more broad than might be implied by the narrow sample. • High school students who take the SAT • High school students • Anyone who wants to succeed • AnyoneMore homework notes • MATLAB: – If nothing else, if you can’t figure out something in MATLAB, find/email a TA, or track down one of the zillions of fine web tutorials. – Some specifics…MATLAB • Hint: MATLAB works best if you can think of your problem as an operation on a matrix. Do this instead of “for” loops, when possible. – E.G. coinflip example w/o for loops x = rand(5,10000); coinflip = x>0.5; numheads = sum(coinflip); % num H in 5 flipsMATLAB • randn(N) -> NxN matrix! • randn(1,N) -> 1xN matrix • sum(x) vs. sum(x,2) • hist(data, 1:10) vs. hist(data, 10) • plot(hist(data)) vs. [n,x]=hist(data); plot(x,n)A few more comments • Expected value can tell you whether or not you want to play game even once. – It tells you if the “game” is in your favor. • In our example of testing positive for a disease, P(D) is the prior probability that you have the disease. What was the probability of you having the disease before you got tested? If you are from a risky population, P(D) may be higher than 0.001. Before you took the test you had a higher probability of having the disease, so after you test positive, your probability of having the disease, P(D|+) will be higher than 1/20.Random Variables • Variables that take numerical values associated with events in an experiment – Either discrete or continuous • Integral (not sum) in equations below for continuous r.v. – Mean, µ, of a random variable is the sum of eachpossible value multiplied by its probability: µ = ∑xiP(xi) ≡ E(x) • Note relation to “expected value” from last time. – Variance is the average of squared deviations multiplied by the probability of each value – σ2= ∑(xi-µ)2P(xi) ≡ E((x-µ)2)We’ve already talked about a few special cases • Normal r.v.’s (with normal distributions) • Uniform r.v.’s (with distributions like this:) p x •Etc.Random variables • Can be made out of functions of other random variables. • X r.v., Y r.v. -> Z=X+Y r.v. Z=sqrt(X)+5Y + 2 r.v.Linear combinations of random variables • We talked about this in lecture 2. Here’s a review,with new E() notation. •Assume: – E(x) = µ –E(x-µ)2 = E(x2-2µx+µ2) = σ2 • E(x+5) = E(x) + E(5) = E(x) + 5 = µ + 5 = µ’ • E((x+5-µ’)2) = E(x2+2(5-µ’)x + (5-µ’)2) = E(x2-2µx+µ2) = σ2= (σ’)2 Adding a constant to x adds that constant to µ, butleaves σ unchanged.Linear combinations of random variables • E(2x) = 2E(x) = 2µ = µ’ • E((2x-µ’)2) = E(4x2 –8xµ + 4µ2) = 4σ2= (σ’)2 σ’ = 2σ Scaling x by a constant scales both µ and σ by that constant. But…Multiplying by a negative constant • E(-2x) = 2E(x) = -2µ = µ’ • E((-2x-µ’)2) = E(4x2 +2(2x)(-2µ) + (-2µ)2) = E(4x2 –8xµ + 4µ2) = 4σ2= (σ’)2 σ’ = 2σ Scaling by a negative number multiples the mean by that number, but multiplies the standard deviation by –(the number). (Standard deviation is always positive.)What happens to z-scores when you apply a transformation? • Changes in scale or shift do not change “standard units,” i.e. z-scores. – When you transform to z-scores, you’re already subtracting off any mean, and dividing by any standard deviation. If you change the mean or standard deviation, by a shift or scaling, the new mean (std. dev.) just gets subtracted (divided out).Special case: Normal random variables • Can use z-tables to figure out the area under part of a normal curve.An example of using the table What % • P(-0.75<z<0.75) = here and here 0.5467 • P(z<-0.75 or -.75 0.75 z>0.75) = 1-0.5467 z Height Area ≈ 0.45 … … … 0.70 31.23 51.61• That’s our answer. 0.75 30.11 54.67 0.80 28.97 57.63 … … …Another way to use the z-tables • Mean SAT score = 500, std. deviation = 100 • Assuming that the distribution of scores is normal, what is the score such that 95% of the scores are below that value? 5%95% z = ?Using z-tables to find the 95 percentile point 5%5% 90% • From the tables: z Height Area 1.65 10.23 90.11 • z=1.65 -> x=? Mean=500, s.d.=100 • 1.65 = (x-500)/100; x = 165+500 = 665Normal distributions • A lot of data is normally distributed because of the central limit theorem from last time. – Data that are influenced by (i.e. the “sum” of) many small and unrelated random effects tend to be approximately normally distributed. – E.G. weight (I’m making up these numbers) • Overall average = 120 lbs for adult women • Women add about 1 lb/year after age 29 • Illness subtracts an average of 5 lbs • Genetics can make you heavier or thinner • A given “sample” of weight is influenced by being an adult woman, age, health, genetics, …Non-normal distributions • For data that is approximately normally distributed, we can use the normal approximation to get useful information about percent of area under some fraction of the distribution. • For non-normal data, what do we do?Non-normal distributions • E.G. income distributions tend to be very skewed • Can use percentiles, much like in the last z- table example (except without the tables) – What’s the 10th percentile point? The 25th percentile point?Percentiles & interquartile range • Divide data into 4 groups, see how far about the extreme groups are. Median = 50th percentile median=Q1 median=Q3 = 25th percentile = 75th percentile • Q3-Q1 = IQR = 75th percentile – 25th percentileWhat do you do for other percentiles? • Median = point such that 50% of the data lies below that point • Similarly, 10th percentile = point such that 10% of the data lies below that point.What do you do for other percentiles? • If you have a theory for the
View Full Document