DOC PREVIEW
UIUC STAT 420 - Chp0

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Some Basic Results in Probability & Statistics• Linear Algebra• Probability• Random Variables• Common Statistical Distributions• Statistical Estimation• Statistical Inference about Normal Disbributions2Linear Algebra• Summation and Product Operatorsni=1xi= x1+ x2+ ···+ xn;n i=1Yi= Y1· Y2···Ynni=1pj=1xij=ni=1{xi1+···xip} = x11+···x1p+···+xn1+···xnp• Matrix: a rectangular display and organization of data. Youcan treat matrix as data with two subscripts, e.g. xij, the firstsubscript is row index and the second is the column index. Wenote the matrix as Xn×p=(xij), and call it a n by p matrix.3Matrix Operations• Transpose: reverse the row and column index. So t(X)ij= xji.• Summation: element-wise summation• Product: for Xn×p=(xij); Bp×m=(βjk), their product Y =XB =(yik)isanbymmatrixwithyik=pj=1xijβjk.• Identity matrix I:square(n = p), diagonal equal to 1 and 0elsewhere.• Inverse: the product of a matrix X and its inverse X−1is identitymatrix.• Trace: for square matrix Xn×n, tr(X)=ni=1xii.4Some Notes about Matrix• When doing matrix product XB,alwaysmakesurethenumberof columns of X and rows of B are equal.• Matrix product has orders, XB and BX are different. For in-verse matrix we have XX−1= X−1X = I.Soonlysquarematrix has inverse.• Only square matrix has trace, and tr(XB)=tr(BX).• If X−1= t(X), we call X an orthogonal matrix.5Probability• Sample space, events (sets) A,B• Basic rulesPr(Ω) = 1; Pr(Φ) = 0Pr(AB)=Pr(A)+Pr(B) −Pr(AB)Pr(AB)=Pr(A)Pr(B|A)=Pr(B)Pr(A|B)• Complementary events: Pr(¯A)=1− Pr(A)6Random Variables• A mapping (function) Y from sample space to R1.Forcontinu-ous random variables, the distribution and density functions aredefined as F (y)=Pr(Y ≤ y); f(y)=lim→0{F (y+ )−F (y)}/ .• Joint, Marginal, and Conditional Probability DistributionsPr(yi)=jPr(yi,zj); Pr(yi|zj)=Pr(yi,zj)/ Pr(zj)• Expectation: E(Y )=iyiPr(yi)=yf(y)dy• Variance: Var(Y )=E[Y −E(Y )]2= E(Y2) − E(Y )27Random Variables: Contd.• Covariance: Cov(Y,Z)=E[Y − E(Y )][Z − E(Z)] = E(YZ) −E(Y )E(Z)• Correlation: ρ(Y,Z)=Co v(Y,Z)√Var(Y )Var(Z)• Independent Random VariablesY and Z are independent ⇔ Pr(yi,zj)=Pr(yi)Pr(zj)⇒ Cov(Y,Z)=0• Central Limit Theorem: If Y1, ··· ,Ynare iid (independent andidentically distributed) random variables with mean μ and vari-ance σ2, then the sample mean¯Y =ni=1Yi/n is approximatelyN(μ, σ2/n) when the sample size n is reasonably large.8Common Statistical Distribution• Normal Distribution N(μ, σ2): density1√2πσ2exp{−(y−μ)22σ2},whereμ and σ2are the mean and variance for Y .WehaveE(Y )=μ,E(Y − μ)2= σ2, E(Y − μ)4=3σ4. More generallyE(Y − μ)2k−1=0; E(Y − μ)2k= σ2k(2k − 1)!!where (2k − 1)!! = (2k − 1) × (2k − 3) ×···×3 × 1.• Linear functions of normal random variables are still normal.(Y −μ)/σ is standard normal with mean 0 and variance 1. φ(·)and Φ(·) are commonly used to code the standard normal densityand distribution functions.9Common Statistical Distribution: Contd.• χ2Random Variable: χ2(n)=ni=1z2i,whereziare iid stan-dard normal random variables and n is called the degree of free-dom. We haveE(χ2(n)) = n;Var(χ2(n)) = 2n• t Random Variable: t(n)=z/χ2(n)/n,wherez is standardnormal and independent of χ2(n).• F Random Variable: F(n, m)=χ2(n)/nχ2(m)/m,whereχ2(n)andχ2(m) are two independent chi2random variables.10Common Distribution Densities−3 −2 −1 0 1 2 30.0 0.1 0.2 0.3 0.4yStandard Normal Density0 5 10 150.00 0.05 0.10 0.15yχ2(4) Density−4 −2 0 2 40.0 0.1 0.2 0.3yt(4) Density0 5 10 150.0 0.1 0.2 0.3 0.4 0.5 0.6yF(10,4) Density11Statistical Estimations• Estimator Properties: an estimatorˆθ is a function of the sam-ple observations (y1, ··· ,yn), which estimates some parameter θassociated with the distribution of Y .• Estimation Technique:– Maximum Likelihood Estimation– Least Squares Estimation– A lot of others ......12Estimator Properties• Unbiasedness: E(ˆθ)=θ• Consistency: limn→∞Pr(|ˆθ − θ|≥ )=0;∀ >0• Sufficiency: Pr(y1, ··· ,yn|ˆθ) doesn’t depend on θ• Minimum variance estimator : Var(ˆθ) ≤ Var(˜θ); ∀˜θ13Maximum Likelihood Estimators (MLE)Maximum Likelihood is a general method of finding estimators. Sup-pose (y1, ··· ,yn)aren iid samples from distribution f(y; θ)withparameter θ. The “probability of observing these samples” isL(θ)=n i=1f(yi; θ);which is called the likelihood function. Maximize L(θ) with respectto θ yields the MLEˆθ =argmaxθL(θ).Under very general conditions, MLE’s are consistent and sufficient.14MLE for Normal DistributionsSuppose (y1, ··· ,yn) are iid samples from normal distribution N(μ, σ2).What’s the MLE for parameters μ and σ2?L(μ, σ2)=n i=11√2πσ2exp−(yi− μ)22σ2Maximize L(μ, σ2) is equivalent to maximize log(L(μ, σ2)), the “LogLikelihood”, and we can easily get the following MLE:ˆμ =ni=1yin;ˆσ2=ni=1(yi− ¯y)2n15Least Squares Estimators (LS)LS is another general method of finding estimators. The sampleobservations are assumed to be of the form yi= fi(θ)+ i; i =1, ··· ,n,wherefi(θ) is a known function of the parameter θ andthe iare random variables, usually assumed to have expectationE( i)=0. LS estimators are obtained by minimizing the sum ofsquaresQ =ni=1(yi− fi(θ))2Here L2distance is used; more generally Lqdistance can be consid-ered.16Hypothesis TestingHypothesis testing is concerned with the state of population, whichis usually characterized by some parameters, e.g. we’re interested intesting the mean and variance of a normal distribution. There areseveral components• Null hypothesis H0: the postulated “default” state (value)• Alternative hypothesis Ha: “abnormal” state• Test statistics: the empirical information from observed data(usually some functions of data)• Rejection rules: Type-I error α = Pr(reject H0|H0true) andType-II error 1 − β = Pr(don’t reject H0|H0false)17P-valueP-value for a hypothesis test is defined as the probability thatthe sample outcome is more extreme than the observed onewhen H0is true.Large P-values support H0while small P-values support Ha.Atestcan be carried out by comparing the P-value with the specified type-Ierror α.IfP-value<α,thenH0is rejected.Note that the calculation of P-value depends on the rejection


View Full Document

UIUC STAT 420 - Chp0

Documents in this Course
Load more
Download Chp0
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chp0 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chp0 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?