DOC PREVIEW
BYU BIO 465 - Essential Probability and Statistics

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Essential Probability & StatisticsPurpose of Prob. & StatisticsBasic Concepts in ProbabilityBasic Concepts of Prob. (cont.)Interpretation of Bayes’ RuleRandom VariableAn ExampleProbability DistributionsParameter EstimationMaximum Likelihood EstimatorMaximum Likelihood vs. BayesianBayesian EstimatorDirichlet Prior Smoothing (cont.)Illustration of Bayesian EstimationBasic Concepts in Information TheoryEntropyInterpretations of H(X)Cross Entropy H(p,q)Kullback-Leibler Divergence D(p||q)Cross Entropy, KL-Div, and LikelihoodMutual Information I(X;Y)What You Should KnowEssential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004ChengXiang ZhaiDepartment of Computer ScienceUniversity of Illinois, Urbana-ChampaignPurpose of Prob. & Statistics•Deductive vs. Plausible reasoning•Incomplete knowledge -> uncertainty•How do we quantify inference under uncertainty?–Probability: models of random process/experiments (how data are generated)–Statistics: draw conclusions on the whole population based on samples (inference on data)Basic Concepts in Probability •Sample space: all possible outcomes, e.g., –Tossing 2 coins, S ={HH, HT, TH, TT}•Event: ES, E happens iff outcome in E, e.g., –E={HH} (all heads) –E={HH,TT} (same face)•Probability of Event : 1P(E) 0, s.t.–P(S)=1 (outcome always in S)–P(A B)=P(A)+P(B) if (AB)=Basic Concepts of Prob. (cont.) •Conditional Probability :P(B|A)=P(AB)/P(A)–P(AB) = P(A)P(B|A) =P(B)P(B|A)–So, P(A|B)=P(B|A)P(A)/P(B)–For independent events, P(AB) = P(A)P(B), so P(A|B)=P(A)•Total probability: If A1, …, An form a partition of S, then–P(B)= P(BS)=P(BA1)+…+P(B An) –So, P(Ai|B)=P(B|Ai)P(Ai)/P(B) (Bayes’ Rule)Interpretation of Bayes’ Rule)()()|()|(EPHPHEPEHPiii=Hypothesis space: H={H1 , …, Hn} Evidence: EIf we want to pick the most likely hypothesis H*, we can drop p(E)Posterior probability of HiPrior probability of HiLikelihood of data/evidenceif Hi is true)()|()|(iiiHPHEPEHP ∝Random Variable•X: S   (“measure” of outcome)•Events can be defined according to X–E(X=a) = {si|X(si)=a}–E(Xa) = {si|X(si)  a}•So, probabilities can be defined on X–P(X=a) = P(E(X=a))–P(aX) = P(E(aX)) (f(a)=P(a>x): cumulative dist. func)•Discrete vs. continuous random variable (think of “partitioning the sample space”)An Example•Think of a DNA sequence as results of tossing a 4-face die many times independently•P(AATGC)=p(A)p(A)p(T)p(G)p(C)•A model specifies {p(A),p(C), p(G),p(T)}, e.g., all 0.25 (random model M0)•P(AATGC|M0) = 0.25*0.25*0.25*0.25*0.25•Comparing 2 models–M1: coding regions–M2: non-coding regions–Decide if AATGC is more likely a coding regionProbability Distributions•Binomial: Times of successes out of N trials•Gaussian: Sum of N independent R.V.’s•Multinomial: Getting ni occurrences of outcome i( | ) (1 )k N kNp k N p pk−⎛ ⎞= −⎜ ⎟⎝ ⎠22( )21( ) e2xf xμσπσ−−=( | ) (1 )k N kNp k N p pk−⎛ ⎞= −⎜ ⎟⎝ ⎠Parameter Estimation•General setting:–Given a (hypothesized & probabilistic) model that governs the random experiment–The model gives a probability of any data p(D|) that depends on the parameter –Now, given actual sample data X={x1,…,xn}, what can we say about the value of ?•Intuitively, take your best guess of  -- “best” means “best explaining/fitting the data”•Generally an optimization problemMaximum Likelihood EstimatorData: a sequence d with counts c(w1), …, c(wN), and length |d|Model: multinomial M with parameters {p(wi)} Likelihood: p(d|M)Maximum likelihood estimator: M=argmax M p(d|M)( ) ( )1 111'1 1'1 1| |( | ) , ( )( )... ( )( | ) log ( | ) ( ) log( | ) ( ) log ( 1)( ) ( )01, ( ) | |i iN Nc w c wi i i ii iNNi iiN Ni i ii ii iii iN Ni ii idp d M where p wc w c wl d M p d M c wl d M c wc w c wlSince c w d Soθ θ θθθ λ θλ θθ θ λθ λ= === == =⎛ ⎞= ∝ =⎜ ⎟⎝ ⎠= == + −∂= + = ⇒ =−∂= =− =−∏ ∏∑∑ ∑∑ ∑( ), ( )| |ii ic wp wdθ = =We’ll tune p(wi) to maximize l(d|M)Use Lagrange multiplier approachSet partial derivatives to zeroML estimateMaximum Likelihood vs. Bayesian•Maximum likelihood estimation–“Best” means “data likelihood reaches maximum”–Problem: small sample•Bayesian estimation–“Best” means being consistent with our “prior” knowledge and explaining data well–Problem: how to define prior?)|(maxargˆθθθXP=)()|(maxarg)|(maxargˆθθθθθθPXPXP ==Bayesian Estimator• ML estimator: M=argmax M p(d|M)• Bayesian estimator: – First consider posterior: p(M|d) =p(d|M)p(M)/p(d)– Then, consider the mean or mode of the posterior dist.• p(d|M) : Sampling distribution (of data) • P(M)=p(1 ,…, N) : our prior on the model parameters• conjugate = prior can be interpreted as “extra”/“pseudo” data• Dirichlet distribution is a conjugate prior for multinomial sampling distribution 11111)()()(),,|(−=ΠΓΓ++Γ=iiNiNNNDirαθααααααθLLK“extra”/“pseudo” counts e.g., i= p(wi|REF)Dirichlet Prior Smoothing (cont.)))( , ,)( | () | (11 NNwcwcDirdp ααθθ ++= KPosterior distribution of parameters:}{)E(then ),|(~ If :Property∑=iiDirααθαθθThe predictive distribution is the same as the mean:μμααθαθθθ++=++==∑∫=|d|)|()w(|d|)w()|()|p(w)ˆ|p(wi1iiiiREFwpccdDiriNiiBayesian estimate (|d| ?)Illustration of Bayesian EstimationPrior: p()Likelihood: p(D|)D=(c1,…,cN)Posterior: p(|D) p(D|)p(): prior mode ml: ML estimate: posterior modeBasic Concepts in Information Theory•Entropy: Measuring uncertainty of a random variable•Kullback-Leibler divergence: comparing two distributions•Mutual Information: measuring the correlation of two random variablesEntropy 2( ) ( ) ( )log ( )0log 0 0, log logxH X H p p x p x all possible valuesDefine∈Ω= = − Ω == =∑Entropy H(X) measures the average uncertainty of random variable X1 ( ) 0.5( ) 0 1 ( ) 0.80 ( ) 1fair coin p HH X between and biased coin p Hcompletely biased p H=⎧⎪= =⎨⎪=⎩Example:Properties: H(X)>=0; Min=0; Max=log M; M is the total number of valuesInterpretations of H(X)•Measures the “amount of information” in X–Think of each value of X as a “message”–Think of X as a random experiment (20 questions)•Minimum average number of bits to compress values of X–The


View Full Document

BYU BIO 465 - Essential Probability and Statistics

Documents in this Course
summary

summary

13 pages

Cancer

Cancer

8 pages

Ch1

Ch1

5 pages

GNUMap

GNUMap

20 pages

cancer

cancer

8 pages

SNPs

SNPs

22 pages

Load more
Download Essential Probability and Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Essential Probability and Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Essential Probability and Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?