This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC828G Principles of Data Mining Lecture #7• Announcements:– Eiman’s office hours changed to T 12:30-1:30PM, Th 9:30-10:30AM– Matlab review today 2PM, AVW 3228 • Today’s Reading:–HMS, chapter 4• Today’s Lecture:– Student presentations– Parameter Estimation• Upcoming Due Dates:– H1 due 2/21Recall• Independence: P(X,Y) = P(X)P(Y)• Conditional Independence: P(X,Y|Z) = P(X|Z)P(Y|Z)•Example 1: I(X,Y|∅) and not I(X,Y|Z)• Example 2: I(X,Y|Z) and not I(X,Y|∅) • conclusion: independence does not imply conditional independence or vice-versa!Importance of Independence• Independence/Conditional Independence are important properties to identify in a distribution• They provide one criterion on which a joint distribution can be factored into a product of simpler marginal and/or conditional distributionsSequential Data• sequence of values, x1, …, xn• a common assumption made is that next value in the sequence is independent of all of the past values given the current value:• first-order Markov assumption• Allows factorization of joint into simpler products:)X|X(P)XX|X(P1jj11jj −−=K)X|X(p)x(p)x,,x(p1jjn2j1n1 −=∏=KSamples• In data mining, sometimes we work with entire population of interest• Other times we work with a sample from the population• Even if the entire data set is available, we may work with a sample:– entirely legitimate if we are interested in learning a model– may be less appropriate when we are looking for patterns of anomalous behaviorStatistical Inference• Statistical inference: inferring properties of an unknown distribution from data generated by that distribution. • most common: approximating the unknown distribution by choosing a distribution from a restricted family of distributions.• Estimate parameters of the model from data• (if we had the entire population, we would calculateparameters)ModelDataProbabilityStatistical Inference2Statistical Inference, cont.• assumes that the sample has been drawn from the population at random• The model specifies the distribution for the population; the probability that a particular value for the variable will appear in the sample• If we have a model M for the data, we can compute the probability that a random sampling process will lead to the dataD = {x(1),…,x(x)}:• if we assume the probability of each data point is independent, or ‘drawn at random’ then)M|D(p)M,|)i(x(p)M,|D(pn1iθ=θ∏=Statistical Inference, cont.• based on this value, we can decide how realistic the assumed model is.• if the assumed model is unlikely to have generated the data, we might reject the model; this is the principle behind hypothesis testing• we can also estimate population values for the parameters…)M,|)i(x(p)M,|D(pn1iθ=θ∏=Desirable properties of estimators• Let be an estimate for a population parameter • The value we compute for depends on the data• For different samples, we will have different estimates• in other words, is a random variable• it will have a distribution, mean, E( ) and Var( )θˆθθˆθˆθˆθˆθ−θ=θ )ˆ(E)ˆ(Bias•The bias of an estimator is defined:θ=θ)ˆ(E• An estimator is unbaised if • i.e., no systematic departure from the true parameter on averageDesirable estimator properties, cont.• The variance of an estimator is another measure of quality.• The variance of an estimator is:• Measures how sensitive the estimator is to individual data sets• Choose between estimators that have the same bias by choosing one with minimum variance• unbiased estimators with minimum variance are called best unbiased estimators[]2))ˆ(Eˆ(E)ˆ(Var θ−θ=θExample• Suppose we ignore data D and simply say that is 1.0•is 0• however, in most cases this estimator will have a bias that will be nonzero and largeθˆ)ˆ(Var θBias-Variance decomposition of MSE• The mean squared error (MSE) of is θˆ[][][]()[][][]2222)ˆEˆ(E)ˆE(ˆEˆEˆE)ˆ(Eθ−θ+θ−θ=θ−θ+θ−θ=θ−θBias2Variance• MSE is useful criteria, since it measures systematic biasand random variance between estimate and true valueBIAS VARIANCE TRADEOFF• Unfortunately, bias and variance often work in differentdirections; reducing an estimator’s bias tends to increasethe variance and vice-versa.3More desirable properties• Let be a sequence of estimators based on increasing sample sizes n1, n2, …, nm• The sequence is consistent ifm21nnnˆ,,ˆ,ˆθθθ K0)ˆ(prlimnn=ε>θ−θ∞→Parameter estimation• Maximum Likelihood Estimation•Bayesian EstimationLikelihood Function• Let D = {x(1),…,x(n)}• independently sampled, from the same distribution p(x|θ)‘independent and identically distributed’, iid•The likelihood function, L(θ| x(1),…,x(n)) captures the probability of the data as a function of θ)|)i(x(p)|)n(x),...,1(x(p))n(x),...,1(x|(L)D,(Ln1iθ=θ=θ=θ∏=Maximum Likelihood Estimation (MLE)• Most widely used method of parameter estimation• The likelihood function, L(θ| x(1),…,x(n))• choose value that maximizes the likelihood function)|)i(x(p)|)n(x),...,1(x(p))n(x),...,1(x|(L)D,(Ln1iθ=θ=θ=θ∏=MLEθ)Example• Database of customer purchases, want to estimate probability that a randomly chosen customer buys milk• Suppose we have random sample 1000 customers that either do or do not buy milk, D ={x(1), …,x(n)}• Assume simple Binomial model where θ is the probability that milk is purchasedr1000r)1(x110001i)1(x)1()1())1000(x,),1(x|(L−−=θ−θ=θ−θ=θ∏K• Take logs:)1log()r1000(logr)(l θ−−+θ=θ• Differentiate and set to zero:01r1000r=θ−−−θ1000rMLE=θ)Binomial Likelihood Function4Next Time•Reading:– HMS, chapter 4 cont.•Topic:•Due:– homework #1References• Principles of Data Mining, Hand, Mannila, Smyth. MIT Press, 2001.• http://www.cc.gatech.edu/classes/cs6751_97_winter/Topics/stat-meas/probHist.html•


View Full Document

UMD CMSC 828G - Principles of Data Mining

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Principles of Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?