This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC828G Principles of Data Mining Lecture #5• Announcements:– Eiman’s office hours changed to T 12:30-1:30, Th 9:30-10:30– Book errata: if you find any mistakes, send email to [email protected]• Today’s Reading:– HMS, chapter 4• Today’s Lecture:– Dealing with Uncertainty– Probability Review– Parameter Estimation• Upcoming Due Dates:– P1 writeup due 2/13, in class presentation 2/14– H1 due 2/21Uncertainty• ubiquitousMethods for Dealing with Uncertainty•fuzzy logic• possibility theory•rough sets• certainty factors• nonmonotonic logic• PROBABILITYA Brief History of Probability• 1654, Blaise Pascal and Pierre de Fermat develop first mathematical theory of probability to analysis a game of dice.• 1812, Pierre de Laplace applied probabilistic ideas to many scientific and practical problems and introduced a host of new ideas and mathematical techniques in his book,Théorie Analytique des Probabilités. until this point, probability theory was solely concerned with developing a mathematical analysis of games of chance. • 1933, Andrei Kolmogorov states axioms of modern probability theory.Probability • Probability Theory– interpretation of statements• Probability Calculus– manipulation of mathematical representationProbability Theory• different perspectives on meaning of probability• many debates and controversies• two important different views: Frequentist vs. BayesianFrequentist View• probability is an objective concept• long-term relative frequency of an event occurring under many repeated "similar" trials • e.g., number of times we observe heads in repeated coin tosses• dominant perspective throughout most of the last centuryBayesian Viewpoint• probability is "degree-of-belief", or "degree-of-uncertainty". • To the Bayesian, probability lies subjectively in the mind, and can--with validity--be different for people with different information• e.g., the probability that Bush will be reelected in 2004.• In contrast, to the frequentist, probability lies objectively in the external world.• The Bayesian viewpoint has been gaining popularity in the past decade, largely due to the increase computational power that makes many of the calculations that were previously intractable, feasible.Bayesian Statistics• Central tenet: explicit characterization of all forms of uncertainty in the problem:– uncertainty about any parameters– uncertainty about model– uncertainty about forecast• Subjective probability – internal state of individual• Fortunately, if we adopt tenets of rational behavior, the set of axioms underlying subjective probability is the same as for frequentist viewpoint.• Thus, while the interpretations may differ, the calculus is the same.Things that Bayesians can do that Frequentists can’t• Calculate the probability of something that occurred in the pastbut whose outcome is not known. To the frequentist, this is meaningless. e.g., What is the probability that fossil A predates fossil B? • Calculate the probability of an event resulting from a trial that can occur once only. Frequentists cannot. • Computing the probability an HIV test giving a false negative when selecting someone from the general population. The question is Bayesian, because we are estimating the probability of someone already selected (past event) being HIV-positive. From a frequentist point of view, the person either is or is not HIV-positive (their status would be an unknown parameter), but we could not make probability statements about it. • Here, we are beginning with a degree-of-belief probability statement about the person's HIV status in the absence of a blood test, and then updating our probability estimate by conditioning on the data--the result of the blood test.Bayesian Inference• Begin with "prior probability" estimates, based on whatever expert information is available, then you update those estimates by conditioning on observed data.Inference Differences • The fundamental difference between Bayesian inference andfrequentist inference, is how they define probability. The appropriate interpretations of their inferential results are different as well.Frequentists, for example, may say, "this interval is the result of a procedure that had a 95% chance of creating an interval that would contain the mean". Bayesians may say "this interval has a 95% chance of containing the mean". • Regarding tests of significance, frequentists examine the probability of data (statistics) given models (hypothesis). Bayesians examine the probability of different models given data. • The red flag that pops up in people's minds when they first learn about Bayesian methods is due to the word "subjective". "Statistical analysis should be objective!" they cry. And many (but not all) Bayesians agree: statistical analysis should be as objective as possible. But there is no such thing as total objectivity, and subjectivity creeps into frequentistmethods as well.• the nature of Bayesian inference: "Extraordinary claims require extraordinary evidence." --Random Variables• Intuition: When the outcomes of an event that produces random results are numerical, the numbers obtained are called random variables. The sample space for the event is just a list containing all possible values of the random variable. This canbe defined more precisely in terms of measure theory, however it will not be necessary for our data mining purposes.• Random variable: The outcome of a random phenomenon.• Discrete random variable X has a finite number of possible values, x ∈ {x1, …, xm}.•The probability distribution of X is written P(X=x).• P(X) = {p(x1), …, p(xm)} is the probability mass function,p(x) denotes the probability that X takes on the particular value x.•The cumulative density P(x) is the probability that X will take on a value less than x (if values are ordered):∑<=xxii)x(p)x(PRandom Variables, cont.•A continuous random variable takes all values in an interval of numbers. • The behavior of a continuous random variable X is described by a continuous curve f(x), or p(x), the probability density function• If we want to make probability statements about X, we will have to consider the probability that X falls in an interval (a, b), that is:∫=≤≤badx)x(f)bxa(PExpected Value• If X is discrete random variable with probability mass function p(x), • If X is continuous random variable with probability density


View Full Document

UMD CMSC 828G - Principles of Data Mining

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Principles of Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?