DOC PREVIEW
CMU CS 10601 - Machine Learning

This preview shows page 1-2-3-4-5-6-44-45-46-47-48-49-50-90-91-92-93-94-95 out of 95 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 15, 2011 Today: • The Big Picture • Overfitting • Review: probability Readings: Decision trees, overfiting • Mitchell, Chapter 3 Probability review • Bishop Ch. 1 thru 1.2.3 • Bishop, Ch. 2 thru 2.2 • Andrew Moore’s online tutorialFunction Approximation: Problem Setting: • Set of possible instances X • Unknown target function f : XY!• Set of function hypotheses H={ h | h : XY }!Input: • Training examples {<x(i),y(i)>} of unknown target function f!Output: • Hypothesis h ∈ H that best approximates target function f!Function Approximation: Decision Tree Learning Problem Setting: • Set of possible instances X – each instance x in X is a feature vector x = < x1, x2 … xn> • Unknown target function f : XY!– Y is discrete valued • Set of function hypotheses H={ h | h : XY }!– each hypothesis h is a decision tree Input: • Training examples {<x(i),y(i)>} of unknown target function f!Output: • Hypothesis h ∈ H that best approximates target function f!node = Root [ID3, C4.5, Quinlan]Information Gain (also called mutual information) between input attribute A and target variable Y Information Gain is the expected reduction in entropy of target variable Y for data sample S, due to sorting on variable AFunction approximation as Search for the best hypothesis • ID3 performs heuristic search through space of decision treesFunction Approximation: The Big PictureWhich Tree Should We Output? • ID3 performs heuristic search through space of decision trees • It stops at smallest acceptable tree. Why? Occam’s razor: prefer the simplest hypothesis that fits the dataWhy Prefer Short Hypotheses? (Occam’s Razor) Arguments in favor: Arguments opposed:Why Prefer Short Hypotheses? (Occam’s Razor) Argument in favor: • Fewer short hypotheses than long ones  a short hypothesis that fits the data is less likely to be a statistical coincidence Argument opposed: • Also fewer hypotheses containing a prime number of nodes and attributes beginning with “Z” • What’s so special about “short” hypotheses, instead of “prime number of nodes”?Overfitting Consider a hypothesis h and its • Error rate over training data: • True error rate over all data:Overfitting Consider a hypothesis h and its • Error rate over training data: • True error rate over all data: We say h overfits the training data if Amount of overfitting =Split data into training and validation set!Create tree that classifies training set correctly!What you should know: • Well posed function approximation problems: – Instance space, X – Sample of labeled training data { <x(i), y(i)>} – Hypothesis space, H = { f: XY } • Learning is a search/optimization problem over H – Various objective functions • minimize training error (0-1 loss) • among hypotheses that minimize training error, select smallest (?) – But inductive learning without some bias is futile ! • Decision tree learning – Greedy top-down learning of decision trees (ID3, C4.5, ...) – Overfitting and tree post-pruning – Extensions…Extra slides extensions to decision tree learningQuestions to think about (1) • ID3 and C4.5 are heuristic algorithms that search through the space of decision trees. Why not just do an exhaustive search?Questions to think about (2) • Consider target function f: <x1,x2>  y, where x1 and x2 are real-valued, y is boolean. What is the set of decision surfaces describable with decision trees that use each attribute at most once?Questions to think about (3) • Why use Information Gain to select attributes in decision trees? What other criteria seem reasonable, and what are the tradeoffs in making this choice?Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 15, 2011 Today: • Review: probability Readings: Probability review • Bishop Ch. 1 thru 1.2.3 • Bishop, Ch. 2 thru 2.2 • Andrew Moore’s online tutorial many of these slides are derived from William Cohen, Andrew Moore, Aarti Singh, Eric Xing. Thanks!Probability Overview • Events – discrete random variables, continuous random variables, compound events • Axioms of probability – What defines a reasonable theory of uncertainty • Independent events • Conditional probabilities • Bayes rule and beliefs • Joint probability distribution • Expectations • Independence, Conditional independenceRandom Variables • Informally, A is a random variable if – A denotes something about which we are uncertain – perhaps the outcome of a randomized experiment • Examples A = True if a randomly drawn person from our class is female A = The hometown of a randomly drawn person from our class A = True if two randomly drawn persons from our class have same birthday • Define P(A) as “the fraction of possible worlds in which A is true” or “the fraction of times A holds, in repeated runs of the random experiment” – the set of possible worlds is called the sample space, S – A random variable A is a function defined over S A: S  {0,1}A little formalism More formally, we have • a sample space S (e.g., set of students in our class) – aka the set of possible worlds • a random variable is a function defined over the sample space – Gender: S  { m, f } – Height: S  Reals • an event is a subset of S – e.g., the subset of S for which Gender=f – e.g., the subset of S for which (Gender=m) AND (eyeColor=blue) • we’re often interested in probabilities of specific events • and of specific events conditioned on other specific eventsVisualizing A Sample space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish ovalThe Axioms of Probability • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) [di Finetti 1931]: when gambling based on “uncertainty formalism A” you can be exploited by an opponent iff your uncertainty formalism A violates these axiomsInterpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 •


View Full Document

CMU CS 10601 - Machine Learning

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?