Machine Learning,Decision Trees, OverfittingMachine Learning 10-701Tom M. MitchellCenter for Automated Learning and DiscoveryCarnegie Mellon UniversitySeptember 13, 2005Recommended reading: Mitchell, Chapter 3Machine Learning:Study of algorithms that• improve their performance• at some task• with experienceLearning to Predict Emergency C-Sections9714 patient records, each with 215 features[Sims et al., 2000]Object DetectionExample training images for each orientation(Prof. H. Schneiderman)Text ClassificationCompany home pagevsPersonal home pagevsUniveristy home pagevs…Reading a noun (vs verb)[Rustandi et al., 2005]Growth of Machine Learning• Machine learning is preferred approach to– Speech recognition, Natural language processing– Computer vision– Medical outcomes analysis– Robot control–…• This trend is accelerating– Improved machine learning algorithms – Improved data capture, networking, faster computers– Software too complex to write by hand– New sensors / IO devices– Demand for self-customization to user, environmentDecision tree learningEach internal node: test one attribute XiEach branch from a node: selects one value for XiEach leaf node: predict Y (or P(Y|X ∈ leaf))How would you representAB ∨ CD(¬E)?node = Root[ID3, C4.5, …]EntropyEntropy H(X) of a random variable XH(X) is the expected number of bits needed to encode a randomly drawn value of X (under most efficient code) Why? Information theory:• Most efficient code assigns -log2P(X=i) bits to encode the message X=i• So, expected number of bits is:Sample EntropyAssume X values known, labels Y encodedWhat you should know:• Well posed function approximation problems:– Instance space, X– Sample of labeled training data, D = { <xi, yi>}– Hypothesis space, H = { f: XÆY }• Learning is a search/optimization problem over H– Various objective functions– Today: minimize training error (0-1 loss) • Decision tree learning– Greedy top-down learning of decision trees (ID3, C4.5, ...)– Overfitting and tree/rule post-pruning–
View Full Document