DOC PREVIEW
MSU CSE 802 - Decision Trees

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Trees(Sections 8.1-8.4)• Non-metric methods• CART (Classification & regression Trees)• Number of splits• Query selection & node impurity• Multiway splits• When to stop splitting?• Pruning • Assignment of leaf node labels • Feature choice• Multivariate decision trees• Missing attributesData Type and Scale• Data type refers to the degree of quantization in the data– binary feature: have exactly two values (Yes-No response)– discrete feature: finite, usually small, number of possible values (gray values in a digital image)– continuous feature: any real value in a fixed range of values• Data scale indicates the relative significance of numbers– qualitative scales• Nominal (categorical): not really a scale since numerical values are simply used as names; e.g., (yes, no) response can be coded as (0,1) or (1,0) or (50,100); numerical values are meaningless in a quantitative sense• Ordinal: numbers have meaning only in relation to one another (e.g., one value is larger than the other); e.g., scales (1, 2, 3), (10, 20, 30), and (1, 20, 300) are all equivalent from an ordinal viewpoint (military rank)– quantitative scales• Interval scale: separation between numbers has meaning; a unit of measurement exists, and the interpretation of the numbers depends on this unit (Fahrenheit scale for temperature. Equal differences on this scale represent equal differences in temperature, but a temperature of 30 degrees is not twice as warm as one of 15 degrees.• Ratio scale: numbers have an absolute meaning; an absolute zero exists along with a unit of measurement, so the ratio between two numbers has meaning (height)Decision Trees• Also known as – Hierarchical classifiers– Tree classifiers– Multistage classification– Divide & conquer strategy• A single-stage classifier assigns a test pattern Xto one of C classes in a single step: compute the posteriori probability foreach class & choose the class with the max posteriori• Limitations of single-stage classifier – A common set of features is used for distinguishing all the classes; for large no. of classes, this common feature set may not be thebest for specific pairs of classes– Requires a large no. of features when the no. of classes is large– For each test pattern, Cposteriori probs. need to be computed– Does not perform well when classes have multimodal distributions– Not easy to handle nominal data (discrete features without any natural notion of similarity or even ordering)Decision Trees•Most pattern recognition methods address problems where feature vectors are real valued and there exists some notion of metric•Suppose the classification involves nominal data– attributes that are discrete & without any natural notion of similarity or even ordering•Describe patterns by using a list of attributes rather than by vectors of real numbers•Describe a fruit by {color, texture, taste, size}– {red, shiny, sweet, small}•How to learn categories using such non-metric data?Decision Trees•Classify a pattern through a sequence of questions (20-question game); next question asked depends on the answer to the current question•This approach is particularly useful for non-metric data; questions can be asked in a “yes-no” or “true-false” style that do not require any notion of metric•Sequence of questions is displayed in a directed decision tree•Root node, links or branches, leaf or terminal nodes•Classification of a pattern begins at the root node until we reach the leaf node; pattern is assigned the category of the leaf node•Benefit of decision tee: –Interpretability: a tree can be expressed as a logical expression–Rapid classification: a sequence of simple queries–Higher accuracy & speed:Decision TreeSeven-class, 4-feature classification problemApple = (green AND medium) OR (red AND medium) = (Medium AND NOT yellow)How to Grow A Tree?•Given a set D of labeled training samples and a set of features•How to organize the tests into a tree? Each test or question involves a single feature or subset of features•A decision tree progressively splits the training set into smaller and smaller subsets •Pure node: all the samples at that node have the same class label; no need to further split a pure node•Recursive tree-growing process: Given data at a node, decide the node as a leaf node or find another feature to split the node•This generic procedure is called CART (Classification & Regression Trees)Classification & Regression Tree (CART)• Six types of questions– Should the attributes (answers to questions) be binary or multivalued? In other words, how many splits should be made at a node?– Which feature or feature combination should be tested at a node?– When should a node be declared a leaf node?– If the tree becomes “too large”, can it be pruned to make it smaller and simpler?– If a leaf node is impure, how should category label be assigned to it?– How should missing data be handled?Number of SplitsBinary tree: every decision can be represented using just binary outcome; tree of Fig 8.1 can be equivalently written asQuery Selection & Node Impurity•Which attribute test or query should be performed at each node?• Fundamental principle of simplicity: obtain simple, compact trees with few nodes• Seek a query T at node N to make the descendent nodes as pure as possible• Query of the form xi<= xisleads to hyperplanar decision boundaries that are perpendicular to the coordinate axes (monothetic tree; one feature/node))Query Selection and Node Impurity• P(ωj): fraction of patterns at node N in category ωj• Node impurity is 0 when all patterns at the node are of the same category• Impurity becomes maximum when all the classes at node N are equally likely• Entropy impurity• Gini impurity• Expected error rate at node N if the category label is selected randomly from the class distribution present at N21( ) ( ) ( ) 1 ( ) (3)2i j ji j ji N P P Pω ω ω≠ = = −  ∑ ∑•Misclassification impurity•Minimum probability that a training pattern will be misclassified at NQuery Selection and Node Impurity• Given a partial tree down to node N, what value s to choose for property test T?• Choose the query at node N to decrease the impurity as much as possible• Drop in impurity is defined asBest query value s is the choice for test T that maximizes the drop in impurityOptimization in Eq.


View Full Document

MSU CSE 802 - Decision Trees

Download Decision Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?