# UT Knoxville STAT 201 - Decision Trees (22 pages)

Previewing pages 1, 2, 21, 22 of 22 page document
View Full Document

## Decision Trees

Previewing pages 1, 2, 21, 22 of actual document.

View Full Document
View Full Document

## Decision Trees

109 views

Pages:
22
School:
University of Tennessee
Course:
Stat 201 - Introduction to Statistics
##### Introduction to Statistics Documents
• 38 pages

• 34 pages

• 36 pages

• 44 pages

• 29 pages

• 37 pages

Unformatted text preview:

Decision Trees 1 A World of Data Companies have been collecting information on variables of interest for years creating huge data sets Baseball teams have data on players and prospective players Grocery stores have data on the buying habits of their consumers Colleges have data on their students If you re in business you re in the business of data 2 Why Collect Data Often times there is a question we wish to answer Will a player be successful in the majors How can we increase the average sale in our grocery store Why do students fail out of college The answers to these questions and more might be contained within data 3 What is a Decision Tree A decision tree is a graphical display of data being segmented After multiple segmentations are created the graph begins to somewhat resemble a tree Computers use complex algorithms to find the best splits in the data Understanding and interpreting these splits is the job of the statistician 4 Decision Trees Continued Decision trees can handle both categorical and quantitative data Identifier variables should not be used in decision trees They have too many levels Some categorical variables act similar to identifier variables if they have a lot of levels These should also be excluded from the decision tree 5 Sleuthing Through the Data Statistical software contains powerful tools to mine through data and find possible relationships The more data we collect the more tests we can run The more tests we run the more likely we are to find results The more results we find the more decisions we will make More decisions means a higher chance of a type I or type II error 6 Creating the Decision Tree Our y variable is the variable of interest that we wish to explain Response Variable Our x variables are the explanatory variables The decision tree allows us to enter in multiple x variables to try to explain the one y variable Variables are entered in one by one as we create splits 7 Decision Trees in JMP The partitioning tool is used to

View Full Document

Unlocking...