DOC PREVIEW
UT Knoxville STAT 201 - Decision Trees

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Trees 1 A World of Data Companies have been collecting information on variables of interest for years creating huge data sets Baseball teams have data on players and prospective players Grocery stores have data on the buying habits of their consumers Colleges have data on their students If you re in business you re in the business of data 2 Why Collect Data Often times there is a question we wish to answer Will a player be successful in the majors How can we increase the average sale in our grocery store Why do students fail out of college The answers to these questions and more might be contained within data 3 What is a Decision Tree A decision tree is a graphical display of data being segmented After multiple segmentations are created the graph begins to somewhat resemble a tree Computers use complex algorithms to find the best splits in the data Understanding and interpreting these splits is the job of the statistician 4 Decision Trees Continued Decision trees can handle both categorical and quantitative data Identifier variables should not be used in decision trees They have too many levels Some categorical variables act similar to identifier variables if they have a lot of levels These should also be excluded from the decision tree 5 Sleuthing Through the Data Statistical software contains powerful tools to mine through data and find possible relationships The more data we collect the more tests we can run The more tests we run the more likely we are to find results The more results we find the more decisions we will make More decisions means a higher chance of a type I or type II error 6 Creating the Decision Tree Our y variable is the variable of interest that we wish to explain Response Variable Our x variables are the explanatory variables The decision tree allows us to enter in multiple x variables to try to explain the one y variable Variables are entered in one by one as we create splits 7 Decision Trees in JMP The partitioning tool is used to make decision trees in JMP This can be found under Analyze Modeling Partition 8 Decision Trees in JMP Next we need to add in our variables The picture below shows a basic decision tree trying to describe gender 9 Decision Trees in JMP The decision tree creates partitions in the data to divide it on the most significant explanatory variable The data shows there are 730 people in our sample and slightly over half are female 10 The First Split The first split is always on the most significant explanatory variable It will often explain the greatest percent of variation in our y variable Height greater than or equal to 70 inches 6 foot 2 inches explains 43 7 of the variation in gender 11 The End of the Tree There are two possible ways a decision tree ends After so many splits we run out of the required data needed to create a split We decide to stop creating splits As mentioned before the more splits we create the less likely they are to be significant explanatory variables The picture to the right shows the 15th and 18th split trying to predict gender Do you think views on tailgating and Gangnum Style are truly significant 12 Additional Splits Each additional split usually explains less and less variation in our y variable Two reasons account for this There is less total variation to explain in y The variables are usually less significant Look at the R2 value on the right as we create more splits 13 Checking the R2 We have to make our own judgment when to end When the increases in R2 are tiny we need to stop making splits 14 Interpreting the Tree The interpretation is very similar to our regression interpretation R2 of the variation in y is explained by the variables in the tree 2 R describes variation explained in y by the xs y is the response variable xs are the explanatory variables This interpretation can get very long if we have a lot of variables in the model 15 Interpreting the Splits Under the red arrow click leaf report The report shows us the percentage and count within each split Each split is labeled with the details of the split 79 98 of people who are between 67 to 70 inches and own jorts Jean shorts are female 16 Saving Predictions from Tree We can use the decision tree to save predictions for the y variable A quantitative variable will save a prediction A categorical variable will save a probability for the different levels of the y variable 17 Things to Consider Creating a good model starts with collecting the right data New splits are contained within old splits The original split creates a path for the decision tree to follow Sometimes it is best to create multiple trees with first splits to get an idea of what truly governs your variable of interest 18 Example in Business Many businesses have massive data sets The data set to the left has 1152 variables The data is financial data from a bank that has been coded A decision tree allows us to find out what x variable explains the most variation in y 19 What Leads to Deposits We want to explain variation in total deposits In the past statisticians had to run each test individually In moments we ve found a variable that explains 60 5 of the variation in total deposits 20 Examining the R2 Examining the increase in R2 can help us decide how many splits we need The increase after the third split is very small This seems like the split to stop on 21 Using the Model Output for our final model is displayed below 2 What does R tell us Why would this model be useful to a bank How could we create a better model 22


View Full Document

UT Knoxville STAT 201 - Decision Trees

Documents in this Course
Load more
Download Decision Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision Trees and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?