Unformatted text preview:

CS 348: Introduction to Artificial IntelligenceLab 3: Decision TreesINPUT FILE FORMATTHE EXECUTABLEOUTPUT FILE FORMATQUESTIONS TO ANSWERWHAT TO HAND INHOW TO HAND IT INDue by the start of class on Wednesday, May 31, 2006.CS 348: Introduction to Artificial IntelligenceLab 3: Decision TreesThis lab will introduce you to machine learning using decision trees. Decision tree induction has been described in class and is in section 18.3 of the textbook. Decision tree induction is a machine learning approach to approximating f, given a set of examples. An example is a tuple <x1, x2,…, xn, f(x1, x2,…, xn)> consisting of values for the n inputs to the function f and the output of f, given those values. For this lab, you will construct a binary decision tree learner, examine its performance on a variety of binary classification problems, and report the results. The following sections describe the file format for examples, the kind of executable to create, the questions to answer, and what needs to be handed in.INPUT FILE FORMATThe input file format is simple. Input files are text files. The first line of each file contains a list of attribute names. Each attribute name is separated from the following attribute by one or more blank characters (spaces and tabs). Each additional line is an example. Each example line contains n+1 binary values, where n is the number of attributes in the first line. Binary values are encoded as the lower case words “true” and “false.” The ith value in each example line is the value of the ith attribute in that example. The final value in each example line is the categorization of that example. The task for a machine learner is to learn how to categorize examples, using only the values specified for the attributes, so that the machine’s categorization matches the categorization specified in the file.The following is an example of the input file format for a function of three binary attributes.ivy_school good_gpa good_letterstrue true true truetrue true true falsetrue false true falsefalse true true falsetrue false true falsetrue true true truefalse true true truetrue false false truefalse false false falsetrue true false falsefalse false false truefalse false false trueTHE EXECUTABLEYour program must be written in C, C++, Java, or Lisp. The executable requirements for the varying languages are outlined below. If your program is written in C, C++, or Java: Your executable must run in Windows XP and must be callable from the command line. It must be named dtree.exe (in the case of a native windows executable) or dtree.jar (in the case of a Java byte code executable). The executable must accept the three parameters shown on the below, in theorder shown below.dtree.exe <file name> <training set size> <number of trials>The previous line is for a Windows XP executable, compiled from C or C++. Your windows executable must conform to this specification. In this specification, <file name> is the name of the text file containing the examples, <training setsize> is an integer specifying the number of examples to include in the training set, and <number of trials> is the number of times a training set will be selected to create a decision tree.If you have chosen to create your program in Java, we require that you create an executable .jar file so that we may call the file using the following syntax.CS 348: Introduction to Artificial IntelligenceLab 3: Decision TreesJava –jar dtree.jar <file name> <training set size> <number of trials>If you do not know how to create a .jar file, there is an excellent tutorial available at the following URL.http://java.sun.com/docs/books/tutorial/jar/For Java code, your class must be run-able on a Windows machine with Java 1.4.X or later and it should require no change to the CLASSPATH. You can test this by trying your code on other machines with Java and making sure you aren't forgetting any dependencies. If you have questionson this, please email Sara, the TA.If your program is written in Lisp:Your code should be written in a file called dtree.lisp. Within this file, you should include a function called “dtree” which takes three parameters as mentioned above (file name, training set size, and number of trials). Your code will be tested using “Allegro CL,” so you should make sure that it runs in that environment. When run, your executable must perform the following steps.1) Read in the text file containing the examples.2) Divide the set of examples into a training set and a testing set by randomly selecting the number ofexamples for the training set specified in the command-line input <training set size>. Use the remainder for the testing set.3) Estimate the expected probability of TRUE and FALSE classifications, based on the examples in the training set.4) Construct a decision tree, based on the training set, using the approach described in section 18.3 ofthe text.5) Classify the examples in the testing set using the decision tree built in step 4.6) Classify the examples in the testing set using the prior probabilities from step 3.7) Determine the proportion of correct classifications made in steps 5 and 6 by comparing the classifications to the correct answers.8) Steps 2 through 7 constitute a trial. Repeat steps 2 through 6 until the number of trials is equal to the value specified in the command-line input <number of trials>.9) Print the results for each trial to an output file called output.txt. The format of output.txt is specified in the following section.OUTPUT FILE FORMATEach run of your decision tree program should create an output text file that contains the following information:- The input file name- The training set size- The number of trialsIn additions, you must provide the following information for each trial:- The number of the trial- The set of examples in the training set- The set of examples in the testing setCS 348: Introduction to Artificial IntelligenceLab 3: Decision Trees- The classification returned by the decision tree for each member of the testing set- The classification returned by applying prior probability to each member of the testing set.- The proportion of correct classifications returned by the decision tree- The proportion of correct classifications returned by applying prior probability.If there are multiple trials, then this information should be in the output file for EACH AND EVERY trial. We will not require that a particular layout be used. That said, if we have ANY


View Full Document

NU CS 348 - Lab 3 -Decision Trees

Documents in this Course
Load more
Download Lab 3 -Decision Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lab 3 -Decision Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab 3 -Decision Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?