DOC PREVIEW
Rutgers University CS 536 - CS 536 Assignment

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS536 Machine Learning Spring 2007 Assignment 1 Due Date March 7th 2007 Submission instruction: submit your work in the TA's mailbox. Any questions related to Weka should be directed to the TA. [Q1 – 10 points] Consider the following set of training examples Instance A1 A2 Classification 1 T T + 2 T T + 3 T F - 4 F F + 5 F T - 6 F T - a. What is the entropy of this collection of training examples with respect to the target function? b. What is the information gain of both A1 and A2 attributes relative to these training examples? [Q2 – 10 points] As discussed in class, any joint probability distribution can be decomposed using a chain rule as follows: P(x1,x2,…,xn)=P(x1) ∏ni=2 P(xi|x1,…,xi-1) Using the chain rule and the independence and conditional independence assumptions made in the shown Bayesian network, prove that P(F,A,S,H,N)=P(F)P(A)P(S|F,A)P(H|S)P(N|S) F AH NS [Q3 – 10 points ] True or False: If a decision tree D2 is an elaboration of tree D1, then D1 is more-general-than D2. Assume D1 and D2 are decision trees representing arbitrary Boolean function, and that D2 is an elaboration of D1 if ID3 could extend D1 into D2. If true, give a proof; if false give a counterexample.[Q4 – 10 points ] Consider the space of linear "hinges" consisting of two line segments joined at a point. The drawing below shows a linear hinge separating positive and negative examples. (a) What is the VC dimension of linear hinges in 2 dimensions? Explain why (with diagrams if (b) inimum number of parameters needed to specify a general linear hinge in 2 5 40 Points USING WEKA] Learning decision trees. rs for parts 6 and 8. 1. W1data/you like). What is the mdimensions? Explain your answer. [Q For this question you need to submit only answeDownload the two datasets, trndata and tstdata from http://www.cs.rutgers.edu/~elgammal/classes/cs536/H . We want to build a les used 2. r both of the files. Please read the tutorial that comes with weka to 3. 4. f the training data. vised instance b. filter text box, select sampleSizePercent 10. .arff. Open d. in on open file, and select trndata.arff. 20,40,50,70,90 , save the result as trndata_n_0.arff and rename all datasets as 'income'. decision tree such that given someone's age, work class, occupation etc., we will be able to tell whether he earns more than 50k per year or not. This is a binary classification where class={>50, <=50}. The attr.txt file describes the variabin these datasets. Generate ARFF folearn how to convert txt/xls files to arff. Name the dataset (i.e. Relation) as income. Open trndata.arff in the explorer window of weka. Skip the preprocessing for now and choose the classify tab. Choose J48 tree classifier and set tstdata.arff as test set.Press the start button and observe the output. In this step, we will generate random subsets oa. Go to preprocess section of explorer window and choose superbased filter Resample. By right clicking on thec. Click apply and save the data subset (i.e. the relation as trndatat_10_0this file in any text editor (e.g. Wordpad )and change the name of the relation as income. Click agae. Repeat steps a~d for sample size percentage m=5. Res me of the filter and then iased 6. (select sampling that s' 7. his feature by clicking on the text box and selecting true for unpruned option. Then press start button to 8. ing using the trees generated (unpruned, pruning by Repeat 4, but this time set the biasToUniformClass parameter of the filter ample to 1.0 (click on the text box which displays the naclick More button on the dialog box that appears next to learn about this bsampling). Save the resulting datasets as trndatat_m_1.arff where m=10, 20,40,50,70,90. Again, change names of all the relations to 'income'. Use these 12 datasets as the training sets for the decision tree classifier J48tstdata.arff as the test set). As we know six of them were generated bytried to retain the distribution of the 'class' variable. In the other 6 datasets, the 'clasdistributions were tried to make a uniform one. Plot two curves for the percentage of correctly classified instances of these two types of training sets. Can you explain the graph? Hint: Look at the distribution of the variable class in the training and test sets.please explain why the two curves are same or different. J48 tree classifier uses Rule-post pruning by default. Turn off timplement C4.5 without pruning. Now set reducedErrorPruning to true and observe the output. Briefly state the difference between these two prunrule-post scheme and by Reduced error pruning). Note: learn the decision tree on the original datasets, trndata and tstdata (NOT on the subsets created in steps


View Full Document
Download CS 536 Assignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 536 Assignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 536 Assignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?