Rutgers University CS 536 - CS 536 Assignment - D1592570

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> CS 536 Assignment

DOC PREVIEW

Rutgers University CS 536 - CS 536 Assignment

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS536 Machine Learning Spring 2007 Assignment 1 Due Date March 7th 2007 Submission instruction: submit your work in the TA's mailbox. Any questions related to Weka should be directed to the TA. [Q1 – 10 points] Consider the following set of training examples Instance A1 A2 Classification 1 T T + 2 T T + 3 T F - 4 F F + 5 F T - 6 F T - a. What is the entropy of this collection of training examples with respect to the target function? b. What is the information gain of both A1 and A2 attributes relative to these training examples? [Q2 – 10 points] As discussed in class, any joint probability distribution can be decomposed using a chain rule as follows: P(x1,x2,…,xn)=P(x1) ∏ni=2 P(xi|x1,…,xi-1) Using the chain rule and the independence and conditional independence assumptions made in the shown Bayesian network, prove that P(F,A,S,H,N)=P(F)P(A)P(S|F,A)P(H|S)P(N|S) F AH NS [Q3 – 10 points ] True or False: If a decision tree D2 is an elaboration of tree D1, then D1 is more-general-than D2. Assume D1 and D2 are decision trees representing arbitrary Boolean function, and that D2 is an elaboration of D1 if ID3 could extend D1 into D2. If true, give a proof; if false give a counterexample.[Q4 – 10 points ] Consider the space of linear "hinges" consisting of two line segments joined at a point. The drawing below shows a linear hinge separating positive and negative examples. (a) What is the VC dimension of linear hinges in 2 dimensions? Explain why (with diagrams if (b) inimum number of parameters needed to specify a general linear hinge in 2 5 40 Points USING WEKA] Learning decision trees. rs for parts 6 and 8. 1. W1data/you like). What is the mdimensions? Explain your answer. [Q For this question you need to submit only answeDownload the two datasets, trndata and tstdata from http://www.cs.rutgers.edu/~elgammal/classes/cs536/H . We want to build a les used 2. r both of the files. Please read the tutorial that comes with weka to 3. 4. f the training data. vised instance b. filter text box, select sampleSizePercent 10. .arff. Open d. in on open file, and select trndata.arff. 20,40,50,70,90 , save the result as trndata_n_0.arff and rename all datasets as 'income'. decision tree such that given someone's age, work class, occupation etc., we will be able to tell whether he earns more than 50k per year or not. This is a binary classification where class={>50, <=50}. The attr.txt file describes the variabin these datasets. Generate ARFF folearn how to convert txt/xls files to arff. Name the dataset (i.e. Relation) as income. Open trndata.arff in the explorer window of weka. Skip the preprocessing for now and choose the classify tab. Choose J48 tree classifier and set tstdata.arff as test set.Press the start button and observe the output. In this step, we will generate random subsets oa. Go to preprocess section of explorer window and choose superbased filter Resample. By right clicking on thec. Click apply and save the data subset (i.e. the relation as trndatat_10_0this file in any text editor (e.g. Wordpad )and change the name of the relation as income. Click agae. Repeat steps a~d for sample size percentage m=5. Res me of the filter and then iased 6. (select sampling that s' 7. his feature by clicking on the text box and selecting true for unpruned option. Then press start button to 8. ing using the trees generated (unpruned, pruning by Repeat 4, but this time set the biasToUniformClass parameter of the filter ample to 1.0 (click on the text box which displays the naclick More button on the dialog box that appears next to learn about this bsampling). Save the resulting datasets as trndatat_m_1.arff where m=10, 20,40,50,70,90. Again, change names of all the relations to 'income'. Use these 12 datasets as the training sets for the decision tree classifier J48tstdata.arff as the test set). As we know six of them were generated bytried to retain the distribution of the 'class' variable. In the other 6 datasets, the 'clasdistributions were tried to make a uniform one. Plot two curves for the percentage of correctly classified instances of these two types of training sets. Can you explain the graph? Hint: Look at the distribution of the variable class in the training and test sets.please explain why the two curves are same or different. J48 tree classifier uses Rule-post pruning by default. Turn off timplement C4.5 without pruning. Now set reducedErrorPruning to true and observe the output. Briefly state the difference between these two prunrule-post scheme and by Reduced error pruning). Note: learn the decision tree on the original datasets, trndata and tstdata (NOT on the subsets created in steps

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

Rutgers University CS 536 - CS 536 Assignment

Sign up for free to view:

Please select your school