UMBC CMSC 691 - Experiment Design for Computer Scientists - D2547776

Home> Schools> University of Maryland, Baltimore County> (CMSC) > CMSC 691> Experiment Design for Computer Scientists

DOC PREVIEW

UMBC CMSC 691 - Experiment Design for Computer Scientists

School name University of Maryland, Baltimore County

Course Cmsc 691- Malware Analysis

Pages 29

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 29 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Experiment Design for Computer ScientistsSourcesExperiment designProvable ClaimsSlide 5More Provable ClaimsOne MoreMeasurable, Meaningful CriteriaMeasurable CriteriaMeaningful CriteriaExample 1: CISCExample 2: MYCINMYCIN Study 2MYCIN Study 3MYCIN ResultsMYCIN Lessons LearnedReasonable BaselinesBaseline: Point of ComparisonPoor BaselinesEstablish a NeedTest Alternative ExplanationsIs CHC Better than Random HC ?Statistically Valid ResultsLook at Your DataAnscombe Datasets PlottedLook at Your Data, AgainCloser analysis reveals…Statistical MethodsSlide 29September1999October 1999October 1999Experiment Design for Computer ScientistsMarie desJardins ([email protected])CMSC 691BMarch 9, 2004September1999October 1999October 19993/9/04 2SourcesPaul Cohen, Empirical Methods in Artificial Intelligence, MIT Press, 1995.Tom Dietterich, CS 591 class slides, Oregon State University.Rob Holte, “Experimental Methodology,” presented at the ICML 2003 Minitutorial on Research, ‘Riting, and Reviews.September1999October 1999October 19993/9/04 3Experiment designExperiment design criteria:Claims should be provableContributing factors should be isolated and controlled forEvaluation criteria should be measurable and meaningfulData should be gathered on convincing domain /problem Baselines should be reasonableResults should be shown to be statistically validssSeptember1999October 1999October 1999Provable ClaimsSeptember1999October 1999October 19993/9/04 5Provable ClaimsMany research goals start out vague:Build a better plannerLearn preference functionsEventually, these claims need to be made provable:ConcreteQuantitativeMeasurableProvable claims:My planner can solve large, real-world planning problems under conditions of uncertainty, in polynomial time, with few execution-time repairs.My learning system can learn to rank objects, producing rankings that are consistent with user preferences, measured by probability of retrieving desired objects.September1999October 1999October 19993/9/04 6More Provable Claims More vague claims:Render painterly drawingsDesign a better interfaceProvable claims:My system can convert input images into drawings in the style of Matisse, with high user approval, and with measurably similar characteristics to actual Matisse drawings (color, texture, and contrast distributions).My interface can be learned by novice users in less time than it takes to learn Matlab; task performance has equal quality, but takes significantly less time than using Matlab.September1999October 1999October 19993/9/04 7One MoreVague claim:Visualize relational dataProvable claim:My system can load and draw layouts for relational datasets of up to 2M items in less than 5 seconds; the resulting drawings exhibit efficient screen utilization and few edge crossings; and users are able to manually infer important relationships in less time than when viewing the same datasets with MicroViz.September1999October 1999October 1999Measurable, Meaningful CriteriaSeptember1999October 1999October 19993/9/04 9Measurable CriteriaIdeally, your evaluation criteria should be:Easy to measureReliable (i.e., replicable)Valid (i.e., measuring the right thing)Applicable early in the design processConvincingTypical criteria:CPU time / clock timeCycles per instructionNumber of [iterations, search states, disk seeks, ...]Percentage of correct classificationNumber of [interface flaws, user interventions, necessary modifications, ...]Adapted with permission from Tom Dietterich’s CS 519 (Oregon State University) course slidesSeptember1999October 1999October 19993/9/04 10Meaningful CriteriaEvaluation criteria must address the claim you are trying to makeNeed clear relationship between the claim/goals and the evaluation criteriaGood criteria:Your system scores well iff it meets your stated goalBad criteria:Your system can score well even though it doesn’t meet the stated goalYour system can score badly even though it does meet the stated goalSeptember1999October 1999October 19993/9/04 11Example 1: CISCTrue goals:Efficiency (low instruction fetch, page faults)Cost-effectiveness (low memory cost)Ease of programmingEarly metrics:Code size (in bytes)Entropy of Op-code fieldOrthogonality (can all modes be combined?)Efficient execution of the resulting programs was not being directly consideredRISC showed that the connection between the criteria and the true goals was no longer strong→ Metrics not appropriate! Adapted with permission from Tom Dietterich’s CS 519 (Oregon State University) course slidesSeptember1999October 1999October 19993/9/04 12Example 2: MYCINMYCIN: Expert system for diagnosing bacterial infections in the bloodStudy 1 evaluation criteria were:Expert ratings of program tracesDid the patient need treatment?Were the isolated organisms significant?Was the system able to select an appropriate therapy?What was the overall quality of MYCIN’s diagnosis?Problems:Overly subjective dataAssumed that experts were ideal diagnosticiansExperts may have been biased against the computerRequired too much expert timeLimited set of experts (all from Stanford Hospital)Adapted with permission from Tom Dietterich’s CS 519 (Oregon State University) course slidesSeptember1999October 1999October 19993/9/04 13MYCIN Study 2Evaluation criteria:Expert ratings of treatment planMultiple-choice rating system of MYCIN recommendationsExperts from several different hospitalsComparison to study 1: Objective ratings More diverse experts Still have assumption that experts are right Still have possible anti-computer bias Still takes a lot of timeAdapted with permission from Tom Dietterich’s CS 519 (Oregon State University) course slidesSeptember1999October 1999October 19993/9/04 14MYCIN Study 3Evaluation criteria:Multiple-choice ratings in a blind evaluation setting:MYCIN recommendationsNovice recommendationsIntermediate recommendationsExpert recommendationsComparison to study 2: No more anti-computer bias Still assumes expert ratings are correct Still time-consuming (maybe even more so!)Adapted with permission from Tom Dietterich’s CS 519 (Oregon State University) course slidesSeptember1999October 1999October 19993/9/04 15MYCIN

View Full Document