K-State CIS 830 - Lecture Notes - D2957136

Home> Schools> Kansas State University> Computing and Information Sciences (CIS) > CIS 830> Lecture Notes

K-State CIS 830 - Lecture Notes

Course Cis 830- Current Topics Ai - Top/Mach Lrn Pattern Rn

Pages 12

Download Save

Unformatted text preview:

PowerPoint PresentationSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceWednesday, February 2, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:Chown and DietterichAnalytical Learning Discussion (3 of 4):Learning and KnowledgeLecture 7Lecture 7Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline•Paper–Paper: “A Divide-and-Conquer Approach to Learning from Prior Knowledge”–Authors: E. Chown and T. G. Dietterich•Overview–Using prior knowledge as an aid to learning•Model calibration problem•Role of prior knowledge in analytical and inductive learning–Hierarchical learning system: MAPSS•Analytical learning to decompose prediction learning problem sequentially•Idea: choose hypothesis language (parameters), examples for subproblems•Topics to Discuss–How to choose prediction target(s)?–Local versus global optimization: how can knowledge make difference?–How does hierarchical decomposition implement bias shift (search for H)?–Empirical improvements using prior knowledge? Ramifications for KDD?•Next Paper: Towell, Shavlik, and Noordewier, 1990 (KBANN)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBackground AI and Machine LearningBackground AI and Machine LearningMaterialMaterial•Parameter Estimation–Russell and Norvig•Chapter 18: inductive learning (version spaces, decision trees)•Chapter 21: learning with prior knowledge–Mitchell•Chapter 2: inductive learning (basics, inductive bias, version spaces)•Chapter 6: Bayesian learning•Topics to Discuss–Muddiest points•Inductive learning: learning as search (in H)•Data preprocessing in KDD•Model calibration: parameter estimation (inductive learning application)•Local versus global optimization–Example questions to ask when writing reviews and presentations•How is knowledge represented?•Exactly how is prior knowledge applied to improve learning?Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMAPSS: Issues Brought Up by StudentsMAPSS: Issues Brought Up by Studentsin Paper Reviews in Paper Reviews •Key MAPSS-Specific Questions–How to choose prediction target(s)? (prefilter using “relevance knowledge”)–Learning by local vs. global optimization•Global (e.g., simulated annealing): “no prior assumptions” about P(h) •Role of knowledge? (preference, representation bias)–How does hierarchical decomposition implement bias shift (search for H)?•Bias shift: change of representation (aspect of inductive bias)•References: [Fu and Buchanan, 1985; Jordan et al, 1991; Ronco et al, 1995]–Empirical improvements using prior knowledge? (better convergence in training)–Ramifications for KDD? (better parametric models for prediction; scalability)•Key General Questions–How is knowledge base (KB) represented? (programmatic classification model)–Exactly how is prior knowledge applied to improve learning? (prefiltering D)•Important Question: What Kind of Analytical/Inductive Hybrid Is This?•Applications to KDD (Model Calibration in Simulators, etc.)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceKey StrengthsKey Strengthsof MAPSS Learning Techniqueof MAPSS Learning Technique•Strengths–Prior knowledge led to training convergence•Previously, could only calibrate 12 of 20 parameters of model (Section 2.2)•Prior knowledge made it possible to calibrate rest (Section 3.3)–Idea: analysis of code to produce prior knowledge•Knowledge-based software engineering (KBSE) concept•Implement classification model as program•Use partial evaluation of program to find x D for which few I are unknown–Idea: bootstrapped (interleaved inductive, analytical) learning•Training: “short runs” of global optimization, interleaved with prefiltering of D•Produces filter models and one example per model (batch of 40)–Idea: decomposing problems into locally relevant sets of parameters•Scalability (through divide-and-conquer): relative to I (65 attributes)•Partitioning problem by partitioning attributes [Hsu, Ray, and Wilkins, 2000]•Applications to KDD–Can express many KBs as programs: simulators, classification systems–Methods for estimating (e.g., EM) missing values in data–Breaking problem into more tractable pieces (more in Paper 8!)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceKey WeaknessesKey Weaknesses of MAPSS Learning Technique of MAPSS Learning Technique•Weaknesses–Still took 3+ months (even using prior knowledge)!•750K evaluations took 6 CPU weeks (SPARC 2)•1.5M evaluations in final version–Generality not well established•Under what conditions can we express prediction rules in the imperative programming language used?•Ramifications for general-case learning applications (e.g., KDD?)–Typos in section 3.2?•Unclear Points–What form of partial evaluation is appropriate for prediction task?–How to choose the right architecture of committee machine? (e.g., filter models)–Can technique scale up calibration of broad class of scientific models?–How to use prior relevance knowledge in KDD?•Acquisition (automatic relevance determination, aka ARD) – “20 important I”•Automatic application (stay tuned…)–How to apply other forms of prior knowledge (constraints, etc.)? – Paper 4Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceData Gathering AlgorithmData Gathering Algorithm•Committee Machine–See•Chapter 7, Haykin•Chapter 7, Mitchell•Lectures 21-22, CIS798 (http://ringil.cis.ksu.edu/Courses/Fall-1999/CIS798)–Idea•Use experts to preprocess (filter) D or combine predictions•In this case, 40 experts prefilter D to get n = 40 examples; need 32-36 to agree•Intuitive Idea–Want to use prior knowledge (in form of imperative program) to speed up learning•Analyze program: perform partial evaluation using current

View Full Document


School:
Email:
New Password:
Confirm Password:

K-State CIS 830 - Lecture Notes

Sign up for free to view:

Please select your school