Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceWednesday, March 29, 2000Arul ElumalaiDepartment of Computing and Information Sciences, [email protected] of the day“KDD for Science Data Analysis: Issues and Examples”- Fayyad, Haussler and StolorzData Mining and KDD Presentation (1 of 4)KDD for Science Data AnalysisLecture 28Lecture 28Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Introduction•Concept of data mining•Fundamentals of data analysis•Case studies•Issues and Challenges•Article critiquePresentation previewPresentation previewKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceObjective:Application of KDD in creative data analysis for theory formationScope:Analysis of scientific dataScenario:- Modern scientific instruments & data collection- Data abundanceIssues:- Gap between data collection and data analysis- Large size and dimension of available dataIntroductionIntroductionKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceData ReductionReducing data to an analyzable size and simplicityQuestions:1. Is it representative of the complete phenomenon?2. Is it only the redundant data that has been removed?3. What strategy is to be deployed for data reduction?Automated Analysis:Mechanization of data analysis using intelligent agents.Question:1. Is it as efficient and foolproof as manual analysis?Handling massive dataHandling massive dataKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Image Data+ Predefined display format- Mining image data is difficult- Mapping from pixel to feature is noisy•Time series and sequential data - Rate of measurement may be random - Non stationary characteristics•Numerical Vs Categorical measurement- The concept of “difference” is not defined (CM)Data in its many formsData in its many formsKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Structured and sparse data- Measured attributes may vary - Dimensionally complex (No available algorithms)•Reliability of data (sensor Vs model)- Needs translation from sensor levelData in its many forms (contd.)Data in its many forms (contd.)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceKnowledge Discovery in Databases (KDD) refers to the overall process of discovering useful knowledge from dataData mining refers to the application of algorithms for extracting patterns from dataData mining consists of five major elements:•Extraction•Storage•Access/ retrieval•Analysis (by application software.)•Presentation.KDD and Data miningKDD and Data miningKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProblem statement:Classification of stellar objects based on image dataScale:3 terra bytes of data; 2 billion objects; 3000 images; 40 attribs/ objStrategy deployed:- Dimensional reduction ( 40 to 8) - Tree learning algorithmAchievement:- Accommodated fainter images- Achieved 94% prediction accuracyLimitation:- Inclusion of supervised learningCase 1: Stellar classificationCase 1: Stellar classificationKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProblem statement:Finding volcanoes on Venus using high resolution global mapsScale:30,000 images; 100 CD storage Strategy deployed:- Training via examplesAchievement:- Detection of over 1 million volcanoes- Flexible approach and allows reuseLimitation:- High false detection rate- Sensitive to image illumination, scale and angleCase 2: Volcanoes in VenusCase 2: Volcanoes in VenusKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProblem statement:Extraction of genetic code from stored values in databasesScale:400 million tokens (GENBANK); 200,000 sequences;Inaccurate pattern finding algorithms;Strategy deployed:- Identification of a statistical model (HMM)- Template structure for search is not provided and must be discoveredAchievement:- Identification of new relationsLimitation:- Slow database query and computational overheads- Prerequisite bio knowledge and lab experimentationCase 3: Extraction of genetic codeCase 3: Extraction of genetic codeKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceCase 4: Earth GeophysicsCase 4: Earth GeophysicsProblem statement:Measurement of tectonic motion based on images before and after quakesScale:Lack of precision in resolution;Strategy deployed:- Repeated registration of local images to sub pixel precision- Construction of systems that can work on massive data setsAchievement:- Not only measured known faults but also detected novel patternsLimitation:- Required “similar enough” images for comparisonKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProblem statement:Analysis of weather patterns and other spatio-temporal patternsScale:Several gigabytes of data/ model; Complex queries; Large attributesStrategy deployed:- Use of parallel test beds- Development of learning algorithms that identified novel patterns- Content based indexing to increase query performanceStatus quo:- State of infancyCase 5: Atmospheric scienceCase 5: Atmospheric scienceKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Feature extraction from raw data•Attention to minority classes•Demand of high degree of confidence and accuracy•Basis for selection of data mining task•Translation of derived models into useful knowledge•Harnessing domain knowledge•Scalable machine knowledge and algorithmsIssues and ChallengesIssues and ChallengesKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in


View Full Document

K-State CIS 830 - KDD for Science Data Analysis

Download KDD for Science Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view KDD for Science Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view KDD for Science Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?