DOC PREVIEW
Mt Holyoke CS 341 - Syllabus

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Data MiningData MiningCS 341, Spring 2007CS 341, Spring 2007Prof. Xiaoyan Li Prof. Xiaoyan Li Visiting Assistant Professor of Visiting Assistant Professor of Computer ScienceComputer ScienceMount Holyoke CollegeMount Holyoke College© Prentice Hall 2Course InfoCourse InformationrmationnnInstructor: Xiaoyan LiInstructor: Xiaoyan LinnLecture: Lecture: MMon&on&WWeded2:40pm 2:40pm ––3:55pm3:55pm––Room: Room: KendadeKendadehall 107hall 107nnOffice hour: Office hour: Tu/ThTu/Th10:00am 10:00am ––11:00am (or 11:00am (or by appointment)by appointment)––Office: Clapp 227Office: Clapp 227––Email: Email: [email protected]@mtholyoke.edu© Prentice Hall 3Course InformationCourse InformationnnTextbookTextbook––Data Mining: Introductory and Advanced TopicsData Mining: Introductory and Advanced Topics»»by Margaret H. Dunham , ISBN 0by Margaret H. Dunham , ISBN 0--1313--088892088892--33nnTopicsTopics––Related Concepts & Basic TechniquesRelated Concepts & Basic Techniques––Core TopicsCore Topics»»Classification, Clustering and Association RulesClassification, Clustering and Association Rules––Advanced TopicsAdvanced Topics»»Web Mining, Spatial Mining & Temporal MiningWeb Mining, Spatial Mining & Temporal Mining© Prentice Hall 4Course StructureCourse StructurennThe course is divided into 3 partsThe course is divided into 3 parts––Related concepts and basic techniquesRelated concepts and basic techniques––Core TopicsCore Topics»»Classification, clustering, association rulesClassification, clustering, association rules––Perl programming language, final projectsPerl programming language, final projectsnnThe first 2/3 are lectures, the rest 1/3 The first 2/3 are lectures, the rest 1/3 are seminars. are seminars. © Prentice Hall 5Tentative schedule:Tentative schedule:nnCSCS--341 Data Ming341 Data Ming© Prentice Hall 6GradingGradingnnClass participation: 20%Class participation: 20%nnFour homework assignments: 20%Four homework assignments: 20%nnOne midterm: 20%One midterm: 20%nnOne final project: 40%One final project: 40%2DATA MININGDATA MININGIntroductory and Advanced TopicsIntroductory and Advanced TopicsPart IPart IMargaret H. DunhamMargaret H. DunhamDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringSouthern Methodist UniversitySouthern Methodist UniversityCompanion slides for the text by Dr. M.H.Dunham, Companion slides for the text by Dr. M.H.Dunham, Data Mining, Data Mining, Introductory and Advanced TopicsIntroductory and Advanced Topics, Prentice Hall, 2002., Prentice Hall, 2002.Some slides are adopted from:© Prentice Hall 8Introduction OutlineIntroduction OutlinennDefine data miningDefine data miningnnBasic data mining tasksBasic data mining tasksnnData mining vs. database & KDDData mining vs. database & KDDnnData mining developmentData mining developmentnnData mining issuesData mining issuesGoal:Goal:Provide an overview of data mining.Provide an overview of data mining.© Prentice Hall 9IntroductionIntroductionnnData is growing at a phenomenal rateData is growing at a phenomenal ratennUsers expect more sophisticated Users expect more sophisticated information information nnHow?How?UNCOVER HIDDEN INFORMATIONUNCOVER HIDDEN INFORMATIONDATA MININGDATA MINING© Prentice Hall 10Data Mining DefinitionData Mining DefinitionnnFinding hidden information in a Finding hidden information in a databasedatabasennSimilar termsSimilar terms––Exploratory data analysisExploratory data analysis––Data driven discoveryData driven discovery––Deductive learningDeductive learning© Prentice Hall 11Example 1.1Example 1.1nnCredit card company must determine whether Credit card company must determine whether to authorize credit card purchases.to authorize credit card purchases.nnFour classes: Four classes: ––1) Authorize, 1) Authorize, ––2) Ask for further identification before authorization 2) Ask for further identification before authorization ––3) do not authorize, 3) do not authorize, ––4) do not authorize but contact police 4) do not authorize but contact police nnHow to classify a purchase?How to classify a purchase?––Examine historical data and determine how data fit Examine historical data and determine how data fit into the four classes.into the four classes.––Apply the model to new purchaseApply the model to new purchase© Prentice Hall 12Data Mining AlgorithmData Mining AlgorithmnnPurpose: Fit Data to a ModelPurpose: Fit Data to a ModelnnPreference Preference ––Criteria to choose the best Criteria to choose the best modelmodelnnSearch Search ––Technique to search the dataTechnique to search the data3© Prentice Hall 13Data Mining ModelsData Mining ModelsnnPredictive:Predictive:––A predictive model makes a prediction A predictive model makes a prediction about values of data using known results about values of data using known results found from different data. found from different data. nnDescriptive:Descriptive:––A descriptive model identifies patterns or A descriptive model identifies patterns or relationships in data.relationships in data.© Prentice Hall 14Data Mining Models and TasksData Mining Models and Tasks© Prentice Hall 15Basic Data Mining TasksBasic Data Mining TasksnnClassification Classification maps data into predefined maps data into predefined groups or classesgroups or classes––Pattern recognitionPattern recognitionnnExample 1.1 Example 1.1 is a general classification is a general classification problemproblemnnExample 1.2 Example 1.2 is an example of pattern is an example of pattern recognitionrecognition––Airport screening is used to determine whether Airport screening is used to determine whether passengers are potential terrorists or criminalspassengers are potential terrorists or criminals––Basic patterns: distance between eyes, size and Basic patterns: distance between eyes, size and shape of mouth, etc.shape of mouth, etc.© Prentice Hall 16Basic Data Mining TasksBasic Data Mining TasksnnRegressionRegressionis used to map a data item to a is used to map a data item to a real valued prediction variable.real valued prediction variable.––Assume some known type of function (e.g. linear) Assume some known type of function (e.g. linear) and select the best one.and select the best one.nnExample 1.3Example 1.3––A college professor wishes to reach a certain level A college professor wishes to reach a certain level


View Full Document

Mt Holyoke CS 341 - Syllabus

Download Syllabus
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Syllabus and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Syllabus 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?