DOC PREVIEW
A Data Mining Course for Computer Science

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Data Mining Course for Computer Science: PrimarySources and ImplementationsDavid R. MusicantCarleton CollegeDepartment of Mathematics and Computer ScienceOne North College StreetNorthfield, MN [email protected] undergraduate elective course in data mining providesa strong opp ortunity for students to learn research skills,practice data structures, and enhance their understandingof algorithms. I have developed a data mining course builtaround the idea of using research-level papers as the primaryreading material for the course, and implementing data min-ing algorithms for the assignments. Such a course is accessi-ble to students with no prerequisites beyond the traditionaldata structures course, and allows students to experienceboth applied and theoretical work in a discipline that strad-dles multiple areas of computer science. This paper providesdetailed descriptions of the readings and assignments thatone could use to build a similar course.Categories and Subject DescriptorsI.2.6 [Artificial Intelligence]: Learning—concept learning,induction; I.5.2 [Pattern Recognition]: Design Methodol-ogy—classifier design and evaluation; I.5.3 [Pattern Recog-nition]: Clustering—algorithms, similarity measures; K.3.2[Computers and Education]: Computer and InformationScience Education—computer science education.General TermsAlgorithms, measure ment, design, experimentation.KeywordsData mining, machine learning, course design.1. INTRODUCTIONData mining is an exciting and relatively new area of com-puter science that lies at the intersection of artificial intel-ligence and database systems. Defined as the “non trivialprocess of identifying valid, novel, potentially useful, andThis is the author’s version of the work. It is posted here by permission ofACM for your personal use. Not for redistribution. The definitive versionwas published as:SIGCSE’06, March 1–5, 2006, Houston, Texas, USA.Copyright ACM, 2006.ultimately understandable patterns in data” [11], data min-ing concerns itself with how to automatically find, simplify,and summarize patterns within large sets of data. Machinelearning, said to be “concerned with the question of how toconstruct computer programs that automatically improvewith experience”[16], overlaps heavily with data mining inthat many of its algorithms learn from data. A course inmachine learning and data mining (hereafter simplified tojust “data mining”) is a wonderful elective class to offer toundergraduates.A data mining elective has been offered twice at CarletonCollege. T his course has turned out to be a marvelous op-portunity for students to use theoretical computer scienceideas to solve practical “real-world” problems. Data miningrequires a variety of ideas from data structures and algo-rithms, which gives students the opportunity to see theseconcepts in practice. It should therefore be pointed out thatthis paper actually s erves a dual role: readers of this pap ermight find that some of the concepts or assignments con-tained herein would be useful examples in an advanced datastructures class. There are also significant issues with pri-vacy and ethics in data mining, and this provides an oppor-tunity to link computer science with wider affairs. Becausestudents can choose their own datasets to analyze, they geta personal sense of ownership in the work that they do be-cause they can choose data from some application area thatinterests them. Data mining is a new field, and so most ofthe seminal work has been written within the last ten years.This adds to the motivational aspects of the course, sincethe students are actually learning something new to every-one. Finally, I should admit my biases up front: my researchis in data mining, and thus I wished to offer my liberal artsstudents a chance to see how engaging these ideas are.Why should the fields of machine learning and data min-ing be taught together in one course? The areas of machinelearning and data mining have a very large intersection,which could perhaps be described very simply as “learn-ing from data.” There are areas of machine learning thatdo not interact much with data mining (such as reinforce-ment learning), and there are areas of data mining that donot seem to capture the flavor of machine learning (suchas how to make data analysis algorithms scale gracefully),but the central idea of learning from data is common toboth fields. Material found in machine learning books andin data mining books is quite similar. The first time thatI offered my course at Carleton, I actually just called it“Data Mining.” Students indicated in post-course surveysthat the name “Machine Learning” was considerably moreattractive to them, and students would be more likely totake the course if both names were in the title.Another question that might b e proposed is “Why offersuch a course at all? If some of this material is worthwhile,why not merely split up the material between artificial in-telligence and database courses?” Bits of these ideas do endup in some of our other courses. I do cover a healthy doseof machine learning in my AI course, and data mining atleast gets half a class of discussion in my database course.But the coherent area of data mining is worthy of study inand of itself, and easily can span a semester. Artificial in-telligence courses tend to survey AI, and so the amount oftime that one can spend on machine learning is constrained.Database courses need to spend significant amounts of timeon the functioning of database systems themselves, and thusalgorithms for learning from data are hard to fit in.The cours e that I have constructed and taught is designedto appeal to computer science students and to reinforce com-puter science ideas that they have seen elsewhere. Becausewe are a small program and our courses do not run all thatoften, it helps to boost enrollments if prerequisites are min-imal. Therefore, the only prerequisite that I require is ourdata structures course. One of the challenges in teachingdata mining to undergraduate computer scientists is its highoverlap with statistics, which can require significant back-ground by students. There fore, the course that I have puttogether is based on two pivotal elements: reading researchpap e rs as primary source material, and implementing datamining algorithms via programming. Textbooks on datamining for a course such as this are quite limited. Mos t datamining textbooks are either not


A Data Mining Course for Computer Science

Download A Data Mining Course for Computer Science
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Data Mining Course for Computer Science and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Data Mining Course for Computer Science 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?