Unformatted text preview:

Data Mining CS 341, Spring 2007 Final projectAn Example Query:An Example DocumentAn Example DocumentAn Example DocumentTraining and TestingQuestions to Be Considered:Evaluation:Data Mining TechniquesFuzzy c-means clusteringFuzzy c-means clusteringFuzzy c-means clusteringNext Class (Wednesday April 11th)Data MiningData MiningCS 341, Spring 2007CS 341, Spring 2007Project DiscussionProject Discussion2Final projectFinal projectGoal: Apply available data mining techniques to Goal: Apply available data mining techniques to solve real world problem.solve real world problem.Requirement:Requirement:––Apply two techniques/algorithm and implement at least Apply two techniques/algorithm and implement at least one algorithm. (Find existing codes online or team up one algorithm. (Find existing codes online or team up with your classmates)with your classmates)Problem: Relevant sentence retrievalProblem: Relevant sentence retrieval––Retrieve the set of relevant sentences given a query Retrieve the set of relevant sentences given a query and a collection of documentsand a collection of documents3An Example Query:An Example Query:<title> India and Pakistan Nuclear Tests<title> India and Pakistan Nuclear Tests<<descdesc>Description:>Description:»»On May 11 and 13, 1998 India conducted five nuclear On May 11 and 13, 1998 India conducted five nuclear tests; Pakistan responded by detonating six nuclear tests; Pakistan responded by detonating six nuclear tests on May 28 and 30th. This nuclear testing was tests on May 28 and 30th. This nuclear testing was condemned by the international community.condemned by the international community.<<narrnarr>Narrative>Narrative::»»Relevant documents mention the nuclear testing Relevant documents mention the nuclear testing conducted in May 1998 by both India and Pakistan. conducted in May 1998 by both India and Pakistan. Historical information about the antagonism and rivalry Historical information about the antagonism and rivalry between the two countries is not relevant. Mention of between the two countries is not relevant. Mention of the furor created around the world by these the furor created around the world by these detonations is relevant.detonations is relevant.4An Example DocumentAn Example Document<DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--1</DOCNO>1</DOCNO><TEXT><TEXT>XIE19980529.0045XIE19980529.0045</TEXT></TEXT><DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--2</DOCNO>2</DOCNO><TEXT><TEXT>19981998--0505--2929</TEXT></TEXT><DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--3</DOCNO>3</DOCNO><TEXT><TEXT>France, Canada Condemn Pakistani Nuclear TestsFrance, Canada Condemn Pakistani Nuclear Tests</TEXT></TEXT><DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--4</DOCNO>4</DOCNO><TEXT><TEXT>NEW YORK, May 28 (NEW YORK, May 28 (XinhuaXinhua) ) ----More countries have come out to More countries have come out to condemn Pakistan's nuclear tests.condemn Pakistan's nuclear tests.</TEXT></TEXT>5An Example DocumentAn Example Document<DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--5</DOCNO>5</DOCNO><TEXT><TEXT>The French Foreign Ministry issued a The French Foreign Ministry issued a communiquecommuniqueThursday to deplore and condemn the nuclear tests Thursday to deplore and condemn the nuclear tests conducted by Pakistan on the same day.conducted by Pakistan on the same day.</TEXT></TEXT><DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--6</DOCNO>6</DOCNO><TEXT><TEXT>France calls on both India and Pakistan not to conduct any France calls on both India and Pakistan not to conduct any more nuclear tests but to sign the Comprehensive Test Ban more nuclear tests but to sign the Comprehensive Test Ban Treaty and join talks on the banning of production of fissile Treaty and join talks on the banning of production of fissile materials that can be used to produce nuclear arms, said materials that can be used to produce nuclear arms, said the the communiquecommunique..</TEXT></TEXT>6An Example DocumentAn Example Document<DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--7</DOCNO>7</DOCNO><TEXT><TEXT>Canadian Foreign Affairs Minister Lloyd Canadian Foreign Affairs Minister Lloyd AxworthyAxworthysaid in a said in a statement released Thursday, "We continue to urge statement released Thursday, "We continue to urge Pakistan and India to renounce their nuclear weapons Pakistan and India to renounce their nuclear weapons programs and to sign the Nuclear Nonprograms and to sign the Nuclear Non--Proliferation Treaty Proliferation Treaty and the Comprehensive Test Ban Treaty."and the Comprehensive Test Ban Treaty."</TEXT></TEXT><DOCNO>XIE19980529.0045<DOCNO>XIE19980529.0045--8</DOCNO>8</DOCNO><TEXT><TEXT>He also announced a series of sanctions against Pakistan, He also announced a series of sanctions against Pakistan, which he said are consistent with those imposed on India which he said are consistent with those imposed on India after its nuclear tests.after its nuclear tests.</TEXT></TEXT>7Training and TestingTraining and TestingTraining setTraining set––25 queries25 queries––Collections of documentsCollections of documents––Relevance judgment of each sentenceRelevance judgment of each sentenceTesting setTesting set––Another 25 queriesAnother 25 queries––Collections of documentsCollections of documents8Questions to Be Considered:Questions to Be Considered:How do you represent queries and How do you represent queries and sentences?sentences?What features may be considered for the What features may be considered for the task?task?What data mining techniques can be used What data mining techniques can be used and how?and how?How do you evaluate the performance of your How do you evaluate the performance of your system?system?9Evaluation:Evaluation:Precision and RecallPrecision and RecallThe F measureThe F measure––F = 2*Precision*Recall/(Precision + Recall)F = 2*Precision*Recall/(Precision + Recall)Average Precision for queryAverage Precision for query––Calculate by averaging precision as recall Calculate by averaging precision as recall increases.increases.Mean Average PrecisionMean Average Precision10Data Mining TechniquesData Mining


View Full Document

Mt Holyoke CS 341 - Data Mining

Download Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?