DOC PREVIEW
Stanford CS 374 - Mining Medical Literature

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Mining Medical LiteratureOutlineSlide 3IntroductionThe ProblemWhat is Data Mining?Example Data!Amazon.comGoogle NewsMore ApplicationsInformation Retrieval (IR)Simple flow of Retrieval ProcessIR System EvaluationPrecision and RecallProblems with Precision and RecallSensitivity and SpecificitySlide 17Slide 18HOVERGEN: a Database of Homologous Vertebrate GenesWhy identify functional gene groups?Existing ApproachesStatistical NLP approachNeighbor Divergence ApproachChallenges in the ProblemNeighbor Divergence IntuitionNeighbor Divergence AlgorithmND- Article RepresentationND – Identifying Semantic NeighborsND – Scoring articlesND – Difference in DistributionsObserved and Expected Distribution of Article ScoresResultsOther methodsOther methodsEvaluationCorrupting Functional GroupsSlide 37Slide 38AdvantageExisting approachesInformation Extraction and Machine LearningML techniquesApproach Used hereUnsupervised Learning – Contextual SimilaritySlide 45Contextual SimilarityPartially supervised Learning- SnowballSnowballSupervised Learning – Text classificationHand Crafted Extraction System- GPE systemCombined SystemFinal parameters used for the different systemsRunning TimesResults and EvaluationSlide 55Slide 56Conclusion and Future WorkSlide 58Slide 591Mining Medical LiteratureVignesh Ganapathy(CS 374 : Algorithms in Biology)(FALL 2005)2 OutlineIntroduction and BackgroundMining Technique 1: Identifying Functionally Coherent Gene GroupsMining Technique 2:Extracting Synonymous gene and protein termsConclusions3 OutlineIntroduction and BackgroundMining Technique 1:Identifying Functionally Coherent Gene GroupsMining Technique 2:Extracting Synonymous gene and protein termsConclusions4 IntroductionMedical Literature has vast amounts of knowledge and informationPubMed Central (PMC) ( the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature)Amedeo.com (The Medical Literature Guide)Journals like Science, Nature, Cell ,EMBO, Cell Biology, PNAS  (and many more..)5 The ProblemMajor task is finding out ways to extract useful information from these resources.6 What is Data Mining? “Data Mining is the Process of discovering meaningful, new correlation patterns and trends by sifting through large amount of data stored in repositories, using pattern recognition techniques as well as statistical and mathematical techniques.”7 Example Data!Large amounts of data but no informationDaily transactions at a supermarketDaily website visit historiesBooks/videos rented at a LibraryNewspaper, Journal archives8 Amazon.com9 Google NewsClustering News items (Google News)10 More ApplicationsImproving Sales strategyFinding items that sell together(there is a common example of beer and diaper being related. A supermarket found out that 50% of the times beer was purchased with diapers)Anomaly Detection and many more…11 Information Retrieval (IR)Collecting information from text data (Unstructured Data)ApplicationsSearch web documentsNatural Language ProcessingTerm also extends to include multimedia or other forms of unstructured data12 Simple flow of Retrieval Process13 IR System Evaluation Some measures are Precision RecallF1 measure – Combined measure which is a weighted harmonic meanSensitivitySpecificity14 Precision and RecallHow are Precision and Recall related?15 Problems with Precision and RecallDeciding documents relevant and non relevant is not easyFor recall, difficult to measure the number of relevant documents in databaseCreating pool of relevant records is one solutionIn practice, these are still good measures16 Sensitivity and SpecificitySensitivity – Probability of positive examples Specificity – Probability of negative examplesWhat is the relation between Sensitivity, Specificity, Precision and Recall?17 OutlineIntroduction and BackgroundMining Technique 1:Identifying Functionally Coherent Gene GroupsMining Technique 2:Extracting Synonymous gene and protein termsConclusion18 IntroductionAnalysis shifting from single gene to family of genesExamples of these are:Sequence DataGene Expression ClusteringDeletion PhenotypesYeast-2-Hybrid screens19 HOVERGEN: a Database of Homologous Vertebrate GenesUseful for comparative sequence analysis, or molecular evolution studies 10 biggest gene families20 Why identify functional gene groups?Interesting to know functionally relevant groups for large gene group setsHelps to assess the significance of experimentally derived gene setsRefine gene groups to find more functionally relevant groupsExisting algorithms can make use of this information in finding gene groups21 Existing ApproachesUse of co occurrence of gene names in abstracts to create networks of related genes automaticallyUse existing vocabulary of gene functions and assigned genes to decide a functionally relevant group(Gene Ontology (GO) consortium and Munich Information Center for Protein Sequences (MIPS) )22 Statistical NLP approachUsed for annotating individual genesDetermining gene and protein interactionsAssigning keywords to genes or group of genes23 Neighbor Divergence ApproachStatistical NLP techniqueWill always be up to date if provided with a current literature base Cannot specify what the actual function is!24 Challenges in the ProblemLarge number of genes Genes have multiple functionsSome genes have been extensively studied, others recently discoveredSo the literature about genes reflects these differences25 Neighbor Divergence Intuition26 Neighbor Divergence AlgorithmRepresentation Of ArticlesIdentifying Semantic Neighbors for Corpus ArticlesScoring Articles Relative to Gene GroupCalculating a Theoretical distribution of ScoresCalculating the Difference between empirical and theoretical distribution27 ND- Article RepresentationWords in articles represented by their inverse document frequency (to reduce the impact of common words)Wi,j = 1 + (log2 (tfi,j))log2 (N/dfi) if tfi,j > 0Wi,j = 0 if tfi,j = 0 where Wi,j : weighted count of word i in document j, tfi,j : the number f times word i is in document dfi : the number of documents containing I N : the total number of documents28 ND – Identifying Semantic NeighborsFor each article, K most similar articles


View Full Document

Stanford CS 374 - Mining Medical Literature

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Mining Medical Literature
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mining Medical Literature and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mining Medical Literature 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?