Stanford CS 374 - Mining Medical Literature - D2643111

Home> Schools> Stanford University> Computer Science (CS) > CS 374> Mining Medical Literature

DOC PREVIEW

Stanford CS 374 - Mining Medical Literature

School name Stanford University

Course Cs 374- Algorithms in Biology

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Mining Medical Literature CS374 Fall 2004 Lecture 6, 10/14/04Lecturer: Chirag Bhatt Scribe: Samuel Pearlman1Mining Medical LiteratureBased on the following papers:1. Raychaudhuri S, Schutze H, Altman RB. “Using Text Analysis to Identify Functionally Coherent GeneGroups”, Genome Research, 12:1582-1590, 2002.2. Yu H, Agichtein E. “Extracting synonymous gene and protein terms from biological literature”,Bioinforrmatics, 19(1):i340-i349, 2003.1 MotivationThis lecture introduced the background of what it means to “mine” medical literature, why one would doit, and different paradigms and methods of carrying out the data mining of medical literature. Additionally,the related topic of finding synonyms in medical literature is explored, shedding more light on howliterature can be mined for underlying meaning.2 Basics of Mining Medical LiteratureTo understand the principles of mining the literature, we must first understand the goals of such anendeavor. The general categories of ‘medical, genomics, and proteomics research’, ‘Finding causal linksbetween symptoms or diseases and drugs or chemicals’, and ‘Gene Comparison’ were given. These andother related data mining goals can be split up into two basic categories: Search vs. Discover.2.1 Search vs. DiscoverA fairly clear distinction can be drawn between mining goals that fall under the ‘search’ category, andthose that fall under ‘discover.’ Search can be described as a goal-oriented approach: you have an idea ofwhat you are looking for -- perhaps all genes related to a particular function – and wish to mine theavailable data sources for relevant information (the actual methods of mining will be explored later).Discover is a more open-ended approach, where, as the name implies, you embark on a journey where youexamine a large amount of data or text to look for patterns, implied relationships that have not yet beenexplored, or other correlations that were not previously guessed at.The data being mined can be broadly categorized as being either structured or unstructured, though thereare certainly shades of gray in this area. Structured data is in a form where it has been broken down intocomponent parts in a differentiable way, meaning that the different parts are separately searchable andrecognizable. Databases are the standard form of structured data. Unstructured data is generally in the formof text, found in papers, articles, and other scientific publications. With the advent of making mostpublications available online, it has become much easier to have access to the text of such articles in a wayin which computers can search and manipulate them.The following chart shows the actions performed by both Search and Discover methods, on structured dataas well as on unstructured data:Mining Medical Literature CS374 Fall 2004 Lecture 6, 10/14/04Lecturer: Chirag Bhatt Scribe: Samuel Pearlman2Search(goal oriented)Discover(opportunistic)Structured data(database)Data retrievalData miningUnstructuredData (text)InformationRetrievalText mining2.2 Examples of the 4 method/data form pairsTo illustrate the differences between Search and Discover, and how each is applied to both structured andunstructured data sources, the following examples were shown:Data Retrieval: Searching a company database of customer and product inventory records. This is done in order toretrieve a piece or pieces of stored information, such as “What is the address of Client A” or “How many widgets arecurrently in stock.” Examples of databases and systems supporting this type of search are SQL queries on DB2 orOracle database systems.Information Retrieval: A contemporary, visible example of this are the web search engines such as Google. Theyare goal-driven by the user’s query, but search unstructured data (text in web pages and documents in various formatssuch as HTML, PDF, etc.) for appropriate matches. How this is done will be explored further in a bit.Data Mining: Like Data Retrieval, this operates on a structured data set. However in this case, a large amount of(usually historical) data is retrieved and examined in the hopes of finding previously unrecognized patterns orrelationships. This is an opportunistic approach, and the classic example of what this can find is the “beer and diaper”case. It was found from mining the sales data from a large market chain that there were a high number of sales thatinvolved both beer and diapers. This led to placing those two items closer together in the market.Information Mining: This is the attempt to use patterns, trends, and/or domain knowledge in order to find previouslyunknown relationships and patterns – potential “gems” of information. This must all be obtained from unstructureddata sources, such as local documents, scientific publications or abstracts, or web pages in different forms.Approximately 90% of the world’s data is held in unstructured or semi-structured formats from which it is inherentlyharder to obtain relevant information. Thus, much effort has been expended to try to improve the information contentextracted from these types of sources.Mining Medical Literature CS374 Fall 2004 Lecture 6, 10/14/04Lecturer: Chirag Bhatt Scribe: Samuel Pearlman33 Medical Literature Mining and Finding Functionally Coherent GenesTo define a term used often in the exploration of this topic, “Functionally Coherent Genes” are groups ofgenes that exhibit similar experimental features. Some of these might be genes involved in amino acidmetabolism, or in various body stress responses, or the low level electron transport.Why is so much effort expanded in the text mining of medical literature? It has become plain in thescientific community that there are many multi-functional genes (genes which are involved in more thanone biological process), as well as many instances where families of genes exist that are functionallycoherent or similar. There has been a shift from studying individual genes to studying whole families ofgenes together. The complexity of the relationships that can be explored is increasing, and manuallysearching through text sources is becoming less and less feasible. However the large and increasingavailability of online document sources begs for automated methods

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Stanford CS 374 - Mining Medical Literature

Sign up for free to view:

Please select your school