DOC PREVIEW
Uniqueness of Medical Data Mining

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

To appear in Artificial Intelligence in Medicine journal, 2002 Uniqueness of Medical Data Mining Krzysztof J. Cios1,2,3,4 and G. William Moore5,6,7 1University of Colorado at Denver; 2University of Colorado at Boulder; 3University of Colorado Health Sciences Center; Denver, CO, 44cData LLC, Golden, CO; 5Baltimore Veterans Affairs Medical Center, Baltimore, MD; 6University of Maryland School of Medicine, Baltimore, MD; 7The Johns Hopkins University School of Medicine, Baltimore, MD Keywords: medical data mining; unique features of medical data mining and knowledge discovery; ethical, security and legal aspects of medical data mining 0 Introduction This article emphasizes the uniqueness of medical data mining. This is a position paper, in which the authors' intent, based on their medical and data mining experience, is to alert the data mining community to the unique features of medical data mining. The reason for writing the paper is that researchers who perform data mining in other fields may not be aware of the constraints and difficulties of mining the privacy-sensitive, heterogeneous data of medicine. We discuss ethical, security and legal aspects of medical data mining. In addition, we pose several questions that must be answered by the community, so that both the patients on whom the data are collected, as well as the data miners, can benefit. Human medical data are at once the most rewarding and difficult of all biological data to mine and analyze. Humans are the most closely-watched species on earth. Human subjects can provide observations that cannot easily be gained from animal studies, such as visual and auditory sensations, the perception of pain, discomfort, hallucinations, and recollection of possibly relevant prior traumas and exposures. Most animal studies are short-term, and therefore cannot track long-term disease processes of medical interest, such as preneoplasia or atherosclerosis. With human data, there is no issue of having to extrapolate animal observations to the human species. Some three-quarter billions of persons living in North America, Europe, and Asia have at least some of their medical information collected in electronic form, at least transiently. These subjects generate volumes of data that an animal experimentalist can only dream of. On the other hand, there are ethical, legal, and social constraints on data collection and distribution, that do not apply to non-human species, and that limit the scientific conclusions that may be drawn. The major points of uniqueness of medical data may be organized under four general headings: • Heterogeneity of medical data • Ethical, legal, and social issues • Statistical philosophy • Special status of medicine 1 Heterogeneity of medical data Raw medical data are voluminous and heterogeneous. Medical data may be collected from various images, interviews with the patient, laboratory data, and the physician’s observations and interpretations. All these components may bear upon the diagnosis, prognosis, and treatment of the patient, and cannot be ignored. The major areas of heterogeneity of medical data may be organized under these headings: • Volume and complexity of medical data • Physician’s interpretation • Sensitivity and specificity analysis • Poor mathematical characterization • Canonical formTo appear in Artificial Intelligence in Medicine journal, 2002 1.1 Volume and complexity of medical data Raw medical data are voluminous and heterogeneous. Medical data may be collected from various images, interviews with the patient, and physician’s notes and interpretations. All these data-elements may bear upon the diagnosis, prognosis, and treatment of the patient, and must be taken into account in data mining research. More and more medical procedures employ imaging as a preferred diagnostic tool. Thus, there is a need to develop methods for efficient mining in databases of images, which are more difficult than mining in purely numerical databases. As an example, imaging techniques like SPECT, MRI, PET, and collection of ECG or EEG signals, can generate gigabytes of data per day. A single cardiac SPECT procedure on one patient may contain dozens of two-dimensional images. In addition, an image of the patient’s organ will almost always be accompanied by other clinical information, as well as the physician’s interpretation (clinical impression, diagnosis). This heterogeneity requires high capacity data storage devices and new tools to analyze such data. It is obviously very difficult for an unaided human to process gigabytes of records, although dealing with images is relatively easier for humans because we are able to recognize patterns, grasp basic trends in data, and formulate rational decisions. The stored information becomes less useful if it is not available in an easily comprehensible format. Visualization techniques will play an increasing role in this setting, since images are the easiest for humans to comprehend, and they can provide a great deal of information in a single snapshot of the results. 1.2 Importance of physician’s interpretation The physician’s interpretation of images, signals, or any other clinical data, is written in unstructured free-text English, that is very difficult to standardize and thus difficult to mine. Even specialists from the same discipline cannot agree on unambiguous terms to be used in describing a patient’s condition. Not only do they use different names (synonyms) to describe the same disease, but they render the task even more daunting by using different grammatical constructions to describe relationships among medical entities. It has been suggested that computer translation may hold part of the solution for processing the physician’s interpretation (Manning and Schuetze, 2000; Ceusters, 2000). Principles of computer translation may be summarized as follows (Nagao, 1992): • “Machine translation is typically composed of the following three steps: analysis of a source language sentence; transfer ... from one language to another; and generation of a target language sentence. • “Natural language can be regarded as a huge set of exceptional expressions ... as many expressions as possible must be collected in the dictionary ... It is an endless job . • “One of the difficulties of translation ... is that the translation of an input sentence is not unique (see 1.5 Canonical form). • “Current


Uniqueness of Medical Data Mining

Download Uniqueness of Medical Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Uniqueness of Medical Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Uniqueness of Medical Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?