Automatic Code Assignment to Medical Text

Unformatted text preview:

Automatic Code Assignment to Medical Text Koby Crammer and Mark Dredze and Kuzman Ganchev and Partha Pratim Talukdar Department of Computer and Information Science University of Pennsylvania Philadelphia PA crammer mdredze kuzman partha seas upenn edu Steven Carroll Division of Oncology The Children s Hospital of Philadelphia Philadelphia PA carroll genome chop edu Abstract Code assignment is important for handling large amounts of electronic medical data in the modern hospital However only expert annotators with extensive training can assign codes We present a system for the assignment of ICD 9 CM clinical codes to free text radiology reports Our system assigns a code configuration predicting one or more codes for each document We combine three coding systems into a single learning system for higher accuracy We compare our system on a real world medical dataset with both human annotators and other automated systems achieving nearly the maximum score on the Computational Medicine Center s challenge 1 Introduction The modern hospital generates tremendous amounts of data medical records lab reports doctor notes and numerous other sources of information As hospitals move towards fully electronic record keeping the volume of this data only increases While many medical systems encourage the use of structured information including assigning standardized codes most medical data and often times the most important information is stored as unstructured text This daunting amount of medical text creates exciting opportunities for applications of learning methods such as search document classification data mining information extraction and relation extraction Shortliffe and Cimino 2006 These ap plications have the potential for considerable benefit to the medical community as they can leverage information collected by hospitals and provide incentives for electronic record storage Much of the data generated by medical personnel is unused past the clinical visit often times because there is no way to simply and quickly apply the wealth of information Medical NLP holds the promise of both greater care for individual patients and enhanced knowledge about health care In this work we explore the assignment of ICD 9CM codes to clinical reports We focus on this practical problem since it is representative of the type of task faced by medical personnel on a daily basis Many hospitals organize and code documents for later retrieval using different coding standards Often times these standards are extremely complex and only trained expert coders can properly perform the task making the process of coding documents both expensive and unreliable since a coder must select from thousands of codes a small number for a given report An accurate automated system would reduce costs simplify the task for coders and create a greater consensus and standardization of hospital data This paper addresses some of the challenges associated with ICD 9 CM code assignment to clinical free text as well as general issues facing applications of NLP to medical text We present our automated system for code assignment developed for the Computational Medicine Center s challenge Our approach uses several classification systems each with the goal of predicting the exact code configuration for a medical report We then use a learning system to combine our predictions for superior performance This paper is organized as follows First we explain our task and difficulties in detail Next we describe our three automated systems and features We combine the three approaches to create a single superior system We evaluate our system on clinical reports and show accuracy approaching human performance and the challenge s best score 2 Task Overview The health care system employs a large number of categorization and classification systems to assist data management for a variety of tasks including patient care record storage and retrieval statistical analysis insurance and billing One of these systems is the International Classification of Diseases Ninth Revision Clinical Modification ICD 9 CM which is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States 1 The coding system is based on World Health Organization guidelines An ICD 9 CM code indicates a classification of a disease symptom procedure injury or information from the personal history Codes are organized hierarchically where top level entries are general groupings e g diseases of the respiratory system and bottom level codes indicate specific symptoms or diseases and their location e g pneumonia in aspergillosis Each specific low level code consists of 4 or 5 digits with a decimal after the third Higher level codes typically include only 3 digits Overall there are thousands of codes that cover a broad range of medical conditions Codes are assigned to medical reports by doctors nurses and other trained experts based on complex coding guidelines National Center for Health Statistics 2006 A particular medical report can be assigned any number of relevant codes For example if a patient exhibits a cough fever and wheezing all three codes should be assigned In addition to finding appropriate codes for each condition complex rules guide code assignment For example a diagnosis code should always be assigned if a diagnosis is reached a diagnosis code should never be assigned when the diagnosis is unclear a symptom should never be assigned when a diagnosis is present and the most specific code is preferred This means that codes that seem appropriate to a report should be omitted in specific cases For example a patient with hallucinations should be coded 780 1 hallucinations but for visual hallucinations the correct code is 368 16 The large number of codes and complexity of assignment rules make this a difficult problem for humans inter annotator agreement is low Therefore an automated system that suggested or assigned codes could make medical data more consistent These complexities make the problem difficult for NLP systems Consider the task as multi class multi label For a given document many codes may seem appropriate but it may not be clear to the algorithm how many to assign Furthermore the codes are not independent and different labels can interact to either increase or decrease the likelihood of the other Consider a report that says patient reports cough and fever The presence of the words cough and fever indicate codes 786


Automatic Code Assignment to Medical Text

Loading Unlocking...
Login

Join to view Automatic Code Assignment to Medical Text and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automatic Code Assignment to Medical Text and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?