Infrastructure and Methods to Support Real Time Biosurveillance Kenneth D. Mandl, MD, MPH Children’s Hospital Boston Harvard Medical School Harvard-MIT Division of Health Sciences and TechnologyHST.950J: Medical ComputingKenneth D. Mandl, MD, MPHCategory A agents z Anthrax (Bacillus anthracis)• z Botulism (Clostridium botulinum toxin) » z Plague (Yersinia pestis) » z Smallpox (Variola major) » z Tularemia (Francisella tularensis) » z Viral hemorrhagic fevers (filoviruses[e.g., Ebola, Marburg] and arenaviruses[e.g., Lassa])Natural history—Anthrax z Incubation is 1-6 days z Flu like symptoms followed in 2 days by acute phase, including breathing difficulty, shock. z Death within 24 hours of acute phase z Treatment must be initiated within 24 hours of symptomsAttack scenario—Anthrax z State sponsored terrorist attack z Release of Anthrax, NYC subway z No notification by perpetrators z 1% of the passengers exposed during rush hour will contract the disease0 Need for early detection 0 1 Gain of 2 days Detection Phase II Acute IllnessPhase I 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Disease Detection Early Detection Traditional Disease Initial Symptoms 24 48 72 96 120 144 168 Effective Treatment Incubation Period (Hours)PeriodBut . . . z Until now, there has been no real time surveillance for anydiseases z The threat of bioterrorism has focused interest on and brought funding to this problemWhere can real time information have a beneficial effect? z Diagnosis 9 Decision Support z Response 9 Coordination 9 Communication z Surveillance 9 Detection 9 MonitoringSurveillance of what? z Environment 9 Biological sensors z Citizenry 9 Health related behaviors 9 Biological markers z Patient populations 9 Patterns of health services use 9 Biological markersSyndromic surveillance z Use patterns of behavior or health care use, for early warning z Example, influenza-like illness z Really should be called “prodromic surveillance”Early implementations z Drop in surveillance 9 Paper based 9 Computer based z Automated surveillance 9 Health care data 9 “Non-traditional” data sourcesSyndromes tracked at WTC 2001 Syndromic Surveillance for Bioterrorism Following the Attacks on the World Trade Center --- New York City, 2001. MMWR. 2002;51((Special Issue)):13-15.Health care data sources z Patient demographic information z Emergency department chief complaints z International Classification of Disease (ICD) z Text-based notes z Laboratory data z Radiological reports z Physician reports (not automated) z ?new processes for data collection?“Non traditional data sources” z Pharmacy data z 911 operators z Call triage centers z School absenteeism z Animal surveillance z Agricultural dataData Integration z Technical challenges z Security issues z Political barriers z Privacy concernsData Issues z Data often collected for other purposes z Data formats are nonstandard z Data may not be available in a timely fashion z Syndrome definitions may be problematicData quality z Data often collected for other purposes 9 What do the data represent? 9 Who is entering them? 9 When are they entered? 9 How are they entered? Electronic vs. paperMeasured quality/value of data CC: all resp ICD: upper resp ICD: lower resp CC or ICD: all resp sens [95% CI] .49 [.40-.58] .67 [.57-.76] .96 [.80-.99] .76 [.68-.83] spec [95% CI] .98 [.95-.99] .99 [.97-.99] .99 [.98-.99] .98 [.95-.99]Syndrome definition z May be imprecise z Sensitivity/Specificity tradeoff z Expert guided vs. machine-guided?Modeling the Data z Establishing baseline z Developing forecasting methods z Detecting temporal signal z Detecting spatial signalBaseline z Are data available to establish baseline? 9 Periodic variations )Day )Month )Season )Year )Special days 9 Variations in patient locations )Secular trends in population )Shifting referral patterns )Seasonal effectsBoston data z Syndromic surveillance z Influenza like illness z Time and spaceForecastingComponents of ED volume RESP SICK GI PAIN INJURY SKIN OTHERForecastingPrincipal Fourier component analysis 1 week .5 week 1 year 1/3 yearARIMA modelingForecasting performance • Overall ED Volume – Average Visits: 137 – ARMA(1,2) Model – Average Error: 7.8%ForecastingForecasting performance •Respiratory ED Volume– Average Visits: 17 – ARMA(1,1) Model – Average Error: 20.5%GISSeasonal distributionsA curve fit to the cumulative distributionA simulated outbreakThe cluster 14 12 10 P e 8r c e n 6 t 4 2 0 0 6 12 18 24 30 36 42 48 54 60 66 72 78 distance Curve: Beta (Theta=-.02 Scale=95.5 a=1.44 b=5.57)Major issues z Will this work at all??? z Can we get better data? z How do we tune for a particular attack? z What to do without training data? z What do we do with all the information? z How do we set alarm thresholds? z How do we protect patient privacy?Will this work at all? z A syndromic surveillance system operating in the metro DC area failed to pick up the 2001 anthrax mailings z Is syndromic surveillance therefore a worthless technology? z Need to consider the parameters of what will be detectable z Do not ignore the monitoring roleGetting better data z Approaches to standardizing data collection 9 DEEDS 9 Frontlines of Medicine project 9 National Disease Epidemiologic Surveillance System, NEDSSTuning for a particular attack z Attacks may have different “shapes” in the data z Different methods may be more well suited to detect each particular shape z If we use multiple methods at once, how do we deal with multiple testing?Will this work at all? z A syndromic surveillance system operating in the metro DC area failed to pick up the 2001 anthrax mailings z Is syndromic surveillance therefore a worthless technology? z Need to consider the parameters of what will be detectable z Do not ignore the monitoring roleGetting better data z Approaches to standardizing data collection 9 DEEDS 9 Frontlines of Medicine project 9 National Disease Epidemiologic Surveillance System, NEDSSNo training data z Need to rely on simulation z Imprint an attack onto our data set, taking in to account regional peculiarities 9 Artificial signal on probabilistic noise 9 Artificial signal on real noise 9 Real signal (from different data) on real noiseWhat do we do with all of this information? z Signals from same data using multiple methods? z Signals from overlapping geographical regions? z Signals from remote geographical regions? 9 Note: This highlights the important issue of
View Full Document