Table of FiguresIntroductionReidentification theoryKey Terms and ConceptsGrowth of Public DataPrivacy ConcernsAccess PoliciesUsefulness of DataDeidentificationReidentificationReasons for ReidentificationDatabase Selection CriteriaChicago Homicide DataStructureStatisticsSSDIStructureStatisticsJoining the DatabasesInitial ApproachRevised ApproachTechnical SpecificationsToolsValidation of MatchesAnonymizing the Chicago Homicide Data SetOther Deidentified Data SetsAIDS PatientsOutpatient DataMalpracticeChicago RobberiesJuvenile Court RecordsOther Control Data SetsVoting RecordsBirth/Death/Marriage/Divorce RecordsLegal AnalysisAre we breaking the law?What if we tried to use the information?As a company, would this be breaking the law?As the government, would this be breaking the law?Looking at US LawsPrivacy Act vs. FOIA conflictSouthern Illinoisan vs. DPHPrivacy ActCriminal ProtectionsMedical ProtectionsProposed Legislation: MedicalA German Reidentification LawLegal RecommendationsTechnical RecommendationsSuggestions for Further WorkConclusionReferences[ACI] Appellate Court Of Illinois, "The Southern Illinoisan v. the Department of Public Health", No. 5-99-0568, 2001 Mar 28, [cited 2001 May 13], Available HTTP:[BS] Buettner, Russ, and Sherman, William, "The 15 Most Sued Doctors in New York", New York Daily News, 2000 Mar 05, [cited 2001 May 10], Available HTTP: http://www.nydailynews.com/2000-03-05/News_and_Views/City_Beat/a-58900.asp[CFR1] Code of Federal Regulations, "Confidentiality of Identifiable Research and Statistical Information", [cited 2001 May 15], Available HTTP: http://www.access.gpo.gov/nara/cfr/waisidx_00/28cfr22_00.html[CFR2] Code of Federal Regulations, "Bureau of Prisons, Access to Records", [cited 2001 May 15], Available HTTP: http://www.access.gpo.gov/nara/cfr/waisidx_00/28cfr513_00.html[CFR3] Code of Federal Regulations, "Department of Health and Human Services", [cited 2001 May 15], Available HTTP: http://www.access.gpo.gov/nara/cfr/waisidx_00/45cfrv1_00.html[FSOG] Federal Statistical Office Germany, "Law on Statistics for Federal Purposes", 1987 Jan 22, [cited 2001 May 13], Available HTTP:[IORP] The Internet Open Records Project, "Dallas County Voting Records", [cited 2001 May 13], Available HTTP: http://www.openrecords.org/records/voting/dallas_voting/[OLRC] U.S. House of Representatives, "Office of the Law Revision Counsel", [cited 2001 May 13], Available HTTP: http://law2.house.gov/[SSO] Social Security Online, "Frequently Asked Questions", [cited 2001 May 10], Available HTTP: http://ssa-custhelp.ssa.gov[SWE1] Sweeney, Latanya, "Computational Disclosure Control: A Primer on Data Privacy Protection", [cited 2001 May 13], Available HTTP: http://www.swiss.ai.mit.edu/classes/6.805/articles/privacy/sweeney-thesis-draft.pdf[SWE2] Sweeney, Latanya, "Lecture 2: Data Explosion", [cited 2001 May 13], Available HTTP:[SWE3] Sweeney, Latanya, "Lecture 3: Simple Demographics Identify People Uniquely", [cited 2001 May 13], Available HTTP: http://sos.heinz.cmu.edu/dataprivacy/courses/dp1/lectures/lecture3paper.pdf[TDH] Texas Department of Health, "Information on Requests for Indexes", [cited 2001 May 13], Available HTTP: http://www.tdh.state.tx.us/bvs/registra/INDEX.HTMAcknowledgementsAppendix A: Obtaining the SSDI RecordsAppendix B: SQL QueriesQuery 1:Query 2:Query 3:Query 4:Reidentification of Individuals in Chicago's Homicide DatabaseA Technical and Legal StudySalvador Ochoa Jamie Rasmussen Christine Robson Michael SalibCollective address: [email protected] government agencies, hospitals, and other organizations collect personal data of a sensitivenature. Often, these groups would like to release their data for statistical analysis by thescientific community, but do not want to cause the subjects of the data embarrassment orharassment. To resolve this conflict between privacy and progress, data is often deidentifiedbefore publication. In short, personally identifying information such as names, home addresses,and social security numbers are stripped from the data. We analyzed one such deidentified dataset containing information about Chicago homicide victims over a span of three decades. Bycomparing the records in the Chicago data set with records in the Social Security Death Index,we were able to associate names with, or reidentify, 35% of the victims. This study details thereidentification method and results, and includes a legal review of U.S. regulations related toreidentification. Based on the findings of our project, we recommend removal of these databasesfrom their online locations, and the establishment of national deidentification regulations. 1/13/2019 Page 1 of 51Table of ContentsTable of Figures...............................................................................................................................3Introduction......................................................................................................................................4Reidentification theory....................................................................................................................4Key Terms and Concepts.............................................................................................................4Growth of Public Data.................................................................................................................6Privacy Concerns.........................................................................................................................7Access Policies............................................................................................................................8Usefulness of Data.......................................................................................................................8Deidentification...........................................................................................................................9Reidentification..........................................................................................................................10Reasons for Reidentification......................................................................................................12Database Selection
View Full Document