I’m in the Database, But Nobody KnowsMany Threats to Privacy of Electronic DataThis Talk: Privacy-Preserving Data Analysis“Pure” Privacy ProblemTypical SuggestionsPowerPoint PresentationAOL Search History Release (2006)William Weld’s Medical Record [Sweeney02]Slide 9GWAS Membership [Homer et al. ‘08]Anonymized Social Networks [BackstromDK07]Definitional FailuresParable: How Tall is Pamela Jones (Groklaw)?Differential Privacy [Dwork-McSherry-Nissim-Smith 2006]Differential PrivacySlide 16Slide 17Slide 18Snow 1854https://h1n1.cloudapp.net/Privacy.aspxMission CreepPan-Private Streaming Algorithms [DNPRY10]DiffeP: Limitations and ChallengesUtility Implies Exposure to HarmPauseWhich Ad(s) Am I Charged For?More Subtle AttackWall Street Journal 4/4/2010Work in ProgressThank You!I’m in the Database, But Nobody KnowsCynthia Dwork, Microsoft ResearchMany Threats to Privacy of Electronic DataTheftPhishingVirusesCryptanalysisChanging Privacy Policies…This Talk: Privacy-Preserving Data Analysis“First Tier” Motivating ExamplesAnalysis of Census Data, Medical Outcomes Data, GWAS data, Epidemiology, Analysis of Vehicle Braking Records“Second Tier” ExamplesTraining an advertising classifier, Recommendation System, Netflix ChallengeCDifficult Even ifCurator is AngelData are in VaultC“Pure” Privacy ProblemTypical Suggestions“Large Set” Queries How many MSFT employees have Sickle Cell Trait (CST)?How many MSFT employees who are not female Distinguished Scientists with very curly hair have the CST?Add Random Noise to True AnswerAverage of responses to repeated queries converges to true answerCan’t simply detect repetition (undecidable)Detect When Answering is UnsafeRefusal can be disclosiveA LitanyName: Thelma ArnoldAge: 62WidowResidence: Lilburn, GAAOL Search History Release (2006)William Weld’s Medical Record [Sweeney02]ZIPbirthdatesexnameaddressdate reg.partyaffiliationlast votedethnicityvisit datediagnosisproceduremedicationtotal chargevoter registrationdata HMO dataGWAS Membership [Homer et al. ‘08]SNP: Single Nucleotide (A,C,G,T) polymorphismCTTT…………Reference PopulationMajor Allele (C): 94%Minor Allele (T): 6%Genome-Wide Association Study Allele frequencies for many thousands of SNPSAnonymized Social Networks [BackstromDK07] Magic StepIsolate lightly linked-in subgraphs from rest of graph Special structure of subgraph permits finding A, BSJABDefinitional FailuresFailure to Cope with Auxiliary InformationExisting and future databases, newspaper reports, Flikr, literature, etc.Definitions are Syntactic and Ad HocDalenius’s Ad Omnia Guarantee (1977): Anything that can be learned about a respondent from the statistical database can be learned without access to the databaseUnachievableParable: How Tall is Pamela Jones (Groklaw)?An Admittedly Unreasonable Impossibility Proof Database teaches average heights of population subgroups“PJ is 2 inches shorter than avg Swedish woman”PJ’s height learnable with the DB, not learnable without.PJ loses privacy whether or not she is in the databaseSuggests new notion of privacy: risk incurred by joining DBThe outcome of any analysis is essentially equally likely, independent of whether any individual joins or refrains from joining the dataset. (The likelihood is over the choices made by the algorithm.)Neutralizes all linkage attacks.Composes unconditionally and automatically: Σi i http://research.microsoft.com/en-us/projects/DatabasePrivacy/Differential Privacy [Dwork-McSherry-Nissim-Smith 2006]Bad Responses: X XXPr [response]ratio boundedM gives -differential privacy if for all adjacent D1 and D2, and all C µ range(M): Pr[ M (D1) 2 C] ≤ ePr[ M (D2) 2 C]Differential Privacy Resilience to All Auxiliary InformationPast, present, future data sources and algorithmsLow-error high-privacy DP techniques exist for many problemsdatamining tasks (association rules, decision trees, clustering, …), contingency tables, histograms, synthetic data sets for query logs, machine learning (boosting, statistical queries learning model, SVMs, logistic regression), various statistical estimators, network trace analysis, recommendation systems, …Programming Platformshttp://research.microsoft.com/en-us/projects/PINQ/http://userweb.cs.utexas.edu/~shmat/shmat_nsdi10.pdfDownload today!Can we store and share your information with health officials and researchers?“…This information can be very helpful in monitoring regional health conditions, plan flu response, and conduct health research. By allowing the responses to the survey questions to be used for public health, education and research purposes, you can help your community.”Snow 1854SuspectedpumpSuspectedpumpCholera casesCholera caseshttps://h1n1.cloudapp.net/Privacy.aspx“Microsoft may also disclose information if required to do so by law or in the good faith belief that such action is necessary to (a) conform to the edicts of the law or comply with legal process served on Microsoft or the Site; (b) protect and defend the rights or property of Microsoft and our family of Web sites; or (c) act in urgent circumstances to protect the personal safety of users of Microsoft products or members of the public.”Mission Creep?C“Think of the children!”Never store the data!Pan-Private Streaming Algorithms [DNPRY10]Private “inside and out” Completely hide the pattern of appearances of any individualPresence, absence, frequency, etc.Protect against mission creep, subpoena, intrusionDiffeP: Limitations and ChallengesCan’t study outliersPrivacy erosion over multiple analyses is cumulativePrivacy erosion over multiple databases is cumulativeCompare real world to one in which my info is everywhere deleted, looking at a lifetime of exposure against worst-case adversary/information/collection of databasesFormally capture “reasonable” worlds?What are the right questions to ask about social networks?Removing one person can affect data of many other peopleUtility Implies Exposure to HarmDatabase teaches that smoking causes cancer. Smoker S’s insurance premiums rise. But learning that smoking causes cancer is the whole point.Smoker S enrolls in a smoking cessation program.May be fine for “First-Tier” Uses, but what about “Second Tier”?Who decides, and
View Full Document