DOC PREVIEW
UT CS 380S - Privacy Preserving Data Mining

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Privacy-Preserving Data MiningReading AssignmentInput PerturbationOutput PerturbationConcepts of PrivacyKullback-Leibler DistancePrivacy of Input PerturbationInput Perturbation ExamplePrivacy DefinitionsExampleSome PropertiesPrivacy BreachesPrivacy Breach: DefinitionsTransition ProbabilitiesAmplificationAmplification: ExampleOutput Perturbation ReduxFormally…Privacy Definition RevisitedLimits of Output PerturbationThe SuLQ AlgorithmComputing with SuLQk-Means ClusteringComputing k-Means with SuLQID3 ClassifiersComputing ID3 with SuLQslide 1Vitaly ShmatikovCS 380SPrivacy-Preserving Data Miningslide 2Reading AssignmentEvfimievski, Gehrke, Srikant. “Limiting Privacy Breaches in Privacy-Preserving Data Mining” (PODS 2003).Blum, Dwork, McSherry, and Nissim. “Practical Privacy: The SuLQ Framework” (PODS 2005).slide 3Input Perturbationx1…xnReveal entire database, but randomize entriesDatabasex1+1…xn+nAdd random noise i to each database entry xi For example, if distribution of noise hasmean 0, user can compute average of xiUserslide 4Output PerturbationRandomize response to each query(S, f)Set of rows Function on rows i f(xi)True response x1…xnDatabaseUserAdd random noise  to the true response + slide 5Concepts of PrivacyWeak: no single database entry has been revealedStronger: no single piece of information is revealed (what’s the difference from the “weak” version?)Strongest: the adversary’s beliefs about the data have not changedslide 6Kullback-Leibler DistanceMeasures the “difference” between two probability distributionsslide 7Privacy of Input PerturbationX is a random variable, R is the randomization operator, Y=R(X) is the perturbed databaseNaïve: measure mutual information between original and randomized databases•Average KL distance between (1) distribution of X and (2) distribution of X conditioned on Y=y•Ey (KL(PX|Y=y || Px))–Intuition: if this distance is small, then Y leaks little information about actual values of XWhy is this definition problematic?slide 8Input Perturbation ExampleGladys: 85Doris: 90Beryl: 82Name: AgedatabaseGladys: 72Doris: 110Beryl: 85Age is an integer between 0 and 90Randomize database entries by adding random integers between -20 and 20Randomization operatorhas to be public (why?)Doris’s age is 90!!slide 9Privacy DefinitionsMutual information can be small on average, but an individual randomized value can still leak a lot of information about the original valueBetter: consider some property Q(x)•Adversary has a priori probability Pi that Q(xi) is true Privacy breach if revealing yi=R(xi) significantly changes adversary’s probability that Q(xi) is true•Intuition: adversary learned something about entry xi (namely, likelihood of property Q holding for this entry)slide 10ExampleData: 0x1000, p(x=0)=0.01, p(x0)=0.00099Reveal y=R(x) Three possible randomization operators R•R1(x) = x with prob. 20%; uniform with prob. 80%•R2(x) = x+ mod 1001,  uniform in [-100,100]•R3(x) = R2(x) with prob. 50%, uniform with prob. 50%Which randomization operator is better?slide 11Some PropertiesQ1(x): x=0; Q2(x): x{200, ..., 800}What are the a priori probabilities for a given x that these properties hold?•Q1(x): 1%, Q2(x): 40.5%Now suppose adversary learned that y=R(x)=0. What are probabilities of Q1(x) and Q2(x)?•If R = R1 then Q1(x): 71.6%, Q2(x): 83%•If R = R2 then Q1(x): 4.8%, Q2(x): 100%•If R = R3 then Q1(x): 2.9%, Q2(x): 70.8%slide 12Privacy BreachesR1(x) leaks information about property Q1(x) •Before seeing R1(x), adversary thinks that probability of x=0 is only 1%, but after noticing that R1(x)=0, the probability that x=0 is 72%R2(x) leaks information about property Q2(x)•Before seeing R2(x), adversary thinks that probability of x{200, ..., 800} is 41%, but after noticing that R2(x)=0, the probability that x{200, ..., 800} is 100%Randomization operator should be such that posterior distribution is close to the prior distribution for any propertyslide 13Privacy Breach: DefinitionsQ(x) is some property, 1, 2 are probabilities1“very unlikely”, 2“very likely”Straight privacy breach: P(Q(x))  1, but P(Q(x) | R(x)=y)  2 •Q(x) is unlikely a priori, but likely after seeing randomized value of xInverse privacy breach: P(Q(x))  2, but P(Q(x) | R(x)=y)  1•Q(x) is likely a priori, but unlikely after seeing randomized value of x[Evfimievski et al.]slide 14Transition ProbabilitiesHow to ensure that randomization operator hides every property?•There are 2|X| properties•Often randomization operator has to be selected even before distribution Px is known (why?)Idea: look at operator’s transition probabilities•How likely is xi to be mapped to a given y?•Intuition: if all possible values of xi are equally likely to be randomized to a given y, then revealing y=R(xi) will not reveal much about actual value of xislide 15AmplificationRandomization operator is -amplifying for y ifFor given 1, 2, no straight or inverse privacy breaches occur if y)p(xy)p(x :V x ,x 21x21 ) -(1) -(12112[Evfimievski et al.]slide 16Amplification: ExampleFor example, for randomization operator R3, p(xy) = ½ (1/201 + 1/1001) if y[x-100,x+100] = 1/2002 otherwiseFractional difference = 1 + 1001/201 < 6 (= )Therefore, no straight or inverse privacy breaches will occur with 1=14%, 2=50%slide 17Output Perturbation ReduxRandomize response to each query(S, f)Set of rows Function on rows i f(xi)True response x1…xnDatabaseUserAdd random noise  to the true response + slide 18Formally…Database is n-tuple D = (d1, d2 … dn)•Elements are not random; adversary may have a priori beliefs about their distribution or specific valuesFor any predicate f: D  {0,1}, pi,f(n) is the probability that f(di)=1, given the answers to n queries as well as all other entries dj for ji•pi,f(0)=a priori belief, pi,f(t)=belief after t answers•Why is adversary given all entries except di?conf(p) = log p / (1–p)•From raw probability to “belief”slide 19Privacy Definition RevisitedIdea: after each query, adversary’s gain in knowledge about any individual database entry should be small•Gain in knowledge about di as the result of (n+1)st


View Full Document

UT CS 380S - Privacy Preserving Data Mining

Download Privacy Preserving Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Privacy Preserving Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Privacy Preserving Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?