UT CS 395T - Privacy Preserving Data Mining - D1976525

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Privacy Preserving Data Mining

DOC PREVIEW

UT CS 395T - Privacy Preserving Data Mining

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Privacy Preserving Data MiningMining Joint DatabasesUnnecessary InformationSimulationMore Simulation DetailsThe Semi-Honest ModelGeneral Secure Two Party ComputationYao’s Protocol (Basically)Garbled Wire ValuesGarbled GatesYao’s ProtocolCost of circuit with n inputs and m gatesClassification by Decision Tree LearningA Database…… and its Decision TreeThe ID3 Algorithm: DefinitionsThe ID3 AlgorithmThe Best Predicting AttributeWhy can we do better than Yao?How do we do it?Private x ln xPrivate x ln x: some intuitionUsing the x ln x protocolShares of Relative EntropyA Technical DetailComplexity for each nodeConclusionPrivacy Preserving Data MiningYehuda LindellBenny PinkasPresenter: Justin BrickellMining Joint Databases• Parties P1and P2own databases D1and D2• f is a data mining algorithm• Compute f(D1∪ D2) without revealing“unnecessary information”Unnecessary Information• Intuitively, the protocol should function as if a trusted third party computed the outputP1P2TTPD1D2f(D1 ∪D2)f(D1 ∪D2)Simulation• Let msg(P2)beP2’s messages•IfS1can simulate msg(P2) to P1given only P1’s input and the protocol output, then msg(P2) must not contain unnecessary information (and vice-versa)• S1(D1,f(D1,D2)) =Cmsg(P2)More Simulation Details• The simulator S1can also recover r1, the internal coin tosses of P1• Can extend to allow distinct f1(x,y) and f2(x,y)– Complicates the definition– Not necessary for data mining applicationsThe Semi-Honest Model•A malicious adversary can alter his input– f(Ø ∪ D2) = f(D2) !• A semi-honest adversary– adheres to protocol– tries to learn extra information from the message transcriptGeneral Secure Two Party Computation• Any algorithm can be made private (in the semi-honest model)– Yao’s Protocol• So, why write this paper?– Yao’s Protocol is inefficient– This paper privately computes a particular algorithm more efficientlyYao’s Protocol (Basically)• Convert the algorithm to a circuit• P1hard codes his input into the circuit• P1transforms each gate so that it takes garbled inputs to garbled outputs• Using 1-out-of-2 oblivious transfer, P1sends P2garbled versions of his inputsGarbled Wire Values• P1assigns to each wire i two random values (Wi0,Wi1)– Long enough to seed a pseudo-random function F• P1assigns to each wire i a random permutation over {0,1}, πi: bi→ ci• 〈Wibi,ci〉 is the ‘garbled value’ of wire iGarbled Gates• Gate g computes bk= g(bi,bj)• Garbled gate is a table Tgcomputing〈Wibi,ci〉〈Wjbj,cj〉→〈Wkbk,ck〉– Tghas four entries:– ci,cj: 〈Wkg(bi,bj),ck〉⊕F [Wibi](cj) ⊕ F [Wjbj](ci)Yao’s Protocol• P1sends– P2’s garbled input bits (1-out-of-2)– Tgtables– Table from garbled output values to output bits• P2can compute output values, but P1’s input and intermediate values appear randomCost of circuit with n inputs and m gates• Communication: m gate tables–4m · length of pseudo-random output• Computation: n oblivious transfers– Typically much more expensive than the mpseudo-random function applications• Too expensive for data miningClassification by Decision Tree Learning• A classic machine learning / data mining problem• Develop rules for when a transactionbelongs to a class based on its attribute values• Smaller decision trees are better• ID3 is one particular algorithmA Database…Outlook Temp Humidity Wind Play TennisSunny Hot High Weak NoSunny Hot High Strong NoOvercast Mild High Weak YesRain Mild High Weak YesRain Cool Normal Weak YesRain Cool Normal Strong NoOvercast Cool Normal Strong YesSunny Mild High Weak NoSunny Cool Normal Weak YesRain Mild Normal Weak YesSunny Mild Normal Strong YesOvercast Mild High Strong YesOvercast Hot Normal Weak YesRain Mild High Strong No… and its Decision TreeOutlookHumidity WindYesSunnyRainOvercastYes YesNoNoHigh Normal Strong WeakThe ID3 Algorithm: Definitions• R: The set of attributes– Outlook, Temperature, Humidity, Wind• C: the class attribute– Play Tennis• T: the set of transactions– The 14 database entriesThe ID3 AlgorithmID3(R,C,T)•If R is empty, return a leaf-node with the most common class value in T• If all transactions in T have the same class value c, return the leaf-node c• Otherwise,– Determine the attribute A that best classifies T– Create a tree node labeled A, recur to compute child trees – edge aigoes to tree ID3(R -{A},C,T(ai))The Best Predicting Attribute• Entropy!••• Gain(A) =defHC(T) - HC(T|A) • Find A with maximum gainHCT()=−Tci()TlogTci()Ti= il∑HCT | A()=Taj()THCTaj()()j= im∑Why can we do better than Yao?• Normally, private protocols must hide intermediate values• In this protocol, the assignment of attributes to nodes is part of the outputand may be revealed– H values are not revealed, just the identity of the attribute with greatest gain• This allows genuine recursionHow do we do it?• Rather than maximize gain, minimize– H’C(T|A) =defHC(T|A)·|T|·ln 2• This has the simple formula• Terms have form (v1+v2)·ln(v1+v2)– P1knows v1, P2knows v2ˆ H CT | A()= Taj,ci()⋅ ln Taj,ci()()i= il∑j=1m∑+ Taj()j=1m∑⋅ ln Taj()()Private x ln x• Input: P1’s value v1, P2’s value v2• Auxiliary Input: A large field F• Output: P1obtains w1∈ F, P2obtains w2∈ F– w1+ w2≈ (v1+ v2)·ln(v1+v2)– w1and w2are uniformly distributed in FPrivate x ln x: some intuition• Compute shares of x and ln x, then privately multiply• Shares of ln x are actually shares of nand ε where x = 2n(1+ε)– -1/2 ≤ε≤1/2– Uses Taylor expansionsUsing the x ln x protocol• For every attribute A, every attribute-value aj ∈ A, and every class ci ∈ C– wA,1(aj), wA,2(aj), wA,1(aj,ci), wA,2(aj,ci)– wA,1(aj) + wA,2(aj) ≈ |T(aj)|·ln(|T(aj)|– wA,1(aj,ci) + wA,2(aj,ci) ≈|T(aj,ci)|·ln(|T(aj,ci)|Shares of Relative Entropy• P1and P2can locally compute shares SA,1+ SA,2≈ H’C(T|A) • Now, use the Yao protocol to find the Awith minimum Relative Entropy!A Technical Detail• The logarithms are only approximate–ID3δalgorithm– Doesn’t distinguish relative entropies within δComplexity for each node•For |R| attributes, m attribute values, and lclass values– x ln x protocol is invoked O(m · l ·|R|) times– Each requires O(log|T|) oblivious transfers– And bandwidth O(k ·log|T| · |S|) bits• k depends logarithmically on δ• Depends only logarithmically on |T|•Only k·|S| worse

View Full Document

UT CS 395T - Privacy Preserving Data Mining

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-25-26-27 out of 27 pages.

UT CS 395T - Privacy Preserving Data Mining

Sign up for free to view:

Please select your school