PrivacyWhat is PrivacySome Privacy concernsData Mining as a Threat to PrivacySome Privacy Problems and Potential SolutionsPrivacy Constraint ProcessingArchitecture for Privacy Constraint ProcessingSemantic Model for Privacy ControlPrivacy Preserving Data MiningCryptographic Approaches for Privacy Preserving Data MiningCryptographic Approaches for Privacy Preserving Data MiningPerturbation Based Approaches for Privacy Preserving Data MiningSlide 13Perturbation Based Approaches for Privacy Preserving Data MiningSlide 15CPT: Confidentiality, Privacy and TrustPlatform for Privacy Preferences (P3P): What is it?Platform for Privacy Preferences (P3P): OrganizationsPlatform for Privacy Preferences (P3P): SpecificationsP3P and Legal IssuesPrivacy for Assured Information SharingPrivacy Preserving SurveillanceDirections: Foundations of Privacy Preserving Data MiningDirections: Testbed Development and Application ScenariosKey PointsApplication Specific Privacy?Data Mining and Privacy: Friends or Foes?PrivacyProf. Bhavani ThuraisinghamThe University of Texas at DallasJuly 2011What is PrivacyMedical Community-Privacy is about a patient determining what patient/medical information the doctor should be released about him/herFinancial community-A bank customer determine what financial information the bank should release about him/herGovernment community-FBI would collect information about US citizens. However FBI determines what information about a US citizen it can release to say the CIASome Privacy concernsMedical and Healthcare-Employers, marketers, or others knowing of private medical concernsSecurity-Allowing access to individual’s travel and spending data-Allowing access to web surfing behaviorMarketing, Sales, and Finance-Allowing access to individual’s purchasesData Mining as a Threat to PrivacyData mining gives us “facts” that are not obvious to human analysts of the dataCan general trends across individuals be determined without revealing information about individuals?Possible threats:-Combine collections of data and infer information that is private Disease information from prescription dataMilitary Action from Pizza delivery to pentagonNeed to protect the associations and correlations between the data that are sensitive or privateSome Privacy Problems and Potential SolutionsProblem: Privacy violations that result due to data mining-Potential solution: Privacy-preserving data miningProblem: Privacy violations that result due to the Inference problem-Inference is the process of deducing sensitive information from the legitimate responses received to user queries-Potential solution: Privacy Constraint ProcessingProblem: Privacy violations due to un-encrypted data-Potential solution: Encryption at different levelsProblem: Privacy violation due to poor system design-Potential solution: Develop methodology for designing privacy-enhanced systemsPrivacy Constraint ProcessingPrivacy constraints processing-Based on prior research in security constraint processing -Simple Constraint: an attribute of a document is private-Content-based constraint: If document contains information about X, then it is private-Association-based Constraint: Two or more documents taken together is private; individually each document is public-Release constraint: After X is released Y becomes privateAugment a database system with a privacy controller for constraint processingArchitecture for Privacy Constraint ProcessingUser Interface ManagerConstraintManagerPrivacy ConstraintsQuery Processor:Constraints during query and release operationsUpdate Processor:Constraints during update operationDatabase Design ToolConstraints during database design operationDatabaseDBMSSemantic Model for Privacy ControlPatient JohnCancerInfluenzaHas diseaseTravels frequentlyEnglandaddressJohn’s addressDark lines/boxes containprivate informationPrivacy Preserving Data MiningPrevent useful results from mining -Introduce “cover stories” to give “false” results -Only make a sample of data available so that an adversary is unable to come up with useful rules and predictive functionsRandomization-Introduce random values into the data and/or results-Challenge is to introduce random values without significantly affecting the data mining results-Give range of values for results instead of exact valuesSecure Multi-party Computation-Each party knows its own inputs; encryption techniques used to compute final resultsCryptographic Approaches for Privacy Preserving Data Mining Secure Multi-part Computation (SMC) for PPDM-Mainly used for distributed data mining.-Provably secure under some assumptions.-Learned models are accurate-Efficient/specific cryptographic solutions for many distributed data mining problems are developed.-Mainly semi-honest assumption (i.e. parties follow the protocols)-Malicious model is also explored recently. (e.g. Kantarcioglu and Kardes paper in this workshop)-Many SMC based PPDM algorithms share common sub-protocols (e.g. dot product, summation, etc. )Cryptographic Approaches for Privacy Preserving Data MiningDrawbacks:-Still not efficient enough for very large datasets. (e.g. petabyte sized datasets ??)-Semi-honest model may not be realistic -Malicious model is even slowerPossible new directions -New models that can trade-off better between efficiency and security-Game theoretic / incentive issues in PPDM-Combining anonymization and cryptographic techniques for PPDMPerturbation Based Approaches for Privacy Preserving Data MiningGoal: Distort data while still preserve some properties for data mining propose. −Additive Based −Multiplicative Based−Condensation based −Decomposition −Data SwappingPerturbation Based Approaches for Privacy Preserving Data MiningGoal: Achieve a high data mining accuracy with maximum privacy protection.Perturbation Based Approaches for Privacy Preserving Data Mining Privacy is a personal choice, so should enable individual adaptable (Liu, Kantarcioglu and Thuraisingham ICDM’06)Perturbation Based Approaches for Privacy Preserving Data MiningThe trend is to make PPDM approaches fit in the realityWe investigated perturbation based approaches with real-world data setsWe give a applicability study to the current approaches-Liu, Kantarcioglu and Thuraisingham, DKE 07We found out, -The reconstruction the original distribution may not work well with real-world data set-Distribution is a
View Full Document