Unformatted text preview:

UT DALLAS Erik Jonsson School of Engineering Computer Science Knowledge Management Semantic Web and Social Networking Security and Privacy in Online Social Networks Murat Kantarcioglu Bhavani Thuraisingham Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations FEARLESS engineering Outline Introduction to Social Networks Properties of Social Networks Social Network Analysis Basics Data Privacy Basics Privacy and Social Networks Access control issues for Online Social Networks FEARLESS engineering Social Networks Social networks have important implications for our daily lives Spread of Information Spread of Disease Economics Marketing Social network analysis could be used for many activities related to information and security informatics Terrorist network analysis FEARLESS engineering Enron Social Graph http jheer org enron FEARLESS engineering Romantic Relations at Jefferson High School FEARLESS engineering Emergence of Online Social Networks Online Social networks become increasingly popular Example Facebook Facebook has more than 200 million active users More than 100 million users log on to Facebook at least once each day More than two thirds of Facebook users are outside of college The fastest growing demographic is those 35 years old and older http www facebook com press info php statistics FEARLESS engineering Properties of Social Networks Small world phenomenon Milgram asked participants to pass a letter to one of their close contacts in order to get it to an assigned individual Most of the letters are lost 75 of the letters The letters who reached their destination have passed through only about six people Origins of six degree Mean geodesic distance l of graphs grows logarithmically or even slower with the network size dij is the shortest distance between node i and j l FEARLESS engineering 2 d i j ij n n 1 Small World Example Six Degrees of Kevin Bacon FEARLESS engineering Properties of Social Networks Degree Distribution Clustering Other important properties Community Structure Assortativity Clustering Patterns Homomiphly Many of these properties could be used for analyzing social networks FEARLESS engineering Social Network Mining Social network data is represented a graph Individuals are represented as nodes Nodes may have attributes to represent personal traits Relationships are represented as edges Edges may have attributes to represent relationship types Edges may be directed Common Social Network Mining tasks Node classification Link Prediction FEARLESS engineering Data Privacy Basics How to share data without violating privacy Meaning of privacy Identity disclosure Sensitive Attribute disclosure Current techniques for structured data K anonymity L diversity Secure multi party computation Problem Publishing private data while at the same time protecting individual privacy Challenges How to quantify privacy protection How to maximize the usefulness of published data How to minimize the risk of disclosure FEARLESS engineering Sanitization and Anonymization Automated de identification of private data with certain privacy guarantees Opposed to formal determination by statisticians requirement of HIPAA Two major research directions 1 Perturbation e g random noise addition 2 Anonymization e g k anonymization Removing unique identifiers is not sufficient Quasi identifier QI Maximal set of attributes that could help identify individuals Assumed to be publicly available e g voter registration lists As a process 1 Remove all unique identifiers 2 Identify QI attributes model adversary s background knowledge 3 Enforce some privacy definition e g k anonymity FEARLESS engineering Re identifying anonymous data Sweeney 01 37 US states mandate collection of information She purchased the voter registration list for Cambridge Massachusetts 54 805 people 69 unique on postal code and birth date 87 US wide with all three FEARLESS engineering Solution k anonymity Any combination of values appears at least k times Developed systems that guarantee kanonymity Minimize distortion of results k Anonymity Each released record should be indistinguishable from at least k 1 others on its QI attributes Alternatively cardinality of any query result on released data should be at least k k anonymity is the first one of many privacy definitions in this line of work l diversity t closeness m invariance delta presence Complementary Release Attack Different releases can be linked together to compromise k anonymity Solution Consider all of the released tables before release the new one and try to avoid linking Other data holders may release some data that can be used in this kind of attack Generally this kind of attack is hard to be prohibited completely FEARLESS engineering L diversity principles L diversity principle A q block is l diverse if contains at least l well represented values for the sensitive attribute S A table is ldiverse if every q block is l diverse l diversity may be difficult and unnecessary to achieve A single sensitive attribute Two values HIV positive 1 and HIV negative 99 Very different degrees of sensitivity l diversity is unnecessary to achieve 2 diversity is unnecessary for an equivalence class that contains only negative records l diversity is difficult to achieve Suppose there are 10000 records in total To have distinct 2 diversity there can be at most 10000 1 100 equivalence classes FEARLESS engineering Privacy Preserving Distributed Data Mining Goal of data mining is summary results Association rules Classifiers Clusters The results alone need not violate privacy Contain no individually identifiable values Reflect overall results not individual organizations The problem is computing the results without access to the data Data needed for data mining maybe distributed among parties Credit card fraud data Inability to share data due to privacy reasons HIPPAA Even partial results may need to be kept private FEARLESS engineering Secure Multi Party Computation SMC The goal is computing a function without revealing xi Semi Honest Model f x1 x 2 x n Parties follow the protocol Malicious Model Parties may or may not follow the protocol We cannot do better then the existence of the third trusted party situation Generic SMC is too inefficient for PPDDM Enhancements being explored FEARLESS engineering Graph Model Lindamood et al 09 Heatherly et al 09 Graph represented by a set of homogenous vertices and a set of homogenous edges Each node also has a set of


View Full Document

UTD CS 7301 - LECTURE NOTES

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?