Purdue CS 42600 - De-anonymizing Social Networks - D2627209

Home> Schools> Purdue University> Computer Sciences (CS) > CS 42600> De-anonymizing Social Networks

Purdue CS 42600 - De-anonymizing Social Networks

Pages 15

Download Save

Unformatted text preview:

De-anonymizing Social NetworksArvind Narayana n and Vitaly ShmatikovThe University of Texas at AustinAbstractOperators of online social networks are increasinglysharing potentially sensitive info rmation about users andtheir relationships with advertisers, application developers,and data-mining researchers. Privacy is typically protectedby anonymization, i.e., removing names, addresses, etc.We present a framework for analyzing privacy andanony mity in social networks and develop a newre-identific ation algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users whocan be verified to have accounts on both Twitter, a popularmicroblogging service, and Flickr, an online p hoto-sharin gsite, can be re -identified in the anonymous Twitter graphwith only a 12% error rate.Our de-anonymization algorithm is based purely on thenetwork topology, does not require creation of a largenumber of dummy “sybil” nodes, is robust to noise and allexisting defenses, and works even when the overlap betweenthe target network and the adversary’s auxiliary informationis small.1. IntroductionSocial networks have been studied for a century [66] andare a staple o f research in disciplines such as epidemiol-ogy [8], socio logy [73], [28], [11], economics [29], andmany others [19], [9], [32]. The recent proliferation of onlinesocial ne tworks such as My Space, Facebook, Twitter, and soon has attracted attention of comp uter scientists, as well [40].Even in the few online ne tworks that are completelyopen, there is a discon nect between users’ willingness toshare informa tion and their reaction to unintended partiesviewing or using this information [13]. Most operators thu sprovide at least some privacy controls. Many online andvirtually all offline networks (e.g., telephone calls, emailand instant messages, etc.) restrict access to the informatio nabout individual members a nd their relationships.Network owners often share this information with ad-vertising partn ers and other third parties. Such sharingis the foundation of the business case for many onlinesocial-network operators. Some networks are even publishedfor research purposes. To alleviate privacy concerns, thenetworks are anon ymized, i.e., names and demographicinforma tion associated with individual nodes are su ppressed.Such suppression is often misinterpreted as removal of“personally identifiable information” ( PII), even though PIImay in clude much more than names and identifiers. Forexample, the EU privacy directive defines “personal data”as “any information relating to an identified or identifiablenatural person [. . . ]; an identifiable person is one w ho canbe identified, dire ctly or indirectly, in particular by referen ceto an identification number or to one or more factors specificto his physical, physiological, m ental, economic, cultural orsocial identity” [22].Anonymity has been unquestioningly interpreted as equiv-alent to privacy in several hig h-profile ca ses of data sharing.After a New Yo rk court ruling orde ring Google to han dover viewing data of over 100 millio n YouTube user s toViacom and the subsequent protests from privacy advocates,a revised agreement was struck under whic h Google wouldanonymize the data before handing it over [71]. The CEOof NebuAd, a U.S. company that offers targeted ad vertisingbased on browsing histories gathered from ISPs, dismissedprivacy concerns by saying that “We don’t have any rawdata on the identifiable individual. Everything is anony-mous” [15]. Phorm , a U.K. company with a similar businessmodel, aims to collect the data on Web-surfing habits of70% of British b roadband users; the only privacy protectionis that user identities are mappe d to random identifiers [69].In social networks, too, user anonymity has been used asthe answer to all privacy concerns ( see Section 2).Our contributions. This is the first p aper to d e monstratefeasibility of large-scale, passive de-a nonymization of real-world social networks.First, we survey the current state of data sha ring in socialnetworks, the intended purpose of each type of sharing, theresulting privacy risks, and the wide availability of auxiliaryinforma tion which can aid the attacker in de-anonymization.Second, we formally define privacy in social networks andrelate it to node anonymity. We identify several categories ofattacks, differentiated by attackers’ resources and auxiliaryinforma tion. We also give a methodology for measuring theextent of privacy brea ches in social networks, which is aninteresting problem in its own right.Third, we develop a generic re-identification algorithm foranonymized social networks. The algorithm uses only thenetwork structure, does not make any a priori assumptionsabout mem bership overlap between multiple networks, anddefeats all known defenses.2009 30th IEEE Symposium on Security and Privacy1081-6011/09 $25.00 © 2009 IEEEDOI 10.1109/SP.2009.22173Authorized licensed use limited to: Purdue University. Downloaded on August 27, 2009 at 17:46 from IEEE Xplore. Restrictions apply.Fourth, we give a con c rete demonstration of how our de-anonymization algorithm works by applying it to Flickr andTwitter, two large, real-world online social networks. Weshow that a third of the users who are verifiable members ofboth Flickr an d Twitter1can be recognized in the completelyanonymous Twitter graph with only 12% err or rate, eventhough the overlap in the relationships for these members isless than 15%!Sharing of anonymized so c ia l-network data is widespreadand the auxiliary informa tion needed for our attack iscommonly available. We argue that our work calls for asubstantial re-evaluation of business practices surroundingthe sharing of social-network d a ta .2. State of the UnionThe attacks described in this paper target anonymized,sanitized versions of social networks, using partial auxiliaryinforma tion about a subset of their members. To show thatboth anonymized networks and auxiliary information arewidely available, we sur vey real-world examples of social-network d a ta sharing, most of which involve rele a sing moreinforma tion than needed for our attack.Academic and government data-mining. So cial networksused for published data-mining research include the mobile-phone call graphs of, r e spectively, 7 million [56], 3 mil-lion [53], and 2.5 million [42] customers, as well as theland-line phone graph

View Full Document


School:
Email:
New Password:
Confirm Password:

Purdue CS 42600 - De-anonymizing Social Networks

Sign up for free to view:

Please select your school