DOC PREVIEW
Relationship Identification for Social Network Discovery

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Relationship Identification for Social Network DiscoveryChristopher P. DiehlApplied Physics LaboratoryJohns Hopkins UniversityLaurel, MD 20723Galileo Namata and Lise Getoor∗Computer Science Department / UMIACSUniversity of MarylandCollege Park, MD 20742AbstractIn recent years, informal, online communication has trans-formed the ways in which we connect and collaborate withfriends and colleagues. With millions of individuals commu-nicating online each day, we have a unique opportunity to ob-serve the formation and evolution of roles and relationships innetworked groups and organizations. Yet a number of chal-lenges arise when attempting to infer the underlying socialnetwork from data that is often ambiguous, incomplete andcontext-dependent. In this paper, we consider the problem ofcollaborative network discovery from domains such as intel-ligence analysis and litigation support where the analyst is at-tempting to construct a validated representation of the socialnetwork. We specifically address the challenge of relation-ship identification where the objective is to identify relevantcommunications that substantiate a given social relationshiptype. We propose a supervised ranking approach to the prob-lem and assess its performance on a manager-subordinate re-lationship identification task using the Enron email corpus.By exploiting message content, the ranker routinely cues theanalyst to relevant communications relationships and mes-sage traffic that are indicative of the social relationship.IntroductionThe Internet provides an increasing number of avenues forcommunication and collaboration. From instant messagingand email to wikis and blogs, millions of individuals aregenerating content daily that reflects their relationships withothers in the world, both online and offline. Now that storagehas become vast and inexpensive, much of this data will bearchived for years to come. This provides new opportunitiesand new challenges. As networked groups and organizationsincreasingly leverage online means of communication andcollaboration, there is an opportunity to observe the forma-tion and evolution of roles and relationships from the com-munications archives. Such data provides a rich collectionof evidence from which to infer the structure, attributes anddynamics of the underlying social network. Yet numerouschallenges emerge as one contends with data that is oftenambiguous, incomplete and context-dependent.If we wish to analyze the underlying social network that isat least partially represented by a collection of informal, on-∗This work was supported by NSF Grant #0423845.Copyrightc 2007, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.line communications, it is important to think carefully aboutthe data transformations required prior to conducting anytype of analysis. At the highest level, we are fundamentallyinterested in discovering entities and the types of relation-ships they share. This implies that we must do more thansimply adopt the communications (hyper)graph as a surro-gate for the social network. Entities can and often do usemore than one account online and not all communicationsrelationships are equivalent. In fact, the social network canbe thought of as a collection of networks with different rela-tionship types (e.g. friendship, trust, advice, management).Human relations are multi-faceted and context-dependent.Therefore it is important to tease the communications apartand understand what types of relationships are being ex-pressed among the entities.We view the network discovery process of identifying theentities and their relationships as being inherently a collab-orative process between human and machine. In this paper,we consider the scenario from domains such as intelligenceanalysis and litigation support where an analyst is attempt-ing to reconstruct a representation of the social networkfrom the data with minimal context. This involves map-ping the communications graph, which represents commu-nication events among network references (email addresses,telephone numbers, etc.), to a validated social network ex-pressing typed relationships among the known entities thatthe analyst believes are substantiated by the data. Withinthis process, there are two distinct tasks: entity resolutionand relationship identification. Entity resolution refers to themapping of network references to their corresponding enti-ties. Relationship identification refers to the identification ofrelevant communications that are indicative of a given rela-tionship type.In this paper, we propose a supervised ranking approachto address the relationship identification problem. Our goalis to focus the analyst’s attention on relevant communica-tions relationships that express a given social relationshipalong with relevant message traffic that supports this asso-ciation. We begin the discussion in the following sectionwith a formal definition of the problem. We discuss our ap-proach to learning a relationship ranker from traffic statis-tics and message content and present an evaluation of thesemethods on a manager-subordinate relationship identifica-tion task in email. We then review related work and con-clude with thoughts on future directions.Problem DefinitionInformal, online communications such as instant messag-ing, text messaging and email are composed of structuredand unstructured data. At the most basic level, this includesthe network references corresponding to the sender and oneor more recipients, the date and time of the communicationand the message content. We will define a communicationsarchive C as a set of observed messages exchanged among aset of network references N:C = {mk= (nsk, Nrk, dk, bk) : nsk∈ N, Nrk⊆ N} . (1)For each message mk, nskis the sender’s network reference,Nrkis the set of recipient network references, dkis the dateand time and bkis the body of the message. Every archivehas a corresponding communications graph Cg= {N, L}that represents the message data as a set of dyadic commu-nication relationshipsL =lij= (nsi, nrj, Mij) : nsi, nrj∈ N, Mij⊆ C. (2)among the network references N . For each directed rela-tionship lij, nsiis the sender’s network reference, nrjis therecipient’s network reference and Mijis the set of messagessent by nsithat include nrjas one of the recipients.The task of relationship identification involves identifyinga mapping from the dyadic communications relationships Lto one or


Relationship Identification for Social Network Discovery

Download Relationship Identification for Social Network Discovery
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Relationship Identification for Social Network Discovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Relationship Identification for Social Network Discovery 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?