View Full Document

Relationship Identification for Social Network Discovery



View the full content.
View Full Document
View Full Document

9 views

Unformatted text preview:

Relationship Identification for Social Network Discovery Christopher P Diehl Galileo Namata and Lise Getoor Applied Physics Laboratory Johns Hopkins University Laurel MD 20723 Computer Science Department UMIACS University of Maryland College Park MD 20742 Abstract In recent years informal online communication has transformed the ways in which we connect and collaborate with friends and colleagues With millions of individuals communicating online each day we have a unique opportunity to observe the formation and evolution of roles and relationships in networked groups and organizations Yet a number of challenges arise when attempting to infer the underlying social network from data that is often ambiguous incomplete and context dependent In this paper we consider the problem of collaborative network discovery from domains such as intelligence analysis and litigation support where the analyst is attempting to construct a validated representation of the social network We specifically address the challenge of relationship identification where the objective is to identify relevant communications that substantiate a given social relationship type We propose a supervised ranking approach to the problem and assess its performance on a manager subordinate relationship identification task using the Enron email corpus By exploiting message content the ranker routinely cues the analyst to relevant communications relationships and message traffic that are indicative of the social relationship Introduction The Internet provides an increasing number of avenues for communication and collaboration From instant messaging and email to wikis and blogs millions of individuals are generating content daily that reflects their relationships with others in the world both online and offline Now that storage has become vast and inexpensive much of this data will be archived for years to come This provides new opportunities and new challenges As networked groups and organizations increasingly leverage online means of communication and collaboration there is an opportunity to observe the formation and evolution of roles and relationships from the communications archives Such data provides a rich collection of evidence from which to infer the structure attributes and dynamics of the underlying social network Yet numerous challenges emerge as one contends with data that is often ambiguous incomplete and context dependent If we wish to analyze the underlying social network that is at least partially represented by a collection of informal on This work was supported by NSF Grant 0423845 Copyright c 2007 Association for the Advancement of Artificial Intelligence www aaai org All rights reserved line communications it is important to think carefully about the data transformations required prior to conducting any type of analysis At the highest level we are fundamentally interested in discovering entities and the types of relationships they share This implies that we must do more than simply adopt the communications hyper graph as a surrogate for the social network Entities can and often do use more than one account online and not all communications relationships are equivalent In fact the social network can be thought of as a collection of networks with different relationship types e g friendship trust advice management Human relations are multi faceted and context dependent Therefore it is important to tease the communications apart and understand what types of relationships are being expressed among the entities We view the network discovery process of identifying the entities and their relationships as being inherently a collaborative process between human and machine In this paper we consider the scenario from domains such as intelligence analysis and litigation support where an analyst is attempting to reconstruct a representation of the social network from the data with minimal context This involves mapping the communications graph which represents communication events among network references email addresses telephone numbers etc to a validated social network expressing typed relationships among the known entities that the analyst believes are substantiated by the data Within this process there are two distinct tasks entity resolution and relationship identification Entity resolution refers to the mapping of network references to their corresponding entities Relationship identification refers to the identification of relevant communications that are indicative of a given relationship type In this paper we propose a supervised ranking approach to address the relationship identification problem Our goal is to focus the analyst s attention on relevant communications relationships that express a given social relationship along with relevant message traffic that supports this association We begin the discussion in the following section with a formal definition of the problem We discuss our approach to learning a relationship ranker from traffic statistics and message content and present an evaluation of these methods on a manager subordinate relationship identification task in email We then review related work and con clude with thoughts on future directions Problem Definition Informal online communications such as instant messaging text messaging and email are composed of structured and unstructured data At the most basic level this includes the network references corresponding to the sender and one or more recipients the date and time of the communication and the message content We will define a communications archive C as a set of observed messages exchanged among a set of network references N C mk nsk Nkr dk bk nsk N Nkr N 1 mk nsk For each message is the sender s network reference Nkr is the set of recipient network references dk is the date and time and bk is the body of the message Every archive has a corresponding communications graph Cg N L that represents the message data as a set of dyadic communication relationships L lij nsi nrj Mij nsi nrj N Mij C 2 among the network references N For each directed relationship lij nsi is the sender s network reference nrj is the recipient s network reference and Mij is the set of messages sent by nsi that include nrj as one of the recipients The task of relationship identification involves identifying a mapping from the dyadic communications relationships L to one or more social relationships from a predefined set S To emphasize the collaborative nature of our


Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Relationship Identification for Social Network Discovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Relationship Identification for Social Network Discovery and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?