Unformatted text preview:

Representing Degree Distributions, Clustering, andHomophily in Social Networks With Latent ClusterRandom Effects ModelsPavel N. Krivitsky Mark S. Handcock Adrian E. RafteryPeter D. Hoff1University of WashingtonTechnical Report no. 517Department of StatisticsUniversity of WashingtonAugust 7, 20071Pavel N. Krivitsky is Graduate Student, Mark S. Handcock is Professor and Chair, Adrian E.Raftery is Blumstein-Jordan Professor of Statistics and Sociology, and Peter D. Hoff is AssociateProfessor, all at the Department of Statistics, University of Washington, Box 354322, Seattle, WA98195-4322. The research of Krivitsky and Handcock was supported by NIDA Grant DA012831and NICHD Grant HD041877. The research of Krivitsky and Raftery was supported by NIH grant8 R01EB 002137–02. Raftery thanks Miroslav K´arn´y and the Institute of Information Theoryand Automation, Prague, as well as Gilles Celeux and INRIA, France, for hospitality during thepreparation of this paper.AbstractSocial network data often involve transitivity, homophily on observed attributes, clustering,and heterogeneity of actors. We propose the latent cluster random effects model to takeaccount of all of these features, and we describe a Bayesian estimation method. The modelfits two real datasets well. We show by simulation that networks with the same degreedistribution can have very different clustering behaviors. This suggests that scale-free andsmall-world network models may not be adequate for all types of network, while our modelrecovers both the clustering and the degree distribution.1 IntroductionSocial network data consist of data about pairs of actors or nodes. Often these data representthe presence, absence or value of a relationship between pairs of actors, such as liking, respect,familial relationship, shared membership in a group of individuals, or volume of trade forcollectivities such as countries or companies. Here we consider binary social network data,representing presence or absence of a relationship.Much social network data share a number of features. One of these is transitivity, forexample the fact that if actor A relates to actor B and actor B relates to actor C, then actorA is more likely to relate to actor C. Another is homophily on observed attributes, accordingto which actors with similar characteristics are more likely to relate. A third feature isclustering, according to which actors cluster into unobserved groups, within which links aremore likely. This can be due to social self-organization or to homophily on unobservedattributes (such as, for example, interest in the same sport, about which the analyst mightnot have information). A fourth feature is heterogeneity, namely the tendency of some actorsto send and/or receive links more than others.Hoff, Raftery, and Handcock (2002) proposed the latent space model for social networks.This postulates an unobserved Euclidean social space in which each actor has a position. Theprobability of a link between pairs of actors depends on the distance between them in thespace and on their observed characteristics. Estimation of the model involves estimating boththe latent positions and the parameters of the model specifying how the probability of a linkdepends on distance and observed attributes. This accounts for transitivity automaticallythrough the latent space and is flexible enough to include the other common features ofsocial network data also. This was extended by Handcock, Raftery, and Tantrum (2007) —hereafter HRT — to include model-based clustering of the latent space positions, giving away to detect groups of actors. Separately, Hoff (2005) added random sender and receivereffects to model inhomogeneity of the actors, similarly to those in the p2model (van Duijn,Snijders, and Zijlstra, 2004).No model so far proposed has modeled all the four common features of social networkdata that we mentioned above. In this paper we propose the Latent Cluster Random EffectsModel, which explicitly models all four features by adding the random sender and receivereffects as proposed by Hoff (2005) to the latent position cluster model of HRT.Heterogeneity of actors in social networks has often been modeled via the degree distri-bution and assuming the network is scale free. These models assume that all networks withthe same degree distribution are equally likely. We show through a number of simulatedexamples that the latent cluster random effects model can model such heterogeneity effec-1tively. We also show that networks with the same degree distribution can have very differentclustering behavior, suggesting that scale-free and small-world networks are not adequate tomodel all networks with the same degree distribution.In Section 2 we introduce the latent cluster random effects model and in Section 3 wedescribe our Bayesian method for estimating it using Markov chain Monte Carlo. In Section4 we illustrate the method using two social network datasets and two simulated datasetswith the same degree distribution but different clustering behavior.2 The Latent Cluster Random Effects Model for SocialNetworksWe first review the latent position cluster model of HRT and then expand it to allow foractor-specific random effects. The data we model consist of yi,j, the value of the relationfrom actor i to actor j for each dyad consisting of two of the n actors. These form theelements of the n × n sociomatrix Y. There may also be dyadic-level covariate informationrepresented by p matrices X = {Xk}pk=1∈ Rn×n×p. We focus on binary-valued relations,although the methods in this paper can be extended to more general relational data. Bothdirected and undirected relations can be analyzed with our methods, although the modelsare slightly different in the two cases.The model posits that each actor i has an unobserved position, Zi, in a d-dimensionalEuclidean latent social space, as in Hoff et al. (2002) and HRT. We then assume that thetie values are stochastically independent given the distances between the actors’ positions.Specifically:logit (p(Yi,j= 1|Z, X, β)) = ηi,j=pXk=1βkXk,i,j− ||Zi− Zj||, (1)where logit(p) = log(p/(1 − p)) and β denotes the parameters to be estimated. The modelaccounts for transitivity, through the latent space, as well as homophily on the observed at-tributes X. As in HRT, we allow for any clustering in the Zivia a finite spherical multivariatenormal mixture:Zii.i.d.∼GXg=1λgMVNd(µg, σ2gId) i = 1, . . . , n,where λgis the probability


View Full Document

UW STAT 517 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?