Link PredictionHossam S. ShararaWalaa Eldin M. MoustafaOutlineOutlineOverviewOverview Link Prediction VariantsDeterministic MethodsDeterministic Methods Probabilistic MethodsSupervised Learning ApproachesSupervised Learning Approaches Feature ConstructionProblem DefinitionProblem DefinitionGiven a snapshot of a social network at time t (or networkGiven a snapshot of a social network at time t (or network evolution between t1and t2), we seek to accurately predict the edges that will be added to the network during the interval from time t (or t2) to a given future time t'.ApplicationsApplications Identifying the structure of a criminal networkyg Predicting missing links in a criminal network using incomplete data.ApplicationsApplications Overcoming the data-sparsity problem in gpyprecommender systems using collaborative filtering (Huang et al, 2005).ApplicationsApplicationsAccelerating a mutually beneficial professional-orAccelerating a mutually beneficial professionalor academic connection that would have taken longer to form serendipitously (Farrell et al, 2005).Picture from www.businessinnovationinsider.comApplicationsApplicationsTo analyze users'navigation history to generateTo analyze users navigation history to generate tools that increase navigational efficiency (Zhu 2003) ie. Predictive server prefetchingPicture from corbis.comApplicationsApplicationsMonitoring and controlling computer viruses thatMonitoring and controlling computer viruses that use email as a vector (Lim et al, 2005).Picture from www robocup dePicture from www.robocup.deLink CompletionLink CompletionGoldenberg et al, 2003Goldenberg et al, 2003 Links can be incomplete.Links can link more than two entitiesLinks can link more than two entities. Given a node (or nodes) that is (are) known to have a link, the task is to determine to which other nodea link, the task is to determine to which other node the link is attached.Link CompletionLink CompletionExampleExample When a user buys five books online and the name of one book is corrupted in transfer. A link completion algorithm could infer the name of the missing book based on the user's name and the hbkhb hother books she bought.Link CompletionLink CompletionExampleExample Alice, Bob, and a third person attended a meeting. Given people’s previous co-occurrences, who is the pp p,wthird person?Simple SolutionsSimple SolutionsEvery entity is assigned a score.Every entity is assigned a score. Co-occurrences:score(A)=∑number of previous co-occurrences ofscore(A) ∑number of previous cooccurrences of A and other members of the link. Popular Personp score(A) = number of occurrences of A in other links.Anomalous Link DiscoveryAnomalous Link DiscoveryRattigan et al, 2005.Rattigan et al, 2005. Link Prediction: Number of Dyads to be evaluated increases quadratically.qy Networks are sparse Æ extremely few positive cases. Focus on discovering surprising links in the existing ones. Very few common neighbors, or too distant apart.Link PredictionLink Predictionttttiti+1tj-1tjtki< j< kMethods for Link PredictionMethods for Link PredictionAll the methods assign a connection weightscore(x,All the methods assign a connection weight score(x, y) to pairs of nodes x, y, based on the input graph, and then produce a ranked list in decreasing order of score(x, y). Can be viewed as computing a measure of proximity bddor “similarity” between nodes x and y.Shortest PathShortest PathNegated length of shortest path between x and y.Negated length of shortest path between x and y. All nodes that share one neighbor will be linked.Common NeighborsCommon Neighbors Newman 2001: The probability of scientists ll b i i i h h b f hcollaborating increases with the number of other collaborators they have in common.Jaccard SimilarityJaccard SimilarityMay be they have common neighbors because eachMay be they have common neighbors because each one has a lot of neighbors, not because they are strongly related to each others.gyAdamic/AdarAdamic/AdarAdamic et al 2003In LinksContactsAdamic et al 2003In LinksContactsUser 1User 2?Out LinksTextUser 1User 2?Out LinksTextAdamic/AdarAdamic/AdarAdamic/AdarAdamic/Adar This gives more weight to neighbours that that are not gggshared with many others.Thi t ll t i hb b tThis actually counts common neighbors but: Neighbors who are linked with only 2 nodes are given the weight 1/log(2) = 1.4 Neighbors who are linked with 5 nodes their weight drops down to 1/log(5) = 0.62Preferential AttachmentPreferential Attachment Newman 2001: The probability of co-authorship of x and y is correlated with the product of the numberx and y is correlated with the product of the number of collaborators of x and yKatz (1953)Katz (1953)A very small β yields predictions much like common neighbors since paths of length three or more contributeneighbors, since paths of length three or more contribute very little to the summationCl d FClosed Form:Hitting TimeHitting TimeRooted PageRankRooted PageRankStationary distribution weight ofyunder theStationary distribution weight of yunder the following random walk:With probability α, jump to xpy,j p With probability 1 – α, go to random neighbor of current node.SimRankSimRank Jeh 2002Only directed graphsOnly directed graphs Can use pruningLarge GraphsLarge GraphsRepresent the adjacency matrixMwith a lower rankRepresent the adjacency matrix Mwith a lower rank matrix Mk. How to map the original graph to the reduced pggpgraph?Unseen BigramsUnseen BigramsRed links: Strong iilitz1similarityBlack Links: Graph edgesxz2yUnseen BigramsUnseen BigramsShould we incorporate score(x,y) into the equation?Should we incorporate score(x,y) into the equation? Can we do that iteratively?ClusteringClustering Compute score(u,v) for all edges in Eold.p(,) gold Remove k% edges with low score. Calculate score(x,y) for all node pairs.(y) pClusteringClustering Compute score(u,v) for all edges in Eold.p(,) gold Remove k% edges with low score. Calculate score(x,y) for all node pairs.(y) pClusteringClustering Compute score(u,v) for all edges in Eold.p(,) gold Remove k% edges with low score. Calculate score(x,y) for all node pairs.(y) pClusteringClusteringQuestionsQuestions Removing edges can decrease the score of nodes connected to these edges.
View Full Document