Graph mining CMU 15-826Jure Leskovec ([email protected]) 1CMU 15-826 Jure Leskovec ([email protected]) 1Graph mining (CS 15-826)Jure Leskovechttp://www.cs.cmu.edu/jureCMU 15-826 Jure Leskovec ([email protected]) 2Outline Patterns in real-world graphs Modeling the evolution and generation of real graphs Graph mining at work: Influence propagation Patterns (bonus material)CMU 15-826 Jure Leskovec ([email protected]) 3Modeling Graphs over Time Joint work with:Jon Kleinberg, CornellDeepay Chakrabarti, YahooChristos Faloutsos, CMUGraph mining CMU 15-826Jure Leskovec ([email protected]) 2CMU 15-826 Jure Leskovec ([email protected]) 4 What can we do with graphs? What patterns or “laws” hold for most real-world graphs? How do the graphs evolve over time? Can we generate synthetic but “realistic” graphs?“Needle exchange”networks of drug usersIntroductionCMU 15-826 Jure Leskovec ([email protected]) 5Evolution of the Graphs How do graphs evolve over time? Conventional Wisdom: Constant average degree: the number of edges grows linearly with the number of nodes Slowly growing diameter: as the network grows the distances between nodes grow Our findings: Densification Power Law: networks are becoming denser over time Shrinking Diameter: diameter is decreasing as the network growsCMU 15-826 Jure Leskovec ([email protected]) 6Outline General patterns and generators Graph evolution – Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Forest Fire Model Proposed graph generation model Kronecker Graphs Conclusion and Open questionsGraph mining CMU 15-826Jure Leskovec ([email protected]) 3CMU 15-826 Jure Leskovec ([email protected]) 7Outline General patterns and generators Graph evolution – Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Forest Fire Model Proposed graph generation model Kronecker Graphs ConclusionCMU 15-826 Jure Leskovec ([email protected]) 8Static Graph Patterns (1) Power Lawlog(Count) vs. log(Degree)Many low-degree nodesFew high-degree nodesInternet in December 1998Y=a*XbCMU 15-826 Jure Leskovec ([email protected]) 9Static Graph Patterns (2) Small-world [Watts, Strogatz]++ 6 degrees of separation Small diameter Effective diameter: Distance at which 90% of pairs of nodes are reachableHops# Reachable pairsEffective DiameterEpinions who-trusts-whom social networkGraph mining CMU 15-826Jure Leskovec ([email protected]) 4CMU 15-826 Jure Leskovec ([email protected]) 10Patterns Hold in Many Graphs All these patterns can be observed in many real life graphs: World wide web [Barabasi] On-line communities [Holme, Edling, Liljeros] Who call whom telephone networks [Cortes] Autonomous systems [Faloutsos, Faloutsos, Faloutsos] Internet backbone – routers [Faloutsos, Faloutsos, Faloutsos] Movie – actors [Barabasi] Science citations [Leskovec, Kleinberg, Faloutsos] Co-authorship [Leskovec, Kleinberg, Faloutsos] Sexual relationships [Liljeros] Click-streams [Chakrabarti]CMU 15-826 Jure Leskovec ([email protected]) 11Graph models: Random Graphs Question: How can we generate a realistic graph? given the number of nodes N and edges E Random graph [Erdos & Renyi, 60s]: Pick 2 nodes at random and link them Nice and simple model Does not obey Power laws No community structureCMU 15-826 Jure Leskovec ([email protected]) 12Graph models: Preferential attachment Preferential attachment [Albert & Barabasi, 99]: Add a new node, create M out-links Probability of linking a node is proportional to its degree Examples: Citations: new citations of a paper are proportional to the number it already has “Rich get richer” phenomena Explains power-law degree distributions But, all nodes have equal (constant) out-degreeGraph mining CMU 15-826Jure Leskovec ([email protected]) 5CMU 15-826 Jure Leskovec ([email protected]) 13Graph models: Copying model Copying model [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: Add a node and choose the number of edges to add Choose a random vertex and “copy” its links (neighbors) Generates power-law degree distributions Generates communitiesCMU 15-826 Jure Leskovec ([email protected]) 14Why is all this important? Gives insight into the graph formation process: Anomaly detection – abnormal behavior, evolution Predictions – predicting future from the past Simulations of new algorithms where real graphs are hard/impossible to collect Graph sampling – many real world graphs are too large to deal with “What if” scenariosCMU 15-826 Jure Leskovec ([email protected]) 15Outline General patterns and generators Graph evolution – Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Forest Fire Model Proposed graph generation model Kronecker Graphs ConclusionGraph mining CMU 15-826Jure Leskovec ([email protected]) 6CMU 15-826 Jure Leskovec ([email protected]) 16Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose thatN(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the Densification Power LawCMU 15-826 Jure Leskovec ([email protected]) 17Temporal Evolution of the Graphs Densification Power Law networks are becoming denser over time the number of edges grows faster than the number of nodes – average degree is increasinga … densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out-degree (assumed in the literature so far) a=2: quadratic growth – cliqueorequivalentlyCMU 15-826 Jure Leskovec ([email protected]) 18Densification – Physics Citations Citations among physics papers 1992: 1,293 papers,2,717 citations 2003: 29,555 papers, 352,807 citations For each month M, create a graph of all citations up to month MN(t)E(t)1.69Graph mining CMU 15-826Jure Leskovec ([email protected]) 7CMU 15-826 Jure Leskovec ([email protected]) 19Graph Densification – Summary The traditional constant out-degree assumption does not hold Instead: the number of edges grows faster than the number of nodes – average degree is increasingCMU 15-826 Jure Leskovec ([email protected]) 20Outline General patterns and generators Graph
View Full Document