C. Faloutsos 15-781/10-7011CMU SCSGraph analysis: laws & toolsChristos FaloutsosCarnegie Mellon University15-781/10-701 (c) 2008, C. Faloutsos3CMU SCSOverall Outline• Laws (mainly, power laws)• Generators and • Tools15-781/10-701 (c) 2008, C. Faloutsos4CMU SCSOutline• Problem definition / Motivation• Static & dynamic laws; generators• Tools: CenterPiece graphs; Tensors• Other projects (Virus propagation, e-bay fraud detection)• ConclusionsC. Faloutsos 15-781/10-701215-781/10-701 (c) 2008, C. Faloutsos5CMU SCSMotivationData mining: ~ find patterns (rules, outliers)• Problem#1: How do real graphs look like?• Problem#2: How do they evolve?• Problem#3: How to generate realistic graphsTOOLS• Problem#4: Who is the ‘master-mind’?• Problem#5: Track communities over time15-781/10-701 (c) 2008, C. Faloutsos6CMU SCSProblem#1: Joint work withDr. Deepayan Chakrabarti (CMU/Yahoo R.L.)15-781/10-701 (c) 2008, C. Faloutsos7CMU SCSGraphs - why should we care?• web: hyper-text graph • IR: bi-partite graphs (doc-terms)• ... and more:D1DNT1TM......C. Faloutsos 15-781/10-701315-781/10-701 (c) 2008, C. Faloutsos8CMU SCSGraphs - why should we care?Internet Map [lumeta.com]Food Web [Martinez ’91]Protein Interactions [genomebiology.com]Friendship Network [Moody ’01]15-781/10-701 (c) 2008, C. Faloutsos9CMU SCSGraphs - why should we care?• network of companies & board-of-directors members• ‘viral’ marketing• web-log (‘blog’) news propagation• computer network security: email/IP traffic and anomaly detection• ....15-781/10-701 (c) 2008, C. Faloutsos10CMU SCSProblem #1 - network and graph mining• How does the Internet look like?• How does the web look like?• What is ‘normal’/‘abnormal’?• which patterns/laws hold?C. Faloutsos 15-781/10-701415-781/10-701 (c) 2008, C. Faloutsos11CMU SCSGraph mining• Are real graphs random?15-781/10-701 (c) 2008, C. Faloutsos12CMU SCSLaws and patterns• Are real graphs random?• A: NO!!– Diameter– in- and out- degree distributions– other (surprising) patterns15-781/10-701 (c) 2008, C. Faloutsos13CMU SCSSolution#1• Power law in the degree distribution [SIGCOMM99]log(rank)log(degree)-0.82internet domainsatt.comibm.comC. Faloutsos 15-781/10-701515-781/10-701 (c) 2008, C. Faloutsos14CMU SCSSolution#1’: Eigen Exponent E• A2: power law in the eigenvalues of the adjacency matrixE = -0.48Exponent = slopeEigenvalueRank of decreasing eigenvalueMay 200115-781/10-701 (c) 2008, C. Faloutsos15CMU SCSBut:How about graphs from other domains?15-781/10-701 (c) 2008, C. Faloutsos16CMU SCSWeb• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]log indegree- log(freq)from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]C. Faloutsos 15-781/10-701615-781/10-701 (c) 2008, C. Faloutsos17CMU SCSWeb• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]log indegreelog(freq)from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]15-781/10-701 (c) 2008, C. Faloutsos18CMU SCSThe Peer-to-Peer Topology• Frequency versus degree • Number of adjacent peers follows a power-law[Jovanovic+]15-781/10-701 (c) 2008, C. Faloutsos19CMU SCSMore power laws:citation counts: (citeseer.nj.nec.com 6/2001)110100100 1000 10000log countlog # citations’cited.pdf’log(#citations)log(count)UllmanC. Faloutsos 15-781/10-701715-781/10-701 (c) 2008, C. Faloutsos20CMU SCSSwedish sex-webNodes: people (Females; Males)Links: sexual relationshipsLiljeros et al. Nature 20014781 Swedes; 18-74; 59% response rate.Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt15-781/10-701 (c) 2008, C. Faloutsos21CMU SCSSwedish sex-webNodes: people (Females; Males)Links: sexual relationshipsLiljeros et al. Nature 20014781 Swedes; 18-74; 59% response rate.Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt15-781/10-701 (c) 2008, C. Faloutsos22CMU SCSMore power laws:• web hit counts [w/ A. Montgomery]Web Site Trafficlog(in-degree)log(count)Zipfuserssites``ebay’’C. Faloutsos 15-781/10-701815-781/10-701 (c) 2008, C. Faloutsos23CMU SCSepinions.com• who-trusts-whom [Richardson + Domingos, KDD 2001](out) degreecounttrusts-2000-people user15-781/10-701 (c) 2008, C. Faloutsos24CMU SCSOutline• Problem definition / Motivation• Static & dynamic laws; generators• Tools: CenterPiece graphs; Tensors• Other projects (Virus propagation, e-bay fraud detection)• Conclusions15-781/10-701 (c) 2008, C. Faloutsos25CMU SCSMotivationData mining: ~ find patterns (rules, outliers)• Problem#1: How do real graphs look like?• Problem#2: How do they evolve?• Problem#3: How to generate realistic graphsTOOLS• Problem#4: Who is the ‘master-mind’?• Problem#5: Track communities over timeC. Faloutsos 15-781/10-701915-781/10-701 (c) 2008, C. Faloutsos26CMU SCSProblem#2: Time evolution• with Jure Leskovec (CMU/MLD)• and Jon Kleinberg (Cornell –sabb. @ CMU)15-781/10-701 (c) 2008, C. Faloutsos27CMU SCSEvolution of the Diameter• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)• What is happening in real data?15-781/10-701 (c) 2008, C. Faloutsos28CMU SCSEvolution of the Diameter• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)• What is happening in real data?• Diameter shrinks over timeC. Faloutsos 15-781/10-7011015-781/10-701 (c) 2008, C. Faloutsos29CMU SCSDiameter – ArXiv citation graph• Citations among physics papers • 1992 –2003• One graph per year• 2003:– 29,555 papers,– 352,807 citationstime [years]diameter15-781/10-701 (c) 2008, C. Faloutsos30CMU SCSDiameter – “Autonomous Systems”• Graph of Internet• One graph per day • 1997 – 2000• 2000– 6,000 nodes– 26,000 edgesnumber of nodesdiameter15-781/10-701 (c) 2008, C. Faloutsos31CMU SCSDiameter – “Affiliation Network”• Graph of collaborations in physics – authors linked to papers• 10 years of data• 2002– 60,000 nodes• 20,000 authors• 38,000 papers– 133,000 edgestime [years]diameterC. Faloutsos 15-781/10-7011115-781/10-701 (c) 2008, C. Faloutsos32CMU SCSDiameter – “Patents”• Patent citation network• 25 years of data• 1999– 2.9 million nodes– 16.5 million edgestime [years]diameter15-781/10-701 (c) 2008, C.
View Full Document