4/8/20081SCS CMUProximity on Large GraphsSpeaker:Hanghang Tong2008-4-1015-826 Guest LectureSCS CMUGraphs are everywhere!2SCS CMUGraph Mining: the big picture Graph/Global LevelSubgraph/Community Level3Node LevelWe are here!4/8/20082SCS CMUProximity on Graph: What?A BH1 1I J11 14D1 1EFG1 11a.k.a Relevance, Closeness, ‘Similarity’…SCS CMUProximity is the main tool behind…• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]•Image caption [Pan+]5gp[ ]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…Will return to this laterSCS CMURoadmap• Motivation• Part I: Definitions• Part II: Fast Solutions•Basic: RWR• Variants• Asymmetry of Prox.• Group Prox•Prox w/ Attributes6• Part III: Applications• ConclusionProx w/ Attributes•Prox w/ Time4/8/20083SCS CMUWhy not shortest path?A BD1 1A BD1 1E F11 1Some ``bad’’ proximities7‘pizza delivery guy’ problemA BD1 1EFG1 11A BD1 1‘multi-facet’ relationshipSCS CMUA BD1 1Why not max. netflow?Some ``bad’’ proximities8A BD1 11ENo punishment on long pathsSCS CMUWhat is a ``good’’ Proximity?A BH1 1I J11 19D1 1EFG1 11• Multiple Connections• Quality of connection•Direct & In-directed Conns•Length, Degree, Weight……4/8/20084SCS CMU1291081112Random walk with restart104356711SCS CMURandom walk with restartNode 4Node 1Node 2Node 3Node 4Node 5Node 60.130.100.130.220.130.05132910811120.130.100.130.080.040.020.040.0311Node 7Node 8Node 9Node 10Node 11Node 120.050.080.040.030.040.0245670.130.050.05Ranking vector More red, more relevantNearby nodes, higher scores4rSCS CMUWhy RWR is a good score?2c3cc...W2W3W12ccall paths from ito j with length 1all paths from ito j with length 2all paths from ito j with length 34/8/20085SCS CMURoadmap• Motivation• Part I: Definitions• Part II: Fast Solutions•Basic: RWR• Variants• Asymmetry of Prox.• Group Prox•Prox w/ Attributes13• Part III: Applications• ConclusionProx w/ Attributes•Prox w/ TimeSCS CMUVariant: escape probability• Define Random Walk (RW) on the graph• Esc_Prob(AÆB)– Prob (starting at A, reaches B before returning to A)14Esc_Prob = Pr (smile before cry)ABthe remaining graphSCS CMUOther Variants• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]•Equivalence of Random Walks15Equivalence of Random Walks– Electric Networks: • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]4/8/20086SCS CMUOther Variants• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]•Equivalence of Random WalksAll are related to, or similar to 16Equivalence of Random Walks– Electric Networks: • EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+],random walk with restart!SCS CMURoadmap• Motivation• Part I: Definitions• Part II: Fast Solutions•Basic: RWR• Variants• Asymmetry of Prox.• Group Prox•Prox w/ Attributes17• Part III: Applications• ConclusionProx w/ Attributes•Prox w/ TimeSCS CMUAsymmetry of Proximity [Tong+ KDD07 a]A B1 11A B1 10.5181111110.51What is Prox from A to B?What is Prox from B to A?What is Prox between A and B?4/8/20087SCS CMUAsymmetry also exists in un-directed graphs• Hanghang’s most important conf. is KDD• The most important author in KDD is ...19So is love…HanghangKDDSCS CMURoadmap• Motivation• Part I: Definitions• Part II: Fast Solutions•Basic: RWR• Variants• Asymmetry of Prox.• Group Prox•Prox w/ Attributes20• Part III: Applications• ConclusionProx w/ Attributes•Prox w/ TimeSCS CMUGroup Proximity [Tong+ 2007]• Q: How close are Accountants to SECs?21• A: Prob (starting at any RED, reaches anyGREEN before touching any RED again)4/8/20088SCS CMUProximity on Attribute Graphs22What is the proximity from node 7 to 10?If we know that…SCS CMUSol: Augmented graphs23SCS CMUAttributes on nodes/edges (ER graph) [Chakrabarti+ WWW07]skipWorks24Wrote Sent ReceivedIn-Replied-toCited4/8/20089SCS CMUProximity w/ Time• Sol #1: treat time an categorical attr. [Minkov+]Sl#2 t li ti [T ]25•Sol #2: aggregate slice matrices [Tong+]Time•Global aggregation•Slide window•Exponential emphasisSCS CMUSummary of Part I• Goal: Summarize multiple … relationships • Solutions26–Basic: Random Walk with Restart– Property: Asymmetry– Variants: Esc_Prob and many others.– Generalization: Group Prox.; w/ Attr.; w/ TimeSCS CMU• Motivation• Part I: Definitions• Part II: Fast SolutionsRoadmap• B_Lin: RWR• FastAllDAP: Esc_Prob• BB_Lin: Skewed BGs27• Part III: Applications• Conclusion• FastUpdate: Time-Evolving4/8/200810SCS CMUPreliminary: Sherman–Morrison LemmaTvuAA=If:28111111()1TTTAuvAAAuv AvAu−−−−−−⋅⋅=+⋅ = −+⋅ ⋅Then:SCS CMUSM Lemma: Applications•RLS – and almost any algorithm in time series!• Leave-one-out cross validation for LS• Kalman filtering• Incremental matrix decomposition• … and all the fast sols we will introduce!29SCS CMUComputing RWR910120.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.13⎛⎞⎜⎟⎜⎟⎜⎟⎜⎟ 01/3 1/3 0 1/3 0 0 0 0 0 0 0 0⎛⎞⎜⎜⎜⎜0.13 00.10 00.13 0⎛⎞ ⎛⎞⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟Ranking vectorStarting vectorAdjacent matrix(1 )ii ircWr ce=+− Restart p301432567811120.220.130.050.90.050.080.040.030.040.02⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟=×⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟⎜⎟⎝⎠1/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 1/3 1/3 0⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝⎠0.220.13 00.05 00.10.05
View Full Document