Unformatted text preview:

11GraphGraph--based Learningbased LearningLarry HolderLarry HolderComputer Science and EngineeringComputer Science and EngineeringUniversity of Texas at ArlingtonUniversity of Texas at Arlington22GraphGraph--based Learningbased LearningMultiMulti--relational data mining and learningrelational data mining and learningSUBDUE graphSUBDUE graph--based relational learnerbased relational learnerDiscoveryDiscoveryClusteringClusteringGraph grammar learningGraph grammar learningSupervised learningSupervised learning33MultiMulti--Relational Data MiningRelational Data MiningLooking for patterns involving multiple Looking for patterns involving multiple tables (relations) in a relational databasetables (relations) in a relational databaseIDIDLastLastFirstFirstAgeAgeIncomeIncomeP1P1P2P2DoeDoeJohnJohn3030P3P3DoeDoeSallySally292980000800009000090000SmithSmithRobertRobert3535100000100000PersonPerson1Person1Person2Person2P1P1P2P2P3P3P7P7MarriedRichCouple(X,Y) Æ Person(X,LastX,FirstX,AgeX,IncX) &Person(Y,LastY,FirstY,AgeY,IncY) & Married(X,Y) &(IncX + IncY) > 150000.44MultiMulti--Relational Data MiningRelational Data MiningApproachesApproachesTransform to nonTransform to non--relational problemrelational problemFirstFirst--order logic basedorder logic basedInductive Logic Programming (ILP)Inductive Logic Programming (ILP)Graph basedGraph based55GraphGraph--based Data Miningbased Data MiningFinding all Finding all subgraphssubgraphsggwithin a set of within a set of graph transactions graph transactions GGsuch thatsuch thatwhere where ttis the minimum supporttGgfreq>||)(is the minimum support66GraphGraph--based Data Miningbased Data MiningSystemsSystemsAprioriApriori--based Graph Mining (AGM)based Graph Mining (AGM)InokuchiInokuchi, , WashioWashioand and MotodaMotoda, 2003, 2003Frequent SubFrequent Sub--Graph discovery (FSG)Graph discovery (FSG)KuramochiKuramochiand and KarypisKarypis, 2001, 2001GraphGraph--based Substructure pattern mining based Substructure pattern mining ((gSpangSpan))YanYanand Han, 2002and Han, 2002Focus on pruning and fast, codeFocus on pruning and fast, code--based based graph matchinggraph matching77GraphGraph--based Relational Learningbased Relational LearningFinding patterns in Finding patterns in graph(sgraph(s))DiscoveryDiscoveryClusteringClusteringSupervised learningSupervised learningPersonDoe John8000030Last FirstAge IncomePersonDoe Sally9000029Last FirstAge IncomePersonSmith Robert10000035Last FirstAge IncomeMarriedMarried88GraphGraph--based Relational Learningbased Relational LearningGraphGraph--Based Induction (GBI)Based Induction (GBI)Yoshida, Yoshida, MotodaMotodaand and IndurkhyaIndurkhya, 1994, 1994SUBstructureSUBstructureDiscovery Using Examples Discovery Using Examples (SUBDUE)(SUBDUE)Cook and Holder, 1994Cook and Holder, 1994Focus on efficient Focus on efficient subgraphsubgraphgeneration generation and compressionand compression--based heuristic searchbased heuristic search99SUBDUE GraphSUBDUE Graph--based Discoverybased DiscoveryGraph representationGraph representationGraph compression and MDLGraph compression and MDLDiscovery algorithmDiscovery algorithmInexact graph matchInexact graph matchBackground knowledgeBackground knowledgeParallel/distributed discoveryParallel/distributed discovery1010Graph RepresentationGraph RepresentationInput is a labeled (vertices and edges) directed graphInput is a labeled (vertices and edges) directed graphA A substructuresubstructureis a connected is a connected subgraphsubgraphAn An instanceinstanceof a substructure is an isomorphic of a substructure is an isomorphic subgraphsubgraphof the input graphof the input graphInput graph compressed by replacing instances with Input graph compressed by replacing instances with vertex representing substructurevertex representing substructureR1C1T1S1T2S2T3S3T4S4Input Database Substructure S1(graph form)Compressed DatabaseobjecttriangleR1C1objectsquareonshapeshapeS1S1S1S1S1S1S1S1S1S1S1S11111Graph RepresentationGraph RepresentationS1S1S1S1S1S2S2S21212Graph Compression and MDLGraph Compression and MDLMinimum Description Length (MDL) Minimum Description Length (MDL) principleprincipleBest theory minimizes description length of Best theory minimizes description length of theory and the data given theorytheory and the data given theoryBest substructure Best substructure SSminimizes description minimizes description length of substructure definition length of substructure definition DL(S)DL(S)and and compressed graph compressed graph DL(G|S)DL(G|S)))|()((min SGDLSDLS+1313Discovery AlgorithmDiscovery Algorithm1.1.Create substructure for each unique Create substructure for each unique vertex labelvertex labelSubstructures:triangle (4), square (4),circle (1), rectangle (1)circlerectangletrianglesquareonontrianglesquareonontrianglesquareonontrianglesquareononon1414Discovery AlgorithmDiscovery Algorithm2.2.Expand best substructures by an edge or Expand best substructures by an edge or edge+neighboring vertexedge+neighboring vertexSubstructures:trianglesquareoncirclerectanglesquareonrectangletriangleoncirclerectangletrianglesquareonontrianglesquareonontrianglesquareonontrianglesquareonononrectangleon1515Discovery AlgorithmDiscovery Algorithm3.3.Keep only best Keep only best beambeam--widthwidthsubstructures on queuesubstructures on queue4.4.Terminate when queue is empty or Terminate when queue is empty or #discovered substructures > #discovered substructures > limitlimit5.5.Compress graph and repeat to generate Compress graph and repeat to generate hierarchical descriptionhierarchical description1616DNA ExampleDNA Example1717Sample SUBDUE InputSample SUBDUE Inputsample.g:e 1 11 shapee 2 12 shapee 3 13 shapee 4 14 shapee 5 15 shapee 6 16 shapee 7 17 shapee 8 18 shapee 9 19 shapee 10 20 shapee 1 5 one 2 6 one 3 7 one 4 8 one 5 10 one 9 10 one 10 2 one 10 3 one 10 4 onv 1 objectv 2 objectv 3 objectv 4 objectv 5 objectv 6 objectv 7 objectv 8 objectv 9 objectv 10 objectv 11 trianglev 12 trianglev 13 trianglev 14 trianglev 15 squarev 16 squarev 17 squarev 18 squarev 19 circlev 20 rectangleR1C1T1S1T2S2T3S3T4S41818Inexact Graph MatchInexact Graph MatchSome variations may occur between Some variations may occur between instancesinstancesWant to abstract over minor differencesWant to abstract over minor differencesDifference = cost of transforming one Difference = cost of


View Full Document

WSU CSE 6363 - Graph-based Learning

Download Graph-based Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Graph-based Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Graph-based Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?