New version page

Large Graph Analysis in the GMine System

Upgrade to remove ads

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

1Large Graph Analysis in the GMine SystemJose F. Rodrigues Jr., † Hanghang Tong, +Jia-Yu Pan, Agma J. M. Traina, Caetano Traina Jr.,*Christos FaloutsosAbstract—Current applications have produced graphs on the order of hundreds of thousands of nodes and millions of edges. Totake advantage of such graphs, one must be able to find patterns, outliers and communities. These tasks are better performed inan interactive environment, where human expertise can guide the process. For large graphs, though, there are some challenges:the excessive processing requirements are prohibitive, and drawing hundred-thousand nodes results in cluttered images hardto comprehend. To cope with these problems, we propose an innovative framework suited for any kind of tree-like graph visualdesign. GMine integrates (a) a representation for graphs organized as hierarchies of partitions - the concepts of SuperGraph andGraph-Tree; and (b) a graph summarization methodology - CEPS. Our graph representation deals with the problem of tracingthe connection aspects of a graph hierarchy with sublinear complexity, allowing one to grasp the neighborhood of a single nodeor of a group of nodes in a single click. As a proof of concept, the visual environment of GMine is instantiated as a system inwhich large graphs can be investigated globally and locally.Index Terms—Graph Analysis System, Graph Representation, Data Structures, Graph Mining, Graph VisualizationF1 INTRODUCTIONLarge graphs are common in real-life settings: webgraphs, computer communication graphs, recommen-dation systems, social networks, bipartite graphs ofweb-logs, to name a few. To find patterns in a largegraph, it is desirable to compute, visualize, interactand mine it. However, dealing with graphs on the or-der of hundreds of thousands of nodes and millions ofedges brings some problems: the excessive processingrequirements are prohibitive, and drawing hundred-thousand nodes results in cluttered images that arehard to comprehend.In former works, the large graph problem has beentreated through graph hierarchies, according to whicha graph is recursively broken to define a tree of setsof partitions. However, previous efforts on this matterfail on the task of integrating the information frommultiple partitions, disregarding mining techniquesto fine inspect each subgraph. Conversely, for under-standing a graph hierarchy, it is worthwhile to havesystems that provide aids for answering the followingquestions:• Hierarchical navigation: What is the relation be-tween arbitrary groups (partitions) of nodes?• Representation and processing: What are the adja-cencies of a given graph node considering the entiregraph, and not only its particular partition?Inst. de Ciˆencias Matem´aticas e de Computa¸c˜ao - Universidade de S˜aoPaulo - CP 668 - 13560-970 S˜ao Carlos, SP, Brazil - {junio, agma,caetano}@icmc.usp.br† IBM T.J. Watson Research - 19 Skyline Dr. - Hawthorne NY, 10532 [email protected]*School of Computer Science - Carnegie Mellon University - 5000 ForbesAve - 15213-3891, USA - {[email protected]+Google Inc., Pittsburgh - 4720 Forbes Ave., Lower Level - Pittsburgh,PA 15213 - [email protected]• Mining: Given a subset of nodes in the graph, whatis the induced subgraph that best summarizes therelationships of this subset?• Visualization: How do we see through the levels ofthe graph hierarchy?• Interaction: How do we perform all these tasks effi-ciently and intuitively?It is our contention that a system that presents theoriginal graph concomitant to its hierarchical versionmust meet all these requirements. Therefore, we seekfor a new representation for graph hierarchies, differ-ent from previous works in which the graph hierarchyis “stagnant” and cannot answer questions about therelationships between nodes at different groups, andneither between groups at different partitions of thehierarchy. These are serious limitations because agraph is, essentially, a model for representing relation-ships.Another concern is that even at the deepest levelof a graph hierarchy – at the leaves, it is possibleto find subgraphs complex enough to surpass theanalytical capacity. In this situation, one should beable to summarize the subgraph achieving a small,yet representative, fraction of it; an operation thatanswers for a deeper level of insight over hierarchicalpartitionings.The contribution of this work is the integration ofmethodologies that address the problems discussedabove. We introduce a novel representation for graphhierarchies that extends those of previous works, lead-ing to a model more suitable for presentation andcomputation. Our methodology also counts on thepossibility of graph summarization at the subgraphs(leaves) in a graph hierarchy. The result of our effortsis GMine [32], a system that allows browsing andmining of large graphs in a rich visual environment2[26].The paper is organized as follows: section 2 reviewsrelated works for this paper. Section 3 introduces theSuperGraph/Graph-Tree methodology and section 4explains the CEPS graph summarization. Section 5presents experiments on the Graph-Tree performanceand section 6 presents accuracy measures for CEPS.As a proof of concept, section 7 demonstrates theGMine system. Section 8 concludes the paper.2 RELATED WORKThe interest on large graph analysis has increasedin the recent years. This research area includespattern mining [10], influence propagation [18]and community mining [15], among others. Suchthemes can benefit from tools that enable the visualinspection of large graphs.Graph Hierarchical PresentationAlthough many works implicitly define the hierarchi-cal clustering of graphs – as in the work of Eadesand Feng [11], most of them do not touch the issueof how such arrangements deal with scalability andprocessing by means of a well-defined data structure.Batagelj et al. [6], for instance, generalizes on theconcept of X-graph of Y-graphs to define a properties-oriented hierarchical clustering of graphs not provid-ing details nor performance evaluation of the im-plicit data arrangement that supports their processing.Archambault et. al [4] define an ingenious dynamicmodification of the graph hierarchy in light of a singlenode of interest; their system requires the user toreset her/his referential locus at every new choiceof a node with a strictly linear complexity on thebasis of seconds delay. Gansner et a. [16] present afish-eye visualization built over a graph layout


Download Large Graph Analysis in the GMine System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Large Graph Analysis in the GMine System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Large Graph Analysis in the GMine System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?