Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture

Home> Academic Documents> Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Advanced Shortest Paths Algorithms on a Massively-Multithreaded ArchitectureJoseph R. Crobak1, Jonathan W. Berry2, Kamesh Madduri3, and David A. Bader31Rutgers University2Sandia National LaboratoriesDept. of Computer Science Albuquerque, NM USAPiscataway, NJ 08854 USA [email protected]@cs.rutgers.edu3Georgia Institute of TechnologyCollege of ComputingAtlanta, GA 30332 USA{kamesh,bader}@cc.gatech.eduAbstractWe present a study of multithreaded implementations ofThorup’s algorithm for solving the Single Source ShortestPath (SSSP) problem for undirected graphs. Our implemen-tations leverage the fledgling MultiThreaded Graph Library(MTGL) to perform operations such as finding connectedcomponents and extracting induced subgraphs. To achievegood parallel performance from this algorithm, we devi-ate from several theoretically optimal algorithmic steps. Inthis paper, we present simplifications that perform better inpractice, and we describe details of the multithreaded im-plementation that were necessary for scalability.We study synthetic graphs that model unstructured net-works, such as social networks and economic transactionnetworks. Most of the recent progress in shortest path algo-rithms relies on structure that these networks do not have.In this work, we take a step back and explore the synergy be-tween an elegant theoretical algorithm and an elegant com-puter architecture. Finally, we conclude with a predictionthat this work will become relevant to shortest path compu-tation on structured networks.1. IntroductionThorup’s algorithm [15] solves the SSSP problem forundirected graphs with positive integer weights in lineartime. To accomplish this, Thorup’s algorithm encapsulates1-4244-0910-1/07/$20.00c!2007 IEEE.information about the input graph in a data structure calledthe Component Hierarchy (CH). Based upon informationin the CH, Thorup’s algorithm identifies vertices that canbe settled in arbitrary order. This strategy is well suited toa shared-memory environment since the component hierar-chy can be constructed only once, then shared by multipleconcurrent SSSP computations.Thorup’s SSSP algorithm and the data structures thatit uses are complex. The algorithm has been generalizedto run on directed graphs in O(n + m log w) time [8](where w is word-length in bits) and in the pointer-additionmodel of computation in O(mα(m, n) + n log log r ) time[13] (where α(m, n) is Tarjan’s inverse-Ackermann func-tion and r is the ratio of the maximum-to-minimum edgelength).In this paper, we perform an experimental study of Tho-rup’s original algorithm. In order to achieve good perfor-mance, our implementation uses simple data structures anddeviates from some theoretically optimal algorithmic strate-gies. Thorup’s SSSP algorithm is complex, and we directthe reader to his original paper for a complete explanation.In the following section, we summarize related work anddescribe in detail the Component Hierarchy and Thorup’salgorithm. Next, we discuss the details of our multithreadedimplementation of Thorup’s algorithm and detail the exper-imental setup. Finally, we present experimental results andplans for future work.2. Background and Related WorkThe Cray MTA-2 and its successor, the XMT [4], aremassively multithreaded machines that provide elaborate1hardware support for latency tolerance, as opposed to la-tency mitigation. Specifically, a large amount of chip spaceis devoted to supporting many thread contexts in hardwarerather than providing cache memory and its associated com-plexity. This architecture is ideal for graph algorithms, asthey tend to be dominated by latency and to benefit littlefrom cache.We are interested in leveraging such architectures tosolve large shortest paths problems of various types. Mad-duri, et al. [11] demonstrate that for certain inputs, delta-stepping [12], a parallel Dijkstra variant, can achieve rel-ative speedups of roughly 30 in 40-processor runs on theMTA-2. This performance is achieved while finding single-source shortest paths on an unstructured graph of roughlyone billion edges in roughly 10 seconds. However, theirstudy showed that there is not enough parallelism in smallerunstructured instances to keep the MTA-2 busy. In particu-lar, similar instances of roughly one million edges yieldedrelative speedups of only about 3 on 40 processors of theMTA-2. Furthermore, structured instances with large diam-eter, such as road networks, prove to be very difficult forparallel delta stepping regardless of instance size.Finding shortest paths in these structured road networkinstances has become an active research area recently [1, 9].When geographical information is available, precomputa-tions to identify “transit nodes” [1] make subsequent s-tshortest path queries extremely fast. However, dependingon the parameters of the algorithms, serial precomputationtimes range from 1 to 11 hours on modern 3Ghz worksta-tions. We know of no work to parallelize these precompu-tations.Although we do not explicitly address that challenge inthis paper, we do note that the precomputations tend toconsist of Dijkstra-like searches through hierarchical data.These serial searches could be batched trivially into paral-lel runs, but we conjecture that this process could be accel-erated even further by the basic idea of allowing multiplesearches to share a common component hierarchy. In thispaper, we explore the utility of this basic idea.2.1. The Component HierarchyThe Component Hierarchy (CH) is a tree structure thatencapsulates information about a graph G. The CH ofan undirected graph with positive edge weights can becomputed directly, but preprocessing is needed if G con-tains zero-weight edges. Each CH-node represents a sub-graph of G called a component, which is identified by avertex v and a level i. Component(v,i) is the subgraphof G composed of vertex v, the set S of vertices reach-able from v when traversing edges with weight < 2i,and all edges adjacent to {v} ∪ S of weight less than 2i.Note that if w ∈Component(v,i), then Component(v,i) =v w55555510Comp(v,3)Comp(w,3)Comp(v,4)Figure 1. An example component hierarchy.Component(v,4), the root of this hierarchy, rep-resents the entire graph.Component(w,i).The root CH-node of the CH is a component containingthe entire graph, and each leaf represents a singleton ver-tices. The children of Component(v,i) in the CH are com-ponents representing the connected components formedwhen removing all the edges with


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school