DOC PREVIEW
UCSB CS 240A - Scalable Parallel Primitives

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Scalable Parallel Primitives for Massive Graph Computation !"#$% BuluçUniversity of California, Santa BarbaraSources of Massive Graphs(WWW snapshot, courtesy Y. Hyun) (Yeast protein interaction network, courtesy H. Jeong)Graphs naturally arise from the internet and social interactionsMany scientific (biological, chemical,cosmological, ecological, etc) datasets are modeled as graphs.Examples:- Centrality- Shortest paths- Network flows- Strongly Connected ComponentsExamples:- Loop and multi edge removal- Triangle/Rectangle enumeration3Types of Graph Computations Fuzzy intersectionExamples: Clustering,Algebraic MultigridTool: GraphTraversalTool: Map/Reduce1342576Tightly coupledFiltering basedmap map mapreduce reduceTightly Coupled Computations& Many graph mining algorithms are computationally intensive. (e.g. graph clustering, centrality)& Some computations are inherently latency-bound. (e.g. finding shortest paths)& Interesting graphs are sparse, typically |edges| = O(|vertices|)4Huge GraphsExpensive Kernels+!High Performance and Massive Parallelism Sparse Graphs/DataSparse Data Structures (Matrices)!Tightly Coupled Computationson Sparse Graphs555Software for Graph Computation'((()"*)+,%*-.%-/01,.%*+2345*spending ten years of my life on the TeX project is that software is hard. It's harder than anything 4/14*6784*4845*9+#*3.*#.:666Software for Graph Computation'((()"*)+,%*-.%-/01,.%*+2345*spending ten years of my life on the TeX project is that software is hard. It's harder than anything 4/14*6784*4845*9+#*3.*#.:Dealing with software is hard !777Software for Graph Computation'((()"*)+,%*-.%-/01,.%*+2345*spending ten years of my life on the TeX project is that software is hard. It's harder than anything 4/14*6784*4845*9+#*3.*#.:Dealing with software is hard !High performance computing (HPC) software is harder !888Software for Graph Computation'((()"*)+,%*-.%-/01,.%*+2345*spending ten years of my life on the TeX project is that software is hard. It's harder than anything 4/14*6784*4845*9+#*3.*#.:Dealing with software is hard !High performance computing (HPC) software is harder !Deal with parallel HPC software?Outline& The Case for Primitives& The Case for Sparse Matrices& Parallel Sparse Matrix-Matrix Multiplication& Software Design of the Combinatorial BLAS& An Application in Social Network Analysis& Other Work& Future Directions910& Input: ;,54-34#*<5+=9*>,39*'-.131:*.%*4#<41& Find least-cost paths between all reachable vertex pairs& Classical algorithm: Floyd-Warshall& Case study of implementation on multicore architecture:? graphics processing unit (GPU)All-Pairs Shortest Pathsfor k=1:n // the induction sequencefor i = 1:nfor j = 1:nif(w(i!k)+w(k!") < w(i!"))w(i!"):= w(i!k) + w(k!")1 52 3 415234k = 1 case11GPU characteristics Powerful: two Nvidia 8800s > 1 TFLOPSInexpensive: $500 each! !"##"$%&'()*+,*-.."/,(.+01&2((3/1("/4'*%$'"+/(4'*1-.(0*"514(6(-*"'7.1'"$(% /"'4! 81*#+*.-/$1("4($+%/'1*"/'%"'"51(-/0(#*-,"&1291.+*:(-$$144()-''1*/(7-4(4%;'&1(1##1$'4(+/($+4'((! <='*1.1&:( 1-4:('+(%/01*%'"&">1('71(015"$12!+"/,("'(?*+/,(1-4"&:($+4'4(@AA=("/('".1t1t3t2t4t6t5t7t9t8t10t12t11t13t14t16t15But:12Recursive All-Pairs Shortest PathsA BC DABDCA = A*; % recursive callB = AB; C = CA; D = D + CB;D = D*; % recursive callB = BD; C = DC;A = A + BC;+ ,1*'),%:@ ! ,1*'+##:Based on R-Kleene algorithmWell suited for GPU architecture:& Fast matrix-multiply kernel& In-place computation => low memory bandwidth& Few, large MatMul calls => low GPU dispatch overhead& Recursion stack on host CPU, not on multicore GPU& Careful tuning of GPU codeThe Case for Primitives13480xLifting Floyd-Warshallto GPUThe right primitive !Conclusions:High performance is achievable but not simpleCarefully chosen and optimized primitives will be keyUnorthodox R-Kleene algorithmAPSP: Experiments and observationsOutline& The Case for Primitives& The Case for Sparse Matrices& Parallel Sparse Matrix-Matrix Multiplication& Software Design of the Combinatorial BLAS& An Application in Social Network Analysis& Other Work& Future Directions1415Sparse Adjacency Matrix and Graph& Every graph is a sparse matrix and vice-versa& Adjacency matrix: sparse array w/ nonzeros for graph edges& Storage-efficient implementation from sparse data structures!1234765"#!The Case for Sparse Matrices& Many irregular applications contain su!cient coarse-grained parallelism that can ONLY be exploited using abstractions at proper level.16Traditional graph computationsGraphs in the language of linear algebraData driven. Unpredictable communication.Fixed communication patterns. Irregular and unstructured. Poor locality of referenceOperations on matrix blocks. Exploits memory hierarchyFine grained data accesses. Dominated by latencyCoarse grained parallelism. Bandwidth limitedThe Case for Sparse MatricesIdentification of PrimitivesSparse matrix-matrix Multiplication (SpGEMM)Element-wise operations17Linear Algebraic Primitivesx!"#$%&'()*+)(',%$%+-( .)'/-/)0!.)12.)0"+3.)*$2.)01.),%+2Sparse matrix-vector multiplicationSparse Matrix Indexingx.*Why focus on SpGEMM?& Graph clustering (Markov, peer pressure)& Subgraph / submatrix indexing& Shortest path calculations & Betweenness centrality& Graph contraction& Cycle detection& Multigrid interpolation & restriction& Colored intersection searching& Applying constraints in finite element computations& Context-free parsing ...18Applications of Sparse GEMM11111xxOutline& The Case for Primitives& The Case for Sparse Matrices& Parallel Sparse Matrix-Matrix Multiplication& Software Design of the Combinatorial BLAS& An Application in Social Network Analysis& Other Work& Future Directions1920Two Versions of Sparse GEMMA1A2A3A4A7A6A5A8B1B2B3B4B7B6B5B8C1C2C3C4C7C6C5C8jx=ikkCijCij+= AikBkjCi= Ci+ A Bix=1D block-column distributionCheckerboard (2D block) distributionComparative Speedup of Sparse 1D & 2D21!"#$%&'()'*+#,-#&./0%)(123# 1&4*#(1*#$0(*"()&. (0#3'&.*+#)5#)2$.*2*"(*6#'0%%*'(.78##94*%.&$$)"/#'022:")'&()0"+#&"6#2&)"(&)")"/#.0&6#;&.&"'*#&%*#'%:')&.8#Projected performances of Sparse 1D & 2DNP1D2DNP& Stores entries in column-major order& Dense collection of #$%&'$()*+,-./$0& Uses storage.22Compressed Sparse Columns (CSC):A Standard Layout!"#$%&'(")&*+,-./*/8102 3,"0)&.&!&%/*,)1'0)*2'&&3&"&3+,"+-&0 2 3 4 0 1 5 7 3 4 5 4 5 6 7041112131617 nnznO)( "23Submatrices are 'hypersparse0 (i.e. nnz << n)blocksblocksTotal Storage: Average of c nonzeros per column& A data structure or algorithm that depends on the


View Full Document

UCSB CS 240A - Scalable Parallel Primitives

Download Scalable Parallel Primitives
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Scalable Parallel Primitives and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Scalable Parallel Primitives 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?