Unformatted text preview:

Efficient Parallel Access Patterns for Sparse Matrices Kevin Waugh kwaugh andrew Abstract A notable characteristic of the scientific computing and machine learning problem domains is the large amount of data to be analyzed and manipulated Here carefully crafted parallel algorithms have the potential to make a massive impact on the size of a problem that is deemed tractable Sparse matrices are a data structure frequently encountered within these domains Typically these sparse matrices have a non uniform structure which can make the task of designing efficient parallel algorithms for sparse matrices much more complex than for their dense counterparts In this report we consider multiple ways of parallelizing the most computationally intensive section of a typical sparse matrix algorithm its inner loop We compare the theoretical trade offs between each of the approaches and empirically test performance on the Netflix Prize data set 4 5 a large collaborative filtering task that seeks to learn movie preferences from a sparse collection of ratings 1 Introduction Large scale scientific computing and machine learning have rapidly become popular research areas in the last decade The advances in these areas have led to many practical applications that have dramatically impacted society in a generally beneficial way For example real time weather simulation financial market analysis web indexing and searching voice and face recognition product recommendation intrusion and fraud detection and control automation are tasks that computers were rarely used for or trusted with fifteen years ago Though these applications would not be possible without the staggering advancements made by researchers in these areas it is safe to say that the upward trend in processor performance and memory capacity has been pivotal to this progression As of recent the rate at which single processor computation is increasing is stagnating Instead computer architects have shifted focus towards multi core designs to keep up with the ever growing computational demands This paradigm shift will in require the efficient design of parallel algorithms to effectively make use of the newest computing resources A sparse matrix is a frequently reoccuring data structure in scientific computing and machine learning Unlike a dense matrix most of the entries in a sparse matrix are zero and are not explicitly stored in memory For example a web search engine might use a matrix of word frequency counts for a set of web pages Here a web page typically uses a small subset of the words in the dictionary which leads to most of the matrix containing zeros Similarly a social networking graph where each edge indicates some type of relationship when represented as an adjacency matrix will likely be sparse In these cases a machine learning algorithm for creating a search index or inferring relationships must be equipped to handle this explicit sparse structure to gain traction on any reasonably sized data set Let us denote an m by n sparse matrix as a pair A S where i j S is a non zero entry in the matrix and A i j is its value A reoccuring pattern in sparse matrix algorithms is then to repeatedly evaluate functions f and g on the non zero entries to compute X X u f i j A i j and v g i j A i j 1 i j S i j S 1 where u Rm and v Rn are row and column vectors respectively This model encompasses a large class of algorithms including the vast array centered around matrix vector products which have been well studied and optimized 2 6 For example solution techniques for linear systems including solving a positive definite system with the conjugate gradient method a variety of matrix factorization algorithms including ones for computing eigenvector and singular value decompositions and optimization algorithms like interior point methods for linear and quadratic programming Despite its simple structure no one approach to parallelizing this family of algorithms is obviously best Usually determining how to effectively distribute this computation will depend on the nonuniform pattern of sparseness in the data in addition to the functions f and g In this report we will examine a variety of schemes for parallelizing algorithms in this family We will compare and contrast these schemes empirically on both shared memory machines and a chip multiprocessor as well as theoretically in terms of both computational and memory resource overhead in addition to cache performance This report is sectioned as follows First we will introduce the Netflix Prize data set and its collaborative filtering task which we will use for our empirical study Then we will describe one possible approach for collaborative filtering the pseudo singular value decomposition learned using gradient descent whose inner loop follows the noted sparse access pattern Finally we will investigate and contrast both theoretically and empirically a variety of ways to parallelize this approach 2 Collaborative Filtering and the Netflix Prize Dataset A collaborative filtering task is most easily described as a recommendation system Take for example an online store selling a number of related items The store might wish to infer the buying habits of a user to strategically advertise goods that are mostlikely to be purchased Similarly an online movie or music retailer might allow a user to rate music that they like and dislike The system could then recommend similar music or movies More formally a collaborative filtering system takes as input a set of k user item rating tuples and gives as output a function that predicts any unobserved tuple The quality of such a system is usually measured by considering the root mean squared error of the predictions on withheld test set Good collaborative filtering systems consider groups of users or perhaps them all at once when learning preferences This is because typically the observation vector for a single user is sparsely populated and little can be inferred from the single user s ratings in isolation That is one typically assumes that though there are many distinct users of the system many of them share similar tastes These similar tastes can be inferred by learning relationships between the users through their ratings Thus a sparse matrix of ratings is a natural way to represent the input data Though here the unrecorded entries in the matrix denote an unknown observation as opposed to a known zero In 2006 the online movie rental company Netflix posed a million dollar


View Full Document

CMU CS 15740 - Efficient Parallel Access Patterns for Sparse Matrices

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Efficient Parallel Access Patterns for Sparse Matrices and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Efficient Parallel Access Patterns for Sparse Matrices and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?