CMU CS 15740 - Efficient Parallel Access Patterns for Sparse Matrices - D1124252

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15740> Efficient Parallel Access Patterns for Sparse Matrices

CMU CS 15740 - Efficient Parallel Access Patterns for Sparse Matrices

School name Carnegie Mellon University

Course Cs 15740- Computer Architecture

Pages 8

Download Save

Unformatted text preview:

Efficient Parallel Access Patterns for Sparse Matrices Kevin Waugh kwaugh andrew Abstract A notable characteristic of the scientific computing and machine learning problem domains is the large amount of data to be analyzed and manipulated Here carefully crafted parallel algorithms have the potential to make a massive impact on the size of a problem that is deemed tractable Sparse matrices are a data structure frequently encountered within these domains Typically these sparse matrices have a non uniform structure which can make the task of designing efficient parallel algorithms for sparse matrices much more complex than for their dense counterparts In this report we consider multiple ways of parallelizing the most computationally intensive section of a typical sparse matrix algorithm its inner loop We compare the theoretical trade offs between each of the approaches and empirically test performance on the Netflix Prize data set 4 5 a large collaborative filtering task that seeks to learn movie preferences from a sparse collection of ratings 1 Introduction Large scale scientific computing and machine learning have rapidly become popular research areas in the last decade The advances in these areas have led to many practical applications that have dramatically impacted society in a generally beneficial way For example real time weather simulation financial market analysis web indexing and searching voice and face recognition product recommendation intrusion and fraud detection and control automation are tasks that computers were rarely used for or trusted with fifteen years ago Though these applications would not be possible without the staggering advancements made by researchers in these areas it is safe to say that the upward trend in processor performance and memory capacity has been pivotal to this progression As of recent the rate at which single processor computation is increasing is stagnating Instead computer architects have shifted focus towards multi core designs to keep up with the ever growing computational demands This paradigm shift will in require the efficient design of parallel algorithms to effectively make use of the newest computing resources A sparse matrix is a frequently reoccuring data structure in scientific computing and machine learning Unlike a dense matrix most of the entries in a sparse matrix are zero and are not explicitly stored in memory For example a web search engine might use a matrix of word frequency counts for a set of web pages Here a web page typically uses a small subset of the words in the dictionary which leads to most of the matrix containing zeros Similarly a social networking graph where each edge indicates some type of relationship when represented as an adjacency matrix will likely be sparse In these cases a machine learning algorithm for creating a search index or inferring relationships must be equipped to handle this explicit sparse structure to gain traction on any reasonably sized data set Let us denote an m by n sparse matrix as a pair A S where i j S is a non zero entry in the matrix and A i j is its value A reoccuring pattern in sparse matrix algorithms is then to repeatedly evaluate functions f and g on the non zero entries to compute X X u f i j A i j and v g i j A i j 1 i j S i j S 1 where u Rm and v Rn are row and column vectors respectively This model encompasses a large class of algorithms including the vast array centered around matrix vector products which have been well studied and optimized 2 6 For example solution techniques for linear systems including solving a positive definite system with the conjugate gradient method a variety of matrix factorization algorithms including ones for computing eigenvector and singular value decompositions and optimization algorithms like interior point methods for linear and quadratic programming Despite its simple structure no one approach to parallelizing this family of algorithms is obviously best Usually determining how to effectively distribute this computation will depend on the nonuniform pattern of sparseness in the data in addition to the functions f and g In this report we will examine a variety of schemes for parallelizing algorithms in this family We will compare and contrast these schemes empirically on both shared memory machines and a chip multiprocessor as well as theoretically in terms of both computational and memory resource overhead in addition to cache performance This report is sectioned as follows First we will introduce the Netflix Prize data set and its collaborative filtering task which we will use for our empirical study Then we will describe one possible approach for collaborative filtering the pseudo singular value decomposition learned using gradient descent whose inner loop follows the noted sparse access pattern Finally we will investigate and contrast both theoretically and empirically a variety of ways to parallelize this approach 2 Collaborative Filtering and the Netflix Prize Dataset A collaborative filtering task is most easily described as a recommendation system Take for example an online store selling a number of related items The store might wish to infer the buying habits of a user to strategically advertise goods that are mostlikely to be purchased Similarly an online movie or music retailer might allow a user to rate music that they like and dislike The system could then recommend similar music or movies More formally a collaborative filtering system takes as input a set of k user item rating tuples and gives as output a function that predicts any unobserved tuple The quality of such a system is usually measured by considering the root mean squared error of the predictions on withheld test set Good collaborative filtering systems consider groups of users or perhaps them all at once when learning preferences This is because typically the observation vector for a single user is sparsely populated and little can be inferred from the single user s ratings in isolation That is one typically assumes that though there are many distinct users of the system many of them share similar tastes These similar tastes can be inferred by learning relationships between the users through their ratings Thus a sparse matrix of ratings is a natural way to represent the input data Though here the unrecorded entries in the matrix denote an unknown observation as opposed to a known zero In 2006 the online movie rental company Netflix posed a million dollar

View Full Document


School:
Email:
New Password:
Confirm Password:

CMU CS 15740 - Efficient Parallel Access Patterns for Sparse Matrices

Sign up for free to view:

Please select your school