DOC PREVIEW
Berkeley COMPSCI C267 - Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication

This preview shows page 1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-124-125-126-127-128-129-130-131-132-133-134-135-136-137-138-139-140-250-251-252-253-254-255-256-257-258-259-260-261-262-263-264-265-266-267 out of 267 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 267 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV)Berkeley Benchmarking and OPtimization (BeBOP)OutlineMotivation for Automatic Performance TuningExamples of Automatic Performance Tuning (1)Examples of Automatic Performance Tuning (2)Tuning Register Tile Sizes (Dense Matrix Multiply)Example: Select a Matmul ImplementationExample: Support Vector ClassificationExamples of Automatic Performance Tuning (3)Slide 11Linear Programming MatrixA Sparse Matrix You Encounter Every DaySpMV with Compressed Sparse Row (CSR) StorageExample: The Difficulty of TuningSlide 16Taking advantage of block structure in SpMVSpeedups on Itanium 2: The Need for SearchRegister Profile: Itanium 2SpMV Performance (Matrix #2): Generation 2Register Profiles: Sun and Intel x86SpMV Performance (Matrix #2): Generation 1Register Profiles: IBM and Intel IA-64Another example of tuning challengesZoom in to top corner3x3 blocks look natural, but…Extra Work Can Improve Efficiency!Automatic Register Block Size SelectionAccurate and Efficient Adaptive Fill EstimationAccuracy of the Tuning Heuristics (1/4)Accuracy of the Tuning Heuristics (2/4)Slide 32Upper Bounds on Performance for blocked SpMVExample: L2 Misses on Itanium 2Example: Bounds on Itanium 2Slide 36Slide 37Summary of Other Performance OptimizationsExample: Sparse Triangular FactorCache Optimizations for AAT*xExample: Combining Optimizations (1/2)Example: Combining Optimizations (2/2)Potential Impact on Applications: Omega3PSlide 44Slide 45Slide 46Slide 47Slide 48Slide 49Optimized Sparse Kernel Interface - OSKIHow the OSKI Tunes (Overview)Slide 52Optimizations in OSKI, so farHow to Call OSKI: Basic UsageSlide 55Slide 56How to Call OSKI: Tune with Explicit HintsHow the User Calls OSKI: Implicit TuningQuick-and-dirty Parallelism: OSKI-PETScOSKI-PETSc Proof-of-Concept ResultsAccelerator Cavity MatrixOSKI-PETSc Performance: Accel. CavitySlide 63OSKI-PETSc Performance: LP MatrixTuning Higher Level Algorithms than SpMVSlide 66Slide 67Slide 68Slide 69Slide 70Slide 71Slide 72Remotely Dependent Entries for [x,Ax, A2x,A3x], 2D LaplacianSlide 74Slide 75Performance ResultsSlide 77Optimizing Communication Complexity of Sparse SolversMinimizing CommunicationSlide 80Slide 81Slide 82Slide 83Slide 84Slide 85Slide 86ExtensionsDesign Space for [x,Ax,…,Akx] (1/3)Design Space for [x,Ax,…,Akx] (2/3)Design Space for [x,Ax,…,Akx] (3/3)SummaryPossible Class ProjectsExtra SlidesTuning Higher Level AlgorithmsKrylov Subspace Methods for Solving Ax=bExample: Standard GMRESExample: Computing [Ax,A2x,A3x,…,Akx] for A tridiagonalSlide 98Slide 99Example: Computing [Ax,A2x,A3x,…,Akx] for LaplacianLatency-Avoiding GMRES (1)Latency-Avoiding GMRES (2)Numerical example (1)Numerical Example (2)Operation Counts for [Ax,A2x,A3x,…,Akx] on p procsSummary and Future WorkSlide 107Slide 108What about the Google Matrix?Current WorkSummary of High-Level ThemesRelated WorkFuture Directions (A Bag of Flaky Ideas)Possible Future WorkReview of Tuning by IllustrationSplitting for Variable Blocks and DiagonalsExample: Variable Block Row (Matrix #12)Example: Row-Segmented DiagonalsMixed Diagonal and Block StructureSummarySlide 121Slide 122Problem ContextKey Questions, Ideas, ConclusionsRoad MapCompressed Sparse Row (CSR) StorageSlide 127Historical Trends in SpMV PerformanceSpMV Historical Trends: Mflop/sSlide 130Still More SurprisesSlide 132Historical Trends: Mixed NewsSlide 134SPARSITY: Framework for Tuning SpMVSlide 136Slide 137Slide 138Accuracy of the Tuning Heuristics (3/4)Accuracy of the Tuning Heuristics (4/4)Slide 141Motivation for Upper Bounds ModelUpper Bounds on Performance: Blocked SpMVSlide 144Slide 145Slide 146Fraction of Upper Bound Across PlatformsAchieved Performance and Machine BalanceSlide 149Where Does the Time Go?Execution Time Breakdown: Matrix 40Speedups with Increasing Line SizeSummary: Performance Upper BoundsSlide 154Statistical Models for Automatic TuningSlide 156Slide 157Slide 158Slide 159Slide 160Slide 161AcknowledgementsTSP-based Reordering: BeforeTSP-based Reordering: AfterSlide 165Example: Distribution of Blocked Non-ZerosSlide 167Slide 168Slide 169Slide 170Sparse/Dense Partitioning for SpTSSpTS Performance: Power3Slide 173Summary of SpTS and AAT*x ResultsRegister Blocking: SpeedupRegister Blocking: PerformanceRegister Blocking: Fraction of PeakExample: Confidence Interval EstimationCosts of TuningSplitting + UBCSR: Pentium IIISplitting + UBCSR: Power4Splitting+UBCSR Storage: Power4Slide 183Example: Variable Block Row (Matrix #13)Dense Tuning is Hard, TooSlide 186Slide 187Preliminary Results (Matrix Set 2): Itanium 2Multiple Vector PerformanceSlide 190Slide 191Slide 192MAPS Benchmark Example: Power4MAPS Benchmark Example: Itanium 2Saavedra-Barrera Example: Ultra 2iSlide 196Summary of Results: Pentium IIISummary of Results: Pentium III (3/3)Execution Time Breakdown (PAPI): Matrix 40Preliminary Results (Matrix Set 1): Itanium 2Tuning Sparse Triangular Solve (SpTS)Sparse Kernels and OptimizationsCache Blocked SpMV on LSI Matrix: Ultra 2iCache Blocking on LSI Matrix: Pentium 4Cache Blocked SpMV on LSI Matrix: ItaniumCache Blocked SpMV on LSI Matrix: Itanium 2Inter-Iteration Sparse Tiling (1/3)Inter-Iteration Sparse Tiling (2/3)Inter-Iteration Sparse Tiling (3/3)Inter-Iteration Sparse Tiling: IssuesSummary and QuestionsExploiting Matrix StructureSymmetric SpMV Performance: Pentium 4SpMV with Split Matrices: Ultra 2iCache Blocking on Random Matrices: ItaniumSlide 216Register Blocked SpMV: Pentium IIIRegister Blocked SpMV: Ultra 2iRegister Blocked SpMV: Power3Register Blocked SpMV: ItaniumPossible Optimization TechniquesMultiple Vector Performance: ItaniumSlide 223SpTS Performance: ItaniumSlide 225Optimizing AAT*xOptimized AAT*x Performance: Pentium IIICurrent DirectionsSlide 229More Related WorkContext: Creating High-Performance LibrariesSlide 232Sustainable Memory BandwidthMultiple Vector Performance: Pentium 4Slide 235Slide 236Optimized AAT*x Performance: Ultra 2iOptimized AAT*x Performance: Pentium 4Tuning Pays Off—PHiPACTuning pays off – ATLASRegister Tile Sizes (Dense Matrix Multiply)High Precision GEMV (XBLAS)High Precision Algorithms (XBLAS)More Extra SlidesSlide 245AwardsSlide 247Can Match DGEMV PerformanceSlide 249Slide 250Slide 251Slide 252Slide 253Slide 254Slide 255Slide 256Evaluating algorithms and machines for SpMVTuning Dense BLAS —PHiPACTuning Dense BLAS– ATLASSlide 260Slide 261SpMV Historical Trends: Fraction of PeakMotivation


View Full Document

Berkeley COMPSCI C267 - Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?