UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set - D3032082

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 734> Implementation of DWT using SSE Instruction Set

DOC PREVIEW

UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set

School name University of Wisconsin, Madison

Course Ece 734- VLSI Array Structures for Digital Signal Processing

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Implementation of DWT using SSE Instruction SetLifting based 2D-DWT2D DWT Matrices layoutOptimizationsOptimizations …ResultsResults …Slide 8Implementation of DWT using SSE Instruction SetMehta, AmiMuller, GillesLifting based 2D-DWTLifting1D Horizontal lifting1D Vertical liftingFixed point(9,7) tap biorthogonal filterLossy compressionHigh compression levels2D DWT Matrices layoutMallat StrategyUses an auxiliary matrix to store the results of the horizontal filtering. No memory scattering:Horizontal high and low frequency components are not interleaved in memory. It allows a better exploitation of the SIMD parallelism.OptimizationsCacheThe 2 matrices are aligned on the cache row size (128bits=16B) to allow data fetching in one cycle.Input and output matrices are juxtaposed in the memory to prevent conflicts in Direct Mapped cache. (Associativity conflict)access accessCache layout without alignment Cache layout with alignmentOptimizations …SIMD codeUsing SSE2Computes 4 pixels in parallel using fixed point arithmetic.Profiling C code showed that column transform and cache access caused the main bottleneck.In DWT intermediate values are reused, instead of recalculating we keep the intermediate computations.ResultsImage size of 1024 x 1024Profiling results done using VTune Analyzer©Cycles per uops improves from 3.38 to 2.28Improvement of 32.5%Results …Thank

View Full Document

UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set

Sign up for free to view:

Please select your school