DOC PREVIEW
UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Implementation of DWT using SSE Instruction SetLifting based 2D-DWT2D DWT Matrices layoutOptimizationsOptimizations …ResultsResults …Slide 8Implementation of DWT using SSE Instruction SetMehta, AmiMuller, GillesLifting based 2D-DWTLifting1D Horizontal lifting1D Vertical liftingFixed point(9,7) tap biorthogonal filterLossy compressionHigh compression levels2D DWT Matrices layoutMallat StrategyUses an auxiliary matrix to store the results of the horizontal filtering. No memory scattering:Horizontal high and low frequency components are not interleaved in memory. It allows a better exploitation of the SIMD parallelism.OptimizationsCacheThe 2 matrices are aligned on the cache row size (128bits=16B) to allow data fetching in one cycle.Input and output matrices are juxtaposed in the memory to prevent conflicts in Direct Mapped cache. (Associativity conflict)access accessCache layout without alignment Cache layout with alignmentOptimizations …SIMD codeUsing SSE2Computes 4 pixels in parallel using fixed point arithmetic.Profiling C code showed that column transform and cache access caused the main bottleneck.In DWT intermediate values are reused, instead of recalculating we keep the intermediate computations.ResultsImage size of 1024 x 1024Profiling results done using VTune Analyzer©Cycles per uops improves from 3.38 to 2.28Improvement of 32.5%Results …Thank


View Full Document

UW-Madison ECE 734 - Implementation of DWT using SSE Instruction Set

Documents in this Course
Load more
Download Implementation of DWT using SSE Instruction Set
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Implementation of DWT using SSE Instruction Set and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation of DWT using SSE Instruction Set 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?