Hardware Acceleration of the Lifting Based DWTBackground: The Lifting SchemeAdvantagesMotivationParallelizing Predict Stage(1)Parallelizing Predict Stage(2)Predict ModuleAccelerating the update stageUpdate ModulePipelining the predict and update stagesWork DoneResultsHardware Acceleration of the Lifting Based DWT Vidhu Niti SinghPritam KulkarniBackground: The Lifting Scheme: Low Frequency ComponentsScaling Coefficients : High Frequency ComponentsWavelet CoefficientsAdvantagesDoes not rely on Fourier Transform, hence can be made even more efficient.Integer to Integer TransformationMaps integers to integers thus eliminating the use of floating point operations. This process is reversible and lossless.Good for Hardware implementation.Large amount of parallelism available.All computations done in place i.e. no extra registers needed to store the input. Registers are needed to store the filter/lifting coefficients though.Motivation- Parallel operations in predicting one - Parallel operations in updating from one - Reusing data in predicting one - Reusing data in updating from one - Parallel execution of 1-D predict and update phases- Reusing data in 1-D predict and update phasesParallelizing Predict Stage(1) All and filter coefficients known before hand. The entire operation Can be done in parallelLambdas used for consecutive stages are stored in buffer to reduce numberof memory accesses. Parallelizing Predict Stage(2)The filter coefficients can be stored in a RAM/Registers in such a manner that they can be accessed in parallel.Predict ModuleAccelerating the update stageIn the update stage we know all the lambdas and lifting coefficientsHence there is no data dependency within one cycle and we can do the entire cycle in parallel.Update ModulePipelining the predict and update stagesA FIFO buffer is needed to accommodate for the different rate of productionof gammas.Work DoneWritten the code ( MATLAB and C) for the predict and update modules for both – the forward and the inverse wavelet transform based on the lifting scheme. Designed the Hardware shown above to help in the parallel implementation of the DWT. Completed coding of the modules in Verilog HDL. Synthesized the same using Synopsis Design Compiler.ResultsPredict stage in a sequential machine needs16 memory accesses 4 multiplication4 addition/subtractionThe new scheme needs1 memory access1 clock for all the multiplications1 clock cycle for all additions/subtractions.Exactly the same results for the update stage.Apart from parallelism within the stages. This scheme allows pipelining of the two
View Full Document