DOC PREVIEW
UW-Madison ECE 734 - Implementation of JPEG 2000 Component Algorithm—DWT in TI TMS32060

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ECE 734 VLSI Array Structures for Digital Signal ProcessingAgendaAbstractC code ImplementationAssembly Code without any optimizationAssembly Code with speed optimizationSpeed optimized code analysisAssembly Code with pipeline optimizationPipeline optimized code designComparisonSlide 11Slide 12Slide 13Thanks!ECE 734 VLSI Array Structures for Digital Signal Processing Topic: Implementation of JPEG 2000 component algorithm—DWT in TI TMS32060Team Members: Peng Zhang and Xun ZhangAdvisor: Yu Hen HuSpring 2004AgendaAbstractDWT C implementationDWT TMS320 C62 Assembly CodeWithout optimizationSpeed optimizationPipeline optimization (by us)Result comparisonJpeg 2000 and DWT (if we have free time)AbstractIn this project, we would like to implement and optimize DWT algorithm ,which is used as a key algorithm in JPEG2000, on TI TMS320C62 platform.1st Step, we implemented 2D DWT algorithm by C code;2nd Step, we implemented 2D DWT algorithm at TI TMS320C62 platform 2 times, without any optimization and with the fastest speed optimization;3rd Step, we did advanced optimization to assembly code, mainly used pipeline;4th Step, we compare the performance between before and after our optimization. Spring 2004C code Implementation ...#define S(i) a[x*(i)*2]...void dwt_deinterleave(int *a, int n, int x) { int dn, sn, i; int *b; dn=n/2; sn=(n+1)/2; b=(int*)malloc(n*sizeof(int)); for (i=0; i<sn; i++) b[i]=a[2*i*x];...}/// Forward wavelet tranform in 1-D.void dwt_encode_1(int *a, int n, int x){... dwt_deinterleave(a, n, x);}/// Forward wavelet tranform in 2-D.void dwt_encode(int *a, int w, int h, int l){ int i, j, rw, rh; for (i=0; i<l; i++) { rw=int_ceildivpow2(w, i); rh=int_ceildivpow2(h, i); for (j=0; j<rw; j++) dwt_encode_1(a+j, rh, w);... }}void main(){...dwt_encode(image[0], 200, 165, 8);...}Spring 2004Assembly Code without any optimization;----------------------------------------------------------------------; 24 | void dwt_deinterleave(int *a, int n, int x) ;----------------------------------------------------------------------_dwt_deinterleave:;** --------------------------------------------------------------------------*...;----------------------------------------------------------------------; 31 | for (i=0; i<sn; i++) ;---------------------------------------------------------------------- ZERO .D2 B4 ; |31| STW .D2T2 B4,*+SP(24) ; |31| LDW .D2T2 *+SP(24),B5 ; |31| LDW .D2T2 *+SP(20),B4 ; |31| NOP 4 CMPLT .L2 B5,B4,B0 ; |31| [!B0] B .S1 L2 ; |31| NOP 5 ; BRANCH OCCURS ; |31| L1: .line 9; 32 | b[i]=a[2*i*x]; ;---------------------------------------------------------------------- LDW .D2T2 *+SP(24),B4 ; |32| LDW .D2T2 *+SP(12),B5 ; |32| LDW .D2T2 *+SP(4),B6 ; |32| NOP 2 ADD .D2 B4,B4,B4 MPYLH .M2 B5,B4,B8 ; |32| MPYLH .M2 B4,B5,B7 ; |32| MPYU .M2 B5,B4,B5 ; |32| ADD .D2 B8,B7,B4 ; |32| SHL .S2 B4,16,B4 ; |32| ADD .S2 B5,B4,B4 ; |32| || LDW .D2T2 *+SP(28),B7 ; |32| LDW .D2T2 *+B6[B4],B4 ; |32| LDW .D2T2 *+SP(24),B5 ; |32| NOP 4 STW .D2T2 B4,*+B7[B5] ; |32| LDW .D2T2 *+SP(24),B4 ; |32| NOP 4 ADD .D2 1,B4,B4 ; |32| STW .D2T2 B4,*+SP(24) ; |32| LDW .D2T2 *+SP(24),B5 ; |32| LDW .D2T2 *+SP(20),B4 ; |32| NOP 4 CMPLT .L2 B5,B4,B0 ; |32| [ B0] B .S1 L1 ; |32| NOP 5 ; BRANCH OCCURS ; |32| ;----------------------------------------------------------------------...Assembly Code with speed optimization_dwt_deinterleave:…;** ------------------------------------------------------------------------|| MV .D2 B4,B11.line 5 MV .D2 B11,B0 ; |28| SHRU .S2 B0,31,B4 ; |28| ADD .D2 B4,B0,B4 ; |28| SHR .S2 B4,1,B0 ; |28| MV .D2 B0,B12 ; |28| .line 6 ADD .D2 1,B11,B10 ; |29| SHRU .S2 B10,31,B4 ; |29| ADD .D2 B4,B10,B4 ; |29| SHR .S2 B4,1,B4 ; |29| MV .S1X B4,A12 ; |29| .line 7 B .S1 _malloc ; |30| MVKL .S2 RL0,B3 ; |30| SHL .S1X B11,2,A4 ; |30| MVKH .S2 RL0,B3 ; |30| NOP 2RL0: ; CALL OCCURS ; |30| .line 8 CMPLT .L2 B10,2,B0 [ B0] B .S1 L2 ; |31| MV .D2 B10,B4 [!B0] MV .D1 A4,A3 [!B0] MV .S1 A10,A0 NOP 2 ; BRANCH OCCURS ; |31| ;** --------------------------------------------------------------------------*;** ----------------------- U$22 = a;;** ----------------------- U$25 = b;;** 32 ----------------------- L$1 = K$7>>1;;** ----------------------- X$4 = x<<3;;** ----------------------- #pragma MUST_ITERATE(1, 1073741823, 1).line 9 SHR .S2 B4,1,B0 ; |32| || SHL .S1 A11,3,A6;** -----------------------g3:;** 32 ----------------------- *U$25++ = *U$22;;** 32 ----------------------- U$22 += X$4;;** 32 ----------------------- if ( --L$1 ) goto g3; SUB .D2 B0,1,B0 ; |32| L1: [ B0] B .S1 L1 ; |32| || LDW .D1T1 *A0,A5 ; |32| ADD .S1 A6,A0,A0 ; |32| [ B0] SUB .D2 B0,1,B0 ; |32| NOP 2 STW .D1T1 A5,*A3++ ; |32| ; BRANCH OCCURS ; |32| ;** -----------------------------------------------------------------------*...Speed optimized code analysisfor (i=0; i<sn; i++)


View Full Document

UW-Madison ECE 734 - Implementation of JPEG 2000 Component Algorithm—DWT in TI TMS32060

Documents in this Course
Load more
Download Implementation of JPEG 2000 Component Algorithm—DWT in TI TMS32060
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Implementation of JPEG 2000 Component Algorithm—DWT in TI TMS32060 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation of JPEG 2000 Component Algorithm—DWT in TI TMS32060 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?