DOC PREVIEW
Berkeley COMPSCI 252 - A Comparison of the VIRAM-1 and Embedded VLIW architectures

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Comparison of the VIRAM-1 and Embedded VLIW architectures for use on SVDMotivationC67 Architecture (mapped)C67 ArchitecturePowerPoint PresentationVIRAM-1 MicroarchitectureSlide 7Testing ConditionsSlide 9Ideal ‘C67 and TM 1100 Performance GapSlide 12VIRAM Performance SummaryConcluding RemarksA Comparison of the VIRAM-1 and Embedded VLIW architectures for use on SVDCS 252Spring 2000Jeff HermanJohn LooXiaoyi TangMotivation•SVD Applications–Smart antennas–Image processing–Medical imaging•VLIW –Trend in high performance embedded computing•Vector–Out of favor–Flynn bottleneck is a limiting factor in parallelism–Known for linear algebra performanceC67 Architecture (mapped)Instruction Ram (cache optional)Data Ram (>4 banks)Decode Logic (8-way)A Register File B Register FileL1 S1 M1 D1 D2 M2 S2 L2C67 Architecture•Split Register Files•16 registers per register file•One cross path per register file•Instruction Latencies•Branches - 6 cycles•Load - 5 cycles•FP add/multiply - 4 cyclesTM 1100 VLIW Processor Core Architecture•5-issue VLIW•2 FP adders/multipliers•2 Load/Store Units•128 general purpose 32 bit registers•16KB data cache, 32KB instruction cache•Instruction Latencies•3 cycles for Branches, Load, FP add/multiplyVIRAM-1 Microarchitecture•2-way-issue superscalar MIPS IV core•Asynchronous vector unit–Communication to scalar core through queue–32 general purpose vector and flag registers–32 scalar and control register–2 VAFU, 2 FFU, 1 VMFU–4-lane standard configurationVIRAM-1 MicroarchitectureTesting Conditions•SVD routine from CLAPACK•Random test matrices with a rank of 10•Matrix dimension ratio of 10•Sizes range from 100x10 to 300x30•Suboptimal parameters used–Trends should still hold•Assumed 200 Mhz clock rateColumns vs. Cycles0510152025303510 15 20 25 30ColumnsCycles in millionsTI 'C67 IdealTI 'C6711 CacheTM1100 CacheTM1100 IdealIRAM (4-lane)IRAM (16-lane)Ideal ‘C67 and TM 1100 Performance Gap •Same memory bottlenecks in both processors•Programming model•C67–Assembly coded kernels–1700 lines•TM 1100–Only C level optimizationsVIRAM-1 Vector Core Scalability0123456789101 2 4 8 16Lane CountGain vs. standard MIPS Core100X10150X15200X20250X25300X30Utilization of Vector Core00.10.20.30.40.50.60.70.80.91 2 4 8 16Lane Countsustained/peak bandwidth100X10150X15200X20250X25300X30VIRAM Performance Summary•Gains from vector unit limited by Amdahl’s law.–Vector instructions comprise only ~15% of total code.–Not much else of SVD can be vectorized.–Gains limited by what cannot be vectorized.–Perhaps streamline LAPACK or handcode assembly?•Sub-linear scalability.–Scaling IRAM is cheap but gains diminish.–Efficiency and scalability increase with size of data set.Concluding Remarks•Limitations of both architecture are different–VIRAM: Scalar core–VLIW: Memory bandwidth•VLIW cannot match performance of VIRAM when computing SVD.•VLIW with vector


View Full Document

Berkeley COMPSCI 252 - A Comparison of the VIRAM-1 and Embedded VLIW architectures

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download A Comparison of the VIRAM-1 and Embedded VLIW architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Comparison of the VIRAM-1 and Embedded VLIW architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Comparison of the VIRAM-1 and Embedded VLIW architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?