DOC PREVIEW
CSUN COMP 546 - Architectures for Low Power

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Lecture 12Architectures for Low Power: Transmeta’s Crusoe ProcessorMotivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important than performance:Mobile communicationsMobile computingWireless InternetMedical implantsDeep space applications Battery life time Trading area/performance for power Power can be reduced by decreasing the supply voltage and allowing the performance to degrade. Trading performance for power But these techniques incur an area penalty. Trading area for powerDesigning for Low Power:Approaches2 Avoiding waste Avoiding waste Clocking module when they are idle Glitching Using dedicated rather than programmable hardware Reducing control overhead by using regular algorithms and architectures Designing systems to meet performance requirementsDesigning for Low Power:Approaches Exploiting locality Global operations inherently consume a lot of power. Data must be transferred from one part of the chip to another at the expense of switching large bus capacitances. A design partitioned to exploit locality of reference can minimize the amount of expensive global communications employed in favor of much less costly local interconnect networks.Designing for Low Power:ApproachesCrusoe Family of Processorsfrom TransmetaIntroductionSoftware: Code MorphingHardware: VLIW corePerformanceApplications3The Idea – David DitzelChampion of simple chip architecture.1995Chief Technical Officer of Sun MicroSystems Inc.’sSparc Business.Working on emulation of x86 software on SparcProcessors.The Idea – David DitzelEarly 1995 left Sun and worked on his own idea.Was not happy with the complexity of the architectures of recent times.Some new ideas mixed with some old ideas to build a simple and fast architecture capable of running x86 code.Software hardware hybrid.The Company - TransmetaDitzel and Colin Hunter choose the nameTransmeta and the company was formed in Summer of 1995.Use of contacts in the industry to recruit top brains for the ideas.Design started in the living rooms of the founders homes.Now employs many people.4InnovationTransmeta Crusoe chipx86 Emulation Very Long Instruction Word (VLIW) Code Morphing Simple ArchitectureLongRun TechnologyVirtual DevicesLow PowerIntroducing a Software LayerSoftware:Code MorphingPerforms dynamic binary translation. Compiles instructions from one instruction set architecture (ISA) to another ISA.5Code Morphingx86 binary codex86 binary codeCode Morphing SoftwareCode Morphing SoftwareVLIW binary codeVLIW binary codeDecoding and SchedulingCode morphing translates an entire group of x86 instructions at once and stores the translation in a translation cache for future reference.Conventional x86 superscalar processors fetch binary instructions and decode them into separate micro-operations. Then they are reordered by the hardware and executed in parallel.Decoding and Scheduling6Decoding and SchedulingThe translation step introduces many opportunities.Due to high repeat rates the translation cache isfrequently used to reduce overhead.Can use much more sophisticated scheduling algorithms.Much lower power consumption because translation is all in software.Can optimize generated code, and by ‘learning’ which parts are executed often, can change levels of optimization dynamically.Instruction Set EmulationEmulation is traditionally slow because of the way different ISAs handle condition codes and exceptions.Crusoe uses specific registers to emulate setting of condition codes by the processor(.c suffix is used after the instruction to show that condition codes need to be set).Exceptions are handled by using shadow registers, and a procedure called “commit and rollback”Translationby code morphing softwareTranslation Step 1Ld %r30, [%esp]Add.c %eax, %eax, %r30Ld %r31, [%esp]Add.c %ebx, %ebx, %r31Ld %esi, [%ebp]Sub.c %ecx, %ecx, 5Original x86 codeNative VLIW codeAddl %eax, (%esp)Addl %ebx, (%esp)Movl %esi, (%ebp)Subl %ecx, 57OptimisationElimination of atoms + extra condition code options.Translation Step 2Ld %r30, [%esp]Add %eax, %eax, %r30Add %ebx, %ebx, %r30Ld %esi, [%ebp]Sub.c %ecx, %ecx, 5Optimized Native VLIW codeNative VLIW codeLd %r30, [%esp]Add.c %eax, %eax, %r30Ld %r31, [%esp]Add.c %ebx, %ebx, %r31Ld %esi, [%ebp]Sub.c %ecx, %ecx, 5Translation Step 31. Ld %r30, [%esp]; Sub.c %ecx, %ecx, 52. Ld %esi, [%ebp]; Add %eax, %eax, %r30; Add %ebx, %ebx, %r30Scheduling -remaining atoms into molecules using a large window.Scheduled Native VLIW codeOptimized Native VLIW codeLd %r30, [%esp]Add %eax, %eax, %r30Add %ebx, %ebx, %r30Ld %esi, [%ebp]Sub.c %ecx, %ecx, 5Software’s Edge Molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine. The hardware doesn’t need to perform complex instruction reordering. Simplicity means fast and low-power design. Processor upgrades are simplified. Software layer means that software developers don’t have to recompile programs. New hardware architecture only needs a new code morphing software from Transmeta.8Software’s Edge Code morphing software can be upgraded independently into flash ROM. Software layer helps debugging process.There are different ways to perform the same function so software can be changed in debug process. Software layer increases performance.Timing of critical paths are improved.Optimization is applied to remove unnecessary instructions.Software reordering can be done much better than hardware by looking at a bigger window of instructions and applying more complicated algorithms.Several ISA Allows you to mix instruction sets with ease because they are all emulated by the software.Hardware9Chip SimplificationsNo Superscalar decode, grouping or issue logic.No register renaming or segmentation hardware.No floating point stack hardware.No front end memory management.Less interlock and bypassing logic.10Hardware Specifications128 bit High performance VLIW engine2 Integer units (ALU’s)Floating point unitMemory unitBranch unitCode Morphing Hardware SupportHandling exceptions by shadowing.Commit and rollback.Gated Store Buffer.Aliasing Hardware.Protection for self modifying code.LongRun Technology.11TM5400


View Full Document

CSUN COMP 546 - Architectures for Low Power

Download Architectures for Low Power
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Architectures for Low Power and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Architectures for Low Power 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?