kwIo8_dyDbUb3fj-RZVYM-gEUUO-wwH1FQ0nnbv1N2wvemjrKu1zwn27eq0SKt3o88R3pnoqWiTo-DbpRvJasw

Scheduling Reusable Instructions for Power Reduction




1 views

Unformatted text preview:

Scheduling Reusable Instructions for Power Reduction J. S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M. J. Irwin Microsystems Design Lab The Pennsylvania State University [email protected] 2 4/4/2004 Power: StrongARM SA-110 „ Power dissipation ICache 27% IBox 18% EBox 8% IMMU 9% DCache 16% DMMU 8% Clock 10% Write Buffer 2% Bus Ctrl 2% PLL < 1% Die of DEC StrongARM [email protected] 3 4/4/2004 Related Work „ Stage-skip pipeline A small decoded instruction buffer [1][2] „ Loop caches Dynamic/preloaded/hybrid loop caches [3][4][5] „ Filter cache Filter dcache [6], decode filter cache [7] [email protected] 4 4/4/2004 Related Work [1] M. Hiraki et al. Stage-skip pipeline: A low power processor architecture using a decoded instruction buffer. In Proc. International Symposium on Low Power Electronics and Design, 1996. [2] R. S. Bajwa et al. Instruction buffering to reduce power in processors for signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 5(4):417–424, December 1997. [3] L. H. Lee, B. Moyer, and J. Arends. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proc. International Symposium on Low Power Electronics and Design, 1999. [4] T. Anderson and S. Agarwala. Effective hardware-based two-way loop cache for high performance low power processors. In IEEE Int’l Conf. on Computer Design, 2000. [5] A. Gordon-Ross, S. Cotterell, and F. Vahid. Exploiting fixed programs in embedded systems: A loop cache example. IEEE Computer Architecture Letters, 2002. [6] J. Kin et al. The filter cache: An energy efficient memory structure. In Proc. International Symposium on Microarchitecture, 1997. [7] W. Tang, R. Gupta, and A. Nicolau. Power savings in embedded processors through decode filter cache. In Proc. Design and Test in Europe Conference, 2002. [email protected] 5 4/4/2004 Our Proposed Approach „ Scheduling reusable loop instructions within the issue queue No need of an additional instruction buffer Utilize the existing issue queue resources Be able to gate the front-end of pipeline Automatically unroll loops in the issue queue No ISA modification [email protected] 6 4/4/2004 Embedded Processor based on MIPS Core Inst. Cache Reorder Buffer (ROB) Load Cache Inst. Decoder Register Map Resource Queue Issue Store Queue Register File FP Function Units Int Function Units Data Add calc Fetch Decode Issue Commit Rename Queue Reg Read Execute WriteBack DcacheAcc (a) (b) [email protected] 7 4/4/2004 Schedule Reusable Instructions (bufferable) Loop (non-bufferable) Outer Loop Innermost slti r2, r24, 499 addiu r24, r24, 1 addiu r5, r5, 2000 addiu r6, r6, 2000 slti r2, r22, 499 addiu r3, r3, 4 addiu r4, r4, 4 addiu r22, r22, 1 sw r2, 0(r4) subu r2, r24, 422 sw r2, 0(r3) addu r3, r0, r5 addu r4, r0, r6 beq r20, r0, 0x4002e8 addiu r20, r0, 499   addu r22, r0, r0 bne r2, r0, 0x4002a0 addu r2, r24, r22 bne r2, r0, 0x400278 „ Array-intensive embedded applications „ Utilizing issue queue „ Reusable instructions – innermost loops „ Self-steaming issue queue „ Gate front-end of the datapath






Loading Unlocking...

Login

Join to view Scheduling Reusable Instructions for Power Reduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?

Sign Up

Join to view Scheduling Reusable Instructions for Power Reduction and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?