The CAE Architecture: Decoupled Program Control forComplexity-Effective Performance.Ronny Krashinsky and Mike Sung6.893 Project Report (checkpoint 1)MIT Laboratory for Computer Science, Cambridge, MA 02139fronny,[email protected] superscalar architectures will not scale intothe next era of computer architecture. Their design is basedon structures with a high degree of connectivity that willnot be available in future chips in which a clock cycle cov-ers a tiny fraction of the area. Their performance is basedon reckless speculation that will not be tolerated in futurecomplexity-effective designs. The processor of the futureis composed of many decoupled elements working inde-pendently but in collaboration. As went the supercomput-ers, so will the superprocessors.A promising next-generation architecture has beendemonstrated in decoupled access/execute machines.These processors have split apart the memory access andexecution portions of a program, and thus have immedi-ately exposed a large amount of ILP. By allowing thesestreams to slip relative to each other, these machines enjoythe benefits of out-of-order execution and memory latencyhiding with very little overhead. Additionally, the queueswhich connect these decoupled elements together providethe benefits of register renaming without the complexityrequired in superscalar architectures.This work presents decoupledcontrol flow, the next stepwhich will enable processorsof the future to reach new lev-els of performance. In a decoupled control/access/execute(CAE) machine, a control processor runs ahead and feedsdirectives to the memory access processor and the mainexecution processor; the directives are in the form of com-mands to execute basic blocks. The execution engine isthen responsible for processing streams of valid instruc-tions and data values, obtained without the overhead ofspeculation. This is a fundamental departure from themodel in which an execution engine must actively fetchinstructions and data values, or speculate to hide latency.As a result, new levels of performance are obtainable.1 Introduction2 CAE Architecture (TRS)2.1 Queue Communication2.2 Control Processor2.3 Access Processor2.4 Execute Processor2.4.1 Caches/Queues for Streaming Instructions2.4.2 Fast Streaming Engines3 CAE Programming4 CAE Performance4.1 Livermore Loops4.2 Streaming Media5 CAE Analysis5.1 Complexity5.2 Comparison to Superscalar5.3 Comparison to DSPs6 CAE Extendibility6.1 Tiled CAE processors7Conclusion1References[1] Wm. A.Wulf. Evaluation of the WM computer architecture.journal,0.[2] Wm. A.Wulf. The WM computer architecture. ComputerArchitecture News, 16(1):???, March 1988.[3] E. Rotenberg et. al. A study of control independence insuperscalar processors. journal,0.[4] E. Rotenberg et. al. Trace processors. journal,0.[5] James E. Smith et. al. The astronautics zs-1 processor. jour-nal,0.[6] M. Farrens, P. Ng, and P. Nico. A comparison of superscalarand decoupled access/execute architectures. journal,0.[7] L. Gwennap. Mips r10000 uses decoupled architecture.journal,0.[8] P. T. Hulina, L. Kurian, E. B. John, and L. D. Coraor. De-sign and vlsi implementation of an access processor for adecoupled architecture. journal,0.[9] L. K. John, A. Subramanian, P. T. Hulina, and L. D. Coraor.Improving the parallelism and concurrency in decoupled ar-chitectures. journal,0.[10] L. Kurian, P. T. Hulina, and L. D. Coraor. Memory latencyeffects in decoupled architectures. journal,0.[11] J. E. Smith. Dynamic instruction scheduling and the astro-nautics zs-1. IEEE Computer, 22(7):21–35, July 1989.[12] James E. Smith. Decoupled access/execute computer archi-tecture. In ISCA 9, 1982.[13] J. Tubella and A. Gonzalez. Control speculation in multi-threaded processors through dynamic loop detection. jour-nal,0.[14] G. Tyson and M. Farrens. Code scheduling for multipleinstruction stream architectures. journal,0.[15] G. Tyson, M. Farrens, and A. Pleszkun. Misc: A multipleinstruction stream computer.
View Full Document