Stanford EE 392C - Lecture #7 - Polymorphic Architectures I

Unformatted text preview:

EE392C: Advanced Topics in Computer Architecture Lecture #7Polymorphic ProcessorsStanford University Tuesday, April 29 2003Po l ymorphic Architectures ILecture #7: Tuesday, April 22 2003Lecturer: Jing Jiang and Honggo WijayaScribe: Chi Ho Yue and Rohit GuptaWe are entering an era of ubiquitous computing, as technology scales, more and moreapplications demand ever-growing performances. Yet the design complexity g rows aswell. In addition, high no n-incurring fabrication cost and manufacturing delays demandchips to be sold in large volume, thus targeting a la rger market to be cost effective. Howcan we achieve performances comparable to customized solutions in a single chip design?The answer is Polymorphous Architecture.One thing is for sure, interconnect is going to be a big issue and therefore any kind ofarchitecture needs to be scalable in terms of wires. We will look at 2 particular solutionstoday, Smart memories and TRIPS.1 Paper 1: The TRIPS Multip r ocesso rTRIPS processor consists 4 out-of-order, 16 -wide-issue Grid processor cores, which canbe partitioned to exploit different types of parallelism. It uses software scheduler t ooptimize for point-to-point communication.It’s a block-oriented system in all modes of operations, namely hyberblocks. Programsare compiled into large blocks of instructions with single entry point, no internal loopsand possible multiple exit point s. Each block has a set of state inputs and a potentiallyvariable set of state outputs that depend upon the exit point from the block. The compileris responsible for statically scheduling each block of instructions onto the computationengine.Each node of the grid processor consists of an integer ALU, A floating-point unit, aset of reservation states. Each node can fo rward the result to any of the operands in thelocal of remote reservation states within the ALU.TRIPS processor has the following resources to achieve configurability. First framespace is the reservation stations with the same index across all nodes. Next is the registerfile banks, which are used for speculation or multithreading etc, depending on the modeof operation. Block Sequencing contr ols has various p olicies for different modes. Fo rexample, deallocation log ic maybe configured to allow a block to execute more than2 EE392C: Lecture #7once, as is useful in streaming applications. Also memory tiles can be configured asscratch pa d memory, synchronization buffers etc.The strength of this paper is that the processor can deal with a mix load of parallelism,at least it claims. However, the performance numbers are done with a perfect memory inmind which is not usually the case in a real world. The overhead of speculation hardwarecan not be underestimated either.2 Paper 2: Smart Memories2.1 SummaryThis paper proposes Smart Memories as a partitioned, explicitly parallel, reconfigurablearchitecture for use as a future universal computing element. By using Smart Memories,the appearance of the on-chip memory, interconnection network a nd processing elementscan be tailored to better match the application requirements.Smart Memories contains an array of processor tiles and on-die DRAM memoriesconnected by a packet-based, dynamically routed network. In order to get more com-putation power than what is contained in a single processing tile, four processor tilesare clustered together into a ”quad” and a low-overhead, intra-quad interconnection net-work is provided. By grouping the tiles into quads also makes the global interconnectionnetwork more efficient by reducing the number of global network interfaces.A Smart Memories tile consists of a reconfigurable memory system, a crossbar inter-connection network, a processor core and a quad network interface. Having a reconfig-urable memory system is important since different applications have different memoryaccess patterns. The crossbar interconnection is used to connect memory mats to proces-sors or the quad interface port. The processor itself contains integer and floating pointclusters, local register files and shared FP register file to provide the necessary band-width. Each tile can sustain up to two independent t hreads. Smart Memories a lso allowfor reconfigurable instruction format and decode.2.2 ResultsSmart Memories mapped really well to two different machines on far ends of the ar-chitectural spectrum which require very different memory systems and arrangement ofcompute resources. The first machine is Imagine, a highly-tuned SIMD/vector machineoptimized for media applications with large amount of data parallelism. The second oneis Hydra, a speculative multiprocessors that suppo r ts application with irregular accessesand communication patterns.EE392C: Lecture #7 32.3 CritiqueSmart Memories is a modular architecture that can address the scalability issues due towiring delay. Its reconfigurability made Smart Memories able to map two machines atopposite ends of the architectural spectrum with only modest performance degradation.The main critique is this paper doesn’t describe how complex it is to map certainarchitecture to Smart Memories.2.4 Future WorkThe future work of this paper is to create a more complete simulation environment tolook at the overall perfor mance of some complete applications and to investigate thearchitecture fo r inter-tile interactions.3 Class DiscussionWhat grain of configurability is optimal? Fine-grained configuration is good for limitednumbers of applications. The extra flexibly comes at the cost of extra configuration andcomplexity overhead. The decision on deciding t he granularity is essentially a tradeoffbetween these two factors.Is there a specific example of what a r chitecture is better? Both coarse and fine grainapproaches can exploit all three kinds of parallelism: ILP, DLP, and TLP, i.e. there is noobvious reason to choose any given architecture. To take the advantage of random ILP,you need wide issue VLIW processors (TRIPS attempts to do this). Smart memoriesis geared toward DLP. The next two sections discuss the TRIPS and Smart memoriesdesign in more detail.3.1 TRIPSOne of the most notable points of this paper is the assumption of having a ”magic” mem-ory block that guarantee perfect memory fetches and handles all memory disambiguation.Obviously, this is an unrealistic assumption - oft en, one of the major perfo r mance bot-tleneck in modern system is the memory latency.Further, the paper claims that the TRIPS processor is capable of


View Full Document

Stanford EE 392C - Lecture #7 - Polymorphic Architectures I

Download Lecture #7 - Polymorphic Architectures I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture #7 - Polymorphic Architectures I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture #7 - Polymorphic Architectures I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?