MULTIPROCESSORS ON A CHIP Leon Gu Dipti Motiani 15 740 Computer Architecture Fall 2003 Papers l K Sankaralingam R Nagarajan H Liu C Kim J Huh D Burger S W Keckler C R Moore Exploiting ILP TLP and DLP with the Polymorphous TRIPS Architecture in ISCA 2003 l Paramjit S Oberoi and Gurindar S Sohi Out ofOrder Instruction Fetch using Multiple Sequencers The 2002 International Conference on Parallel Processing ICPP 31 Aug 18 21 2002 Paper1 Motivation l Increasingly specialized architectures Processor fragility l On chip communication latencies l Choose processor granularity Different types of Parallelism l Instruction l Thread l Data Streaming Processor Granularity Lo gic all y Ph ys ica lly ILP TLP Polymorphous Polymorphous Architecture TRIPS Architecture Polymorphous Resources l Frame Space Manage reservation stations l Register File Banks Extra registers used in different ways l Block Sequencing Controls Policies to allocate processor to blocks l Memory Tiles Tiles closer to ALUs provide special highbandwidth memory Modes of Execution Frames Registers Block Control Memory Tiles S Morph D Morph T Morph Discussion l Granularity l Stress on Compiler and OS l When and how to initiate reconfiguration l Propose to build by 2005 Papers l K Sankaralingam R Nagarajan H Liu C Kim J Huh D Burger S W Keckler C R Moore Exploiting ILP TLP and DLP with the Polymorphous TRIPS Architecture in ISCA 2003 l Paramjit S Oberoi and Gurindar S Sohi Outof Order Instruction Fetch using Multiple Sequencers The 2002 International Conference on Parallel Processing ICPP 31 Aug 18 21 2002 Motivation l Previous work Fetching multiple discontinuous I cache lines Trace Caches l Instructions parallelism in traces Only a small fraction are executed immediately Parallelism between several traces l Applications require fetching multiple threads Multiple Sequencers l Fetch contiguous instructions from multiple points in a program l Multiple trace granularity sequencer fetch bandwidth of a trace cache storage efficiency of an instruction cache Design Details l Trace selection Terminated at call return or indirect branch or traces are too long l Returns and indirect branches Return address stack RAS l Trace prediction Hash function of trace identifier l Out of order renaming Trace Reuse Instructions fetched normalized w r t instructions executed Sequencer Width Too many sequencers leads to incorrect prediction hence loss in performance Scaling MS more tolerant to cache misses Discussion l Trace cache vs Multiple sequencers Performance Storage Efficiency Implementation MULTIPROCESSORS ON A CHIP Leon Gu Dipti Motiani 15 740 Computer Architecture Fall 2003
View Full Document
Unlocking...