Increasing IPC 15 740 Computer Architecture Hendricks Malayeri Multiple Sequencers Problem need a large instruction window Many independent instructions no data dependencies Keep more function units busy higher n way issue But Branches taken every 8 instructions on average Instruction cache latency is 1 clock cycle but only for contiguous accesses 2 Hendricks Malayeri Nov 2002 1 Proposed Solutions Solutions 1 Change branch predictor to predict multiple branches per cycle and I cache to supply multiple discontinuous lines per cycle Problem this is complex and potentially increases cycle time 2 Use a trace cache store instructions in dynamic execution order in the cache Problem inefficient use of space potentially increasing miss rates 3 Hendricks Malayeri Nov 2002 Using Multiple Sequencers Fetch a few contiguous instructions from multiple points in program This provides little improvement over trace caches if cache space is cheap 4 Hendricks Malayeri Nov 2002 2 Extra Instructions Fetched 5 Hendricks Malayeri Nov 2002 Hendricks Malayeri Nov 2002 Performance 6 3 Scaling 7 Hendricks Malayeri Nov 2002 Performance History 70s Wider Datapaths Hardware Support for memory management 80s Integration Single Chip Memory hierarchies Superscalar Speculation 90s Most 80 of performance gains from faster clocks 33 Mhz in 1990 to 2 GHz in 2001 8 Hendricks Malayeri Nov 2002 4 Performance History Our last talk the end of the road Pipelines can t get much deeper Performance will scale with technology 12 19 Wire length is now a limiting factor Previously logic was reused to save silicon now logic is duplicated to shorten delays 2 stages in Pentium 4 for wire delays But even in 1991 many expected a dead end in 1994 9 Hendricks Malayeri Nov 2002 RAW Machines RAW is a grid of compute elements 10 Hendricks Malayeri Nov 2002 5 RAW Baring it all to software Make hardware available to software 11 Hendricks Malayeri Nov 2002 The RAW Processor 12 Hendricks Malayeri Nov 2002 6 Processor Details RAW uses a 2D forwarding network Early processors used reg file for communication Pipelines led to simple bypass forwarding RAW extends this concept Commits are tougher RAW punts done at execution stage Registers 24 27 mapped to network Network is 1st class resource more later 13 Hendricks Malayeri Nov 2002 Tile Floor Plan Note Network Lots of space 3 cycles ALU ALU 14 Hendricks Malayeri Nov 2002 7 Static Dynamic Networks Static Networks Scalar values only Fetch instructions from SRAM for routes Single cycle per hop 3 cycle latency ALU ALU Dynamic Networks Up to 31 words Longer complicated routing Deadlock avoidance recovery 2 values each in each direction cycle 8 total Both are mapped to registers 15 Hendricks Malayeri Nov 2002 Grid Processor Architecture Execute on diagram from left to right One instruction for each black box compute unit 16 Hendricks Malayeri Nov 2002 8 Grid Processor Architecture Block Atomic Mapping One block instruction group at a time No internal transfers Could be a basic block or a predicated hyperblock execute all policy or a run time trace Taken branches go to next group Group Inputs Group Temps Group Outputs 17 Hendricks Malayeri Nov 2002 Group Execution Data Driven Destination encoded in instruction 3 destinations max split instruction for more No sources Data gets pushed Register files push data to instruction Like RAW but more preset forwarding 30 90 fewer register file writes 18 Hendricks Malayeri Nov 2002 9 Performance vs Ideal 8 way Superscalar 19 Hendricks Malayeri Nov 2002 Questions Multiple Sequencer performance not stellar but it scales we promise Can anyone but RAW designers program RAW In fairness instruction cache is getting expensive Do many apps really map to RAW Java Same problems ExoKernel faced Can compilers target RAW effectively Changing ISA between tile revisions Could stay the same In fairness hoped some tiles used for translation GPA Oh so if my program is one big basic block with minimal data hazards it can run quickly GPA How about those forwarding wires Well duh Either too few or too long They use an express wire to wrap 20 Hendricks Malayeri Nov 2002 10
View Full Document
Unlocking...