Unformatted text preview:

Increasing IPC 15 740 Computer Architecture Hendricks Malayeri Multiple Sequencers Problem need a large instruction window Many independent instructions no data dependencies Keep more function units busy higher n way issue But Branches taken every 8 instructions on average Instruction cache latency is 1 clock cycle but only for contiguous accesses 2 Hendricks Malayeri Nov 2002 1 Proposed Solutions Solutions 1 Change branch predictor to predict multiple branches per cycle and I cache to supply multiple discontinuous lines per cycle Problem this is complex and potentially increases cycle time 2 Use a trace cache store instructions in dynamic execution order in the cache Problem inefficient use of space potentially increasing miss rates 3 Hendricks Malayeri Nov 2002 Using Multiple Sequencers Fetch a few contiguous instructions from multiple points in program This provides little improvement over trace caches if cache space is cheap 4 Hendricks Malayeri Nov 2002 2 Extra Instructions Fetched 5 Hendricks Malayeri Nov 2002 Hendricks Malayeri Nov 2002 Performance 6 3 Scaling 7 Hendricks Malayeri Nov 2002 Performance History 70s Wider Datapaths Hardware Support for memory management 80s Integration Single Chip Memory hierarchies Superscalar Speculation 90s Most 80 of performance gains from faster clocks 33 Mhz in 1990 to 2 GHz in 2001 8 Hendricks Malayeri Nov 2002 4 Performance History Our last talk the end of the road Pipelines can t get much deeper Performance will scale with technology 12 19 Wire length is now a limiting factor Previously logic was reused to save silicon now logic is duplicated to shorten delays 2 stages in Pentium 4 for wire delays But even in 1991 many expected a dead end in 1994 9 Hendricks Malayeri Nov 2002 RAW Machines RAW is a grid of compute elements 10 Hendricks Malayeri Nov 2002 5 RAW Baring it all to software Make hardware available to software 11 Hendricks Malayeri Nov 2002 The RAW Processor 12 Hendricks Malayeri Nov 2002 6 Processor Details RAW uses a 2D forwarding network Early processors used reg file for communication Pipelines led to simple bypass forwarding RAW extends this concept Commits are tougher RAW punts done at execution stage Registers 24 27 mapped to network Network is 1st class resource more later 13 Hendricks Malayeri Nov 2002 Tile Floor Plan Note Network Lots of space 3 cycles ALU ALU 14 Hendricks Malayeri Nov 2002 7 Static Dynamic Networks Static Networks Scalar values only Fetch instructions from SRAM for routes Single cycle per hop 3 cycle latency ALU ALU Dynamic Networks Up to 31 words Longer complicated routing Deadlock avoidance recovery 2 values each in each direction cycle 8 total Both are mapped to registers 15 Hendricks Malayeri Nov 2002 Grid Processor Architecture Execute on diagram from left to right One instruction for each black box compute unit 16 Hendricks Malayeri Nov 2002 8 Grid Processor Architecture Block Atomic Mapping One block instruction group at a time No internal transfers Could be a basic block or a predicated hyperblock execute all policy or a run time trace Taken branches go to next group Group Inputs Group Temps Group Outputs 17 Hendricks Malayeri Nov 2002 Group Execution Data Driven Destination encoded in instruction 3 destinations max split instruction for more No sources Data gets pushed Register files push data to instruction Like RAW but more preset forwarding 30 90 fewer register file writes 18 Hendricks Malayeri Nov 2002 9 Performance vs Ideal 8 way Superscalar 19 Hendricks Malayeri Nov 2002 Questions Multiple Sequencer performance not stellar but it scales we promise Can anyone but RAW designers program RAW In fairness instruction cache is getting expensive Do many apps really map to RAW Java Same problems ExoKernel faced Can compilers target RAW effectively Changing ISA between tile revisions Could stay the same In fairness hoped some tiles used for translation GPA Oh so if my program is one big basic block with minimal data hazards it can run quickly GPA How about those forwarding wires Well duh Either too few or too long They use an express wire to wrap 20 Hendricks Malayeri Nov 2002 10


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?