Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Static Scheduling, VLIW, EPIC & Speculation Today’s topics: HW support for better compiler scheduling VLIW/EPIC idea & IPF example 2 CS6810 School of Computing University of Utah Beating the IPC=1 Asymptote • Superscalar  static/compiler scheduled » common in embedded space MIPS & ARM  dynamic scheduled » HW scheduling via scoreboard/Tomasulo approach • VLIW  long instruction word contains set of independent ops  key – compiler schedule and hazard detection (εδ adv.) » each slot goes to a particular type of XU • similar to reservation station role  problem in high performance practice » need to be conservative w.r.t. run time activities • data dependent branch predicate » fix – add some HW to make less conservative but probable choice 3 CS6810 School of Computing University of Utah VLIW History • As usual it’s not new  late 60’s early 70’s – microcode » same idea, different granularity  80’s (textbook inaccurate on this) » Cydrome Cydra-5 (Rau/UIUC) & Multiflow (Fisher/Yale) • mini-super segment (Cray like performance on a budget) • killer micro ate them and the companies cratered • both Rau and Fisher go to HP to develop PA-WW – note both were compiler types (Fisher inspired by dataflow geeks)  90’s » HP wants out of process business, Intel wants a server line » HP & Intel jointly develop and produce Itanium • 2001 first release of “Merced” & IA-64  Now » AMD shocks x86 land w/ 64-bit architecture at MPF 2000 » poor IA-64 integer performance forces Intel to follow suit » IA-64  IPF still happening • now all Intel but “Itanic” problems persist 4 CS6810 School of Computing University of Utah “Itanic” • Interesting quotes:  John Dvorak (journalist) article » “How the Itanium killed the Computer Industry”  Ashlee Vance (tech columnist) » underperformance + product delays • “turned the product into a joke in the semiconductor industry”  Donald Knuth » “supposed to be terriffic – until it turned out that the wished-for compilers were basically impossible to write” • However  illustrates some interesting architectural tactics » approach highly valued in the embedded space  Tukwila (4 core IPF) » “what rhymes with Godzilla and has enough cache to take out Tokyo?” » 4 FB-Dimm channels • a move to dominate data-center now called “Cloud” appsPage 2 5 CS6810 School of Computing University of Utah Tukwila QPI is Intel’s response to AMD Hypertransport, 2 fbd’s missing on RHS 2 threads/core target delivery “real soon now” original target was 2007 OUCH 34 GB/s memory b/w 96 GB/s skt-skt b/w 30MB L2$ total 6 CS6810 School of Computing University of Utah VLIW Achilles’ Heel • Code compatibility  backwards compatibility » always a bit of a boat anchor  compiler schedules but what if the machine changes and you don’t have the source code?  oops • Solutions  Transmeta approach » dynamic object code translation • not wildly different than VM + dynamic issue  IPF approach » don’t be devout about VLIW • add some hardware support to allow some dynamic information 7 CS6810 School of Computing University of Utah Itanium Example • Registers  32 64-bit + poison bit flag GPR’s  128 82-bit FPR’s » 2 extra exponent bits over IEEE 754 80-bit standard  64 1-bit predication flags (single register)  8 64-bit indirect branch registers  large set of special purpose regs » I/O, system, memory map, OS interface » rich set of performance counters • Register stack  128 architected registers » 0-31 are the GPR’s » 32-127 are on the stack (cached or not) • special HW handles overflow and underflow » special instructions manipulate stack frame save and restore 8 CS6810 School of Computing University of Utah IPF Instructions and Slots • Instruction types  A = int ALU  I = shifts, bit-tests, moves  M = memory access  F = floats  B = branches  L+X = extended immediates, stop, nops • Instruction word slots  I = A or I types  M = A or M types  F = F types  B = B types  L+X = L+X typesPage 3 9 CS6810 School of Computing University of Utah IPF Groups and Bundles • Instruction group  set of parallel instructions  arbitrary length w/ explicit stop bit • Instruction bundle = 128 bits  a subset of a group that gets executed/cycle  contains pre-decode tag » 5 bits indicates what the bundle order contains what the bundle contains • permutations (5,3) = 20  5 bits » 3 41-bit instructions in the bundle  2 bundles decoded and executed per cycle » on Merced and McKinley » key: • compiler generates the group and organizes code into bundles • HW decides decode and issue rate 10 CS6810 School of Computing University of Utah IPF Predication and Speculation • Most instructions predicated on a predicate flag  10 compare types » result goes to 2 predication flags (dual rail encoding) • Speculation  GPR’s have a poison bit (indicating data validity) » Intel calls them NAT’s (Not a Thing)  FPR’s indicate poison by NATVal » mantissa=0, exponent outside legal range • hence the extra exponent bits » interesting choice  advanced loads » loads promoted over stores • return value to ALAT table (value, dest. reg, and mem. addr) » if a previous store executing later matches mem addr • ALAT invalidated, and register poisoned » interesting wrinkle on more common write buffer 11 CS6810 School of Computing University of Utah IPF Pipe • XU’s  2 I’s, M’s, F’s, 3-B’s, 1 L+X • Issue 2 bundles = 6 instructions max • Pipe – 10 macro stages  IPG – prefetch 2 bundles  Fetch – decode  Rotate – rotate bundle to align the stops  EXP – hand instructions to the XU’s – issue  REN – rename registers  WLD – bypass and access reg. file  REG – checks register scoreboard dependencies (dynamic stall if not cleared)  EXE – execute  DET – detect


View Full Document

U of U CS 6810 - Lecture Notes

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?