DOC PREVIEW
Berkeley COMPSCI 152 - Lecture Notes

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Last Time Lecture 15: VLIWIntel EPIC IA-64Quad Core Itanium “Tukwila” [Intel 2008]IA-64 Instruction FormatIA-64 RegistersIA-64 Predicated ExecutionFully Bypassed DatapathIA-64 Speculative ExecutionIA-64 Data SpeculationLimits of Static SchedulingSupercomputersSupercomputer ApplicationsVector SupercomputersCray-1 (1976)Slide 16Vector Code ExampleVector Instruction Set AdvantagesSlide 19Vector Instruction ExecutionSlide 21Vector Unit StructureT0 Vector Microprocessor (UCB/ICSI, 1995)Vector Instruction ParallelismCS152 AdministriviaVector ChainingVector Chaining AdvantageVector StartupDead Time and Short VectorsVector Memory-Memory versus Vector Register MachinesVector Memory-Memory vs. Vector Register MachinesAcknowledgementsMarch 18, 2010 CS152, Spring 2010CS 152 Computer Architectureand Engineering Lecture 16: Vector Computers Krste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs152March 18, 2010 CS152, Spring 20102Last Time Lecture 15: VLIW•In a classic VLIW, compiler is responsible for avoiding all hazards -> simple hardware, complex compiler. Later VLIWs added more dynamic hardware interlocks•Use loop unrolling and software pipelining for loops, trace scheduling for more irregular code•Static scheduling difficult in presence of unpredictable branches and variable latency memoryMarch 18, 2010 CS152, Spring 20103Intel EPIC IA-64•EPIC is the style of architecture (cf. CISC, RISC)–Explicitly Parallel Instruction Computing•IA-64 is Intel’s chosen ISA (cf. x86, MIPS)–IA-64 = Intel Architecture 64-bit–An object-code compatible VLIW•Itanium (aka Merced) is first implementation (cf. 8086)–First customer shipment expected 1997 (actually 2001)–McKinley, second implementation shipped in 2002–Recent version, Tukwila 2008, quad-cores, 65nm (not shipping until 2010?)March 18, 2010 CS152, Spring 20104Quad Core Itanium “Tukwila” [Intel 2008]•4 cores•6MB $/core, 24MB $ total•~2.0 GHz•698mm2 in 65nm CMOS!!!!!•170W•Over 2 billion transistorMarch 18, 2010 CS152, Spring 20105IA-64 Instruction Format•Template bits describe grouping of these instructions with others in adjacent bundles•Each group contains instructions that can execute in parallelInstruction 2 Instruction 1 Instruction 0 Template128-bit instruction bundlegroup i group i+1 group i+2group i-1bundle j bundle j+1 bundle j+2bundle j-1March 18, 2010 CS152, Spring 20106IA-64 Registers•128 General Purpose 64-bit Integer Registers•128 General Purpose 64/80-bit Floating Point Registers•64 1-bit Predicate Registers•GPRs rotate to reduce code size for software pipelined loopsMarch 18, 2010 CS152, Spring 20107IA-64 Predicated ExecutionProblem: Mispredicted branches limit ILPSolution: Eliminate hard to predict branches with predicated execution–Almost all IA-64 instructions can be executed conditionally under predicate–Instruction becomes NOP if predicate register falseInst 1Inst 2br a==b, b2Inst 3Inst 4br b3Inst 5Inst 6Inst 7Inst 8b0:b1:b2:b3:ifelsethenFour basic blocksInst 1Inst 2p1,p2 <- cmp(a==b)(p1) Inst 3 || (p2) Inst 5(p1) Inst 4 || (p2) Inst 6Inst 7Inst 8PredicationOne basic blockMahlke et al, ISCA95: On average >50% branches removedMarch 18, 2010 CS152, Spring 20108Fully Bypassed DatapathASrcIRIRIRPCABYRMD1MD2addrinstInstMemory0x4AddIRALUImmExtrd1GPRsrs1rs2wswdrd2wewdataaddrwdatardataData Memorywe31nopstallDE M WPC for JAL, ...BSrcWhere does predication fit in?March 18, 2010 CS152, Spring 20109IA-64 Speculative ExecutionProblem: Branches restrict compiler code motionInst 1Inst 2br a==b, b2Load r1Use r1Inst 3Can’t move load above branch because might cause spurious exceptionLoad.s r1Inst 1Inst 2br a==b, b2Chk.s r1Use r1Inst 3Speculative load never causes exception, but sets “poison” bit on destination registerCheck for exception in original home block jumps to fixup code if exception detectedParticularly useful for scheduling long latency loads earlySolution: Speculative operations that don’t cause exceptionsMarch 18, 2010 CS152, Spring 201010IA-64 Data SpeculationProblem: Possible memory hazards limit code schedulingRequires associative hardware in address check tableInst 1Inst 2StoreLoad r1Use r1Inst 3Can’t move load above store because store might be to same addressLoad.a r1Inst 1Inst 2StoreLoad.cUse r1Inst 3Data speculative load adds address to address check tableStore invalidates any matching loads in address check tableCheck if load invalid (or missing), jump to fixup code if soSolution: Hardware to check pointer hazardsMarch 18, 2010 CS152, Spring 201011Limits of Static Scheduling•Unpredictable branches•Variable memory latency (unpredictable cache misses)•Code size explosion•Compiler complexityDespite several attempts, VLIW has failed in general-purpose computing arena.Successful in embedded DSP market.March 18, 2010 CS152, Spring 201012SupercomputersDefinition of a supercomputer:•Fastest machine in world at given task•A device to turn a compute-bound problem into an I/O bound problem •Any machine costing $30M+•Any machine designed by Seymour CrayCDC6600 (Cray, 1964) regarded as first supercomputerMarch 18, 2010 CS152, Spring 201013Supercomputer Applications Typical application areas• Military research (nuclear weapons, cryptography)• Scientific research• Weather forecasting• Oil exploration• Industrial design (car crash simulation)• Bioinformatics• CryptographyAll involve huge computations on large data setsIn 70s-80s, Supercomputer  Vector MachineMarch 18, 2010 CS152, Spring 201014Vector SupercomputersEpitomized by Cray-1, 1976:•Scalar Unit–Load/Store Architecture•Vector Extension–Vector Registers–Vector Instructions•Implementation–Hardwired Control–Highly Pipelined Functional Units–Interleaved Memory System–No Data Caches–No Virtual MemoryMarch 18, 2010 CS152, Spring 201015Cray-1 (1976)Single PortMemory16 banks of 64-bit words+ 8-bit SECDED80MW/sec data load/store320MW/sec instructionbuffer refill4 Instruction Buffers64-bitx16NIPLIPCIP(A0)( (Ah) + j k m )64T Regs(A0)( (Ah) + j k m )64 B RegsS0S1S2S3S4S5S6S7A0A1A2A3A4A5A6A7SiTjkAiBjkFP AddFP MulFP RecipInt AddInt LogicInt ShiftPop CntSjSiSkAddr AddAddr MulAjAiAkmemory bank cycle 50 ns processor cycle 12.5 ns (80MHz)V0V1V2V3V4V5V6V7VkVjViV. MaskV. Length64 Element Vector RegistersMarch 18, 2010 CS152, Spring 201016Vector


View Full Document

Berkeley COMPSCI 152 - Lecture Notes

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?