DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 17 Vector Computers

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 17: Vector ComputersRecap: VLIWSupercomputersSupercomputer ApplicationsVector SupercomputersCray-1 (1976)PowerPoint PresentationVector Code ExampleVector Instruction Set AdvantagesSlide 10Vector Instruction ExecutionSlide 12Vector Unit StructureT0 Vector Microprocessor (UCB/ICSI, 1995)Vector Instruction ParallelismCS152 AdministriviaVector ChainingVector Chaining AdvantageVector StartupDead Time and Short VectorsVector Memory-Memory versus Vector Register MachinesVector Memory-Memory vs. Vector Register MachinesAutomatic Code VectorizationVector StripminingVector Conditional ExecutionMasked Vector InstructionsVector ReductionsVector Scatter/GatherSlide 29Compress/Expand OperationsA Modern Vector Super: NEC SX-9 (2008)Multimedia Extensions (aka SIMD extensions)Multimedia Extensions versus VectorsAcknowledgementsCS 152 Computer Architectureand Engineering Lecture 17: Vector Computers Krste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs1524/9/20092CS152-Spring’09Recap: VLIW•In a classic VLIW, compiler is responsible for avoiding all hazards -> simple hardware, complex compiler. Later VLIWs added more dynamic hardware interlocks•Use loop unrolling and software pipelining for loops, trace scheduling for more irregular code•Static scheduling difficult in presence of unpredictable branches and variable latency memory•VLIWs somewhat successful in embedded computing, no clear success in general-purpose computing despite several attempts•Static scheduling compiler techniques also useful for superscalar processors4/9/20093CS152-Spring’09SupercomputersDefinition of a supercomputer:•Fastest machine in world at given task•A device to turn a compute-bound problem into an I/O bound problem •Any machine costing $30M+•Any machine designed by Seymour CrayCDC6600 (Cray, 1964) regarded as first supercomputer4/9/20094CS152-Spring’09Supercomputer Applications Typical application areas• Military research (nuclear weapons, cryptography)• Scientific research• Weather forecasting• Oil exploration• Industrial design (car crash simulation)• Bioinformatics• CryptographyAll involve huge computations on large data setsIn 70s-80s, Supercomputer  Vector Machine4/9/20095CS152-Spring’09Vector SupercomputersEpitomized by Cray-1, 1976:•Scalar Unit–Load/Store Architecture•Vector Extension–Vector Registers–Vector Instructions•Implementation–Hardwired Control–Highly Pipelined Functional Units–Interleaved Memory System–No Data Caches–No Virtual Memory4/9/20096CS152-Spring’09Cray-1 (1976)Single PortMemory16 banks of 64-bit words+ 8-bit SECDED80MW/sec data load/store320MW/sec instructionbuffer refill4 Instruction Buffers64-bitx16NIPLIPCIP(A0)( (Ah) + j k m )64T Regs(A0)( (Ah) + j k m )64 B RegsS0S1S2S3S4S5S6S7A0A1A2A3A4A5A6A7SiTjkAiBjkFP AddFP MulFP RecipInt AddInt LogicInt ShiftPop CntSjSiSkAddr AddAddr MulAjAiAkmemory bank cycle 50 ns processor cycle 12.5 ns (80MHz)V0V1V2V3V4V5V6V7VkVjViV. MaskV. Length64 Element Vector Registers4/9/20097CS152-Spring’09Vector Programming Model+ + + + + +[0] [1] [VLR-1]Vector Arithmetic InstructionsADDV v3, v1, v2v3v2v1Scalar Registersr0r15Vector Registersv0v15[0] [1] [2] [VLRMAX-1]VLRVector Length Registerv1Vector Load and Store InstructionsLV v1, r1, r2Base, r1 Stride, r2MemoryVector Register4/9/20098CS152-Spring’09Vector Code Example# Scalar Code LI R4, 64loop: L.D F0, 0(R1) L.D F2, 0(R2) ADD.D F4, F2, F0 S.D F4, 0(R3) DADDIU R1, 8 DADDIU R2, 8 DADDIU R3, 8 DSUBIU R4, 1 BNEZ R4, loop# Vector Code LI VLR, 64 LV V1, R1 LV V2, R2 ADDV.D V3, V1, V2 SV V3, R3# C codefor (i=0; i<64; i++) C[i] = A[i] + B[i];4/9/20099CS152-Spring’09Vector Instruction Set Advantages•Compact–one short instruction encodes N operations•Expressive, tells hardware that these N operations:–are independent–use the same functional unit–access disjoint registers–access registers in same pattern as previous instructions–access a contiguous block of memory (unit-stride load/store)–access memory in a known pattern (strided load/store) •Scalable–can run same code on more parallel pipelines (lanes)4/9/200910CS152-Spring’09Vector Arithmetic Execution•Use deep pipeline (=> fast clock) to execute element operations•Simplifies control of deep pipeline because elements in vector are independent (=> no hazards!) V1V2V3V3 <- v1 * v2Six stage multiply pipeline4/9/200911CS152-Spring’09Vector Instruction ExecutionADDV C,A,BC[1]C[2]C[0]A[3] B[3]A[4] B[4]A[5] B[5]A[6] B[6]Execution using one pipelined functional unitC[4]C[8]C[0]A[12] B[12]A[16] B[16]A[20] B[20]A[24] B[24]C[5]C[9]C[1]A[13] B[13]A[17] B[17]A[21] B[21]A[25] B[25]C[6]C[10]C[2]A[14] B[14]A[18] B[18]A[22] B[22]A[26] B[26]C[7]C[11]C[3]A[15] B[15]A[19] B[19]A[23] B[23]A[27] B[27]Execution using four pipelined functional units4/9/200912CS152-Spring’09Vector Memory System0 1 2 3 4 5 6 7 8 9 A B C D E F+Base StrideVector RegistersMemory BanksAddress GeneratorCray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency• Bank busy time: Time before bank ready to accept next request4/9/200913CS152-Spring’09Vector Unit StructureLaneFunctional UnitVectorRegistersMemory SubsystemElements 0, 4, 8, …Elements 1, 5, 9, …Elements 2, 6, 10, …Elements 3, 7, 11, …4/9/200914CS152-Spring’09T0 Vector Microprocessor (UCB/ICSI, 1995)LaneVector register elements striped over lanes[0][8][16][24][1][9][17][25][2][10][18][26][3][11][19][27][4][12][20][28][5][13][21][29][6][14][22][30][7][15][23][31]4/9/200915CS152-Spring’09loadVector Instruction ParallelismCan overlap execution of multiple vector instructions–example machine has 32 elements per vector register and 8 lanesloadmulmuladdaddLoad Unit Multiply Unit Add UnittimeInstruction issueComplete 24 operations/cycle while issuing 1 short instruction/cycle4/9/200916CS152-Spring’09CS152 Administrivia•Quiz 5, Thursday April 234/9/200917CS152-Spring’09Vector Chaining•Vector version of register bypassing–introduced with Cray-1MemoryV1Load UnitMult.V2V3ChainAddV4V5ChainLV v1MULV v3,v1,v2ADDV v5, v3, v44/9/200918CS152-Spring’09Vector Chaining Advantage•With chaining, can start dependent instruction as soon as first result appearsLoadMulAddLoadMulAddTime•Without chaining, must wait for last element of result to be written before starting


View Full Document

Berkeley COMPSCI 152 - Lecture 17 Vector Computers

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 17 Vector Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 17 Vector Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 17 Vector Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?