DOC PREVIEW
Berkeley COMPSCI 152 - Lecture Notes

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architectureand Engineering Lecture 17: Vector ComputersKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs1524/9/20092CS152-Spring!09Recap: VLIW• In a classic VLIW, compiler is responsible foravoiding all hazards -> simple hardware, complexcompiler. Later VLIWs added more dynamichardware interlocks• Use loop unrolling and software pipelining for loops,trace scheduling for more irregular code• Static scheduling difficult in presence ofunpredictable branches and variable latency memory• VLIWs somewhat successful in embeddedcomputing, no clear success in general-purposecomputing despite several attempts• Static scheduling compiler techniques also useful forsuperscalar processors4/9/20093CS152-Spring!09SupercomputersDefinition of a supercomputer:• Fastest machine in world at given task• A device to turn a compute-bound problem into an I/O boundproblem• Any machine costing $30M+• Any machine designed by Seymour CrayCDC6600 (Cray, 1964) regarded as first supercomputer4/9/20094CS152-Spring!09Supercomputer Applications Typical application areas• Military research (nuclear weapons, cryptography)• Scientific research• Weather forecasting• Oil exploration• Industrial design (car crash simulation)• Bioinformatics• CryptographyAll involve huge computations on large data setsIn 70s-80s, Supercomputer ! Vector Machine4/9/20095CS152-Spring!09Vector SupercomputersEpitomized by Cray-1, 1976:• Scalar Unit– Load/Store Architecture• Vector Extension– Vector Registers– Vector Instructions• Implementation– Hardwired Control– Highly Pipelined Functional Units– Interleaved Memory System– No Data Caches– No Virtual Memory4/9/20096CS152-Spring!09Cray-1 (1976)Single PortMemory16 banks of64-bit words+8-bit SECDED80MW/sec dataload/store320MW/secinstructionbuffer refill4 Instruction Buffers64-bitx16NIPLIPCIP(A0)( (Ah) + j k m )64T Regs(A0)( (Ah) + j k m )64 B RegsS0S1S2S3S4S5S6S7A0A1A2A3A4A5A6A7SiTjkAiBjkFP AddFP MulFP RecipInt AddInt LogicInt ShiftPop CntSjSiSkAddr AddAddr MulAjAiAkmemory bank cycle 50 ns processor cycle 12.5 ns (80MHz)V0V1V2V3V4V5V6V7VkVjViV. MaskV. Length64 ElementVector Registers4/9/20097CS152-Spring!09Vector Programming Model+ + + + + +[0] [1] [VLR-1]Vector ArithmeticInstructionsADDV v3, v1, v2 v3v2v1Scalar Registersr0r15Vector Registersv0v15[0] [1] [2] [VLRMAX-1]VLRVector Length Registerv1Vector Load andStore InstructionsLV v1, r1, r2Base, r1 Stride, r2MemoryVector Register4/9/20098CS152-Spring!09Vector Code Example# Scalar Code LI R4, 64loop: L.D F0, 0(R1) L.D F2, 0(R2) ADD.D F4, F2, F0 S.D F4, 0(R3) DADDIU R1, 8 DADDIU R2, 8 DADDIU R3, 8 DSUBIU R4, 1 BNEZ R4, loop# Vector Code LI VLR, 64 LV V1, R1 LV V2, R2 ADDV.D V3, V1, V2 SV V3, R3# C codefor (i=0; i<64; i++) C[i] = A[i] + B[i];4/9/20099CS152-Spring!09Vector Instruction Set Advantages• Compact– one short instruction encodes N operations• Expressive, tells hardware that these N operations:– are independent– use the same functional unit– access disjoint registers– access registers in same pattern as previous instructions– access a contiguous block of memory (unit-stride load/store)– access memory in a known pattern(strided load/store)• Scalable– can run same code on more parallel pipelines (lanes)4/9/200910CS152-Spring!09Vector Arithmetic Execution• Use deep pipeline (=> fastclock) to execute elementoperations• Simplifies control of deeppipeline because elements invector are independent (=> nohazards!)V1V2V3V3 <- v1 * v2Six stage multiply pipeline4/9/200911CS152-Spring!09Vector Instruction ExecutionADDV C,A,BC[1]C[2]C[0]A[3] B[3]A[4] B[4]A[5] B[5]A[6] B[6]Execution using onepipelined functionalunitC[4]C[8]C[0]A[12] B[12]A[16] B[16]A[20] B[20]A[24] B[24]C[5]C[9]C[1]A[13] B[13]A[17] B[17]A[21] B[21]A[25] B[25]C[6]C[10]C[2]A[14] B[14]A[18] B[18]A[22] B[22]A[26] B[26]C[7]C[11]C[3]A[15] B[15]A[19] B[19]A[23] B[23]A[27] B[27]Execution usingfour pipelinedfunctional units4/9/200912CS152-Spring!09Vector Memory System0 1 2 3 4 5 6 7 8 9 A B C D E F+Base StrideVector RegistersMemory BanksAddressGeneratorCray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency• Bank busy time: Time before bank ready to accept next request4/9/200913CS152-Spring!09Vector Unit StructureLaneFunctional UnitVectorRegistersMemory SubsystemElements0, 4, 8, …Elements1, 5, 9, …Elements2, 6, 10, …Elements3, 7, 11, …4/9/200914CS152-Spring!09T0 Vector Microprocessor (UCB/ICSI, 1995)LaneVector registerelements stripedover lanes[0][8][16][24][1][9][17][25][2][10][18][26][3][11][19][27][4][12][20][28][5][13][21][29][6][14][22][30][7][15][23][31]4/9/200915CS152-Spring!09loadVector Instruction ParallelismCan overlap execution of multiple vector instructions– example machine has 32 elements per vector register and 8 lanesloadmulmuladdaddLoad Unit Multiply Unit Add UnittimeInstructionissueComplete 24 operations/cycle while issuing 1 short instruction/cycle4/9/200916CS152-Spring!09CS152 Administrivia• Quiz 5, Thursday April 234/9/200917CS152-Spring!09Vector Chaining• Vector version of register bypassing– introduced with Cray-1MemoryV1LoadUnitMult.V2V3ChainAddV4V5ChainLV v1MULV v3,v1,v2ADDV v5, v3, v44/9/200918CS152-Spring!09Vector Chaining Advantage• With chaining, can start dependent instruction as soon as first resultappearsLoadMulAddLoadMulAddTime• Without chaining, must wait for last element of result to bewritten before starting dependent instruction4/9/200919CS152-Spring!09Vector StartupTwo components of vector startup penalty– functional unit latency (time through pipeline)– dead time or recovery time (time before another vector instruction canstart down pipeline)R X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WFunctional Unit LatencyDead TimeFirst Vector InstructionSecond Vector InstructionDead Time4/9/200920CS152-Spring!09Dead Time and Short VectorsCray C90, Two lanes4 cycle dead timeMaximum efficiency 94%with 128 element vectors4 cycles dead timeT0, Eight lanesNo dead time100% efficiency with 8 elementvectorsNo dead time64 cycles active4/9/200921CS152-Spring!09Vector Memory-Memory versus Vector RegisterMachines• Vector memory-memory instructions hold all vector operands inmain memory• The first vector machines, CDC Star-100 (‘73) and TI ASC (‘71),were memory-memory


View Full Document

Berkeley COMPSCI 152 - Lecture Notes

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?