Berkeley COMPSCI 152 - Lecture Notes - D889047

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Lecture Notes

DOC PREVIEW

Berkeley COMPSCI 152 - Lecture Notes

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 152 Computer Architectureand Engineering Lecture 17: Vector ComputersKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.cs.berkeley.edu/~cs1524/9/20092CS152-Spring!09Recap: VLIW• In a classic VLIW, compiler is responsible foravoiding all hazards -> simple hardware, complexcompiler. Later VLIWs added more dynamichardware interlocks• Use loop unrolling and software pipelining for loops,trace scheduling for more irregular code• Static scheduling difficult in presence ofunpredictable branches and variable latency memory• VLIWs somewhat successful in embeddedcomputing, no clear success in general-purposecomputing despite several attempts• Static scheduling compiler techniques also useful forsuperscalar processors4/9/20093CS152-Spring!09SupercomputersDefinition of a supercomputer:• Fastest machine in world at given task• A device to turn a compute-bound problem into an I/O boundproblem• Any machine costing $30M+• Any machine designed by Seymour CrayCDC6600 (Cray, 1964) regarded as first supercomputer4/9/20094CS152-Spring!09Supercomputer Applications Typical application areas• Military research (nuclear weapons, cryptography)• Scientific research• Weather forecasting• Oil exploration• Industrial design (car crash simulation)• Bioinformatics• CryptographyAll involve huge computations on large data setsIn 70s-80s, Supercomputer ! Vector Machine4/9/20095CS152-Spring!09Vector SupercomputersEpitomized by Cray-1, 1976:• Scalar Unit– Load/Store Architecture• Vector Extension– Vector Registers– Vector Instructions• Implementation– Hardwired Control– Highly Pipelined Functional Units– Interleaved Memory System– No Data Caches– No Virtual Memory4/9/20096CS152-Spring!09Cray-1 (1976)Single PortMemory16 banks of64-bit words+8-bit SECDED80MW/sec dataload/store320MW/secinstructionbuffer refill4 Instruction Buffers64-bitx16NIPLIPCIP(A0)( (Ah) + j k m )64T Regs(A0)( (Ah) + j k m )64 B RegsS0S1S2S3S4S5S6S7A0A1A2A3A4A5A6A7SiTjkAiBjkFP AddFP MulFP RecipInt AddInt LogicInt ShiftPop CntSjSiSkAddr AddAddr MulAjAiAkmemory bank cycle 50 ns processor cycle 12.5 ns (80MHz)V0V1V2V3V4V5V6V7VkVjViV. MaskV. Length64 ElementVector Registers4/9/20097CS152-Spring!09Vector Programming Model+ + + + + +[0] [1] [VLR-1]Vector ArithmeticInstructionsADDV v3, v1, v2 v3v2v1Scalar Registersr0r15Vector Registersv0v15[0] [1] [2] [VLRMAX-1]VLRVector Length Registerv1Vector Load andStore InstructionsLV v1, r1, r2Base, r1 Stride, r2MemoryVector Register4/9/20098CS152-Spring!09Vector Code Example# Scalar Code LI R4, 64loop: L.D F0, 0(R1) L.D F2, 0(R2) ADD.D F4, F2, F0 S.D F4, 0(R3) DADDIU R1, 8 DADDIU R2, 8 DADDIU R3, 8 DSUBIU R4, 1 BNEZ R4, loop# Vector Code LI VLR, 64 LV V1, R1 LV V2, R2 ADDV.D V3, V1, V2 SV V3, R3# C codefor (i=0; i<64; i++) C[i] = A[i] + B[i];4/9/20099CS152-Spring!09Vector Instruction Set Advantages• Compact– one short instruction encodes N operations• Expressive, tells hardware that these N operations:– are independent– use the same functional unit– access disjoint registers– access registers in same pattern as previous instructions– access a contiguous block of memory (unit-stride load/store)– access memory in a known pattern(strided load/store)• Scalable– can run same code on more parallel pipelines (lanes)4/9/200910CS152-Spring!09Vector Arithmetic Execution• Use deep pipeline (=> fastclock) to execute elementoperations• Simplifies control of deeppipeline because elements invector are independent (=> nohazards!)V1V2V3V3 <- v1 * v2Six stage multiply pipeline4/9/200911CS152-Spring!09Vector Instruction ExecutionADDV C,A,BC[1]C[2]C[0]A[3] B[3]A[4] B[4]A[5] B[5]A[6] B[6]Execution using onepipelined functionalunitC[4]C[8]C[0]A[12] B[12]A[16] B[16]A[20] B[20]A[24] B[24]C[5]C[9]C[1]A[13] B[13]A[17] B[17]A[21] B[21]A[25] B[25]C[6]C[10]C[2]A[14] B[14]A[18] B[18]A[22] B[22]A[26] B[26]C[7]C[11]C[3]A[15] B[15]A[19] B[19]A[23] B[23]A[27] B[27]Execution usingfour pipelinedfunctional units4/9/200912CS152-Spring!09Vector Memory System0 1 2 3 4 5 6 7 8 9 A B C D E F+Base StrideVector RegistersMemory BanksAddressGeneratorCray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency• Bank busy time: Time before bank ready to accept next request4/9/200913CS152-Spring!09Vector Unit StructureLaneFunctional UnitVectorRegistersMemory SubsystemElements0, 4, 8, …Elements1, 5, 9, …Elements2, 6, 10, …Elements3, 7, 11, …4/9/200914CS152-Spring!09T0 Vector Microprocessor (UCB/ICSI, 1995)LaneVector registerelements stripedover lanes[0][8][16][24][1][9][17][25][2][10][18][26][3][11][19][27][4][12][20][28][5][13][21][29][6][14][22][30][7][15][23][31]4/9/200915CS152-Spring!09loadVector Instruction ParallelismCan overlap execution of multiple vector instructions– example machine has 32 elements per vector register and 8 lanesloadmulmuladdaddLoad Unit Multiply Unit Add UnittimeInstructionissueComplete 24 operations/cycle while issuing 1 short instruction/cycle4/9/200916CS152-Spring!09CS152 Administrivia• Quiz 5, Thursday April 234/9/200917CS152-Spring!09Vector Chaining• Vector version of register bypassing– introduced with Cray-1MemoryV1LoadUnitMult.V2V3ChainAddV4V5ChainLV v1MULV v3,v1,v2ADDV v5, v3, v44/9/200918CS152-Spring!09Vector Chaining Advantage• With chaining, can start dependent instruction as soon as first resultappearsLoadMulAddLoadMulAddTime• Without chaining, must wait for last element of result to bewritten before starting dependent instruction4/9/200919CS152-Spring!09Vector StartupTwo components of vector startup penalty– functional unit latency (time through pipeline)– dead time or recovery time (time before another vector instruction canstart down pipeline)R X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WR X X X WFunctional Unit LatencyDead TimeFirst Vector InstructionSecond Vector InstructionDead Time4/9/200920CS152-Spring!09Dead Time and Short VectorsCray C90, Two lanes4 cycle dead timeMaximum efficiency 94%with 128 element vectors4 cycles dead timeT0, Eight lanesNo dead time100% efficiency with 8 elementvectorsNo dead time64 cycles active4/9/200921CS152-Spring!09Vector Memory-Memory versus Vector RegisterMachines• Vector memory-memory instructions hold all vector operands inmain memory• The first vector machines, CDC Star-100 (‘73) and TI ASC (‘71),were memory-memory

View Full Document

Berkeley COMPSCI 152 - Lecture Notes

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Berkeley COMPSCI 152 - Lecture Notes

Sign up for free to view:

Please select your school