U of U CS 6810 - Big Iron - D1666794

Home> Schools> University of Utah> Computer Science (CS) > CS 6810> Big Iron

DOC PREVIEW

U of U CS 6810 - Big Iron

School name University of Utah

Course Cs 6810- Computer Architecture

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Big Iron Today’s topics: Vector Processors and Supercomputers VP’s came first – now exist as GPGPU’s figure source: text Appendix F Supercomputers lots of microprocessors with a fancy interconnect – a look at the top500 Datacenter “cloud” Computing lots of blades w/ fancy interconnect AND fancy storage systems (this is not DRAM!) 2 CS6810 School of Computing University of Utah Review • Roadblocks to parallelism  wide issue & deep pipelines » dynamic OOO issue • huge # of instructions on the fly • quadratic circuit complexity to keep track of everything – forwarding, ROB size, # of registers • power density kills you • performance still limited by ILP in the program » VLIW • compiler does most of the scheduling work • still huge # of instructions on the fly • power density is still a problem – this will continue to be a common theme • performance also limited by ILP • Enhancing parallelism  multi- threads, cores, sockets » main game today » might be easier to build than program 3 CS6810 School of Computing University of Utah 1st Supercomputers • Vector machines  often attributed to Seymour Cray, but he says • “I’m certainly not inventing vector processors. There are three kind that I know of existing today. They are represented by the Illiac-IV, th (CDC) Star processor, and the TI (ASC) processor. Those three were all pioneering processors. . . . One of the problems of being a pioneer is you always make mistakes and I never, never want to be a pioneer. It’s always best to come second when you can look at the mistakes the pioneers made. talk at LLNL – 1976 – on the introduction of the CRAY-1 • Alternative programming model  two data types » scalar and vector • not wildly dissimilar to map reduce (Google reinvention) – map sub-problems to some set of resources – reduce/combine sub-problem into final answer  APL – Iverson’s 1969 book » +/( 1, 2, 3) = 6 4 CS6810 School of Computing University of Utah Replace Loops w/ Vector Instructions • Vector-Vector add  conventional » 2 pointers to head of two vectors » offset with loop variable • A[i] + B[i] for all I  vector model » Vadd A, B /1 instruction does a lot of work » no loop or instruction decode overhead » hazard checking only required between vector instructions • Issues  each vector has to be contiguous  machine has a native vector length » 64 was common • pad if actual vector length is not in chunks of 64  scientific programmers embraced the vector model » but how do you write a web browser?Page 2 5 CS6810 School of Computing University of Utah 2001 Vector Odyssey • Vector machines out of fashion • 2002  Japan’s Earth Simulator announced » virtual planet • predict environmental change impact on world climate » leads top500 list • widespread US panic @ government level – strategic leadership lost? – oh woe is us or U.S. • spurs supercomputer development – including new vector machines from Cray • Now  wide-SIMD alive and well in GPGPU’s  short-SIMD alive and well in CPU’s  SIMD = short vector » same issues apply 6 CS6810 School of Computing University of Utah Basic Vector Architecture • 2 parts  scalar unit » similar to a normal CPU • OOO: NEC SX/5 • VLIW: Fujitsu VPP5000  vector unit » multiple FU’s (both int & float) • deeply pipelined for high clock frequencies • particularly true for FPU’s – primary focus for the scientific comp folks • 2 basic architecture types  vector-register processors » early CDC machnes  memory-memory vector processors (vector RISC) » everything since about 1980 • CRAY 1, 2, XMP, YMP, C90, T90, SV1, X1 • NEC SX/2-SX/8, Fujitsu VP200-VPP5000, Hitachi S820 and S8300 • Convex C-1 through C-4 7 CS6810 School of Computing University of Utah Top Level Vector-Register VMIPS 64 element Vregs 2 read ports 1 write port is it enough? 8 CS6810 School of Computing University of Utah Snippet of Real MachinesPage 3 9 CS6810 School of Computing University of Utah VMIPS ISA Snippet 1 10 CS6810 School of Computing University of Utah VMIPS ISA Snippet 2 11 CS6810 School of Computing University of Utah DAXPY: MIPS vs. VMIPS IC = 6 vs 600 12 CS6810 School of Computing University of Utah Performance • Vector execution time  f(vector length, structural hazards, data hazards) » initiation rate: # of operands consumed or produced per cycle » multi-lane architecture • each vector lane can carry n values per cycle – often 2 or more • # vector lanes * lane width = initiation rate  also dependent on pipeline fill and spill • Convoys (made up term)  set of independent vector instructions » similar to an EPIC VLIW bundle • Chime  time it takes to execute 1 convoy • Start up time  time it takes to load the vector registers and fill the pipe • All contribute to execution timePage 4 13 CS6810 School of Computing University of Utah Vector Memory Systems • Lots of bandwidth required to feed lots of XU’s  very wide data bus  banked memory » each bank indpendently addressed • not interleaved • multiple load and stores issued per cycle • each bank serves a particular load or store – assuming no bank conflict – compiler tries hard to avoid conflict • latency can be high for DRAM based memory – but bandwidth can be quite good – early CRAY machines used SSRAM’s – too expensive today » addressing? where are the bank select bits? 14 CS6810 School of Computing University of Utah Vector Length Control • Vec.reg.length != operand.vec.size (OVS)  MVL = vec.reg.length  enter VLR » specifies the operand vec size for a vector instruction • actual vector size often not known until run time • may even change based on a call parameter • APL rho(V) length (or structure if vector of vectors of …) » controls XU’s and Vector_Ld_Store Unit » VLR value <= vector.reg.length • hence not known until run time • statically known then compiler can issue

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

U of U CS 6810 - Big Iron

Sign up for free to view:

Please select your school