U of U CS 7810 - Lecture 17 - On-Chip Networks - D696083

Home> Schools> University of Utah> Computer Science (CS) > CS 7810> Lecture 17 - On-Chip Networks

DOC PREVIEW

U of U CS 7810 - Lecture 17 - On-Chip Networks

School name University of Utah

Course Cs 7810- Advanced Computer Architecture

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 171Lecture 17: On-Chip Networks• Today: background wrap-up and innovations2Router Pipeline• Four typical stages: RC routing computation: the head flit indicates the VC that it belongs to, the VC state is updated, the headers are examined and the next output channel is computed (note: this is done for all the head flits arriving on various input channels) VA virtual-channel allocation: the head flits compete for the available virtual channels on their computed output channels SA switch allocation: a flit competes for access to its output physical channel ST switch traversal: the flit is transmitted on the output channelA head flit goes through all four stages, the other flits do nothing in the first two stages (this is an in-order pipeline and flits can not jump ahead), a tail flit also de-allocates the VC3Speculative Pipelines• Perform VA and SA in parallel• Note that SA only requires knowledge of the output physical channel, not the VC• If VA fails, the successfully allocated channel goes un-utilizedRCVASAST-- SA ST-- SA ST-- SA STCycle 1 2 3 4 5 6 7Head flitBody flit 1Body flit 2Tail flit• Perform VA, SA, and ST in parallel (can cause collisions and re-tries)• Typically, VA is the critical path – can possibly perform SA and ST sequentially• Router pipeline latency is a greater bottleneck when there is little contention• When there is little contention, speculation will likely work well!• Single stage pipeline?RCVASA STSA STSA STSA ST4Alpha 21364 PipelineRC T DWSA1WrQRESA2ST1ST2 ECCRoutingTransport/Wire delayUpdate of input unit stateWrite to input queuesSwitch allocation – localSwitch allocation – globalSwitch traversalAppend ECC information5Recent Intel RouterSource: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006• Used for a 6x6 mesh• 16 B, > 3 GHz• Wormhole with VC flow control6Recent Intel RouterSource: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 20067Recent Intel RouterSource: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 20068Data Points• On-chip network’s power contribution in RAW (tiled) processor: 36% in network of compute-bound elements (Intel): 20% in network of storage elements (Intel): 36% bus-based coherence (Kumar et al. ’05): ~12% Polaris (Intel) network: 28% SCC (Intel) network: 10%• Power contributors: RAW: links 39%; buffers 31%; crossbar 30% TRIPS: links 31%; buffers 35%; crossbar 33% Intel: links 18%; buffers 38%; crossbar 29%; clock 13%9Network Power• Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton•Energy for a flit = ER . H + Ewire . D = (Ebuf + Exbar + Earb) . H + Ewire . D ER = router energy H = number of hops Ewire = wire transmission energy D = physical Manhattan distance Ebuf = router buffer energy Exbar = router crossbar energy Earb = router arbiter energy• This paper assumes that Ewire . D is ideal network energy (assuming no change to the application and how it is mapped to physical nodes)10Segmented Crossbar• By segmenting the row and column lines, parts of these lines need not switch  less switching capacitance (especially if your output and input ports are close to the bottom-left in the figure above)• Need a few additional control signals to activate the tri-state buffers• Overall crossbar power savings: ~15-30%11Cut-Through Crossbar• Attempts to optimize the common case: in dimension-order routing, flits make up to one turn and usually travel straight• 2/3rd the number of tristate buffers and 1/2 the number of data wires• “Straight” traffic does not go thru tristate buffers• Some combinations of turns are not allowed: such as E  N and N  W (note that such a combination cannot happen with dimension-order routing)• Crossbar energy savings of 39-52%12Write-Through Input Buffer• Input flits must be buffered in case there is a conflict in a later pipeline stage• If the queue is empty, the input flit can move straight to the next stage: helps avoid the buffer read• To reduce the datapaths, the write bitlines can serve as the bypass path• Power savings are a function of rd/wr energy ratios and probability of finding an empty queue13Express Channels• Express channels connect non-adjacent nodes – flits traveling a long distance can use express channels for most of the way and navigate on local channels near the source/destination (like taking the freeway)• Helps reduce the number of hops• The router in each express node is much bigger now14Express Channels• Routing: in a ring, there are 5 possible routes and the best is chosen; in a torus, there are 17 possible routes• A large express interval results in fewer savings because fewer messages exercise the express channels15Express Virtual Channels• To a large extent, maintain the same physical structure as a conventional network (changes to be explained shortly)• Some virtual channels are treated differently: they go through a different router pipeline and can effectively avoid most router overheads16Router Pipelines• If Normal VC (NVC): at every router, must compete for the next VC and for the switch will get buffered in case there is a conflict for VA/SA• If EVC (at intermediate bypass router): need not compete for VC (an EVC is a VC reserved across multiple routers) similarly, the EVC is also guaranteed the switch (only 1 EVC can compete for an output physical channel) since VA/SA are guaranteed to succeed, no need for buffering simple router pipeline: incoming flit directly moves to ST stage• If EVC (at EVC source/sink router): must compete for VC/SA as in a conventional pipeline before moving on, must confirm free buffer at next EVC router17Title•

View Full Document