DOC PREVIEW
Berkeley COMPSCI 250 - Lecture 10: Patterns for Processing Units and Communication Links

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 10, Processor PatternsCS250, UC Berkeley, Fall 2011CS250 VLSI Systems DesignLecture 10: Patterns for Processing Units and Communication LinksJohn Wawrzynek, Krste Asanovic,withJohn LazzaroandBrian Zimmer (TA)UC BerkeleyFall 2011CS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsUnit-Transaction Level (UTL)2A UTL design’s functionality is specified as sequences of atomic transactions performed at each unit, affecting only local state and I/O of uniti.e., serializable: can reach any legal state by single-stepping entire system, one transaction at a timeHigh-level UTL spec admits various mappings into RTL with various cycle timings and overlap of transactions’ executionsUnitUnit UnitMemoryNetworkUnitUnitT1T2T3T4T5CS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsTransactional Specification of UnitEach transaction has a combinational guard function defined over local state and state of I/O indicating when it can firee.g., only fire when head of input queue present and of certain typeTransaction mutates local state and performs I/O when it firesScheduler is combinational function that picks next ready transaction to fire3Architectural StateTrans 1Trans 1Trans 1Trans 1TransactionSchedulerNetworkMemoryCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsArchitectural StateThe architectural state of a unit is that which is visible from outside the unit through I/O operationsi.e., architectural state is part of the spec(this is the target for “black-box” testing)When a unit is refined into RTL, there will usually be additional microarchitectural state that is not visible from outsideIntra-transaction sequencing logicPipeline registersInternal caches and/or buffers(this is the target for “white-box” testing)4CS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsUTL Example: Route LookupTransactions in decreasing scheduler priorityTable_Write (request on table access queue)–Writes a given 12-bit value to a given 12-bit addressTable_Read (request on table access queue)–Reads a 12-bit value given a 12-bit address, puts response on table reply queueRoute (request on packet input queue)–Looks up header in table and places routed packet on correct output queueThis level of detail is all the information we really need to understand what the unit is supposed to do! Everything else is implementation.5Packet Input Packet Output QueuesLookup TableTable Access Table Replies Table_WriteTable_ReadRouteSchedulerCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsRefining Route Lookup to RTLThe reorder buffer, the trie lookup pipeline’s registers, and any control state are microarchitectural state that should not affect function as viewed from outsideImplementation must ensure atomicity of UTL transactions:–Reorder buffer ensures packets flow through unit in order–Must also ensure table write doesn’t appear to happen in middle of packet lookup, e.g., wait for pipeline to drain before performing write6Packet Input Packet Output QueuesLookup RAMTable Access Table Replies Reorder BufferTrie Lookup PipelineControlCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsSystem Design Goal: Rate BalancingSystem performance limited by application requirements, on-chip performance, off-chip I/O, or power/energy Want to balance throughput of all units (processing, memory, networks) so none too fast or too slow7On-ChipMemoryNetworkOff-ChipMemoryCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsRate-Balancing PatternsTo make unit faster, use parallelismUnrolling (for processing units)Banking (for memories)Multiporting (for memories)Widen links (for networks)I.e., Use more resources by expanding in space, shrinking in timeTo make unit slower, use time-multiplexingReplace dedicated links with a shared bus (for networks)Replace dedicated memories with a common memoryReplace multiport memory with multiple cycles on single portMultithread computations onto a common pipelineSchedule a dataflow graph onto a single ALUI.e., Use less resources by shrinking in space, expanding in time8CS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsStateless Stream Unit Unrolling(Stream is an ordered sequence)Problem: A stateless unit processing a single input stream of requests has insufficient throughput.Solution: Replicate the unit and stripe requests across the parallel units. Aggregate the results from the units to form a response stream.Applicability: Stream unit does not communicate values between independent requests.Consequences: Requires additional hardware for replicated units, plus networks to route requests and collect responses. Latency and energy for each individual request increases due to additional interconnect cost.9CS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsStateless Stream Unit Unrolling10CollectDistributeT1 T2 T3 T1T4TimeT1T2T3T4TimeCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsVariable-Latency Stateless Stream Unit UnrollingProblem: A stateless stream unit processing a single input stream of requests has insufficient throughput, and each request takes a variable amount of time to process.Solution: Replicate the unit. Allocate space in output reorder buffer in stream order, then dispatch request to next available unit. Unit writes result to allocated slot in output reorder buffer when completed (possibly out-of-order), but results can only be removed in stream order.Applicability: Stream unit does not communicate values between independent requests.Consequences: Additional hardware for replicated units plus added scheduler, buffer, and interconnects. Need scheduler to find next free unit and possibly an arbiter for reorder buffer write ports. Latency and energy for each individual request increases due to additional buffers and interconnect.11CS250, UC Berkeley, Fall 2011Lecture 10, Processor Patterns12Variable-Latency Stateless Stream Unit UnrollingArbiterDispatchSchedulerReorder BufferT1 T2 T3 T1T4TimeT1T2T3T4TimeCS250, UC Berkeley, Fall 2011Lecture 10, Processor PatternsTime MultiplexingProblem: Too much hardware used by several units processing independent transactions.Solution: Provide only a single unit and time-multiplex hardware within unit to process independent transactions.Applicability: Original units have similar functionality and required throughput is low.Consequences: Combined unit has to provide superset of functionality of original units. Combined unit has to provide architectural


View Full Document

Berkeley COMPSCI 250 - Lecture 10: Patterns for Processing Units and Communication Links

Download Lecture 10: Patterns for Processing Units and Communication Links
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10: Patterns for Processing Units and Communication Links and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10: Patterns for Processing Units and Communication Links 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?