Slide 1Synchronous vs Asynchronous PipelinesTwo-stage Asynchronous PipelineThe tensionThe compiler issuesome insight into Concurrent rule firingParallel execution reorders reads and writesCorrectnessExecuting Multiple Rules Per Cycle: Conflict-free rulesMutually Exclusive RulesExecuting Multiple Rules Per Cycle: Sequentially Composable rulesCompiler determines if two rules can be executed in parallelMuxing structureScheduling and control logicConcurrency analysis Two-stage PipelineConcurrency analysis Add RuleRegister File concurrency propertiesBypass Register FileUnsafe modulesFIFOsConcurrency analysis One Element “Loopy” FIFOOne Element Searchable FIFOWhat concurrency do we want?Concurrency analysis Branch RulesConcurrency analysis Load-Store RulesProperties Required of Register File and FIFO for Instruction PipeliningSlide 27Lot of nontrivial analysis but no change in processor code!BypassingThe stall function for the synchronous pipelineThe stall function for the asynchronous pipelineFebruary 20, 2009 http://csg.csail.mit.edu/6.375 L08-1Asynchronous Pipelines: Concurrency IssuesArvind Computer Science & Artificial Intelligence LabMassachusetts Institute of TechnologyFebruary 20, 2009L08-2http://csg.csail.mit.edu/6.375Synchronous vs Asynchronous PipelinesIn a synchronous pipeline:typically only one rule; the designer controls precisely which activities go on in paralleldownside: The rule can get too complicated -- easy to make a mistake; difficult to make changesIn an asynchronous pipeline:several smaller rules, each easy to write, easier to make changesdownside: sometimes rules do not fire concurrently when they shouldFebruary 20, 2009L08-3http://csg.csail.mit.edu/6.375Two-stage Asynchronous Pipelinerule fetch_and_decode (!stallFunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrulerule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then beginpc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrulefetch & decodeexecutepcrfCPUbuCan these rules fire concurrently ?Does it matter?February 20, 2009L08-4http://csg.csail.mit.edu/6.375The tensionIf the two rules never fire in the same cycle then the machine can hardly be called a pipelined machineIf both rules fire in parallel every cycle when they are enabled, then wrong results would be producedFebruary 20, 2009L08-5http://csg.csail.mit.edu/6.375The compiler issueCan the compiler detect all the conflicting conditions?Important for correctnessDoes the compiler detect conflicts that do not exist in reality?False positives lower the performanceThe main reason is that sometimes the compiler cannot detect under what conditions the two rules are mutually exclusive or conflict freeWhat can the user specify easily?Rule priorities to resolve nondeterministic choiceyesyesIn many situations the correctness of the design is not enough; the design is not done unless the performance goals are metFebruary 20, 2009L08-6http://csg.csail.mit.edu/6.375some insight intoConcurrent rule firingRulesHWRi Rj RkclocksrulestepsRiRjRk•There are more intermediate states in the rule semantics (a state after each rule step)•In the HW, states change only at clock edgesFebruary 20, 2009L08-7http://csg.csail.mit.edu/6.375Parallel executionreorders reads and writesRulesHWclocksrulesteps•In the rule semantics, each rule sees (reads) the effects (writes) of previous rules•In the HW, rules only see the effects from previous clocks, and only affect subsequent clocksreads writes reads writes reads writesreads writesreads writesreads writes reads writesFebruary 20, 2009L08-8http://csg.csail.mit.edu/6.375CorrectnessRulesHWRi Rj RkclocksrulestepsRiRjRk•Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution•Consequence: the HW can never reach a state unexpected in the rule semanticsFebruary 20, 2009L08-9http://csg.csail.mit.edu/6.375Executing Multiple Rules Per Cycle: Conflict-free rulesParallel execution behaves like ra < rb or equivalently rb < rarule ra (z > 10); x <= x + 1; endrulerule rb (z > 20); y <= y + 2; endruleRulea and Ruleb are conflict-free ifs . a(s) b(s) 1. a(b(s)) b(a(s)) 2. a(b(s)) == b(a(s))February 20, 2009L08-10http://csg.csail.mit.edu/6.375Mutually Exclusive RulesRulea and Ruleb are mutually exclusive if they can never be enabled simultaneouslys . a(s) ~ b(s) Mutually-exclusive rules are Conflict-free by definitionFebruary 20, 2009L08-11http://csg.csail.mit.edu/6.375Executing Multiple Rules Per Cycle: Sequentially Composable rulesrule ra (z > 10); x <= y + 1; endrulerule rb (z > 20); y <= y + 2; endruleParallel execution behaves like ra < rbRulea and Ruleb are sequentially composable ifs . a(s) b(s) 1. b(a(s)) 2. PrjR(Rb)(b(s)) == PrjR(Rb)(b(a(s)))- R(Rb) is the range of rule Rb- Prjst is the projection selecting st from the total stateFebruary 20, 2009L08-12http://csg.csail.mit.edu/6.375Compiler determines if two rules can be executed in parallelRulea and Ruleb are sequentially composable ifs . a(s) b(s) 1. b(a(s)) 2. PrjR(Rb)(b(s)) == PrjR(Rb)(b(a(s)))Rulea and Ruleb are conflict-free ifs . a(s) b(s) 1. a(b(s)) b(a(s))2. a(b(s)) == b(a(s)) These properties can be determined by examining the domains and ranges of the rules in a pairwise manner.Parallel execution of CF and SC rules does not increase the critical path delay D(Ra) R(Rb) = D(Rb) R(Ra) = R(Ra) R(Rb) = D(Rb) R(Ra) = These conditions are sufficient but not necessaryFebruary 20, 2009L08-13http://csg.csail.mit.edu/6.375Muxing structureMuxing logic requires determining for each register (action method) the rules that update it and under what conditionsConflict Free/Mutually Exclusive)andand or1122Sequentially Composableandand or11 and ~222If two CF rules update the same element then they must be mutually exclusive (1 ~2)February 20, 2009L08-14http://csg.csail.mit.edu/6.375Scheduling and control logicModules(Current
View Full Document