DOC PREVIEW
Berkeley COMPSCI 150 - Lecture Notes

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 150 - Spring 2007 – Lec #16 – Retiming - 1State Machine Timing! Retiming" Slosh logic between registers to balance latencies andimprove clock timings" Accelerate or retard cycle in which outputs are asserted! Parallelism" Doing more than one thing at a time! Pipelining" Splitting computations into overlapped, smaller time stepsCS 150 - Spring 2007 – Lec #16 – Retiming - 2SynchronizerCircuitry atInputs andOutputsOutput LogicOutput LogicOutput LogicDDDDSTATE STATE STATEQQQA AA'A'Qƒƒ'ƒƒƒ'ARecall: Synchronous Mealy MachineDiscussion! Placement of flipflops before and after the output logic changes the timingof when the output signals are asserted …CS 150 - Spring 2007 – Lec #16 – Retiming - 3Recall: Synchronous Mealy Machine withSynchronizers Following OutputsCase III: Synchronized OutputsA asserted during Cycle 0, ƒ' asserted in next cycleEffect of ƒ delayed one cyclecycle 0cycle 1 cycle 2CLKAƒƒ'S0S1A/ƒSignal goesinto effect onecycle laterCS 150 - Spring 2007 – Lec #16 – Retiming - 4Vending Machine State Machine! Moore machine" outputs associated with state0¢[0]10¢[0]5¢[0]15¢[1]N’ D’ + ResetDDNN+DNN’ D’Reset’N’ D’N’ D’ResetMealy machineoutputs associated with transitions0¢10¢5¢15¢(N’ D’ + Reset)/0D/0D/1N/0N+D/1N/0N’ D’/0Reset’/1N’ D’/0N’ D’/0Reset/0CS 150 - Spring 2007 – Lec #16 – Retiming - 5Open asserted only whenin state 15State Machine Retiming! Moore vs. (Async) Mealy Machine" Vending Machine ExampleOpen asserted when lastcoin inserted leading tostate 15CS 150 - Spring 2007 – Lec #16 – Retiming - 6State Machine Retiming! Retiming the Moore Machine: Faster generation of outputs! Synchronizing the Mealy Machine: Add a FF, delaying the output! These two implementations have identical timing behaviorPush the AND gate through theState FFs and synchronize withan output FFLike computing open in the priorstate and delaying it one state timeCS 150 - Spring 2007 – Lec #16 – Retiming - 7Out calcPlus set-upNOTE: overlaps withNext State calculationState Machine Retiming! Effect on timing of Open Signal (Moore Case)ClkFF propdelayStateOpenOut propdelayRetimedOpenOpenCalculationCS 150 - Spring 2007 – Lec #16 – Retiming - 8State Machine Retiming! Timing behavior is the same, but are theimplementations really identical?FF input in synchronous MealyimplementationFF input in retimed MooreimplementationOnly differencein don’t care caseof nickel and dimeat the same timeCS 150 - Spring 2007 – Lec #16 – Retiming - 9Parallelism! Example, Student final grade calculation:read mt1, mt2, mt3, project;grade = 0.2 ! mt1 + 0.2 ! mt2+ 0.2 ! mt3 + 0.4 ! project;write grade;! High performance hardware implementation:As many operations as possible are done in parallelDoing more than one thing at a time: optimization in h/w often involves using parallelism to trade between cost and performancexx xx+++0.2mt10.2 mt2 0.4 proj0.2 mt3gradeCS 150 - Spring 2007 – Lec #16 – Retiming - 10Parallelism! Is there a lower cost hardware implementation?Different tree organization?! Can factor out multiply by 0.2:! How about sharing operators (multipliers and adders)?x++0.2mt1 mt20.4 projmt3gradex+CS 150 - Spring 2007 – Lec #16 – Retiming - 11Time Multiplexing! Reuse single ALU for alladds and multiplies" Lower hardware cost, longer latency" BUT must add muxes/registers/control! Consider the combinational hardware circuit diagramas an abstract computation-graph:acc1 = mt1 + mt2;acc1 = acc1 + mt3;acc1 = 0.2 x acc1;acc2 = 0.4 x proj;grade = acc1 + acc2;controllerALUmt1 mt1mt3 projacc1acc2xx xx+++0.2mt10.2 mt2 0.4 proj0.2 mt3grade+A BCxAlternativebuilding blocksxx+A BCDCS 150 - Spring 2007 – Lec #16 – Retiming - 12Time Multiplexingxx xx+++0.2mt10.2 mt2 0.4 proj0.2 mt3gradex+0.2 mt1x+0.4 Proj0x+0.2 mt3x+0.2 mt2x+0.20.4mt1mt2mt3proj0Time-multiplexing “covers” the computation graph by performing the action of each node one at a timeCS 150 - Spring 2007 – Lec #16 – Retiming - 13Time Multiplexingxx xx+++0.2mt10.2 mt2 0.4 proj0.2 mt3gradex0.2 mt3x+0.4 Projx0.2 mt1x+0.2 mt2x1x+1x0.21mt3mt1x+0.40.21Projmt2CS 150 - Spring 2007 – Lec #16 – Retiming - 14Pipelining Principle! Pipelining review from CS61C:Analog to washing clothes:step 1: wash (20 minutes)step 2: dry (20 minutes)step 3: fold (20 minutes) 60 minutes x 4 loads " 4 hourswash load1 load2 load3 load4dry load1 load2 load3 load4fold load1 load2 load3 load4 20 minoverlapped " 2 hoursCS 150 - Spring 2007 – Lec #16 – Retiming - 15Pipeliningwash load1 load2 load3 load4dry load1 load2 load3 load4fold load1 load2 load3 load4• Increase number of loads, average time per load approaches 20 minutes•Latency (time from start to end) for one load = 60 min•Throughput = 3 loads/hour• Pipelined throughput $ # of pipe stages x un-pipelined throughputCS 150 - Spring 2007 – Lec #16 – Retiming - 16Pipelining! General principle:! Cut the CL block into pieces (stages) and separate with registers:T’ = 4 ns + 1 ns + 4 ns +1 ns = 10 nsF = 1/(4 ns +1 ns) = 200 MHz! CL block produces a new result every 5 ns instead of every 9 nsCLOUTINTCL1OUTINT'CL2T1 T2Assume T = 8 nsTFF(setup +clk#q) = 1 nsF = 1/9 ns = 111 MHzAssume T1 = T2 = 4 nsCS 150 - Spring 2007 – Lec #16 – Retiming - 17Limits on Pipelining! Without FF overhead, throughput improvement proportional to # of stages• After many stages are added. FF overhead begins to dominate:• Other limiters to effective pipelining:• Clock skew contributes to clock overhead• Unequal stages• FFs dominate cost• Clock distribution power consumption•feedback (dependencies between loop iterations)1 2 3 4 5 6 7 8500# of stagesthroughput(1/T)idealrealhalf the clock periodin FF overheadFF “overhead”is the setup and clk to Q times.CS 150 - Spring 2007 – Lec #16 – Retiming - 18Pipelining Example! F(x) = yi = a xi2 + b xi + c! x and y are assumed to be“streams”! Divide into 3 (nearly) equalstages.! Insert pipeline registers atdashed lines.! Can we pipeline basic operators?! Computation graph:F(x )x yxx+abcxx+yCS 150 - Spring 2007 – Lec #16 – Retiming - 19Example: Pipelined AdderFAs3a3b3FAs2a2b2FAs1a1b1FAs0a0b0c0FAs3a3b3FAs2a2b2FAs1a1b1FAs0a0b0c0FFreg regFF FF! Possible, but usuallynot done …(arithmetic units canoften be madesufficiently fast withoutinternal pipelining)CS 150 - Spring 2007 – Lec #16 – Retiming -


View Full Document

Berkeley COMPSCI 150 - Lecture Notes

Documents in this Course
Lab 2

Lab 2

9 pages

Debugging

Debugging

28 pages

Lab 1

Lab 1

15 pages

Memory

Memory

13 pages

Lecture 7

Lecture 7

11 pages

SPDIF

SPDIF

18 pages

Memory

Memory

27 pages

Exam III

Exam III

15 pages

Quiz

Quiz

6 pages

Problem

Problem

3 pages

Memory

Memory

26 pages

Lab 1

Lab 1

9 pages

Memory

Memory

5 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?