Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009CS250 VLSI Systems DesignL2: Design RepresentationsJohn Wawrzynek, Krste Asanovic,withJohn LazzaroandYunsup Lee (TA)Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Engineering Challenge2PhysicsApplicationGap usually too large to bridge in one step, but there are exceptions...Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Magnetic Compass3PhysicsApplicationLecture 2, Design Representations CS250, UC Berkeley, Fall 2009Design Abstraction Stack4PhysicsValence BandConduction BandEgDevices (Transistors)poxinnCircuitsGatesRegister-Transfer Level (RTL)Unit-Transaction Level (UTL)ApplicationLecture 2, Design Representations CS250, UC Berkeley, Fall 2009Properties of a Useful Abstraction5‣Hides less important details‣e.g., for RTL, don’t worry how combinational logic is decomposed into logic gates‣Allows control of more important details‣e.g., RTL designer still controls how much logic is performed between any two registers‣If done right, provides portable efficiency‣i.e., same RTL can be implemented as custom logic, standard cells, FPGA, or even vacuum tube logic, with reasonably good resultsLecture 2, Design Representations CS250, UC Berkeley, Fall 2009CS250 Design Abstractions6PhysicsDevices (Transistors)CircuitsGatesRegister-Transfer Level (RTL)Unit-Transaction Level (UTL)ApplicationPrimary Design AbstractionsInterface to Technology(UCB EE130/230)(UCB EE141/241)Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009CS250 Design Refinement7Gates (Stdcell Library)RTL (Verilog)UTL (C/C++)Application (C/C++)Architecture Design (Manual)Micro(µ)-Architecture Design (Manual)Synthesis + Place&Route (Automated)Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Course Prerequisites‣B+ in CS150 for UCB undergrads, or equivalent for incoming grad students‣This means you should have seen RTL and Verilog/VHDL before‣We won’t be covering Verilog coding details in lecture, but some coverage in section + handouts8Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009RTL Representation‣When writing Verilog, be sure to separate RTL code into pure state and pure logic9Combinational LogicClockCombinational LogicLecture 2, Design Representations CS250, UC Berkeley, Fall 2009Application to RTL in One Step?Modern hardware systems have complex functionality (graphics chips, video encoders, wireless communication channels), but sometimes designers try to map directly to an RTL cycle-level µarchitecture in one stepRequires detailed cycle-level design of each sub-unit–Significant design effort required before clear if design will meet goalsInteractions between units becomes unclear if arbitrary circuit connections allowed between units, with possible cycle-level timing dependencies–Increases complexity of unit specificationsRemoves degrees of freedom for unit designers–Reduces possible space for architecture explorationDifficult to document intended operation, therefore difficult to verify10Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Example Difficult Design Problem(For today’s lecture, we’ll assume clock distribution is not an issue)11The humble shift registerLecture 2, Design Representations CS250, UC Berkeley, Fall 2009First Complication: Output StallShift register should only move data to right if output ready to accept next itemReadyWhat complication does this introduce?Need to fan out to enable signal on each flop12Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Stall Fan-Out Example200 bits per shift register stage, 16 stages3200 flip-flopsHow many fanout-of-four gate delays to buffer up ready signal?‣Log4(3200) = 5.82ReadyEnableThis doesn’t include any penalty for driving enable signal wiring!13Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Loops Prevent Arbitrary Resizing We could increase size of gates in ready logic block to reduce fan out required to drive ready signal to flop enables…BUT, this increases load on flops, so they have to get bigger --- a vicious circleReadyReady LogicShift Register ModuleReceiving Module14Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Second Complication: Bubbles Sender doesn’t have valid data every clock cycle, empty “bubbles” inserted into pipelineReadyWould like to “squeeze” bubbles out of pipelineValidStage 1Stage 2Stage 3Stage 4Time~Ready~Valid15Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Logic to Squeeze BubblesCan move one stage to right if Ready asserted, or there is any bubble in stages to right of current stageReady?ValidEnable?Valid?Fan-in of number of valid signals grows with number of pipeline stagesFan-out of each stage’s valid signal also grows with number of pipeline stagesResults in slow combinational paths as number of pipeline stages grows16Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Decoupled Design DisciplineThe shift register is a simple example that illustrates the control complexity problems of any large synchronous pipeline–Usually, there are even more complex interactions between stagesCombinational LogicClockCombinational LogicTo avoid these problems (and many others), designers will use a decoupled design discipline, where moderate size synchronous units (~10-100K gates) are connected by decoupling FIFOs or channels17Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Decoupled Architectures andUnit-Transaction Level Design18Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009CS250 Design Refinement19Gates (Stdcell Library)RTL (Verilog)UTL (C/C++)Application (C/C++)Architecture Design (Manual)µArchitecture Design (Manual)Synthesis + Place&Route (Automated)Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Unit-Transaction Level Design Model design as messages flowing through FIFO buffers between units containing architectural stateEach unit can independently perform an operation, or transaction, that may consume messages, update local state, and send further messages Transaction and/or communication might take many cyclesHave to design RTL of unit microarchitecture during design refinementUnit 1Arch. StateArch. StateUnit 2Unit 3Arch. StateShared Memory Unit20Lecture 2, Design Representations CS250, UC Berkeley, Fall 2009Unit Architectural State‣Architectural
View Full Document