Spring 2009EECS150 - Lec24-hdl1Page EECS150 - Digital DesignLecture 23 - High-Level Design (Part 1)April 21, 2009John Wawrzynek1Spring 2005 EECS150 - Lec25-hld1Page Introduction• High-level Design Specifies:– How data is moved around and operated on.– The architecture (sometimes called micro-architecture):• The organization of state elements and combinational logic blocks• Functional specification of combinational logic blocks• Optimization– Deals with the task of modifying an architecture and data movement procedure to meet some particular design requirement:• performance, cost, power, or some combination.• Most designers spend most of their time on high-level organization and optimization– modern CAD tools help fill in the low-level details and optimization• gate-level minimization, state-assignment, etc.– A great deal of the leverage on effecting performance, cost, and power comes at the high-level.2Spring 2005 EECS150 - Lec25-hld1Page One Standard High-level Organization• Controller– accepts external and control input, generates control and external output and sequences the movement of data in the datapath.• Datapath– is responsible for data manipulation. Usually includes a limited amount of storage.• Memory– optional block used for long term storage of data structures.• Standard model for CPUs, micro-controllers, many other digital sub-systems.• Usually not nested.• Sometimes cascaded:3Spring 2005 EECS150 - Lec25-hld1Page Register Transfer Language Descriptions• A standard high-language representation for describing systems.• It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks:• RTL comprises a set of register transfers with optional operators as part of the transfer.• Example: regA ← regB regC ← regA + regB if (start==1) regA ← regC• My personal style:– use “;” to separate transfers that occur on separate cycles.– Use “,” to separate transfers that occur on the same cycle.• Example (2 cycles): regA ← regB, regB ← 0; regC ← regA;4Spring 2005 EECS150 - Lec25-hld1Page Example of Using RTLACC ← ACC + R0, R1 ← R0;ACC ← ACC + R1, R0 ← R1;R0 ← ACC; • • • • In this case: RTL description is used to sequence the operations on the datapath (dp).• It becomes the high-level specification for the controller.• Design of the FSM controller follows directly from the RTL sequence. FSM controls movement of data by controlling the multiplexor control signals.5Spring 2005 EECS150 - Lec25-hld1Page Example of Using RTL• Sometimes RTL is used as a starting point for designing both the dp and the control:• example: regA ← IN; regB ← IN; regC ← regA + regB; regB ← regC;• From this we can deduce:– IN must fanout to both regA and regB– regA and regB must output to an adder– the adder must output to regC– regB must take its input from a mux that selects between IN and regC• What does the datapath look like:• The controller:6Spring 2005 EECS150 - Lec25-hld1Page List Processor Example• RTL gives us a framework for making high-level optimizations.• General design procedure outline:1. Problem, Constraints, and Component Library Spec.2. “Algorithm” Selection3. Micro-architecture Specification4. Analysis of Cost, Performance, Power5. Optimizations, Variations6. Detailed Design7Spring 2005 EECS150 - Lec25-hld1Page 1. Problem Specification• Design a circuit that forms the sum of all the 2's complement integers stored in a linked-list structure starting at memory address 0:• All integers and pointers are 8-bit. The link-list is stored in a memory block with an 8-bit address port and 8-bit data port, as shown below. The pointer from the last element in the list is 0. At least one node in list.I/Os:– START resets to head of list and starts addition process.– DONE signals completion– R, Bus that holds the final result8Spring 2005 EECS150 - Lec25-hld1Page 1. Other Specifications• Design Constraints:– Usually the design specification puts a restriction on cost, performance, power or all. We will leave this unspecified for now and return to it later.• Component Library: component delay simple logic gates 0.5ns n-bit register clk-to-Q=0.5ns setup=0.5ns n-bit 2-1 multiplexor 1ns n-bit adder (2 log(n) + 2)ns memory 10ns read (asynchronous read) zero compare 0.5 log(n) (single ported memory)Are these reasonable?9Spring 2005 EECS150 - Lec25-hld1Page Review of Register with “Load Enable”• Register with Load Enable:• Allows register to be either be loaded on selected clock posedge or to retain its previous value.• Assume both data and LD require setup time = 0.5ns.• Assume no reset input.Functional description only. Transistor level circuit has lower input delay.10Spring 2005 EECS150 - Lec25-hld1Page 2. Algorithm Specification• In this case the memory only allows one access per cycle, so the algorithm is limited to sequential execution. If in another case more input data is available at once, then a more parallel solution may be possible. • Assume datapath state registers NEXT and SUM.– NEXT holds a pointer to the node in memory.– SUM holds the result of adding the node values to this point. If (START==1) NEXT0, SUM0; repeat { SUMSUM + Memory[NEXT+1]; NEXTMemory[NEXT]; } until (NEXT==0); RSUM, DONE1; 11Spring 2005 EECS150 - Lec25-hld1Page 3. Architecture #1Direct implementation of RTL description:DatapathControllerIf (START==1) NEXT0, SUM0; repeat { SUMSUM + Memory[NEXT+1]; NEXTMemory[NEXT]; } until (NEXT==0);RSUM, DONE1; 12Spring 2005 EECS150 - Lec25-hld1Page 4. Analysis of Cost, Performance, and Power• Skip Power for now.• Cost:– How do we measure it? # of transistors? # of gates? # of CLBs?– Depends on implementation technology. Often we are just interested in comparing the relative cost of two competing implementations. (Save this for later)• Performance:– 2 clock cycles per number added.– What is the minimum clock period?– The controller might be on the critical path. Therefore we need to know the implementation, and controller input and output delay.13Spring 2005 EECS150 - Lec25-hld1Page Possible Controller Implementation•
View Full Document