DOC PREVIEW
Berkeley COMPSCI 150 - RTL Design Optimization

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1EECS 150 - Components and Design Techniques for Digital SystemsLec 21 – RTL Design Optimization11/16/2004David CullerElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs1502Recall: Register Trans. Level Descriptions• A standard high-level representation for describing systems.• It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks:• RTL comprises a set of register transfers with optional operators as part of the transfer.• Example:regA ←←←← regBregC ←←←← regA + regBif (start==1) regA ←←←← regC• Personal style:– use “;” to separate transfers that occur on separate cycles.– Use “,” to separate transfers that occur on the same cycle.• Example (2 cycles):regA ←←←← regB, regB ←←←← 0;regC ←←←← regA;reg regCL CLclock inputoutputoption feedbackinputoutput3Components of the data path• Storage– Flip-flops– Registers– Register Files– SRAM• Arithmetic Units– Adders, subtraters, ALUs (built out of FAs or gates)– Comparators– Counters• Interconnect– Wires– Busses– Tri-state Buffers– MUX4Arithmetic Circuit Design (MT revisited)• Full Adder• Adder• Relationship of positional notation and operations on it to arithmetic circuitsFAA B CinCo S5Recall: Computer Organization• Computer design as an application of digital logic design procedures• Computer = processing unit + memory system• Processing unit = control + datapath• Control = finite state machine– Inputs = machine instruction, datapath conditions– Outputs = register transfer control signals, ALU operation codes– Instruction interpretation = instruction fetch, decode, execute• Datapath = functional units + registers + interconnect– Functional units = ALU, multipliers, dividers, etc.– Registers = program counter, shifters, storage registers– Interconenct = busses and wires• Instruction Interpreter vs Fixed Function Device6Datapath vs Control• Datapath: Storage, FU, interconnect sufficient to perform the desired functions– Inputs are Control Points– Outputs are signals• Controller: State machine to orchestrate operation on the data path– Based on desired function and signalsDatapathControllerControl Pointssignals7Vector Op Example (more MT)8List Processor Example• RTL gives us a framework for making high-level optimizations.– Fixed function unit– Approach extends to instruction interpreters• General design procedure outline:1. Problem, Constraints, and Component Library Spec.2. “Algorithm” Selection3. Micro-architecture Specification4. Analysis of Cost, Performance, Power5. Optimizations, Variations6. Detailed Design91. Problem Specification• Design a circuit that forms the sum of all the 2's complements integers stored in a linked-list structure starting at memory address 0:• All integers and pointers are 8-bit. The link-list is stored in a memory block with an 8-bit address port and 8-bit data port, as shown below. The pointer from the last element in the list is 0.At least one node in list.I/Os:– START resets to head of list and starts addition process.– DONE signals completion– R, Bus that holds the final result101. Other Specifications• Design Constraints:– Usually the design specification puts a restriction on cost, performance, power or all. We will leave this unspecified for now and return to it later.• Component Library:component delaysimple logic gates 0.5nsn-bit register clk-to-Q=0.5nssetup=0.5ns (data and LD)n-bit 2-1 multiplexor 1nsn-bit adder (2 log(n) + 2)nsmemory 10ns read (asynchronous read)zero compare 0.5 log(n)(single ported memory)Are these reasonable?112. Algorithm Specification• In this case the memory only allows one access per cycle, so thealgorithm is limited to sequential execution. If in another casemore input data is available at once, then a more parallel solution may be possible.• Assume datapath state registers NEXT and SUM.– NEXT holds a pointer to the node in memory.– SUM holds the result of adding the node values to this point.If (START==1) NEXT0, SUM0;repeat {SUMSUM + Memory[NEXT+1];NEXTMemory[NEXT];} until (NEXT==0);RSUM, DONE1; 12A_SEL01NEXT01+MemoryDA==0+01SUMNEXT_SELLD_NEXTNEXT_ZEROSUM_SELLD_SUM0103. Architecture #1Direct implementation of RTL description:DatapathControllerIf (START==1) NEXT0, SUM0;repeat {SUMSUM + Memory[NEXT+1];NEXTMemory[NEXT];} until (NEXT==0);RSUM, DONE1;134. Analysis of Cost, Performance, and Power• Skip Power for now.• Cost:– How do we measure it? # of transistors? # of gates? # of CLBs?– Depends on implementation technology. Usually we are interested in comparing the relative cost of two competing implementations. (Save this for later)• Performance:– 2 clock cycles per number added.– What is the minimum clock period?– The controller might be on the critical path. Therefore we need to know the implementation, and controller input and output delay.14Possible Controller ImplementationSTARTCOMPSUMGETNEXTDONELD_SUMSUM_SELLD_NEXTNEXT_SELDONEA_SELSTARTSTARTSTARTNEXT_ZERO• Based on this, what is the controller input and output delay?15MT again…• Longest path from any reg out to any reg input164. Analysis of PerformanceCLK-Q MUX8-bit addmemory 15-bit addMUXsetup.5 811010 1.531nsCLKNEXTCLKA_SELMUXcontrol output delaymemoryMUX==0control input delay.5 1011.511.515.5nsCOMPUTE_SUM stateGET_NEXT stateNEXT_ZERO17Critical pathsA_SEL01NEXT01+MemoryDA==0+01SUMNEXT_SELLD_NEXTNEXT_ZEROSUM_SELLD_SUM010184. Analysis of Performance• Detailed timing:clock period (T) = max (clock period for each state)T > 31ns, F < 32 MHz• Observation:COMPUTE_SUM state does most of the work. Most of the componentsare inactive in GET_NEXT state.GET_NEXT does: Memory access + …COMPUTE_SUM does: 8-bit add, memory access, 15-bit add + …• Conclusion:Move one of the adds to GET_NEXT.195. Optimization• Add new register named NUMA, for address of number to add.• Update RTL to reflect our change (note still 2 cycles per iteration):If (START==1) NEXT0, SUM0, NUMA1;repeat {SUMSUM + Memory[NUMA];NUMAMemory[NEXT] + 1,NEXTMemory[NEXT] ;} until


View Full Document

Berkeley COMPSCI 150 - RTL Design Optimization

Documents in this Course
Lab 2

Lab 2

9 pages

Debugging

Debugging

28 pages

Lab 1

Lab 1

15 pages

Memory

Memory

13 pages

Lecture 7

Lecture 7

11 pages

SPDIF

SPDIF

18 pages

Memory

Memory

27 pages

Exam III

Exam III

15 pages

Quiz

Quiz

6 pages

Problem

Problem

3 pages

Memory

Memory

26 pages

Lab 1

Lab 1

9 pages

Memory

Memory

5 pages

Load more
Download RTL Design Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view RTL Design Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view RTL Design Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?