1ECE 274 - Digital LogicLecture 16 Lecture 16 – RTL Design RTL Examples RTL Design Pitfalls and Good Practices Control and Data Dominated RTL Design2RTL Example: Video Compression – Sum of Absolute Differences Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame Compression idea: just send difference from previous frameDigitizedframe 21 MbyteFrame 2Digitizedframe 1Frame 11 Mbyte(a)Digitizedframe 1Frame 11 Mbyte(b)Only difference: ball movingaDifference of2 from 10.01 MbyteFrame 2Just send difference3RTL Example: Video Compression – Sum of Absolute Differences Need to quickly determine whether two frames are similar enough to just send difference for second frame Compare corresponding 16x16 “blocks” Treat 16x16 block as 256-byte array Compute the absolute value of the difference of each array item Sum those differences – if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)Frame 2Frame 1compareEach is a pixel, assume represented as 1 byte(actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)4RTL Example: Video Compression – Sum of Absolute Differences Want fast sum-of-absolute-differences (SAD) component When go=1, sums the differences of element pairs in arrays Aand B, outputs that sum!(i<256)BAgoSADsad256-byte array256-byte arrayinteger5RTL Example: Video Compression – Sum of Absolute Differences S0: wait for go S1: initialize sumand index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg!(i<256)BAgoSADsadInputs: A, B (256 byte memory); go (bit)Outputs: sad (32 bits)Local registers: sum, sad_reg (32 bits); i (9 bits)!goS0goS1sum = 0i = 0S3sum=sum+abs(A[i]-B[i])i=i+1S4sad_reg = sumS2i<256(i<256)’a6RTL Example: Video Compression – Sum of Absolute Differences Step 2: Create datapath!(i<256)!(i<256) (i_lt_256)i_lt_256i_inci_clrsum_ldsum_clrsad_reg_ldDatapathsumsad_regsadAB_addr A_data B_data<256932888 8323232i–+absInputs: A, B (256 byte memory); go (bit)Outputs: sad (32 bits)Local registers: sum, sad_reg (32 bits); i (9 bits)!goS0goS1sum = 0i = 0S3sum=sum+abs(A[i]-B[i])i=i+1S4sad_reg=sumS2i<256(i<256)’a7RTL Example: Video Compression – Sum of Absolute Differences Step 3: Connect to controller Step 4: Replace high-level state machine by FSM!(i<256)!(i<256) (i_lt_256)S0S1S2S3S4go’gogo AB_rdsum=0i=0i<256!(i<256) (i_lt_256)?sum=sum+abs(A[i]-B[i])i=i+1sad_reg=sumControlleri_lt_256i_inci_clrsum_ldsum_clrsad_reg_ldsumsad_regsadAB_addr A_data B_data<256932888 8323232i–+absasum_ld=1; AB_rd=1sad_reg_ld=1i_inc=1i_lt_256i_clr=1sum_clr=18RTL Example: Video Compression – Sum of Absolute Differences Comparing software and custom circuit SAD Circuit: Two states (S2 & S3) for each i, 256 i’sÆ 512 clock cycles Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i– say about 6 cycles per array item Æ 256*6 = 1536 cycles Circuit is about 3 times(300%) faster Later, we’ll see how to build SAD circuit that is even faster!(i<256)!(i<256) (i_lt_256)S3sum=sum+abs(A[i]-B[i])i=i+1S2i<256(i<256)’9RTL Design Pitfalls and Good Practice Common pitfall: Assuming register is update in the state it’s written Final value of Q? Final state? Answers may surprise you Value of Qunknown Final state is C, not D Why? State A: R=99and Q=Rhappen simultaneously State B: Rnot updated with R+1until next clock cycle, simultaneously with state register being updatedA BCDR> = 1 0 0R< 1 0 0R= R+ 1R= 99Q=R??99A99?100B100?CR< 1 0 0clkRQ(a)(b)Local registers: R, Q (8 bits)10RTL Design Pitfalls and Good Practice Solutions Read register in following state (Q=R) Insert extra state so that conditions use updated value Other solutions are possible, depends on the exampleBA B2CDR> = 1 0 0R< 1 0 0R= R+ 1Q=RR= 9 9Q=R??99A99?100B100 10099 99B2 DR<100 R>=100clkRQ(a)(b)Local registers: R, Q (8 bits)11RTL Design Pitfalls and Good Practice Common pitfall: Reading outputs Outputs can only be written Solution: Introduce additional register, which can be written and readTSP=P+BP=A(a)Inputs: A, B (8 bits)Outputs: P (8 bits)Inputs: A, B (8 bits)Outputs: P (8 bits)Local register: R (8 bits)TSP=R+BR=AP=A(b)12RTL Design Pitfalls and Good Practice Good practice: Register all data outputs In fig (a), output Pwould show spurious values as addition computes Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component In fig (b), spurious outputs reduced, and longest register-to-register path is clear+RBP(a)+RPregBP(b)13Control vs. Data Dominated RTL Design Designs often categorized as control-dominated or data-dominated Control-dominated design – Controller contains most of the complexity Data-dominated design – Datapath contains most of the complexity General, descriptive terms – no hard rule that separates the two types of designs Laser-based distance measurer – control dominated Bus interface, SAD circuit – mix of control and data Now let’s do a data dominated design14Data Dominated RTL Design Example: FIR Filter Filter concept Suppose Xis data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) That 240 is probably wrong! Could be electrical noise Filter should remove such noise in its output Y Simple filter: Output average of last Nvalues Small N: less filtering Large N: more filtering, but less sharp output1212YclkXdigital filter15Data Dominated RTL Design Example: FIR Filter FIR filter “Finite Impulse Response” Simply a configurable weighted sum of past input values y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) Above known as “3 tap” Tens of taps more common Very general filter – User sets the constants (c0, c1, c2) to define specific filter RTL design Step 1: Create high-level state machine But there really is none! Data dominated indeed. Go straight to step 21212YclkXdigital filtery(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)16Data Dominated RTL Design Example: FIR Filter Step 2: Create datapath Begin by creating
View Full Document