Purdue CS 63600 - Implementation Models

Unformatted text preview:

Ramana Kompella Lecture 2: Implementation Models CS 636 Internetworking Spring 2009 1CS 636 Internetworking Spring 2009  Before we can play with improvements, we must know “the rules of the game”.  Algorithmics uses four separate areas: protocols, architecture, OS and algorithms  Question: How can a logic designer understand protocols and how can an algorithm designer understand hardware ◦ Use simple models that have explanatory and predictive power ◦ Leave details to the specialist 2CS 636 Internetworking Spring 2009  Networking (protocols)  Hardware  Architecture  Operating systems 3CS 636 Internetworking Spring 2009  Interfaces ◦ Used by applications or higher level protocols  Packet (message) format ◦ Examples: IP,TCP, UDP  Typical protocol operations ◦ Data manipulation ◦ Demultiplexing ◦ Looking up state ◦ Set timers and receive alarms 4CS 636 Internetworking Spring 2009  Throughput: number of jobs per second ◦ Focus of industry, ISPs wish to maximize  Latency: time to complete a job ◦ E.g. time for a packet to go from one end to another (RTT) ◦ Ideal interactive time 100ms 5CS 636 Internetworking Spring 2009  Backbone link speeds: ◦ Optical carrier (OC)-n  n x 51.8Mbits ◦ OC48 ~ 2.5 Gbps, OC192~10Gbps  TCP and web dominance : Web (70%) and TCP (90%)  Small transfers: 50% of accessed files < 50Kbytes  Latencies large, e.g. 100 ms for wide area  ~ half of packets 40 bytes, many 1500 bytes ◦ Takes 8 ns to send 40 bytes at 40 Gbps 6CS 636 Internetworking Spring 2009  Networking (protocols)  Hardware  Architecture  Operating systems 7CS 636 Internetworking Spring 2009  Combinatorial logic ◦ From transistors to gates ◦ Timing and power ◦ High level building blocks (standard cells)  Memories ◦ Registers, SRAM, DRAM ◦ Fancy memory tricks (page mode, interleaved) 8CS 636 Internetworking Spring 2009  A function from digital inputs to outputs ◦ Wires, transistors, resistors, capacitors ◦ Gates: AND, NAND, NOT, etc. (60 ps) ◦ More complex blocks: adders, multiplexers  Different designs for same boolean function offer different tradeoffs ◦ Time for the output to stabilize ◦ Number of transistors ◦ Power dissipation P=CV2F 9 Input I and outputs O are N-bit vectors such that ◦ O[j] = 1 iff I[j] = 1 and I[k] = 0, for all k<j ◦ Find first one, represent in one-hot notation  Design1: Implement using N AND gates where each gate takes all 1 to N inputs. ◦ Space - O(N2) since each AND gate needs O(N) transistors ◦ Time - O(1) CS 636 Internetworking Spring 2009 10 Design 2: Every output bit requires the AND of complment of first j-1 bits ◦ Construct recursively P[j] = P[j-1]. ~I[j] ◦ Using two N two input AND gates in series ◦ O(N) transistors but takes O(N)  Design 3: Tradeoff design. Use balanced binary tree to compute P[j] and combine partial results. CS 636 Internetworking Spring 2009 11 Build four input mux from 2 input muxes. CS 636 Internetworking Spring 2009 12 PPE: like a priority encoder but with rotating priority ◦ Find first 1 beyond P  Arises in switch arbitration  Design 1: ◦ Use a barrel shifter to shift I to the left by P bits. ◦ Apply PE ◦ Shift back by P bits.  Two shifts + PE ~ 3 log N gate delays. ◦ Each barrel shift using tree of 2-muxes ~ log (N) CS 636 Internetworking Spring 2009 13 Smarter scheme due to Gupta/McKeown  First check if any bit set at or after P  If yes, apply PE on bits after P  If no, apply PE on bits before P  Used in Tiny Tera and Abrizio  Tested using TI libraries ◦ 2x fast and 3x less area ◦ For a 32 port router. CS 636 Internetworking Spring 2009 14 Router forwarding done using logic but packets stored in memories  Memory are significant bottlenecks ◦ Slower than logic delays  Different subsystems require different types ◦ 200ms worth of packet buffering (DRAM) ◦ <1Mbyte of fast SRAM for forwarding tables  Need different models CS 636 Internetworking Spring 2009 15 Need to store bit such that it stays indefinitely (absence of writes/power failures)  Register is ordered collection of flip-flops  Access from logic to a register ~ 0.5-1ns CS 636 Internetworking Spring 2009 16 N registers addressed by log N bits Addresses  Self refreshing.  Needs a decoder to translate A to unary so add decode delay.  On chip SRAM 1-2ns,  Off-chip 5-10ns. CS 636 Internetworking Spring 2009 17 SRAM flip-flop needs 5 transistors.  DRAM needs one transistor plus capacitor (so more dense)  Leakage fixed by refresh.  Slower because output not driven by the power supply as in SRAM  40-60ns CS 636 Internetworking Spring 2009 18CS 636 Internetworking Spring 2009  Page mode DRAM (latency 40-60ns)  Direct decoding hard: use divide and conquer ◦ Decode in stages, row first and column next ◦ Reduces decoder from O(N) to O(√N) gates  Accesses within rows do not require Row address strobe. 19CS 636 Internetworking Spring 2009  Increase DRAM throughput (same latency)  While Bank 1 works, send addresses to Bank 2, etc.  SDRAM uses 2 banks, RDRAM uses 16 banks.  If consecutive accesses to different banks, bandwidth increased by a factor of B 20CS 636 Internetworking Spring 2009  Example: Router with five 10GBps links  Overall buffering – 200ms * 50Gbps  Want to use DRAM ◦ Bw required 2 * 50 = 100 Gbits/sec + overhead ~ 200 Gbps  Use single RDRAM with 16 banks ◦ Peak bw of 1.6 Gbytes or 13Gbps  Accessing each RDRAM requires 64 pins for data, 25 for address/control ~90 ◦ 200 Gbps memory requires 16 RDRAMs ~ 1440 pins  Chips have pin limtiations (typically <1000pins)  Thus requires atleast one more chip. 21 Box architect partitions functions between chips  Design teams write specifications, logic designers write RTL using verilog  Synthesize gates to generate circuits.  Timing checks to change design.  Finally chip tapes out, manufactured, yield inspected  Easy to add


View Full Document

Purdue CS 63600 - Implementation Models

Download Implementation Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Implementation Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?