Purdue CS 63600 - Lecture notes - D491088

Home> Schools> Purdue University> Computer Sciences (CS) > CS 63600> Lecture notes

Purdue CS 63600 - Lecture notes

Pages 28

Download Save

Unformatted text preview:

CS 636 InternetworkingRamana KompellaLecture 3: Implementation ModelsCS 636 Internetworking 1A rather small set of key concepts is enough. Only by learning the essence of each topic, and by carrying along the least amount of mental baggage at each step, will the student emerge with a good overall understanding of the subject!- Carver Mead and Lynn ConwayCS 636 Internetworking 2CS 636 InternetworkingGoals of Models Before we can play with improvements, we must know “the rules of the game”. Algorithmics uses four separate areas: protocols, architecture, OS and algorithms Question: How can a logic designer understand protocols and how can an algorithm designer understand hardware◦ Use simple models that have explanatory and predictivepower◦ Leave details to the specialist3CS 636 InternetworkingModels Networking (protocols) Hardware  Architecture Operating systems4Transport Models Presents illusion of two shared queues in each direction Connection between local ports at IP addressesCS 636 Internetworking 5Routing Protocols Forwarding: router determines next hop based on forwarding table Routing: routers work together to populate forwarding table◦ RIP (Exchange distance estimates)◦ OSPF (Link state)◦ BGP (Path vector)CS 636 Internetworking 6CS 636 InternetworkingProtocols Interfaces◦ Used by applications or higher level protocols Packet (message) format◦ Examples: IP,TCP, UDP Typical protocol operations◦ Data manipulation◦ Demultiplexing◦ Looking up state◦ Set timers and receive alarms7CS 636 InternetworkingMeasures Throughput: number of jobs per second◦ Focus of industry, ISPs wish to maximize Latency: time to complete a job ◦ E.g. time for a packet to go from one end to another (RTT)◦ Ideal interactive time 100ms8CS 636 InternetworkingPerformance facts Backbone link speeds: ◦ Optical carrier (OC)-n  n x 51.8Mbits◦ OC48 ~ 2.5 Gbps, OC192~10Gbps TCP and web dominance : Web (70%) and TCP (90%) Small transfers: 50% of accessed files < 50Kbytes Latencies large, e.g. 100 ms for wide area ~ half of packets 40 bytes, many 1500 bytes◦ Takes 8 ns to send 40 bytes at 40 Gbps9Case study: iSCSI Large data centers connect disks and computers with a SAN Single SCSI command can READ 10Mbytes from a remote disk iSCSI over TCP/IP: must implement TCP in hardware, add iSCSI header with length. SCSI messages can be issued out of order, but TCP does not allow!!!CS 636 Internetworking 10CS 636 InternetworkingModels Networking (protocols) Hardware Architecture Operating systems11CS 636 InternetworkingHardware Combinatorial logic ◦ From transistors to gates◦ Timing and power◦ High level building blocks (standard cells) Memories◦ Registers, SRAM, DRAM◦ Fancy memory tricks (page mode, interleaved)12CS 636 InternetworkingCombinatorial logic A function from digital inputs to outputs◦ Wires, transistors, resistors, capacitors◦ Gates: AND, NAND, NOT, etc. (60 ps)◦ More complex blocks: adders, multiplexers Different designs for same booleanfunction offer different tradeoffs◦ Time for the output to stabilize◦ Number of transistors◦ Power dissipation P=CV2F13Priority encoder timing Input I and outputs O are N-bit vectors such that◦ O[j] = 1 iff I[j] = 1 and I[k] = 0, for all k<j◦ Find first one, represent in one-hot notation Design1: Implement using N AND gates where each gate takes all 1-N inputs.◦ O(N2) appears to take O(1) time.CS 636 Internetworking 140][]1[]...1[][  jjIjIIjOPriority encoder timing Design 2: Every output bit requires the AND of complment of first j-1 bits◦ Construct recursively P[j] = P[j-1]. ~I[j]◦ Using two N two input AND gates in series◦ O(N) transistors but takes O(N) Design 3: Tradeoff design. Use balanced binary tree to compute P[j] and combine partial results.CS 636 Internetworking 15Reduction using standard cells Build four input mux from 2 input muxes.CS 636 Internetworking 16Crossbar scheduler PPE: like a priority encoder but with rotating priority◦ Find first 1 beyond P Arises in switch arbitration Design 1: ◦ Use a barrel shifter to shift I to the left by P bits.◦ Apply PE◦ Shift back by P bits. Two shifts + PE ~ 3 log N gate delays. ◦ Each barrel shift using tree of 2-muxes ~ log (N)CS 636 Internetworking 17Crossbar scheduler Smarter scheme due to Gupta/McKeown First check if any bit set at or after P  If yes, apply PE on bits after P If no, apply PE on bits before P Used in Tiny Tera and Abrizio Tested using TI libraries ◦ 2x fast and 3x less area ◦ For a 32 port router.CS 636 Internetworking 18Memories Router forwarding done using logic but packets stored in memories Memory are significant bottlenecks ◦ Slower than logic delays Different subsystems require different types◦ 200ms worth of packet buffering (DRAM)◦ <1Mbyte of fast SRAM for forwarding tables Need different modelsCS 636 Internetworking 19Registers Need to store bit such that it stays indefinitely (absence of writes/power failures) Register is ordered collection of flip-flops Access from logic to a register ~ 0.5-1nsCS 636 Internetworking 20SRAM N registers addressed by log N bits Addresses Self refreshing. Needs a decoder to translate A to unary so add decode delay.  On chip SRAM 1-2ns, Off-chip 5-10ns.CS 636 Internetworking 21DRAM SRAM flip-flop needs 5 transistors.  DRAM needs one transistor plus capacitor (so more dense) Leakage fixed by refresh. Slower because output not driven by the power supply as in SRAM 40-60nsCS 636 Internetworking 22CS 636 InternetworkingMemories Page mode DRAM (latency 40-60ns) Direct decoding hard: use divide and conquer◦ Decode in stages, row first and column next◦ Reduces decoder from O(N) to O(√N) gates Accesses within rows do not require Row address strobe.23CS 636 InternetworkingInterleaved DRAMs Increase DRAM throughput (same latency) While Bank 1 works, send addresses to Bank 2, etc. SDRAM uses 2 banks, RDRAM uses 16 banks. If consecutive accesses to different banks, bandwidth increased by a factor of B24Example: Pipelined flowid lookup Lookup 1M 96-bit packet ID for accounting using binary tree in 128nsec (OC-48) SRAM infeasible. DRAM too slow (20 accesses) Use interleaved memory + pipeline Direct RDRAM runs at 800MHz, read access 60ns 216

View Full Document


School:
Email:
New Password:
Confirm Password:

Purdue CS 63600 - Lecture notes

Sign up for free to view:

Please select your school