Ramana Kompella Lecture 3: Principles CS 636 Internetworking 1 Posted on the web page http://www.cs.purdue.edu/homes/kompella/teaching/sp09/cs636/restricted/projects.html If you have not formed a group yet, please do so immediately. CS 636 Internetworking 2 Systems Principles: 1-5 ◦ System is constructed from subsystems ◦ System-wide view versus blackbox Retain modularity: 6-10 ◦ Improve performance retaining modularity Speeding it up: 11-15 ◦ Techniques for speeding up individual routines CS 636 Internetworking 3 Amazingly, many of these principles have been used for years by Chef Charlie at Greasy Spoon restaurant! Avoid obvious waste in common sequence of operations Common sub-expression elimination in compilers ◦ S1: i = (5.1*n + 2) and S2: j=(5.1*n+2)*4 Example in our context: ◦ Copy avoidance. Chef Charlie: trips to the pantry to get ice-cream maker and pie dish while making pie a la mode CS 636 Internetworking 4 Shift computation in time: ◦ P2a : Pre-compute: Chef Charlie prepares crushed garlic in advance, pre-compute TCP headers ◦ P2b: Evaluate lazily: Dishwashing, copy-on-write in Mach, byte orders in packet data ◦ P2c: Share expenses: Several pies in one oven, timing wheels. CS 636 Internetworking 5 Relax system requirements P3a: Trading certainty for time: ◦ Ethernet, Approximate statistics counters P3b: Trading accuracy for time: ◦ MPEG, lossy compression, scheduling algorithms, divides with shifts P3c: Shifting computation in space: Path MTU discovery CS 636 Internetworking 6 Property P Subsystem 2 Subsystem 1 Property Q Subsystem 2 Subsystem 1 Spec S Weaker Spec W Leverage other system components: P4a: Exploit local access costs: ◦ Disk algorithms like B-trees ◦ IP lookups using multi-bit tries P4b: Trade memory for speed: ◦ Lookup tables ◦ Compress to fit in cache P4c: Exploit hardware features: ◦ Strength reduction in compilers (i*4 i+4) ◦ Fast IP checksum computation CS 636 Internetworking 7 Add hardware to improve performance ◦ Microwave for Charlie P5a: Memory interleaving and pipelining ◦ IP lookups use multiple banks etc. P5b: Use wide word parallelism ◦ DRAM page mode, Lucent bit vector scheme P5c: Combine SRAM and DRAM ◦ Statistics counters CS 636 Internetworking 8 Consider replacing unwieldy general purpose routines with more efficient specialized ones Database caching schemes: ◦ General purpose use LRU ◦ But, query-processing in a loop require MRU CS 636 Internetworking 9 Question the need for excessive generality: ◦ If restrictions provide big gains, consider living with restrictions RISC multiplications done on firmware Fbufs provide specialized virtual memory service that allows efficient copies CS 636 Internetworking 10 Consider alternatives to reference implementations found in specifications as long as it has same results. Charlie knows that when a recipe asks to cut beans and then cut carrots, he can probably interchange steps without danger. Upcalls : lower layers can call an upper layer for data or advice, seemingly violating rules of modular design. CS 636 Internetworking 11 Consider passing information (that can optimize performance) between organizational layers while preserving structure Alto file system: Pointers to next file block (hint) but checked against file block number. CS 636 Internetworking 12 Consider changing protocols to pass information (that can optimize performance) Like Attn: Ms. Harper on fax messages once you know who to correspond with Tag switching – where tags are carried in packets to aid fast forwarding. CS 636 Internetworking 13 Optimize the expected case P11a: Use caches ◦ Paging: Worst case 4 accesses, 0 using caches ◦ Header prediction Question : How to determine common cases ? CS 636 Internetworking 14 Add or exploit state to increase speed Use of secondary indices in databases P12a: Compute incrementally Strength reduction in loop indexes IP checksums CS 636 Internetworking 15 Optimize the degrees of freedom TCAM Puzzle ◦ Maintaining gaps between adjacent levels Multibit tries ◦ Number of bits at each level can be different CS 636 Internetworking 16 Use special techniques for finite universes Use bucket sorting, array lookup, bitmaps whenever possible Page lookup using TLBs (table lookup instead of hashing) Timing wheels (fixed range using circular array) CS 636 Internetworking 17 Use algorithmic ideas Consider using efficient data structures that have helped protocol implementations Charlie’s recipe books have elaborate indices and crosslinks for speedy navigation CS 636 Internetworking 18 Principles not to eliminate but foster creativity. The set of principles is necessarily incomplete. These principles are not design oriented but are implementation oriented. CS 636 Internetworking 19Performance problems cannot be solved only through the use of Zen Meditation --- By Jeff Mogul, HP labs. CS 636 Internetworking 20 Reducing page download times ◦ Web pages requested via GET message ◦ Images often dowloaded via separate GET messages Can’t we inline messages to reduce total latency ? ◦ TCP effects: Acks anyway need to be sent. ◦ Client caching CS 636 Internetworking 21 Speeding up signature-based intrusion detection ◦ String searches common (e.g., perl.exe) ◦ Snort searches for each string separately using Boyer-moore method. ◦ Worst happens when 310 rules are matched (30% overhead observed using gprof) Can’t we reduce this ? ◦ Idea: to use multiset Boyer-moore that works on multiple string patterns simultaneously. CS 636 Internetworking 22 Nice in theory, but does not work in practice. Two observed problems: ◦ Multiple string matching is not a bottleneck for traces used. ◦ Cache effects (does not fit well in cache when number of string became > 100) Redone the design with a small number of multi-set BMs that fit well in cache. CS 636 Internetworking 23CS 636
View Full Document