1Structured and Unstructured Overlaysfor Distributed DataArvind KrishnamurthyFall 2004Evolution of Overlays in P2P SystemsAd-hoc Overlays(Freenet, Gnutella)Structured Overlays(Chord, Pastry, CAN)Unstructured Overlays(Narada, NICE)Implicit routingProbabilistic load-balanceSmall routing stateHighly scalable systemsExplicit routing protocolMedium-scale systemsComplex QueriesDistribute with locality?Performance-sensitivityDynamic changes in quality?2System Requirements for Structured Overlays System should be able to store complex objects Be capable of supporting similarity searches, range queries Should minimize: Storage overhead Routing state Routing costs Should balance: Data distribution Routing state Query/routing costsSkipIndex: Overview Use some hierarchical tree data structure which is used in centralized settings to manage high-dimensional data Distribute tree while retaining locality Nearby tree-nodes maintained by nearby processors Processors manage sub-trees Do not navigate the tree in a top-down manner Instead jump from one part of the tree to another part of the tree using “long” pointers Employ a load-balancing algorithm to ensure that data distribution is balanced3Motivating Reasons for Unstructured Overlays Overlay networks: Construct a virtual network on top of the physical network Can communicate from lambda.cs.yale.edu to comix.cs.berkeley.edu: Either directly through the internet path Or through an intermediate node at bolle.cs.princeton.edu In which case, it enters Yale’s ISP, traverses Internet, reaches Princeton, is retransmitted through Princeton’s ISP, traverses Internet and reaches Berkeley Three primary reasons: Inefficient Internet routes Avoid failures in Internet routes Provide functionality not currently available in the InternetInefficient Internet Routing Internet routing is inefficient: Does not always pick the lowest latency paths Does not always pick paths with low drop rates Experimental evidence with 43 nodes: (Detour project at Washington) 50% of routes had a faster alternate one-hop route 80% of routes had an alternate route with lower loss rate4Reasons for path inflation #1 Using AS-hop-count as routing metric: Using actual latency or router-hop-count would be betterAS8AS7AS5AS4AS2AS1AS9sourcedestinationShortest AS pathShortest router pathReasons for path inflation #2 Policy routing might result in picking a longer path No-valley routing policy: An AS does not provide transit between any two of its providers or peers. Prefer Customer routing policy: Prefer the free of charge customer route over the peer or provider route. Early exit strategy: AS might try to get rid of a packet as soon as possible and minimize its intra-domain trafficAS8AS7AS6AS5AS4AS3AS2AS1sourcedestination5Avoid Internet Path Outages When a path fails try to find an alternate path if possible Internet: Path outages are reasonably common Recovery time could be substantial 5% of faults last more than 2.75 hoursChandra 01 10% of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repairLabovitz 97-00 3.3% of all routes had serious problemsPaxson95-97Advanced Routing Mechanisms Internet routers rarely support advanced protocols such as IP multicast Ideally, routers should have intelligence to form multicast trees, maintain membership information, and split flows More advanced applications that could benefit from network-embedded intelligence include wide-area file systemsMulticast source6Solution: Overlay Networks Route around faults Use an “overlay path” that comprises of two or more physical connections through the internet Route around inefficiencies Intelligence (for multicasting, wide-area file-systems, etc.) is pushed to the edge of the internetCMUBerkeleyMITYaleOverlay Multicast Multicast performed through multiple unicasts Edge-hosts serve as forwarding agents (results in a distribution tree)CMUBerkeleyUCLAStanfordYaleCMUYaleBerkeleyUCLA Stanford7Optimizing Unstructured Overlays Explicit routing mechanisms are expensive Link-state, distance-vector protocols scale as O(n3) Routing protocols exchange routing tables: Routing table size is O(n) in fully connected mesh Tables exchanged between every pair of neighbors in a fully connected mesh Designers of Resilient Overlay Networks (RON) do not expect suchsystems to scale beyond 50-100 nodesUnstructured Overlay NetworksOverlay networks tend tobe complete graphsTo reduce complexity ofcommunication and routing,you want a pruned graph8Problem Statement High performance Best links with respect to latency, bandwidth, and/or loss-rate Multiple paths Multipath transport Fault tolerance Performance tolerance Self organizing Exploit network information Incremental improvement Minimal use of central authorityWhat should apruned graphlook like??General Strategies K best links Each node find “k” best peers K-random links Each node find “k” random peers Short-long K/2 best links and K/2 random links Connect-improve Select random and carefully improve (for example, Narada)9Find K-Minimum Spanning TreesAlgorithm:Find K-Minimum Spanning TreesAlgorithm:10Find K-Minimum Spanning TreesAlgorithm:Find K-Minimum Spanning TreesAlgorithm:11Find K-Minimum Spanning TreesAlgorithm:Motivation for K-MST Approach Theory: Algorithm by Khuller and Vishkin gives the best approximation for finding a minimum-weight k-connected subgraph Approximation finds K directed trees of minimum cumulative weight Related work by Roskind and Tarjan finds K disjoint trees of minimum weight; might use more graph edges than previous work K-MST is an approximation to the Roskind-Tarjan work Systems: Many systems use MST for multicasting data-intensive streams Some systems augment a single multicast tree with additional links to find “shortcut” edges12Protocol ComponentsMesh Construction:Based on algorithm by Gallagher Humblet, and Spira (GHS), which computes a Minimum Spanning Tree in a fully distributed fashion..We use K instances of GHS to compute K trees. Mesh Repair:Maintain Minimum Spanning Tree in a dynamic topology with failures. .Mesh ImprovementImprove
View Full Document