Yale CPSC 424 - Structured and Unstructured Overlays for Distributed Data

Unformatted text preview:

1Structured and Unstructured Overlaysfor Distributed DataArvind KrishnamurthyFall 2004Evolution of Overlays in P2P SystemsAd-hoc Overlays(Freenet, Gnutella)Structured Overlays(Chord, Pastry, CAN)Unstructured Overlays(Narada, NICE)Implicit routingProbabilistic load-balanceSmall routing stateHighly scalable systemsExplicit routing protocolMedium-scale systemsComplex QueriesDistribute with locality?Performance-sensitivityDynamic changes in quality?2System Requirements for Structured Overlays System should be able to store complex objects Be capable of supporting similarity searches, range queries Should minimize: Storage overhead Routing state Routing costs Should balance: Data distribution Routing state Query/routing costsSkipIndex: Overview Use some hierarchical tree data structure which is used in centralized settings to manage high-dimensional data Distribute tree while retaining locality Nearby tree-nodes maintained by nearby processors Processors manage sub-trees Do not navigate the tree in a top-down manner Instead jump from one part of the tree to another part of the tree using “long” pointers Employ a load-balancing algorithm to ensure that data distribution is balanced3Motivating Reasons for Unstructured Overlays Overlay networks: Construct a virtual network on top of the physical network Can communicate from lambda.cs.yale.edu to comix.cs.berkeley.edu: Either directly through the internet path Or through an intermediate node at bolle.cs.princeton.edu In which case, it enters Yale’s ISP, traverses Internet, reaches Princeton, is retransmitted through Princeton’s ISP, traverses Internet and reaches Berkeley Three primary reasons: Inefficient Internet routes Avoid failures in Internet routes Provide functionality not currently available in the InternetInefficient Internet Routing Internet routing is inefficient: Does not always pick the lowest latency paths Does not always pick paths with low drop rates Experimental evidence with 43 nodes: (Detour project at Washington) 50% of routes had a faster alternate one-hop route 80% of routes had an alternate route with lower loss rate4Reasons for path inflation #1 Using AS-hop-count as routing metric: Using actual latency or router-hop-count would be betterAS8AS7AS5AS4AS2AS1AS9sourcedestinationShortest AS pathShortest router pathReasons for path inflation #2 Policy routing might result in picking a longer path No-valley routing policy: An AS does not provide transit between any two of its providers or peers. Prefer Customer routing policy: Prefer the free of charge customer route over the peer or provider route. Early exit strategy: AS might try to get rid of a packet as soon as possible and minimize its intra-domain trafficAS8AS7AS6AS5AS4AS3AS2AS1sourcedestination5Avoid Internet Path Outages When a path fails try to find an alternate path if possible Internet: Path outages are reasonably common Recovery time could be substantial  5% of faults last more than 2.75 hoursChandra 01 10% of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repairLabovitz 97-00 3.3% of all routes had serious problemsPaxson95-97Advanced Routing Mechanisms Internet routers rarely support advanced protocols such as IP multicast Ideally, routers should have intelligence to form multicast trees, maintain membership information, and split flows More advanced applications that could benefit from network-embedded intelligence include wide-area file systemsMulticast source6Solution: Overlay Networks Route around faults Use an “overlay path” that comprises of two or more physical connections through the internet Route around inefficiencies Intelligence (for multicasting, wide-area file-systems, etc.) is pushed to the edge of the internetCMUBerkeleyMITYaleOverlay Multicast Multicast performed through multiple unicasts Edge-hosts serve as forwarding agents (results in a distribution tree)CMUBerkeleyUCLAStanfordYaleCMUYaleBerkeleyUCLA Stanford7Optimizing Unstructured Overlays Explicit routing mechanisms are expensive Link-state, distance-vector protocols scale as O(n3) Routing protocols exchange routing tables: Routing table size is O(n) in fully connected mesh Tables exchanged between every pair of neighbors in a fully connected mesh Designers of Resilient Overlay Networks (RON) do not expect suchsystems to scale beyond 50-100 nodesUnstructured Overlay NetworksOverlay networks tend tobe complete graphsTo reduce complexity ofcommunication and routing,you want a pruned graph8Problem Statement High performance Best links with respect to latency, bandwidth, and/or loss-rate Multiple paths Multipath transport Fault tolerance Performance tolerance Self organizing Exploit network information Incremental improvement Minimal use of central authorityWhat should apruned graphlook like??General Strategies K best links Each node find “k” best peers K-random links Each node find “k” random peers Short-long K/2 best links and K/2 random links Connect-improve Select random and carefully improve (for example, Narada)9Find K-Minimum Spanning TreesAlgorithm:Find K-Minimum Spanning TreesAlgorithm:10Find K-Minimum Spanning TreesAlgorithm:Find K-Minimum Spanning TreesAlgorithm:11Find K-Minimum Spanning TreesAlgorithm:Motivation for K-MST Approach Theory: Algorithm by Khuller and Vishkin gives the best approximation for finding a minimum-weight k-connected subgraph Approximation finds K directed trees of minimum cumulative weight Related work by Roskind and Tarjan finds K disjoint trees of minimum weight; might use more graph edges than previous work K-MST is an approximation to the Roskind-Tarjan work Systems: Many systems use MST for multicasting data-intensive streams Some systems augment a single multicast tree with additional links to find “shortcut” edges12Protocol ComponentsMesh Construction:Based on algorithm by Gallagher Humblet, and Spira (GHS), which computes a Minimum Spanning Tree in a fully distributed fashion..We use K instances of GHS to compute K trees. Mesh Repair:Maintain Minimum Spanning Tree in a dynamic topology with failures. .Mesh ImprovementImprove


View Full Document

Yale CPSC 424 - Structured and Unstructured Overlays for Distributed Data

Download Structured and Unstructured Overlays for Distributed Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Structured and Unstructured Overlays for Distributed Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Structured and Unstructured Overlays for Distributed Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?