Toronto ECE 1770 - Scalable Content- Addressable Networks - D2092302

Home> Schools> University of Toronto> (ECE) > ECE 1770> Scalable Content- Addressable Networks

Toronto ECE 1770 - Scalable Content- Addressable Networks

Course Ece 1770- Trends in Middleware Systems

Pages 16

Download Save

Unformatted text preview:

Scalable Content-Addressable NetworksHigh-Level OverviewHash Tables and CANWhat CAN would do for usBasic Operations Performed On CANsCAN DesignCAN Design (2)Routing in a CANConstruction of a CAN OverlayMaintenance of a CAN OverlayDesign ImprovementsDesign Improvements (2)Design Improvements (3)Related SystemsDiscussionReferencesScalable Content-Addressable NetworksPrepared byKuhan ParamsothyMarch 5, 2007ECE 1770 – Content-Addressable NetworksHigh-Level OverviewHash tables (map keys to values) are heavily used in building software applicationsThe concept of a Content-Addressable Network (CAN) provides hash table-like functionality on Internet-like scales.CAN is:ScalableRobust/Fault-tolerantSelf-organizingLow-latencyECE 1770 – Content-Addressable NetworksHash Tables and CANA data structure that efficiently maps keys onto valuesCANs are a form of distributed, Internet-scale hash tables.ECE 1770 – Content-Addressable NetworksWhat CAN would do for usCAN would improve peer-to-peer systemsNapster: the process of locating a file is centralizedExpensive to scale the central repository, single point of failureGnutella: decentralized the file location process (network self-organizes into an application layer mesh)Requests for files are done through flooding, not scalable, may not find contentConclusion: P2P systems need a scalable indexing mechanismCAN would improve large data repositoriesThese systems need efficient insertion and retrievalCAN would create large-scale name resolution services that don’t use a naming scheme (ie. Not DNS)No more location-dependent naming schemesECE 1770 – Content-Addressable NetworksBasic Operations Performed On CANsBasic OperationsInsertion (of key,value pairs)Lookup (of key,value pairs)Deletion (of key,value pairs)Each CAN stores1. A piece (called a zone) of the entire hash table2. Holds information about a small number of adjacent zones in the tableRouting in a CANDone by intermediate CAN nodes towards the CAN node whose zone contains that keyCAN Design isDistributed (requires no centralized control or coordination)Scalable (nodes hold only a small about of information that doesn’t grow with the network)Fault-tolerant (nodes can route around failures)Doesn’t require a naming hierarchyIs entirely Application LayerECE 1770 – Content-Addressable NetworksCAN DesignCenters around a virtual d-dimensional Cartesian coordinate space on a d-torusAt any time, the entire coordinate space is dynamically partitioned among all the nodes in the systemEach node owns a distinct zoneECE 1770 – Content-Addressable NetworksCAN Design (2)1. To store a pair, key K1 is mapped to P via a uniform hash function2. The pair is then stored at the node that owns the zone where P lies3. To retrieve an entry corresponding to K1, any node can apply the same hash function to map K1 to P and get the corresponding valueA node learns and maintains the IP addresses of those nodes that hold adjoining coordinate zonesEfficient routing is critical to a useful CANECE 1770 – Content-Addressable NetworksRouting in a CANRouting in a Content Addressable Networks works by following the straight line path through the Cartesian space from source to destination coordinates.A CAN node maintains a coordinate routing table that holds the IP address and virtual coordinate zone of each of its immediate neighbors in the coordinate space.Average Path Length = (d/4)(n1/d)Individual Nodes Have 2d NeighborsAverage Path Length Grows As O(n1/d)ECE 1770 – Content-Addressable NetworksConstruction of a CAN OverlayThe entire CAN space is divided amongst the nodes currently in the systemIncremental construction process takes three stepsThe new node finds a node already in the CANUsing the CAN routing mechanisms, finds a node whose zone will be splitThe neighbors of the split zone must be notified so that routing can include the new nodeBootstrapping: There are CAN bootstrap nodes associated to a DNS domain nameNode Insertion Affects Only O(number of dimensions) existing nodesECE 1770 – Content-Addressable NetworksMaintenance of a CAN OverlayNode Graceful Departure: node explicitly hands over its zone and the associated (key,value) database to one of its neighborsNode Abrupt Disappearance: An immediate takeover algorithm ensures one of the “failed” node’s neighbors takes over the zoneUnder normal conditions, a node sends periodic update messages to each of its neighbors and a list of neighbors and their zone coordinates.Prolonged absence of an update message from a neighbor signals it’s failureECE 1770 – Content-Addressable NetworksDesign ImprovementsBasic CAN algorithm providesLow per-node state (O(d) for a d-dimensional space)Short path lengths (O(dn1/d) hops for d dimensions and n nodes)The problem is that there are application-layer hops, not IP-layer hopsLatency of each hop might be substantialECE 1770 – Content-Addressable NetworksDesign Improvements (2)Improvement: Multi-dimensioned Coordinate SpacesIncreasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the size of the coordinate routing tablePath Length scales as O(d(n1/d))Fault-tolerance improvesImprovement: Multiple Coordinate Spaces (a.k.a. Multiple Realities)Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate space (each coordinate space is a reality)Fault-tolerance improvesLow per-node state (O(d) for a d-dimensional space)Short path lengths (O(dn1/d) hops for d dimensions and n nodes)Which is better?Increasing the dimensionsECE 1770 – Content-Addressable NetworksDesign Improvements (3)Improvement: Better CAN Routing MetricsHave each node measure the network-level round-trip-time RTT to each of its neighbors. Then route messages accordingly.Favors lower latency paths and avoids unnecessarily long hopsImprovement: Caching and ReplicationA CAN node can maintain a cache of the data keys it recently accessedA CAN node can replicate the data key at each of its neighboring nodesBoth schemes need an associated time-to-live field, to eventually expire from the cacheECE 1770 – Content-Addressable NetworksRelated SystemsDomain Name SystemCANs are more general than the

View Full Document


School:
Email:
New Password:
Confirm Password:

Toronto ECE 1770 - Scalable Content- Addressable Networks

Sign up for free to view:

Please select your school