Reliable Distributed SystemsResilient Overlay NetworksWhat is an overlay network?Why do we have overlay networks?Slide 5Slide 6End-to-end effects of Internet Path SelectionBGP and other studiesRON RationalBGP lack of response to congestionBGP information hidingAcceptable Use PoliciesSlide 13RON can bypass BGP information hidingRON test network had private peering linksBGP link failure responseLocal router/link redundancyGoals of RONSome envisioned RON applicationsBasic approachMajor results (tested with 12 and 16 node RONs)RON ArchitectureSlide 23RON Header (inspired by IPv6!…but not IPv6)Link evaluationResponding to failureRON overheadTwo policy mechanismsRON deployment (19 sites)AS viewLatency CDFSame latency data, but as scatterplotRON greatly improves loss-rateAn order-of-magnitude fewer failuresResilience Against DoS AttacksSome unanswered questionsSome concernsSlide 39Slide 40Slide 41RON creators’ opinion on overheadSlide 43Underlay NetworksThe vision: Side by side InternetsDoing it on the edgesPros and consSummaryReliable Distributed SystemsOverlay NetworksResilient Overlay NetworksA hot new idea from MITShorthand name: RONToday:What’s a RON?Are these a good idea, or just antisocial?What next: Underlay networksWhat is an overlay network?A network whose links are based on (end-to-end) IP hopsAnd that has multi-hop forwardingI.e. a path through the network will traverse multiple IP “links”As a “network” service (i.e. not part of an application per se)I.e. we are not including Netnews, IRC, or caching CDNs in our definitionWhy do we have overlay networks?Historically (late 80’s, early 90’s):to get functionality that IP doesn’t provideas a way of transitioning a new technology into the router infrastructurembone: IP multicast overlay6bone: IPv6 overlayWhy do we have overlay networks?More recently (mid-90’s):to overcome scaling and other limitations of “infrastructure-based” networks (overlay or otherwise)Yoid and End-system MulticastTwo “end-system” or (nowadays) “peer-to-peer” multicast overlaysCustomer-based VPNs are kind-of overlay networksIf they do multi-hop routing, which most probably don’tWhy do we have overlay networks?Still more recently (late-90’s, early 00’s):to improve the performance and reliability of native IP!!!RON (Resilient Overlay Network)Work from MIT (Andersen, Balakrishnan)Based on results of Detour measurements of Savage et.al., Univ of Washington, SeattleEnd-to-end effects of Internet Path SelectionSavage et. al., Univ of Washington SeattleCompared path found by internet routing with alternatesAlternates composed by gluing together two internet-routed pathsRoundtrip time, loss rate, bandwidthData sets: Paxson, plus new ones from UWFound improvements in all metrics with these “Detour” routesBGP and other studiesPaxson (ACIR), Labovitz (U Mich), Chandra (U Texas)Show that outages are frequent (>1%)BGP can take minutes to recoverRON RationalBGP cannot respond to congestionBecause of information hiding and policy, BGP cannot always find best pathPrivate peering links often cannot be discovered BGP cannot respond quickly to link failuresHowever, a small dynamic overlay network can overcome these limitationsBGP lack of response to congestionVery hard for routing algorithms to respond to congestion (route around it)Problem is oscillationsTraffic moved from congested link to lightly-loaded link, then lightly-load link becomes congestions, etc.ARPANET (~70 node network) struggled with this for yearsKhanna and Zinky finally solved this (SIGCOMM ’89)Heavy damping of responsivenessBGP information hidingISP1 ISP2Site1 Site2Internet20.1/1620.1.5/2430.1/1630.1.3/2420.1.5/2430.1.3/24Private peering link. Site1 and Site2 can exchange traffic, but Site2 cannot receive internet traffic via ISP1 (even if policy allows it).Acceptable Use PoliciesWhy might Cornell hide a linkPerhaps, Cornell has a great link to the Arecebo telescope in Puerto Rico but doesn’t want all the traffic to that island routing via CornellE.g. we pay for it, and need it for scientific researchBut any Cornell traffic to Puerto Rico routes on our dedicated linkThis is an example of an “AUP”“Cornell traffic to 123.45.00.00/16 can go via link x, but non-Cornell traffic is prohibited”BGP information hidingISP1 ISP2Site1 Site2Internet20.1/1630.1/1620.1.5/2430.1.3/24XRON can bypass BGP information hidingISP1 ISP2Site1 Site2Internet20.1/1630.1/1620.1.5/2430.1.3/24XRON1 RON2RON330.1.3.520.1.5.7…but in doing so may violate AUPRON test network had private peering linksBGP link failure responseBGP cannot respond quickly to changes in AS pathHold down to prevent flappingPolicy limitationsBut BGP can respond locally to link failuresAnd, local topology can be engineered for redundancyLocal router/link redundancyRR RRISPRR RRISPIntra-domain routing (i.e. OSPF) can respond to internal ISP failuresAS1AS2eBGP and/or iBGP can respond to peering failure without requiring an AS path changeAS path responsive-ness is not strictly necessary to build robust internets with BGP.Note: the telephone signalling network (SS7, a data network) is built this way.Goals of RON Small group of hosts cooperate to find better-than-native-IP paths~50 hosts max, though working to improveMultiple criteria, application selectable per packetLatency, loss rate, throughputBetter reliability tooFast response to outages or performance changes10-20 secondsPolicy routingAvoid paths that violate the AUP (Acceptable Usage Policy) of the underlying IP networkGeneral-purpose library that many applications may useC++Some envisioned RON applicationsMulti-media conferenceCustomer-provided VPNHigh-performance ISPBasic approachSmall group of hostsAll ping each other---a lotOrder every 10 seconds50 nodes produces 33 kbps of traffic per node!Run a simplified link-state algorithm over the N2 mesh to find best pathsMetric and policy basedRoute over best path with specialized metric- and policy-tagged headerUse hysteresis to prevent route flappingMajor results (tested with 12 and 16 node RONs)Recover from most complete outages and all periods of sustained high loss rates of >30%18 sec average to route around failuresRoutes around throughput failures,
View Full Document