Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D2TCP)Balajee Vamanan, Jahangir Hasan, and T. N. VijaykumarBalajee Vamanan et al. Datacenters and OLDIs OLDI = OnLine Data Intensive applications e.g., Web search, retail, advertisements An important class of datacenter applications Vital to many Internet companiesOLDIs are critical datacenter applicationsBalajee Vamanan et al. Challenges Posed by OLDIsTwo important properties:1) Deadline bound (e.g., 300 ms) Missed deadlines affect revenue2) Fan-in bursts Large data, 1000s of servers Tree-like structure (high fan-in) Fan-in bursts long “tail latency” Network shared with many apps (OLDI and non-OLDI)Network must meet deadlines & handle fan-in burstsBalajee Vamanan et al. Current ApproachesTCP: deadline agnostic, long tail latency Congestion timeouts (slow), ECN (coarse)Datacenter TCP (DCTCP) [SIGCOMM '10] first to comprehensively address tail latency Finely vary sending rate based on extent of congestion shortens tail latency, but is not deadline aware ~25% missed deadlines at high fan-in & tight deadlinesDCTCP handles fan-in bursts, but is not deadline-awareBalajee Vamanan et al. Current ApproachesDeadline Delivery Protocol (D3) [SIGCOMM '11]: first deadline-aware flow scheduling Proactive & centralized No per-flow state FCFS Many deadline priority inversions at fan-in bursts Other practical shortcomings Cannot coexist with TCP, requires custom siliconD3is deadline-aware, but does not handle fan-in bursts well; suffers from other practical shortcomingsBalajee Vamanan et al. D2TCP’s Contributions1) Deadline-aware and handles fan-in bursts Elegant gamma-correction for congestion avoidance far-deadline back off more near-deadline back off less Reactive, decentralized, state (end hosts)2) Does not hinder long-lived (non-deadline) flows3) Coexists with TCP incrementally deployable4) No change to switch hardware deployable todayD2TCP achieves 75% and 50% fewer missed deadlines than DCTCP and D3Balajee Vamanan et al. Outline Introduction OLDIs D2TCP Results: Small Scale Real Implementation Results: At-Scale Simulation ConclusionBalajee Vamanan et al. OLDIsOLDI = OnLine Data Intensive applications Deadline bound, handle large data Partition-aggregate Tree-like structure Root node sends query Leaf nodes respond with data Deadline budget split among nodes and network E.g., total = 300 ms, parents-leaf RPC = 50 ms Missed deadlines incomplete responses affect user experience & revenueBalajee Vamanan et al. Long Tail Latency in OLDIs Large data High Fan-in degree Fan-in bursts Children respond around same time Packet drops: Increase tail latency Hard to absorb in buffers Cause many missed deadlines Current solutions either Over-provision the network high cost Increase network budget less compute timeCurrent solutions are insufficientBalajee Vamanan et al. Outline Introduction OLDIs D2TCP Results: Small Scale Real Implementation Results: At-Scale Simulation ConclusionBalajee Vamanan et al. D2TCPDeadline-aware and handles fan-in burstsKey Idea: Vary sending rate based on bothdeadline and extent of congestion Built on top of DCTCP Distributed: uses per-flow state at end hosts Reactive: senders react to congestion no knowledge of other flowsBalajee Vamanan et al. D2TCP: Congestion AvoidanceA D2TCP sender varies sending window (W) based on bothextent of congestion and deadlineNote: Larger p ⇒ smaller window. p = 1 ⇒ W/2. p = 0 ⇒ W/2W := W * ( 1 – p / 2 ) P is our gamma correction functionBalajee Vamanan et al. D2TCP: Gamma Correction FunctionGamma Correction (p) is a function of congestion & deadlines α: extent of congestion, same as DCTCP’s α (0 ≤ α ≤ 1) d: deadline imminence factor “completion time with window (W)” ÷ “deadline remaining” d < 1 for far-deadline flows, d > 1 for near-deadline flowsp = αdBalajee Vamanan et al. Gamma Correction Function (cont.)Key insight: Near-deadline flows back off less while far-deadline flows back off more d < 1 for far-deadline flows p large shrink window d > 1 for near-deadline flows p small retain window Long lived flows d = 1 DCTCP behaviorp1.01.0d = 1d < 1 (far deadline)d > 1 (near deadline)αW := W * ( 1 – p / 2 ) Gamma correction elegantly combines congestion and deadlinesfarnearp = αdd = 1Balajee Vamanan et al. Gamma Correction Function (cont.) α is calculated by aggregating ECN (like DCTCP) Switches mark packets if queue_length > threshold ECN enabled switches common Sender computes the fraction of marked packets averaged over timeThresholdBalajee Vamanan et al. Gamma Correction Function (cont.) The deadline imminence factor (d):“completion time with window (W)” ÷ “deadline remaining” (d = Tc/ D) B Data remaining, W Current Window SizeAvg. window size ~= 3⁄4 * W ⇒ Tc~= B ⁄ (3⁄4 * W)A more precise analysis in the paper! W/2TcWLtimeBalajee Vamanan et al. D2TCP: Stability and Convergence D2TCP’s control loop is stable Poor estimate of d corrected in subsequent RTTs When flows have tight deadlines (d >> 1)1. d is capped at 2.0 flows not over aggressive2. As α (and hence p) approach 1, D2TCP defaults to TCP D2TCP avoids congestive collapsep = αdW := W * ( 1 – p / 2 )Balajee Vamanan et al. D2TCP: Practicality Does not hinder background, long-lived flows Coexists with TCP Incrementally deployable Needs no hardware changes ECN support is commonly availableD2TCP is deadline-aware, handles fan-in bursts, and is deployable todayBalajee Vamanan et al. Outline Introduction OLDIs D2TCP Results: Real Implementation Results: Simulation ConclusionBalajee Vamanan et al. Methodology1) Real Implementation Small scale runs2) Simulation Evaluate production-like workloads At-scale runs Validated against real implementationBalajee Vamanan et al. Real Implementation 16 machines connected to ToR 24x 10Gbps ports 4 MB shared packet buffer Publicly available DCTCP code D2TCP ~100 lines of code over DCTCPAll parameters match DCTCP paperD3 requires custom hardware comparison with D3only in simulationToR SwitchServersRackBalajee Vamanan et al. D2TCP: Deadline-aware
View Full Document