Unformatted text preview:

XCo Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms Presented by Wei Dai Reasons for Congestion in Cloud Cloud operators use virtualization to consolidate thousands of VMs on shared hardware platforms due to cost concerns Most VMs host service oriented applications pp that are inherently communication intensive 2 Reasons for Congestion in Cloud Cloud computing infrastructures consist of large data center clusters using commodity servers and networking hardware Pros Cheap Easy to install and manage Can be shared by a wide range of network services and protocols 3 Reasons for Congestion in Cloud Cloud computing p g infrastructures consist of large g data center clusters using commodity servers and networking hardware Cons Higher latency Smaller lower performance packet buffers Switch buffers can easilyy become overwhelmed byy high throughput traffic that can be bursty and synchronized leading to significant packet losses 4 Types of Congestion TCP throughput collapse also known as Incast Well known example of congestion experienced by barrier synchronized traffic e g synchronous reads in networked storage Congestion caused by non TCP traffic e g UDP Congestion caused by traffic not TCP friendly voice video over IP and peer to peer traffic Congestion caused by large number of short TCP sessions 5 How to Solve the Problem Root cause transient overload of buffers within switches Hardware and software mechanisms are hard to deploy at scale Ethernet flow control in IEEE 802 3x helps in low end edge switches but is counter productive in backbone switches 6 How to Solve the Problem Current industry practice Add higher capacity network switches Multi port network cards Physically separate networks for data and control traffic Drawback increase cost and complexity without addressing the root cause 7 XCo Explicit Coordination Coordinate network transmissions from multiple VMs to avoid throughput collapse and increase network utilization Advantages g simple p effective feasible and independent of switch level hardware support transparent implementation without modifying any applications l standard d d protocols l networkk switches h or VMs 8 XCo Explicit Coordination 9 Central Controller Resides in the same switched network as other nodes Takes as input Switch interconnection topology and link capacities Location of VMs on physical nodes Current traffic matrix of the network Administrative policies Whenever detects congestion buildup at any link computes and sends transmission directives to local coordinators at each end host that is contributing to the congestion 10 Local Coordinator Intercepts and regulates the outgoing traffic aggregates VM to VM flows from all VMs within the corresponding end host according to transmission directives Provides traffic feedback to the central controller The specific regulation pattern is dictated by transmission directives 11 Transmission Directives Explicit instructions for transmission Various forms Explicit timeslice scheduling which V2V flow transmits when and for how long Explicit rate limiting att what h t rate t a V2V flow fl should h ld transmit t it for f th the nextt N ms Combination of the above two or other forms 12 Explicit Timeslice Scheduling 13 Work Conservation Some nodes mayy finish earlyy with their timeslice Local coordinators return the remaining part of timeslice b k to centrall controller back ll Central controller then permits another node to transmit transmit Local coordinators introduce a small hysteresis y delayy before returning the timeslice in case that more packets might arrive during the delay 14 Experimental Setups 15 Impact of Ethernet Congestion 16 Performance Evaluation of XCo 17 Impact of Ethernet Congestion 18 Performance Evaluation of XCo 19 Impact of Ethernet Congestion 20 Performance Evaluation of XCo 21 Experimental Setup 22 Impact of Ethernet Congestion 23 Performance Evaluation of XCo 24 Live VM Migration 25 Fairness among V2V Flows 26 Work Conservation 27 Reference XCo explicit coordination to prevent network fabric congestion in cloud computing cluster platforms Vijay Shankar Rajanna Smit Shah Anand Jahagirdar Christopher Lemoine and Kartik Gopalan Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing June 2010 28


View Full Document

UCF CDA 5532 - XCo - Explicit Coordination to Prevent Network Fabric Congestion

Loading Unlocking...
Login

Join to view XCo - Explicit Coordination to Prevent Network Fabric Congestion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view XCo - Explicit Coordination to Prevent Network Fabric Congestion and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?