1 Page 1 1 Enterprise Networks under Stress 2 = 60% growth/year!Vern Paxson, ICIR, “Measuring Adversaries”2 Page 2 3 = 596% growth/year!Vern Paxson, ICIR, “Measuring Adversaries” “Background” Radiation -- Dominates traffic in many of today’s networks 4 Some Observations • Internet reasonably robust to point problems like link and router failures (“fail stop”) • Successfully operates under a wide range of loading conditions and over diverse technologies • During 9/11/01, Internet worked well, under heavy traffic conditions and with some major facilities failures in Lower Manhattan3 Page 3 5 The Problem • Networks awash in illegitimate traffic: port scans, propagating worms, p2p file swapping – Legitimate traffic starved for bandwidth – Essential network services (e.g., DNS, NFS) compromised • Needed: better network management of services/applications to achieve good performance and resilience even in the face of network stress – Self-aware network environment – Observing and responding to traffic changes – While sustaining the ability to control the network 6 From the Frontlines • Berkeley Campus Network – Unanticipated traffic surges render the network unmanageable (and may cause routers to fail) – Denial of service attacks, latest worm, or the newest file sharing protocol largely indistinguishable – In-band control channel is starved, making it difficult to manage and recover the network • Berkeley EECS Department Network (12/04) – Suspected denial-of-service attack against DNS – Poorly implemented/configured spam appliance adds to DNS overload – Traffic surges render it impossible to access Web or mount file systems • Network problems contribute to brittleness of distributed systems4 Page 4 7 Why and How Networks Fail • Complex phenomenology of failure • Traffic surges break enterprise networks • “Unexpected” traffic as deadly as high net utilization – Cisco Express Forwarding: random IP addresses --> flood route cache --> force traffic thru slow path --> high CPU utilization --> dropped router table updates – Route Summarization: powerful misconfigured peer overwhelms weaker peer with too many router table entries – SNMP DoS attack: overwhelm SNMP ports on routers – DNS attack: response-response loops in DNS queries generate traffic overload 8 Technology Trends • Integration of servers, storage, switching, and routing – Blade Servers, Stateful Routers, Inspection-and-Action Boxes (iBoxes) • Packet flow manipulations at L4-L7 – Inspection/segregation/accounting of traffic – Packet marking/annotating • Building blocks for network protection – Pervasive observation and statistics collection – Analysis, model extraction, statistical correlation and causality testing – Actions for load balancing and traffic shaping Load Balancing Traffic Shaping5 Page 5 9 R R Distribution Tier E E E S S II R IA E Internet Edge Access Edge Server Edge Spam Appliance Primary & Secondary DNS Servers IS S Mail Server S Scenario: Traffic Surge Inhibiting Network Services • DNS Server swamped by excessive request traffic – Observe: DNS time outs, Web access traffic slowed, but also higher than normal mail delivery latency implying busy server edge (correlation between Mail Server and DNS Server utilization?) – Root Cause: High DNS request rates generated by Spam Appliance triggered by mail surge 10 Scenario Continued • How Diagnosed? – I-S detects high link utilization but abnormally high DNS traffic – Stats from I-I: high mail traffic, low outgoing web traffic, in traffic high but link utilization not high – Stats from I-A: lower web traffic, no unusual mail origination – Problem localized to Server edge, but visibility limited R R Distribution Tier E E E S S II R IA E Internet Edge Access Edge Server Edge Spam Appliance Primary & Secondary DNS Servers IS S Mail Server S6 Page 6 11 Scenario Continued • Possible Action Responses – Experiment: Redirect local DNS requests to Secondary DNS server: if these complete, can infer the server is the problem, not the network – Throttle: Due to MS-DNS correlation, block/slow email traffic at Server Edge: should expect reduced DNS server utilization R R Distribution Tier E E E S S II R IA E Internet Edge Access Edge Server Edge Spam Appliance Primary & Secondary DNS Servers IS S Mail Server S 12 Internet Edge PC Access Edge MS FS Spam Filter DNS Server Edge Scenario Distribution Tier7 Page 7 13 Observed Operational Problems • User visible services: – NFS mount operations time out – Web access also fails intermittently due to time outs • Failure causes: – Independent or correlated failures? – Problem in access, server, or Internet edge? – File server failure? – Internet denial of service attack? 14 Network Dashboard b/w consumed time Gentle rise in ingress b/w FS CPU utilization time No unusual pattern MS CPU utilization time Mail traffic growing DNS CPU utilization time Unusual step jump/ DNS xact rates Access Edge b/w consumed time Decline in access edge b/w In Web Out Web Email8 Page 8 15 Network Dashboard b/w consumed time FS CPU utilization time MS CPU utilization time DNS CPU utilization time Access Edge b/w consumed time Gentle rise in ingress b/w No unusual pattern Mail traffic growing Unusual step jump/ DNS xact rates Decline in access edge b/w In Web Out Web Email CERT Advisory! DNS Attack! 16 Observed Correlations • Mail traffic up • MS CPU utilization up – Service time up, service load up, service queue longer, latency longer • DNS CPU utilization up – Service time up, request rate up, latency up • Access edge b/w down Causality no surprise! How does mail traffic cause DNS load?9 Page 9 17 Run Experiment Shape Mail Traffic MS CPU utilization time Mail traffic limited DNS CPU utilization time DNS down Access Edge b/w consumed time Access edge b/w returns Root cause: Spam appliance --> DNS lookups to verify sender domains; Spam attack hammers internal DNS, degrading other services: NFS, Web In Web Out Web Email 18 Policies and Actions Restore the Network • Shape mail traffic – Mail delay acceptable to users? – Can’t do this forever unless mail is filtered at the Internet edge • Load
View Full Document