DOC PREVIEW
Berkeley COMPSCI 268 - Experiences with X-Trace: an end-to-end, datapath tracing framework

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Experiences with X-Trace: an end-to-end, datapath tracing frameworkX-TraceExample: WikipediaWell Known ProblemCoral CDNA Coral RequestAdding X-TraceSlide 8Slide 9X-Trace MechanismsTrace collection and storageExplicit Path TracingTalk RoadmapUse CasesCoral DeploymentInitial FindingsTime of HTTP ResponsesSlide 18Enterprise: IEEE 802.1X802.1X Overview802.1X and X-Trace: how to read a trace of a successful requestApproachRoot cause determinationRoot test 1: insufficient timeout settingRoot test 2: udp packet loss (reverse path)Inferring network failures with application tracesSlide 27Apache HadoopSlide 29Hadoop-specific trace display1. Detecting suboptimal default configuration with behavior anomaliesAfter optimizing configuration2. Detecting fault underlying servers with performance anomaliesFunction Contingency TableHadoop SummarySlide 36D2aiquiriD3: Declarative Distributed DebuggingConclusions / StatusSlide 40Software Bug DetectionSlide 42Slide 43Probabilistic ModelingSlide 45Instrumentation ChallengesSlide 47Open QuestionOngoing workOasis Anycast ServiceOasis DNS RequestVery little trouble......but some troubleOasis PUT BugAnomaly DetectionTime-sync correctionX-Trace Logging APIAPI (continued)High level viewUC Berkeley1Experiences with X-Trace: an end-to-end, datapath tracing frameworkGeorge Porter, Rodrigo Fonseca, Matei Zaharia, Andy Konwinski, Randy H. Katz, Scott Shenker, Ion StoicaSun Microsystems LaboratoriesJanuary 30, 20082X-Trace •Framework for capturing causality of events in a distributed system–Coherent logging: events are placed in a causal graph•Capture causality, concurrency•Across layers, applications, administrative boundaries•Audience–Developers: debug complex distributed applications–Operators: pinpoint causes of failures–Users: report on anomalous executions3Many servers, 4 worldwide sitesA user gets a stale page. What went wrong?Combinatorial explosion of possible paths through the appExample: WikipediaDNS Round-Robin33 Web Caches4 Load Balancers105 HTTP +App Servers14 DatabaseServers4Well Known Problem-Disconnected logs in different components-Multiple problems with the same symptoms-Execution paths are ephemeral•Troubleshooting distributed systems is hard5Coral CDN•Open, distributed content distribution network–Distributed cache, uses a self-organizing, locality-aware DHT–Usage is simple: append .nyud.net to domain in url–+25M requests/day, runs on > 250 Planetlab nodes–Built with libasync6A Coral RequestDNSCoral NodesHTTPRPCRPCHTTP•Interesting case for tracing–Involves recursive DNS, HTTP, RPC (and ICMP)–We trace DNS processing, HTTP (incl recursive), and RPCwww.cnn.com.nyud.net/somepage.htmlcnn.com7Adding X-Trace•Capture events within application–Logging API–Capture abstraction–Capture parallelism8Adding X-Trace•Capture events on different layers–e.g. HTTP and RPC9Adding X-Trace•Correlate events–Across different machines–Across different layers10X-Trace Mechanisms•Each Task gets a unique TaskId•Each Event within a task get a unique EventId•When logging, each event must record “edge”:previous EventId > new EventId•<TaskId, last EventId> propagated with execution–Within runtime environment (X-Trace libraries)–Through API calls (augmented APIs)–In protocol messages (as header metadata)11Trace collection and storage•Trace data buffered and distributed across instrumented hosts•Collection process–Orthogonal–Minimize collection overhead via buffering and compressionBack-endFront-end (per host)12Explicit Path Tracing•Advantages–Deterministic causality and concurrency–Handle on specific executions (name the needles)–Does not depend on time synchronization–Correlated logging•Meaningful sampling (random, biased, triggered...)•Disadvantages–Modify applications and protocols (some)13Talk Roadmap•X-Trace motivation and mechanism•Use cases:1. Wide-area: Coral Content Distribution Network2. Enterprise: 802.1X network authentication3. Datacenter: Hadoop Map/Reduce•Future work within the RAD Lab–Debugging and performance–Clustering and analysis of relationships between traces–Applying tracing to energy conservation and datacenter management14Use Cases1. Wide-area: Coral CDN2. Enterprise: 802.1X Network Authentication3. Datacenter: Hadoop Map/Reduce15Coral Deployment•Running on production Coral network since Christmas •253 machines•Sampling: tracing 0.1% of requests16Initial Findings•Found at least 5 bugs :-)•“Wrong timeout value for proxy”•“Some paths when server fetch fail may not kill client connection”•“Does Coral lookup even when likely to be over cache size”•“Forwarding internal state to client”•“Revalidation always goes to origin servers, not to peers”•Some timing issues–Very slow HTTP responses, investigating cause17Time of HTTP Responses•1. Client timeout, large object, slow node, 1 block•2. Very slow link to client•3. Very slow Coral node•4. Failure to connect to origin server, specific 189s timeout •Can look at graph for each specific point, e.g.4318Use Cases1. Wide-area: Coral CDN2. Enterprise: 802.1X Network Authentication3. Datacenter: Hadoop Map/Reduce19Enterprise: IEEE 802.1X•Controls user access to network resources–Wireless access points–VPN endpoints–Wired ports•User-specific admission criteria•Audit for compliance purposes•Complex protocol–Distributed for scalability and reliability: no central point–Multi-protocol–Spans administrative domains–Multi-vendor20802.1X Overview1 2 3 456781. Client sends credentials to authenticator2. Authenticator forwards credentials to auhentication server with RADIUS3. Authentication server queries identity store with LDAP4. Identity store processes query5. Identity store responds with success or failure using LDAP6. Authentication server makes descision; sends RADIUS response to Authenticator7. Authentication server receives response8. Access is granted or denied21802.1X and X-Trace: how to read a trace of a successful requestA BCDEABCDE22Approach•Collect application traces with X-Trace•Determine when a fault is occuring•Localize the fault•Determine the root cause, if possible•Report problem and root cause (if known) to network operator23Root cause determination•Why these tests?–Occur in customer deployments–Based on conversation with support technicians24Root test 1: insufficient timeout setting•Timeouts spread throughout


View Full Document

Berkeley COMPSCI 268 - Experiences with X-Trace: an end-to-end, datapath tracing framework

Documents in this Course
Lecture 8

Lecture 8

33 pages

L-17 P2P

L-17 P2P

50 pages

Multicast

Multicast

54 pages

Load more
Download Experiences with X-Trace: an end-to-end, datapath tracing framework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Experiences with X-Trace: an end-to-end, datapath tracing framework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Experiences with X-Trace: an end-to-end, datapath tracing framework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?