DOC PREVIEW
U of I CS 525 - Debugging Deployed

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Distributed DebuggingEach step in distributed debugging is a challengeCurrent ApproachesD3S: Debugging Deployed Distributed SystemsD3S WorkflowWriting PredicatesPartitioning and Parallelism in D3SFurther OptimizationsGlobal SnapshotsConsistent SanpshotsExperimentsCase Study: PacificAPacificA: Architecture & Bug TraceResultsPerformanceDiscussionDiscussionD3S: Debugging Deployed Distributed SystemsXuezheng Liu, Zhenyu Guo, Xi Wang, Feibo Chen, Xiaochen Lian, Jian Tang, Ming Wu, M Frans Kaashoek, Zheng ZhangNSDI 2008Presented By:Pooja AgarwalCS 525 Class Presentation, UIUCDistributed Debugging•How do we generally debug a program?▫Mostly iterative•Reproducing bugs is hard in distributed systemsLarge scale systemsNetwork/machine failures•Example:•Distributed reader-writer locksLock mode: exclusive, sharedInvariant: only one client can hold a lock in the exclusive mode2Each step in distributed debugging is a challenge•Step 1: What to record? How much to record?▫States to record change over time; too less/too much recording•Step 2: How to record?▫Log based Vs Online monitoring•Step 3: How to order records?▫Problem of global consistent snapshots• Step 4: How to verify?▫Design efficient predicates; Single Vs Multiple verifiers•Processes/nodes under debug can fail▫Need to approximate global consistent snapshots•Debugger nodes themselves fail▫Need to keep running with few false positives and false negatives 3Current Approaches•Log Analysis•Large-Scale Parallel Applications•Model Checking •Online Monitoring•Replay-based Predicate Checking4D3S: Debugging Deployed Distributed Systems•A simple model for writing distributed predicates•Programmers can change what is being checked on-the-fly •Run-time checking to scale to large systems •Failure tolerant consistent snapshot for predicate checking•Evaluation with five real-world applications5D3S WorkflowPredicates (States + Logic)Predicates (States + Logic)Symbol InfoState Exposer (SE)State Exposer (SE)Checking Logic (CL)Checking Logic (CL)Dynamic InjectionAppAppAppAppSESEAppAppSESESESEAppAppSESEVerifierVerifierCLCLVerifierVerifierCLCLViolation reports, Seq of statesViolation reports, Seq of statesConflictConflict6Writing Predicates//Computation graphV0: exposer  { ( client: ClientID, lock: LockID, mode: LockMode ) }V1: V0  { ( conflict: LockID ) } as finalafter (ClientNode::OnLockAcquired) addtuple ($0->m_NodeID, $1, $2)after (ClientNode::OnLockReleased) deltuple ($0->m_NodeID, $1, $2)//Computation graphV0: exposer  { ( client: ClientID, lock: LockID, mode: LockMode ) }V1: V0  { ( conflict: LockID ) } as finalafter (ClientNode::OnLockAcquired) addtuple ($0->m_NodeID, $1, $2)after (ClientNode::OnLockReleased) deltuple ($0->m_NodeID, $1, $2)V0V0V1V1//source code from example appclass ClientNode { ClientID m_NodeID; void OnLockAcquired( LockID, LockMode ); void OnLockReleased( LockID, LockMode );};//source code from example appclass ClientNode { ClientID m_NodeID; void OnLockAcquired( LockID, LockMode ); void OnLockReleased( LockID, LockMode );};Tuples of (C, L, M)• Reuse of application code• Binary InstrumentationConflict (L)// C++ code for Predicateclass LockVerifier : public vertex<V1> { virtual void Execute( const V0::Collection & snapshot ); // verify predicate in the required snapshots, output conflicts static Key Mapping( const V0::tuple & t ) ; // map states to key space};// C++ code for Predicateclass LockVerifier : public vertex<V1> { virtual void Execute( const V0::Collection & snapshot ); // verify predicate in the required snapshots, output conflicts static Key Mapping( const V0::tuple & t ) ; // map states to key space};• Wait for snapshot to complete• Mapping()• More complex computation graphs7Partitioning and Parallelism in D3S{C1,L0,E},{C1,L4,S}{C1,L0,E},{C1,L4,S}{C2,L1,E},{C2,L4,S}{C2,L1,E},{C2,L4,S}{C8,L4,S}{C8,L4,S}L0 L4L1Check L0~L3Check L0~L3Check L4~L7Check L4~L7• Dynamic assignment of key spaces to verifiers by a central master• Pipelining• Fault tolerance8Key SpaceFurther Optimizations• Buffering of exposed states at V0•Handles verifier failures• Incremental checking[ExecuteChange()]•Increases efficiency• Sampling the key space or timestamps•Reduce overhead9Global Snapshots•Predicates are defined over a finite number of consecutive snapshots.•Use of Lamport logical time clock at each node•Liveness Issues10Consistent SanpshotsABChecker{ (A, L0, S) }, ts=2{ (B, L1, E) }, ts=6{ }, ts=10ts=12{ (A, L1, E) }, ts=16M(2)={A,B}SB(2)=?? M(6)={A,B}SA(6)=?? M(10)={A,B}SA(6)=SA(2) check(6)Failure DetectedSB(10)=SB(6) check(10)M(16)={A}check(16)SA(2)SB(6)SA(10) SA(16)Assumptions: Reliable Network and messages received in FIFO orderMembership: external service or built-in heart-beatsSnapshot is correct as long as membership is correctWhen no state being exposed, app node should report its timestamp periodically11Experiments•Three major expectations▫Help applications find bugs▫Predicates need to be simple to write▫Checking overhead needs to be low12Case Study: PacificA•Predicate▫There is at most one primary replica in each group of replicas nodes•Deployment▫8 machines▫Test scenario: database app with random I/O▫Randomly crash & restart processes▫D3S <Slice_identifier, MachineID, Primary/Secondary>•Debugging▫3 checkers, partitioned by replica groups▫Time to trigger violation: several hours13PacificA: Architecture & Bug Trace Meta Server Meta Server Slice serverSid=2,S; Sid=1,S Slice serverSid=2,S; Sid=1,S Verifiercatches violation Verifiercatches violation Report: timestamp, node, event seq Report: timestamp, node, event seq Slice serverSid=2,S; Sid=1,P Slice serverSid=2,S; Sid=1,P Slice serverSlice serverP• Coordinator crashed and forgot the previous answer• Must write to disk synchronously!P14ResultsTable 1: Results for 5 applications15Data center AppWide Area AppPerformance• Each thread(client) sends 1,000 requests• Less than 8%, in most cases less than 4%. • I/O overhead < 0.5%• Overhead in Chord and Paxos is negligible, and in BitTorrent and websearch is < 2%16Discussion•Can D3S be used for large scale applications using different collaborative systems?▫How to build predicates across various systems?▫Which system to check in event of faults?•How easy is it to use D3S?▫One needs to know how the application


View Full Document

U of I CS 525 - Debugging Deployed

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download Debugging Deployed
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Debugging Deployed and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Debugging Deployed 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?