DOC PREVIEW
U of I CS 525 - Debugging Deployed Distributed Systems

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Debugging distributed systems is difficultState of the ArtsProblemsD3S ContributionD3S WorkflowGlance at D3S PredicateD3S Parallel Predicate CheckerSummary of Checking LanguageSnapshotsConsistent SnapshotExperimental MethodChord OverlaySummary of ResultsOverhead (PacificA)Related WorkConclusionThank YouSlide 19What is the ProblemWeb Application ModelPNUTS – DB in the CloudBasic ConceptsA view from 10,000-ftPNUTS Storage ArchitectureGeographic ReplicationIn-region Load BalanceData and Query ModelsRecord AssignmentSingle Point UpdateRange QueryRelaxed ConsistencyRelaxed ConsistencyRelaxed ConsistencyRelaxed ConsistencyRelaxed ConsistencyRelaxed ConsistencyMembership ManagementZooKeeperZooKeeper: High AvailabilityZooKeeper: ServicesZooKeeper Example: LockZooKeeper Is PowerfulExperimental SetupScalabilitySensitivity to R/W RatioSensitivity to Request Dist.Related WorkDiscussionD3S: Debugging Deployed Distributed SystemsXuezheng Liu et al, Microsoft Research, NSDI 2008Presenter: Shuo Tang, CS525@UIUCDebugging distributed systems is difficult•Bugs are difficult to reproduce–Many machines executing concurrently–Machines/network may fail•Consistent snapshots are not easy to get•Current approaches–Multi-threaded debugging–Model-checking–Runtime-checkingState of the Arts•Example–Distributed reader-writer locks•Log-based debugging–Step1: add logs•void ClientNode::OnLockAcquired(…) {• …• print_log( m_NodeID, lock, mode);•}–Step2: Collect logs–Step3: Write checking scriptsProblems•Too much manual effort•Difficult to anticipate what to log–Too much?–Too little?•Checking for large system is challenging–A central checker cannot keep up–Snapshots must be consistentD3S Contribution•A simple language for writing distributed predicates•Programmers can change what is being checked on-the-fly•Failure tolerant consistent snapshot for predicate checking•Evaluation with five real-world applicationsD3S WorkflowChecker CheckerPredicate:no conflict locksPredicate:no conflict locksViolation!Violation!statestatestatestatestatestatestatestatestatestateConflict!Conflict!Glance at D3S Predicate V0: exposer  { ( client: ClientID, lock: LockID, mode: LockMode ) }V1: V0  { ( conflict: LockID ) } as finalafter (ClientNode::OnLockAcquired) addtuple ($0->m_NodeID, $1, $2)after (ClientNode::OnLockReleased) deltuple ($0->m_NodeID, $1, $2)class MyChecker : vertex<V1> { virtual void Execute( const V0::Snapshot & snapshot ) { …. // Invariant logic, writing in sequential style } static int64 Mapping( const V0::tuple & t ) ; // guidance for partitioning};D3S Parallel Predicate CheckerLock clientsCheckersExpose statesindividuallyReconstruct:SN1, SN2, …Exposed states(C1, L1, E), (C2, L3, S), (C5, L1, S),…L1L1(C1,L1,E),(C5,L1,S)(C2,L3,S)Key: LockIDSummary of Checking Language•Predicate–Any property calculated from a finite number of consecutive state snapshots•Highlights–Sequential programs (w/ mapping)–Reuse app types in the script and C++ code•Binary Instrumentation–Supports for reducing the overhead (in the paper) •Incremental checking•Sampling the time or snapshotsSnapshots•Use Lamport clock–Instrument network library–1000 logic clocks per second•Problem: how does the checker know whether it receives all necessary states for a snapshot?Consistent Snapshot•Membership•What if a process does not have state to expose for a long time?•What if a checker fails?ABChecker{ (A, L0, S) }, ts=2{ (B, L1, E) }, ts=6{ }, ts=10ts=12{ (A, L1, E) }, ts=16M(2)={A,B}SB(2)=?? M(6)={A,B}SA(6)=?? M(10)={A,B}SA(6)=SA(2) check(6)Detect failureSB(10)=SB(6) check(10)M(16)={A}check(16)SA(2) SB(6) SA(10) SA(16)Experimental Method•Debugging five real systems–Can D3S help developers find bugs?–Are predicates simple to write?–Is the checking overhead acceptable?•Case: Chord implementation – i3–Using predecessors and successors list to stabilize –“holes” and overlapChord OverlayPerfect Ring:• No overlap, no hole• Aggregated key coverage is 100%Perfect Ring:• No overlap, no hole• Aggregated key coverage is 100%???0 10000200003000040000500006000070000800000%50%100%150%200%3 predecessors8 predecessorstime (seconds)key range coverage ratioConsistency vs. Availability: cannot get both• Global measure on the factors• See the tradeoff quantitatively for performance tuning• Capable of checking detailed key coverage0 64 128 192 256012343 predecessors8 predecessorskey serial# of hit of chord nodesSummary of ResultsApplication LoC Predicates LoP ResultsPacificA (Structured data storage)67,263 membership consistency; leader election; consistency among replicas118 3 correctness bugsPaxos implement-ation6,993 consistency in consensus outputs; leader election50 2 correctness bugsWeb search engine26,036 unbalanced response time of indexing servers81 1 performance problemChord (DHT) 7,640 aggregate key range coverage; conflict key holders72 tradeoff bw/ availability & consistencyBitTorrent client36,117 Health in neighbor set; distribution of downloaded pieces; peer contribution rank210 2 performance bugs; free ridersData center appsWide area appsOverhead (PacificA)2 4 6 8 10030609012015018039.72withoutwith# of clients, each sending 10,000 requests time to complete (seconds)• Less than 8%, in most cases less than 4%. • I/O overhead < 0.5%• Overhead is negligible in other checked systemsRelated Work•Log analysis –Magpie[OSDI’04], Pip[NSDI’06], X-Trace[NSDI’07]•Predicate checking at replay time–WiDS Checker[NSDI’07], Friday[NSDI’07]•P2-based online monitoring–P2-monitor[EuroSys’06]•Model checking–MaceMC[NSDI’07], CMC[OSDI’04]Conclusion•Predicate checking is effective for debugging deployed & large-scale distributed systems •D3S enables:–Change of what is monitored on-the-fly–Checking with multiple checkers–Specify predicate in sequential & centralized mannerThank You•Thank the authors for providing some of slidesPNUTSYahoo!’s Hosted Data Serving PlatformBrian F. Cooper et al. @ Yahoo! ResearchPresented by Ying-Yi Liang* Some slides come from the authors’ versionWhat is the ProblemThe web era: web applicationsUsers are picky – low latency; high availabilityEnterprises are greedy – high scalabilityThings go fast – new ideas expires very soonTwo ways of developing a cool web applicationMaking your


View Full Document

U of I CS 525 - Debugging Deployed Distributed Systems

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download Debugging Deployed Distributed Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Debugging Deployed Distributed Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Debugging Deployed Distributed Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?