Unformatted text preview:

CIS 700/003: Distributed Systems tS ilN t kmeet Social NetworksByzantine behavioryFebruary 18, 2010© 2010 Andreas Haeberlen1University of PennsylvaniaWh ?Where are we?MtdlifMADtMeasurement and analysis of MAD systems Building MAD systems Faults and Misbehavior Rational behavior Byzantine behavior DefensesToday Internet crime Privacy and confidentialityyy Novel opportunitiesExperience© 2010 Andreas HaeberlenExperience2University of PennsylvaniaMdi i itMy discussion points Is BFT sufficient to build a trustworthy service? y Are the assumptions realistic? Are the guarantees strong enough? Overheads How general is this? Can it be applied to all systems? How common are Byzantine faults?How widely do you think this technique is used?How widely do you think this technique is used? How does this compare to software verification? "We omit discussion of how nodes recover from faults" Are there classes of faults BFT does not help against? Relation to FLP? Theory and practice? Fairness of their evaluation? Performance with vs. without faultsRelationship to previous work© 2010 Andreas HaeberlenRelationship to previous work Can we really rule out correlated faults? Do we have to?3University of PennsylvaniaSfdiiitSome of your discussion points NFS as the only application to evaluate? Low overhead Old hat - we've seen this again and again Relation to FLP Formalization Recovering faulty machines View change not explained; seems difficult Expensive voting procedurepgp 'Frivolous' votes Can't you do it with 2f+1 nodes?© 2010 Andreas Haeberleny4University of PennsylvaniaPltdtBFTPapers related to BFTPBFT (OSDI 1999)PBFT (OSDI 1999) Q/U (SOSP 2005)BAR (SOSP 2005)BAR (SOSP 2005) HQ (OSDI 2006) BFT2F (NSDI 2007)Zyzzyva (SOSP 2007)Zyzzyva (SOSP 2007) Shepherd (SOSP 2007)A2M (SOSP 2007)A2M (SOSP 2007) Aardvark (NSDI 2009)© 2010 Andreas Haeberlen UpRight (SOSP 2009)5University of PennsylvaniaAl t il tAn almost magical guarantee At a very high level, a BFT-replicated service appears correct to clients even if some ppfraction of the replicas fail or domains misbehave in some (almost) arbitrary way© 2010 Andreas Haeberlen() yy Wow!6University of PennsylvaniaRThBtidlRecap: The Byzantine modelAllByzantineRationalAllRationalCrashAssumption: Faulty nodes can behave arbitrarilyAssumption: Faulty nodes can behave arbitrarily Subsumes all the other models we've discussed so farExamples: Faulty nodes can crash deviate from the protocolExamples: Faulty nodes can crash, deviate from the protocol, act selfishly, lie, equivocate, attack other nodes, ... Usually it is assumed that the faulty nodes still have limited bl ( dd l f l b© 2010 Andreas Haeberlencapabilities (cannot suddenly factor large numbers or cut network links between other nodes)7University of PennsylvaniaRThBtidlRecap: The Byzantine model What does this buy us? Extremely high resilience Fewer assumptions  Fewer chances to get them wrong Better robustness against intentional attacks This comes at a price, howeverHigh complexity of the BFT protocol itselfHigh complexity of the BFT protocol itself High cost: Four machines (or more) to do the job of one Message complexity; propagation delays; determinismgpy;ppg y; May have to rewrite the software from scratch Need to ensure that no more than 33% of the nodes can be f lt t th ti© 2010 Andreas Haeberlenfaulty at the same time How do we do that?8University of PennsylvaniaSlfBtifltSome examples of Byzantine faults© 2010 Andreas Haeberlen9University of Pennsylvaniahttp://consumerist.com/2007/08/lax-meltdown-caused-by-a-single-network-interface-card.htmlSlfBtifltSome examples of Byzantine faults© 2010 Andreas Haeberlen10University of Pennsylvaniahttp://status.aws.amazon.com/s3-20080720.htmlSlfBtifltSome examples of Byzantine faults© 2010 Andreas Haeberlen11University of Pennsylvaniahttp://www.wired.com/epicenter/2009/01/magnolia-suffer/SlfBtifltSome examples of Byzantine faults© 2010 Andreas Haeberlen12University of Pennsylvaniahttp://groups.google.com/group/google-appengine/msg/ba95ded980c8c179BFT i tiBFT in practiceVery few real systems actually use it Why?Very few real systems actually use it. Why? Getting BFT to work is hardl d f l iff h d f Prevent correlated faults: Different hardware, software, ... Correct configuration: Install all the cryptographic keys, etc.Correct implementation: BFT protocols are highly complexCorrect implementation: BFT protocols are highly complex But Byzantine faults are relatively rareItitdfltiYh!'ZKInvestigated faults in Yahoo!'s ZooKeeper service, which had been running for >1 year9 bl ll5dt i fi ti 2dt9 problems overall; 5 due to misconfiguration, 2 due to application bugs, 2 to server bugs None would have been tolerated by BFTIs ZooKeeper a goodexample to study?© 2010 Andreas Haeberleny13University of PennsylvaniaSong, Junqueira, Reed: "BFT for the skeptics", BFTW3 workshop, Elche, 2009example to study?BAR F lt T lBAR Fault ToleranceObservation: Nodes can depart from the protocol forObservation: Nodes can depart from the protocol for different reasonsBroken nodes: Hardware fault, malicious operator, break-in...Broken nodes: Hardware fault, malicious operator, breakin... Selfish nodes: Want to increase their utility In BFT, the selfish nodes would be counted as Byzantineddlh lfd Idea: Model more than one class of nodes Altruistic: Never depart from protocolSelfish: Will depart from protocol if it benefits themSelfish: Will depart from protocol if it benefits them Byzantine: May behave arbitrarily, independent of benefitApproach: Carefully construct a BFT protocol such pp y pthat it is in each node's best interest to follow it Result: Safety and liveness as long as <1/3 of the nodes are Byzantine; the other nodes can all be rational© 2010 Andreas HaeberlenByzantine; the other nodes can all be rational14University of PennsylvaniaAiyer et al.: "BAR Fault Tolerance for Cooperative Services", SOSP 2005TkitTake-away pointsBtifltdlitl tiByzantine fault model is extremly conservative (Almost) no assumptions about faulty machinesSubsumes all the fault models we've discussed so farSubsumes all the fault models we've discussed so far It is possible to build systems that are resilient eentoB


View Full Document

Penn CIS 700 - Byzantine behavior

Documents in this Course
Lists

Lists

19 pages

Actors

Actors

30 pages

Load more
Download Byzantine behavior
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Byzantine behavior and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Byzantine behavior 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?