Highly Available ACID MemoryIntroductionProject goalQuestions to answerOutlineAcid Memory APIImplementationEvaluationSlide 9Slide 10Slide 11Replication for availabilityArchitectureSlide 14Slide 15ConclusionsAdditional SlidesEvaluation, w.r.t. logging based approachHow to make file cache persistentHighly Available ACID Memory Vijayshankar RamanIntroductionWhy ACID memory?non-database apps:•want updates to critical data to be atomic and persistent•synchronization useful when multiple threads are accessing critical datadatabases•concurrency control and recovery logic runs through most of database code.•Extremely complicated, and hard to get right•bugs lead to data loss -- disastrous!Project goalTake recovery logic out of appsBuild a simple user-level library that provides recoverable, transactional memory.all the logic in one place => easy to debug, maintaineasy to to make use of hardware advancesuse replication and persistent memory for recovery -- instead of writing logs+simpler to implement+simpler for applications to use ??Questions to answerprogram simplicity vs. performancehow much do we lose by replicating instead of logging?on a cluster, can we use replication directly for availability?traditionally availability handled on top of the recovery systemOutlineIntroductionAcid Memory APISingle Node design & implementation EvaluationHigh Availability: multiple node design and implementationEvaluationConclusionAcid Memory APITransaction manager interface•TransactionManager(database name, acid memory area)Transaction interface•beginTransaction()•getLock(memory region1, READ/WRITE)•getLock(memory region2, READ/WRITE)•...–memory region = virtual address prefix•commit/abort() -- all locks released combine concurrency control with recovery•recovery done on write-locked regionssupports fine granularity locking => cannot use VM for recoveryapplications can modify data directlyImplementationassume non-volatile memory (NVRAM, battery backup)assume persistent file cacheacid memory area mmap’d from file persistence => writes are permanentgetLock(WRITE) -- copy the region onto mirror areatransaction abort / system crash undo changes on all writelocked regions using copy in mirror areaonly overhead of recovery is a memcpy on each write lockDisk filemaster copymirrorAcid memory areammapEvaluationOverhead of acid memoryread lock: 35usec (lock manager overhead)write lock: 35usec + 5.5usec/KB (memcpy cost)much lesser than methods that write log to diskEase of programmingapplication needs to only acquire locks to become recoverablecan manipulate the data directly -- do not have to call special function on every updateExample: suppose I want to transfer 1M $ from A’s account to B’s With ACID memory/* a points to A’s account *//* b points to B’s account */trans = new Transaction(transMgr);trans->getLock(a, WRITE);trans->getLock(b, WRITE);a = a - 1000000; b = b + 1000000;trans->commit(); Using loggingBeginTransaction();getLock(A’s account, WRITE);getLock(B’s account, WRITE);read(A’s account, a);read(B’s account, b);a = a - 1000000; b = b + 1000000;Update(A’s account, a);Update(B’s account, b);commit();(Update() creates the needed logs)Performance comparison: acid memory vs. loggingconsider a transaction updating integers in a 1KB data-structurelogging each individual update is a bit faster, to an extentacid memory gives okay performance with very easy programmabilityNumber of integer writesTime (in microseconds)Acid memory: write-lock the data-structureLogging: write-lock the structureand update each integer separatelyOutlineIntroductionAcid Memory APISingle Node design & implementation EvaluationHigh Availability: multiple node design and implementationEvaluationConclusionReplication for availability traditionally, availability has been handled in a separate layer -- above recoverycan we handle both recovery and availability via same mechanism?Transaction processing monitorDBMSDBMS DBMSreplicateArchitectureTransactions run by transaction handlerall lock requests must go to ownerdata in all replicas must be kept in syncbalance load by partitioning datadifferent owner for each partitionfailure modelfail-stop: nodes never send incorrect messages to othersfailed nodes never recover data after crashnetwork never failsOwnerdatalock managerdata datadata dataTransaction handlerreplicasclientReads: client gets data from random replicaWrites: must update all replicason commit, transaction sends new data to ownerowner propagates update atomically to all replicas•3 phase non-blocking commit protocol. Always ensure that there is someone to take over the propagation if you crashif owner crashes, fail-over to a replicaOwnerdatalock managerdata datadata dataclientTransaction handlerEvaluationVery fast recovery -- 424 usecs+get fast transactions without non-volatile memorywrites are slower- 4n messages at commit if n replicas-still, this is faster than logging to disk–homogeneous software: susceptible to bugsConclusionsAcid memory easier to usePerformance relative to logging not too badreplication gives fast recoveryUsing cache for replicationwhen/how much to replicate?Future WorkAdditional SlidesEvaluation, w.r.t. logging based approachEase of implementationvery little to code, mostly lock manager stuffwhereas in a traditional dbms•specialized buffer manager•log manager•complex recovery mechanismHow to make file cache persistentRio (Chen et. Al, 1996)place file cache in non-volatile memoryprotect it against OS crashes using VM protectionflush pages in file cache to disk files on
View Full Document