MIT 6 033 - Lecture Notes - D463718

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 033> Lecture Notes

DOC PREVIEW

MIT 6 033 - Lecture Notes

School name Massachusetts Institute of Technology

Course 6 033- Computer System Engineering

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.6.033 Lecture 17 -- Logging Sam Madden Last time: introduced atomicity two key ideas:- all or nothing - isolation saw transactions, which are a powerful way to ensure atomicity begin xfer(A,B,10) debit(B,10) commit (or abort) started talking about all-or-nothing without isolation(isolation next time) saw idea of shadow copies, where you write changes into a secondary copy, and thenatomically install that copy (show slide) this gets at the key idea of atomicity -- the golden rule never overwrite the only copy most atomicity schemes work kind of like this -- you make changes to some separatecopy of the data, and then have a way to switch over to that new copy when needed. if you crash mid-way through making changes, old copy is still there. typically whenyou crash, because transaction was incomplete, you just revert to old copy. shadow copy approach has some limitations; what are they? - only works for one ﬁle at a time (might be able to ﬁx using, e.g., shadowdirectories) - requires us to make a whole copy of the ﬁle shadow copies are used in many places -- e.g., text editors , (emacs)today we are going to learn about a general method for all-or-nothing atomicity thataddresses these limitations -- logging. basic idea is that after every change, you record the before and after value of theobject you changed. let's work with bank account balances; suppose we have the following table in memory account id balance A 100 B 50 suppose I dobegindebit (A,10)debit (B, 20)end begin deposit (A,50) what kinds of things do I keep in the log:transaction begin / commit / aborttransaction id updates tid variable updated before (undo) / after (redo) state log: begin 1 update 1 A before: 100; after: 90 update 1 A before: 50; after: 30 commit 1 begin 2 update 2 A before: 90; after 140 crash!this log now records everything I need to determine the value of an object read(var, log):cset = {}for r in log(len-1) ... log(0)if r is commit cset = cset U r.TID if r is update if (r.TID in cset and r.var = var) return r.after but this is really slow, since i have to scan the log after every read what is the commit point? (when commit record goes to log) (writes, however, are fast, since i just append to the log, probably sequentially) how to make this faster? keep two copies -- "cell storage" -- e.g., the actual tablecontents, in addition to log on disk. reads can just read current value from cell storage,and writes can go to both places. read(var)return read(cell-storage,var) write(TID,var, newval)append(log,TID, var, cell-storage(var), newval)write(cell-storage,var,newval) let's see what happens: state: A 100 --> 90 --> 140 B 50 --> 30 -> 60 -> 30 -> 40 begin log: debit (A,10) begin T1 debit (B, 20) update(A, T1, 100, 90) commit update(B, T1, 50, 30) commit T1 begin begin T2deposit (B,30)abort update(B,T2, 30, 60)abort T2 begindebit (B,20)commit begin T3 update(T3,B, 30, 40)commit T3 begindeposit (A,50)crash begin T4 update(T4,A,90,140) abort T4 After crash, cell storage has the wrong value for A in it. What should we do? Need to add a recover procedure, and need to do a backwards pass on the log toundo the effects of uncommitted transactions. (undo the "losers") Anything else we need to do? Also need to do a forward pass on the log, to redo the effects of cell storage writes incase we crashed before writing cell storage but after doing append. (redo the "winners") recover(log):cset = {}for each record r in log[len(log)-1 ... 0] //UNDOif r is commit, cset = cset U r.TIDif r is update and r.TID not in cset:write(cell-storage, r.var, r.before)for each record r in log[0...len(log)-1] //REDOif r is update and r.TID in cset:write(cell-storage, r.var, r.after) Why backwards pass for UNDO? Have to do a scan to determine cset. If we do this scan backwards, we can combine with UNDO. Why forwards pass to REDO? Want the cell-storage to show the results of the mostrecent committed transaction. other variants possible -- e.g., redo then undo, forward pass for undo (with previousscan to determine c set), etc. what if I crash during recovery?OK -- recovery is idempotentexample: after crash: A=140; B=40 cset = {1, 3} UNDO A -> 90 B -> 30 REDO A -> 90 B -> 30 B -> 40 T1 and T3 committed, and in that order. Optimization 1: "Don't REDO updates already in cell storage" Possible to optimize this somewhat by not REDO updates already reﬂected in cell storage. Simplest way to do this is to record on each cell a log record ID of the mostrecent log record reﬂected in the cell. Diagram: UPDATE(tid,logid,var,before,after) (in our example, all we needed to actually do was UNDO A->90) Q: Why did I write the log record before updating the cell storage? What would have happened if I had done these in opposite order?cell storage would have value but might not be in logrecovery wouldn't properly undo effects of updates This is called "write ahead logging" -- always write the log before you update the data Optimization 2:Defer cell storage updates Can keep a cache of cell storage in memory, and ﬂush whenever we want. Doesn'treally matter, as long as log record is written ﬁrst, since logging will ensure that weREDO any cell storage updates for committed transactions.read(var):if var in cache return cache[var]else val = read(cell-storage, var) add (var,val) to cache return val write(var,val) append(log,TID, var, cell-storage(var), newval) add (var,val) to cache Optimization 3: Truncate the log. If log grows forever, recovery takes a long time, and disk space is wasted. Q: What preﬁx of the log can be discarded? A: Any part about completed transactions whose changes are deﬁnitelyreﬂected in cell storage (because we will never need to reapply those changes.) Idea: checkpoint:write checkpoint recordﬂush cell storagetruncate log prior to checkpoint Write checkpoint. Write any outstanding updates to cell storage to disk. This state written to disk deﬁnitely reﬂects all updates to log records prior to the checkpoint. Truncate the log prior to the checkpoint. Most databases implement truncation by segmenting the log into a number of ﬁles that are chained together. Truncation involves deleting ﬁles that are no longer used.Diagram: Logging --good read performance -- cell storage, plus in memory

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

MIT 6 033 - Lecture Notes

Sign up for free to view:

Please select your school