CS 372: Operating Systems Mike Dahlin 1 Lecture#22: Transactions and reliability ********************************* Review -- 1 min ********************************* Naming – which blocks belong to which files? which files belong to which names? Directories – regular files containing namefileID mappings ********************************* Outline - 1 min ********************************** Transactions ACID: atomicity, consistency, isolation, durability logging LFS, Reliability disk reliablity RAIDs ********************************* Preview - 1 min ********************************* ********************************* Lecture - 20 min ********************************* 1. Motivation File systems have lots of data structures • bitmap of free blocks • directory • file headerCS 372: Operating Systems Mike Dahlin 2 • indirect blocks • data blocks For performance, all must be cached! Ok for reads, but what about writes? 1.1 Modified data in memory (“cached writes”) can be lost Options for writing data write through – write changes immediately to disk problem: slow! Have to wait for each write to complete before going on. Write back – delay writing modified data back to disk (for example, until replaced). Problem: can lose data on a crash 1.2 multiple updates if multiple updates needed to perform some operation, crash can occur between them! For example, to move a file between directories: 1) delete file from old directory 2) add file to new directory to transfer $100 from your bank account to mine (1) Debit your account (2) Credit my account to update an existing file (e.g., text editor checkpoint) (1) overwrite block 1 of file with new data (2) overwrite block 2 of file with new data (3) … or (1) write new file with new data (2) remove old file from directory (3) add new file to directoryCS 372: Operating Systems Mike Dahlin 3 to create new file 1) allocate space on disk for header, data 2) write new header to disk 3) add new file to directory What if there is a crash in the middle, even with write-through have a problem 2. Approach 1 (ad-hoc) Common in older systems metadata: needed to keep file system logically consistent (directories, bitmaps, file headers, indirect blocks, etc.) data: user bytes 2.1 Metadata consistency For metadata, UNIX uses synchronous write through If multiple updates needed, does them in specific order so that if a crash occurs, run special program “fsck” that scans entire disk for internal consistency to check for “in progress” operations and then fix up anything in progress example: for file create, first write data to file, then update file header, then mark file header “allocated” in bitmap, then mark file blocks “allocated” in bitmap, then update directory, then (if directory grew) mark new file block “allocated” in bitmap fsck: file hdr dir 1 6 5 4 3 2CS 372: Operating Systems Mike Dahlin 4 file header not in bitmap only writes were to unallocated, unreachable blocks; write “disappears” block or file header allocated, but not in bitmap update bitmap file created, but not yet in any directory delete file Challenge: (1) need to get ad-hoc reasoning exactly right (2) poor performance (synchronous writes) (3) slow recovery – must scan entire disk 2.2 User data consistency what about user data? write back, forced to disk every 30 seconds (or user can call “sync” to force to disk immediately) No guarantee blocks written to disk in any order can lose up to 30 seconds of work Still, sometimes metadata consistency is enough e.g. how should vi or emacs write changes to a file to disk? option 1: delete old file write new file (how vi used to work!) now vi does the following: write new version to temp file move old version to other temp file move new version to real file unlink old version If a crash, look in temp area, if any files there, send e-mail to user that there might be a problemCS 372: Operating Systems Mike Dahlin 5 But what if user wants to have multiple file operations occur as a unit? Example: bank transfer ATM gives you $100 debits your account must be atomic 2.3 Implementation tricks 2.3.1 Dependencies Instead of blocking until a write makes it to disk, then sending the next write, send series of writes separated by BARRIER OS builds a dependency graph and ensures that a write does not go to disk until all writes on which it depends goes to disk Example of a general problem: output commit 3. Transaction transaction – group actions together so they are: Atomic – all or nothing. either happens or it doesn’t – no partial operations Consistent – maintains system invariants e.g., “total deposits less total withdrawals = total accounts” Isolated – serializable; transactions appear to happen one after another Durable – persistent -- once it happens, stays happened QUESTION: How does this compare to critical section? Critical sections are atomic, serializable, consistent, but not durable Two more terms commit – when transaction is done (visible, durable)CS 372: Operating Systems Mike Dahlin 6 rollback – “forget” uncomitted transaction (e.g. if failure occurs in middle of transaction, it didn’t happen at all) 4. Implementation (one thread) Key idea – fix problem of how you make multiple updates to disk atomically, by turning multiple updates into a single disk write! Illustrate with simple money transfer from acct x to acct y Begin transaction x = x + 1 y = y - 1 Commit transaction Keep “write-ahead” log (“redo log”) on disk of all changes in transaction A log is like a journal – never erased, record of everything you’ve done Once both changes are in log, write is committed Then can “write behind” changes to disk/checkpoint/final location – if crash after commit, replay log to make sure updates get to disk/checkpoint/final location Memory cache x: 0 y: 2 Disk X: 0 Y: 2 X = 1 Y = 1 “commit” Write-ahead log (on disk or tape or nvram)CS 372: Operating Systems Mike Dahlin 7 Sequence of steps to execute transaction 1) write new value of x to log (and cache) 2) write new value of y to log (and cache) 3) write “commit” to log 4) write x to disk 5) write y to disk 6) reclaim space on log QUESTION: what if we crash after 1? no
View Full Document