Unformatted text preview:

LFSCS380L: Mike DahlinApril 5, 2011Treat disks like tape.J. Ousterhout1 Preliminaries1.1 Review1.2 Outline• Introduction• 2 hard problems– Finding data in log– Cleaning• 2 key ideas– logging and transactions: log your writes– indexing: inode → data can live anywhere → no need to “write back”1.3 Preview2 Introduction2.1 Why I teach this paper• Not widely deployed• Useful pedagogical tool – if you understand this, you understand any file system• Good illustration of research methodology (both original work and subsequent evaluation)• Useful ideas – ideas live on in – solid state drive file systems, solaris ZFS2.2 Why this is “good” research• Driven by keen awareness of technology trend• Willing to radically depart from conventional practice• Yet keep sufficient compatibility to keep things simple and limit grunge work• 3-level analysis:1– Provide insight with simplified math – “science”– Simulation to evaluate and validate ideas that are beyond math– Solid real implementation and measurements/experience• Extreme research – take idea to logical conclusion (e.g., optimize file system for writes since “reads willall come from the cache”)2.3 Technology trends• Big memory(??)– ?? density curves of the disk trend paper• High disk latency, ok bandwidth• Disks becoming more complicated• RAIDs and network RAIDs2.4 Implications• Reads taken care of (?)• Writes not, because paranoid of failure• Most disk traffic is writes• Cant afford small writes– RAID5 makes small writes worse• Simplify and make FS less “device-aware”– No tracks, cylinders, etc– Just “big writes fast” + temporal locality between write and read patterns2.5 Problems with UNIX FFS• (Because most files are small)• Too many small writes• (Because of recovery concerns)• Too many synchronous writes2.6 Approaches• Replace synchronous writes with asynchronous ones• Replace many small writes with a few large ones• So buffer in memory and write to disk using large “segment-sized” chunks• Log-append only, no overwrite in place22.7 Key difference between LFS and other log-based systems:• The log is the only and entire truth, theres nothing else2.8 Challenges• Two hard problems– Metadata design– Free space management• No update-in-place,– (almost) nothing has a permanent home,– So how do we find things?• Free space gets fragmented,– So how to ensure large extents of free space?3 Adminmidterm 1 grade distribution (fall 2004)13 X14 XXXX15 XXX16 XXX17 XXX18 X19 XProject checkpoint (due friday)• Outline of final report• High-level pseudo-code of design• original milestones, progress, new milestonesProject design review (due friday)• 2 hr meeting with another group• give them checkpoint (24hr in advance)• present paper outline and pseudo-code• Reviewers prepare report (see handout)34 Index structures4.1 Index structures in FFS4.2 Index structures in LFS• Step 1 – move inodes to log• Step 2 – find mobile inodes with fixed imap4• Not obvious this is better– Why is this better?– Don’t have to write imap after every write – just at checkpoints; otherwise roll forward– Couldn’t you do this with original inode array?– Would there be any advantages to making imap mobile by adding another level of indirection?• Compare different checkpoint organizations: entire disk, inodes, imap, imap map, ...– Assume 100 bytes/inode, 4 bytes per disk pointer. 1KB blocks; 50 MB/s bandwidth. 100GB disk;10M inodes (–¿ 1GB inode array 1% of disk space allows up to 10M files averaging 10KB each...reasonable?)– Assume 512MB main memory– (Imap is 1GB inode array = 10M entries * 4bytes/entry = 40MB; imap map is 40MB/1KB*4 =160KB; imap map map is 160KB/1KB * 4 = 640 bytes ...)disk data inode array imap imap mapsize 100 GB 1GB 40MB 160 KBtime to write checkpoint 2000 sec 20 sec 1 sec 10ms + seek + rotFraction of main memory 200x 2x 5% .05%4.3 Example of LFS update• Read “/foo”5• Write “/foo”• Read “/foo” using in-memory imap• What if we crash? Replay– Read checkpoint; roll forward using log– To parse log, need segment summary block – the “commit”∗ (Segment summary block assumes entire segment written at once; alternative – partial segmenttransactions + per-transaction metadata and commit)– Given segment summary block, can replay transactions in log and update in-memory (and eventu-ally on-disk) checkpoint• Additional details– How to find “next” log segment?∗ Checkpoint has pointer to first log segment to replay∗ Checkpoint also has free segment list (or can put free segment list in “well known file” e.g.,file with inumber 3)∗ Segment summary block of ith log segment includes pointer to next segment in log∗ (What if next segment hasn’t been written? It still has an old segment summary block. Don’twant to mistake the old SSB for the new one–put a monotonically increasing sequence numberin SSB “commits.” → can detect old one and ignore it...– NOTE: Optimization is to defer writing inodes and indirect blocks – can infer needed metadataupdates from segment summary block; eventually do need to write them so that imap map hassomething to point to, but can amortize cost of writing them over multiple data block writes...(ignore this optimization for now...)6• Update checkpoint (eventually)5 CleaningHow to get back free disk space5.1 Option 1: threading• Put new blocks wherever holes are• Each block written has points to next block in sequential log• Advantage: Don’t waste time reading/writing live data• Law of storage system entropy: left to itself, free space gets fragmented to the point of approaching yourminimum allocation size5.2 Option 2: Compact the log• Compact live blocks in log to smaller log• Advantage: Creates large extents of free space• Problem: Read/write same data over and over again5.3 Option 3: Segments: Combine threading + compaction• Want benefits of both:– Compaction: big free space– Threading: leave long living things in place so I dont copy them again and again7• Solution: “semented log”– Chop disk into a bunch of large segments– Compaction within segments– Threading among segments– Always write to the “current” “clean” segment, before moving onto next one– Segment cleaner: pick some segments and collect their live data together (compaction)∗ Pick a


View Full Document

UT CS 380L - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?