UT CS 380L - Lecture notes - D1964330

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 380L> Lecture notes

UT CS 380L - Lecture notes

School name University of Texas at Austin

Course Cs 380l- Advanced Operating Systems

Pages 24

Download Save

Unformatted text preview:

LFSCS380L: Mike DahlinOctober 25, 2004Treat disks like tape.J. Ousterhout1 Preliminaries1.1 Review1.2 Outline• Introduction• 2 hard problems– Finding data in log– Cleaning• 2 key ideas– logging and transactions: log your writes– indexing: inode → data can live anywhere → no need to “write back”1.3 Preview2 Introduction2.1 Why this is “good” research• Driven by keen awareness of technology trend• Willing to radically depart from conventional practice• Yet keep sufficient compatibility to keep things simple and limit grunge work• 3-level analysis:– Provide insight with simplified math – “science”– Simulation to evaluate and validate ideas that are beyond math– Solid real implementation and measurements/experience• Extreme research – take idea to logical conclusion (e.g., optimize file system for writes since “reads willall come from the cache”)12.2 Technology trends• Big memory(??)– ?? density curves of the disk trend paper• High disk latency, ok bandwidth• Disks becoming more complicated• RAIDs and network RAIDs2.3 Implications• Reads taken care of (?)• Writes not, because paranoid of failure• Most disk traffic is writes• Cant afford small writes– RAID5 makes small writes worse• Simplify and make FS less “device-aware”– No tracks, cylindars, etc– Just “big writes fast” + temporal locality between write and read patterns2.4 Problems with UNIX FFS• (Because most files are small)• Too many small writes• (Because of recovery concerns)• Too many synchronous writes2.5 Approaches• Replace synchronous writes with asynchronous ones• Replace many small writes with a few large ones• So buffer in memory and write to disk using large “segment-sized” chunks• Log-append only, no overwrite in place2.6 Key difference between LFS and other log-based systems:• The log is the only and entire truth, theres nothing else22.7 Challenges• Two hard problems– Metadata design– Free space management• No update-in-place,– (almost) nothing has a permanent home,– So how do we find things?• Free space gets fragmented,– So how to ensure large extents of free space?3 Adminmidterm 1 grade distribution (fall 2004)13 X14 XXXX15 XXX16 XXX17 XXX18 X19 XProject checkpoint (due friday)• Outline of final report• High-level pseudo-code of design• original milestones, progress, new milestonesProject design review (due friday)• 2 hr meeting with another group• give them checkpoint (24hr in advance)• present paper outline and pseudo-code• Reviewers prepare report (see handout)34 Index structures4.1 Index structures in FFS4.2 Index structures in LFS• Step 1 – move inodes to log• Step 2 – find mobile inodes with fixed imap4• Not obvious this is better– Why is this better?– Don’t have to write imap after every write – just at checkpoints; otherwise roll forward– Couldn’t you do this with original inode array?– Would there be any advantages to making imap mobile by adding another level of indirection?• Compare different checkpoint organizations: entire disk, inodes, imap, imap map, ...– Assume 100 bytes/inode, 4 bytes per disk pointer. 50 MB/s bandwidth.– Assume 512MB main memorydisk data inode array imap imap mapsize 100 GB 1GB 40MB 320 KBtime to write checkpoint 2000 sec 20 sec 1 sec 10ms + seek + rotFraction of main memory 200x 2x 5% .05%4.3 Example of LFS update• Read “/foo”5• Write “/foo”• Read “/foo” using in-memory imap• What if we crash?• Update checkpoint (eventually)65 CleaningHow to get back free disk space5.1 Option 1: threading• Put new blocks wherever holes are• Each block written has points to next block in sequential log• Advantage: Don’t waste time reading/writing live data• Law of storage system entropy: left to itself, free space gets fragmented to the point of approaching yourminimum allocation size5.2 Option 2: Compact the log• Compact live blocks in log to smaller log• Advantage: Creates large extents of free space• Problem: Read/write same data over and over again5.3 Option 3: Segments: Combine threading + compaction• Want benefits of both:– Compaction: big free space– Threading: leave long living things in place so I dont copy them again and again• Solution: “semented log”– Chop disk into a bunch of large segments– Compaction within segments– Threading among segments– Always write to the “current” “clean” segment, before moving onto next one– Segment cleaner: pick some segments and collect their live data together (compaction)• Many similarities with generational garbage collection?76 Policies, evaluation6.1 Is cleaning going to hurt?• Write cost = total IO / new writes = read segs + write live + write new / write new = [1+u+(1-u)]/(1-u) =2/(1-u)– where u is utilization of segments cleaned• Conclusion: u better be small or its going to hurt bad• A ha: u doesnt have to be overall utilization6.2 How to lower cost under utilization?• Want8• bimodal distribution– clean low-utilization segments, easy– leave high-utilization segs untouched• Workloads– random writes: still can do better than average u– typical file system has locality, can do even better6.3 Greedy cleaner• Greedy cleaner: pick the lowest u to clean• Works fine for random workload• For “hot-cold” workload: 90% writes to 10% blocks– 1st mistake: not segregating hot from cold– Did that and it didnt help96.4 What’s wrong?• Segments are like fish, swimming to the left• Cleaner spends all its time repeatedly slinging a few hot fish back• Cold fish hide lots of free space on the cliff but the cleaner cant get at them, and most fish are cold6.5 Answer• Cold free space more valuable: if you throw back cold fish, takes them longer to come back• Hot free space is less valuable: might as well wait a bit longer106.6 ‘‘Cost benefit cleaner”• Optimize for benefit/cost = age*(1-u)/(1+u)• Favors cold segments6.7 Segment size?An Example Followup Question• Whats the best segment size?• Big: can amortize seek more effectively• Small: even better chance to find segments that have low utilization, or even zero utilization• Find the optimal compromise7 conclusion7.1 Paper’s conclusions• Disk parameters– WREN IV disk – 1.3MB/s max BW, 17.5 avg seek, 300MB– LFS: 4KB block size, 1MB segment size• Results– 10x performance for small writes11– Similar large

View Full Document


School:
Email:
New Password:
Confirm Password:

UT CS 380L - Lecture notes

Sign up for free to view:

Please select your school