Duke CPS 210 - FS Consistency, Block Allocation, and WAFL - D2861589

Home> Schools> Duke University> (CPS) > CPS 210> FS Consistency, Block Allocation, and WAFL

Duke CPS 210 - FS Consistency, Block Allocation, and WAFL

Pages 35

Download Save

Unformatted text preview:

FS Consistency, Block Allocation, and WAFLSummary of Issues for File SystemsRotational MediaThe Problem of Disk LayoutFFS and LFSWAFL: High-Level ViewWAFL: A Closer LookSnapshotsSlide 9ShadowingWAFL Consistency PointsThe Problem of Metadata UpdatesFFS Failure RecoveryAlternative: Logging and JournalingMetadata LoggingThe Nub of WAFLSnapMirrorMirroringWhat Has Changed?DetailsFFS Cylinder GroupsFFS Allocation PoliciesDisk Hardware (4)What to KnowLog-Structured File System (LFS)Writing the Log in LFSWriting the Log: the Rest of the StoryCleaning in LFSAllocating a Block in FFSClustering in FFSEffect of ClusteringSequential File WriteSequential Writes: A Closer LookSmall-File Create StormSmall-File Create: A Closer LookFS Consistency, Block Allocation, and WAFLFS Consistency, Block Allocation, and WAFLSummary of Issues for File SystemsSummary of Issues for File Systems1. Buffering disk data for access from the processor.block I/O (DMA) must use aligned, physically resident buffersblock update is a read-modify-write2. Creating/representing/destroying independent files.disk block allocation, file block map structuresdirectories and symbolic naming3. Masking the high seek/rotational latency of disk access.smart block allocation on diskblock caching, read-ahead (prefetching), and write-behind4. Reliability and the handling of updates.Rotational MediaRotational MediaSectorTrackCylinderHeadPlatterArmAccess time = seek time + rotational delay + transfer timeseek time = 5-15 milliseconds to move the disk arm and settle on a cylinderrotational delay = 8 milliseconds for full rotation at 7200 RPM: average delay = 4 mstransfer time = 1 millisecond for an 8KB block at 8 MB/sBandwidth utilization is less than 50% for any noncontiguous access at a block grain.The Problem of Disk LayoutThe Problem of Disk LayoutThe level of indirection in the file block maps allows flexibility in file layout.“File system design is 99% block allocation.” [McVoy]Competing goals for block allocation:•allocation cost•bandwidth for high-volume transfers•stamina•efficient directory operationsGoal: reduce disk arm movement and seek overhead.metric of merit: bandwidth utilization (or effective bandwidth)FFS and LFSFFS and LFSWe will study two different approaches to block allocation:•Cylinder groups in the Fast File System (FFS) [McKusick81]clustering enhancements [McVoy91], and improved cluster allocation [McKusick: Smith/Seltzer96]FFS can also be extended with metadata logging [e.g., Episode]•Log-Structured File System (LFS)proposed in [Douglis/Ousterhout90]implemented/studied in [Rosenblum91]BSD port, sort of maybe: [Seltzer93]extended with self-tuning methods [Neefe/Anderson97]•Other approach: extent-based file systemsWAFL: High-Level ViewWAFL: High-Level ViewAllocation mapsFixed locationThe whole on-disk file system layout is a tree of blocks.Everything else: write anywhere.User dataWAFL: A Closer LookWAFL: A Closer LookSnapshotsSnapshots“WAFL’s primary distinguishing characteristic is Snapshots, which are readonly copies of the entire file system.” This was really the origin of the idea of a point-in-time copy for the file server market. What is this idea good for?SnapshotsSnapshotsThe snapshot mechanism is used for user-accessible snapshots and for transient consistency points.How is this like a fork?ShadowingShadowing1. starting pointmodify purple/grey blocks2. write new blocks to diskprepare new block map3. overwrite block map(atomic commit)and free old blocksShadowing is the basic technique for doing an atomic force.Frequent problems: nonsequential disk writes, damages clustered allocation on disk. How does WAFL deal with this?reminiscent of copy-on-writeWAFL Consistency PointsWAFL Consistency Points“WAFL uses Snapshots internally so that it can restart quickly even after an unclean system shutdown.”“A consistency point is a completely self-consistent image of the entire file system. When WAFL restarts, it simply reverts to the most recent consistency point.”•Buffer dirty data in memory (delayed writes) and write new consistency points as an atomic batch (force).•A consistency point transitions the FS from one self-consistent state to another.•Combine with NFS operation log in NVRAMWhat if NVRAM fails?The Problem of Metadata UpdatesThe Problem of Metadata UpdatesMetadata updates are a second source of FFS seek overhead.•Metadata writes are poorly localized.E.g., extending a file requires writes to the inode, direct and indirect blocks, cylinder group bit maps and summaries, and the file block itself.Metadata writes can be delayed, but this incurs a higher risk of file system corruption in a crash.•If you lose your metadata, you are dead in the water.•FFS schedules metadata block writes carefully to limit the kinds of inconsistencies that can occur.Some metadata updates must be synchronous on controllers that don’t respect order of writes.FFS Failure RecoveryFFS Failure RecoveryFFS uses a two-pronged approach to handling failures:1. Carefully order metadata updates to ensure that no dangling references can exist on disk after a failure.•Never recycle a resource (block or inode) before zeroing all pointers to it (truncate, unlink, rmdir).•Never point to a structure before it has been initialized.E.g., sync inode on creat before filling directory entry, and sync a new block before writing the block map.2. Run a file system scavenger (fsck) to fix other problems.Free blocks and inodes that are not referenced.Fsck will never encounter a dangling reference or double allocation.Alternative: Logging and JournalingAlternative: Logging and JournalingLogging can be used to localize synchronous metadata writes, and reduce the work that must be done on recovery.Universally used in database systems.Used for metadata writes in journaling file systems (e.g., Episode).Key idea: group each set of related updates into a single log record that can be written to disk atomically (“all-or-nothing”).•Log records are written to the log file or log disk sequentially.No seeks, and preserves temporal ordering.•Each log record is trailed by a marker (e.g., checksum) that says “this log record is complete”.•To recover, scan the log and reapply updates.Metadata LoggingMetadata LoggingHere’s one approach to building a fast filesystem:1. Start with FFS with clustering.2. Make all metadata writes asynchronous.But, that approach cannot survive a failure, so:3. Add a

View Full Document


School:
Email:
New Password:
Confirm Password:

Duke CPS 210 - FS Consistency, Block Allocation, and WAFL

Sign up for free to view:

Please select your school