Filesystems – Metadata, Paths, & CachingDiskgedankenToday’s OverviewQuiz 1 ObservationsOccam’s RazorA Reasonable ApproachChanges Over TimeMost Common AnswerLinked Files (Alto)Contiguous AllocationSingle-Level Indexed Files or Extent-based FilesystemsFile Allocation Table (FAT)Multi-Level Indexed Files (Unix)Reliability In Disk SystemsRecovery After FailureReducing Synchronous TimesChallengesBigger, Faster, StrongerRAID (Redundant Array of Inexpensive Disks)Synopsis of RAID LevelsDid RAID Work?RAID’s Real BenefitNamespaceA Sample File TreeWhat If You Have Two Disks?As Mariah’s Files Grow?Mount PointsSlide 28PathsFinding PathsConsider The FollowingVarious SolutionsDoes It “Do What You Want”Symbolic LinkFilesystems – Metadata, Paths, & CachingVivek PaiPrinceton University2DiskgedankenAssuming you back-up and restore files, what factors affect the time involved?How are these factors changing?What issues affect the rates of change?How is total backup time changing over the years?What is Occam’s razor?3Today’s OverviewQuiz recapFinish up metadata, reliabilityA little discussion of mounting, etcMove on to performance4Quiz 1 ObservationsI’m disappointedQuizzes not yet graded, but…Most people did poorly on question 1Lots of dimensional analysisLots of sleepers, chatting, weird facesVery little (too little) feedback in generalOpen question – looking for a methodical approach5Occam’s RazorFrom William of Occam (philosopher)“entities should not be multiplied unnecessarily”Often reduced to other statements“one should not increase, beyond what is necessary, the number of entities required to explain anything”“Make as few assumptions as possible”“once you have eliminated all other possible explanations, what remains must be the answer”6A Reasonable ApproachDisk size: 40GB (20-80GB common)File size: 10KB (5-20KB common)Access time: 10ms (5-20ms common)Assume 1 seek per file (reasonable)100 files = 1MB, each access .01 secSo, 40GB at 1MB/s = 40K sec = 11+ hours7Changes Over TimeDisk density doubling each yearSeek time dropping < 10%File size growing slowlyResults# of files grows faster than access time reductionBackup time increases8Most Common AnswerDisk size / maximum transfer rateIn other words, read sectors, not filesCan this be done?Yes, if you have access to “raw” diskWhich means that you have “root” permissionAnd that the system has raw disk supportFaster than file-based dump/restoreNo concept of files, howeverWhat happens if you restore to a disk with a different geometry?9Linked Files (Alto)File header points to 1st block on diskEach block points to nextProsCan grow files dynamicallyFree list is similar to a fileConsrandom access: horribleunreliable: losing a block means losing the restFile headernull. . .10Contiguous AllocationRequest in advance for the size of the fileSearch bit map or linked list to locate a spaceFile headerfirst sector in filenumber of sectorsProsFast sequential accessEasy random accessConsExternal fragmentationHard to grow files11Single-Level Indexed Files orExtent-based FilesystemsA user declares max sizeA file header holds an array of pointers to point to disk blocksProsCan grow up to a limitRandom access is fastConsClumsy to grow beyond limitPeriodic cleanup of new filesUp-front declaration a real painFile headerDiskblocks12217File Allocation Table (FAT)ApproachA section of disk for each partition is reservedOne entry for each blockA file is a linked list of blocksA directory entry points to the 1st block of the fileProsSimpleConsAlways go to FATWasting space619399foo217EOFFAT039961913Multi-Level Indexed Files (Unix)13 Pointers in a header10 direct pointers11: 1-level indirect12: 2-level indirect13: 3-level indirectPros & ConsIn favor of small filesCan growLimit is 16G and lots of seekWhat happens to reach block 23, 5, 340?1 2 datadata...11 12 13 data... ... data... ... data... ...14Reliability In Disk SystemsMake sure certain actions have occurred before function completesKnown as “synchronous” operationEx: make sure new inode is on disk & that the directory has been modified before declaring a file creation is completeDrawback: speedSome ops easily asynchronous: access timeSome filesystems don’t care: Linux ext2fs15Recovery After FailureNeed to ensure consistencyDoes free bitmap match tree walk?Do reference counts in inodes match directory entries?Do blocks appear in multiple inodes?This kind of recovery grows with disk sizeClean shutdown – mark as such, no recovery16Reducing Synchronous TimesWrite to a faster storageNonvolatile memory – expensive, requires some additional OS/firmware supportWrite to a special disk or section – loggingOnly have to examine log when recoveringEventually have to put information in placeSome information dies in the log itselfWrite in a special orderWrite metadata in a way that is consistent but possibly recovers less17ChallengesUnix filesystem has great flexibilityExtent-based filesystems have speedSeeks kill performance – localityBitmaps show contiguous free spaceLinked lists easy to searchHow do you perform backup/restore?18Bigger, Faster, StrongerMaking individual disks larger is hardThrow more disks at the problemCapacity increasesEffective access speed may increaseProbability of failure also increasesUse some disks to provide redundancyGenerally assume a fail-stop modelFail-stop versus Byzantine failures19RAID (Redundant Array of Inexpensive Disks)Main ideaStore the error correcting codes on other disksGeneral error correcting codes are too powerfulUse XORs or single parityUpon any failure, one can recover the entire block from the spare disk (or any disk) using XORsProsReliabilityHigh bandwidthConsThe controller is complexRAID controllerXOR20Synopsis of RAID LevelsRAID Level 0: Non redundant (JBOD)RAID Level 1:MirroringRAID Level 2:Byte-interleaved, ECCRAID Level 3:Byte-interleaved, parityRAID Level 4:Block-interleaved, parityRAID Level 5:Block-interleaved, distributed parity21Did RAID Work?Performance: yesReliability:
View Full Document