File system funThe medium is the messageDisk vs. MemoryDisk reviewSome useful trendsFiles: named bytes on diskWhat's hard about grouping blocks?FS vs. VMSome working intuitionsCommon addressing patternsProblem: how to track file's dataStraw man: contiguous allocationStraw man: contiguous allocationLinked filesLinked filesExample: DOS FS (simplified)FAT discussionFAT discussionIndexed filesIndexed filesIndexed filesMulti-level indexed files (old BSD FS)Old BSD FS discussionMore about inodesDirectoriesA short history of directoriesHierarchical UnixNaming magicUnix example: exttt {/a/b/c.c}Default context: working directoryHard and soft links (synonyms)Case study: speeding up FSA plethora of performance costsProblem: Internal fragmentationSolution: fragmentsClustering related objects in FFSClustering in FFSWhat does a cyl. group look like?Finding space for related objsUsing a bitmapSo what did we gain?Other hacksFile system fun• File systems = the hardest part of OS- More papers on FSes than any other single topic• Main tasks of file system:- Don’t go away (ever)- Associate bytes with name (files)- Associate names with each other (directories)- Can implement file systems on disk, over network, in memory,in non-volatile ram (NVRAM), on tape, w/ paper.- We’ll focus on disk and generalize later• Today: files, directories, and a bit of performance1/38The medium is the mes s age• Disk = First thing we’ve seen that doesn’t go away- So: Where all important state ultimately resides• Slow (ms access vs ns for memory)• Huge (100–1,000x bigger than memory)- How to organize large collection of ad hoc information?- Taxonomies! (Basically FS = general way to make these)2/38Disk vs. MemoryMLC NANDDisk Flash DRAMSmallest write sector sector byteAtomic write sector sector byte/wordRandom read 8 ms 75 µs 50 nsRandom write 8 ms 300 µs* 50 nsSequential read 100 MB/s 250 MB/s > 1 GB/sSequential write 100 MB/s 170 MB/s* > 1 GB/sCost $.08–1/GB $3/GB $10-25/GBPersistence Non-volatile Non-volatile Volatile*Flash write performance degrades over time3/38Disk revie w• Disk reads/writes in terms of sectors, not bytes- Read/write single sector or adjacent groups• How to write a single byte? “Read-modify-write”- Read in sector containing the by te- Modify that byte- Write entire sector back to disk- Key: if cached, don’t need to read in• Sector = unit of atomicity.- Sector write done completely, even if crash in middle(disk saves up enough momentum to complete)• Larger atomic units have to be synthesized by OS4/38Some useful trends• Disk bandwidth and cost/bit improving exponentially- Similar to CPU speed, memory size, etc.• Seek time and rotational delay improving very slowly- Why? require moving physical object (disk arm)• Disk accesses a huge system bottleneck & getting worse- Bandwidth increase lets system (pre-)fetch large chunks for about thesame cost as small chunk.- Trade bandwidth for latency if you can get lots of related stuff.- How to get related stuff? Cluster together on disk• Memory size increasing faster than typical workload size- More and more of workload fits in file cache- Disk traffic changes: mostly writes and new data5/38Files: named bytes on disk• File abstraction:- User’s view: named sequence of bytes- FS’s view: collection of disk blocks- File system’s job: translate name & offset to disk blocks:{file, offset}−−→ FS −→disk address• File operations:- Create a file, delete a file- Read from file, write to file• Want: operations to have as few disk accesses aspossible & have minimal space overhead6/38What’s hard about grouping blocks?• Like page tables, file system meta data are simplydata structures used to construct mappings- Page table: map virtual page # to physical page #23−−−−−−−−−→ Page table −−−−−−−−−→33- File meta data: map byte offset to disk block address418−−−−−−−−→ Unix inode −−−−−→8003121- Directory: map name to disk address or file #foo.c−−−−−−−→ directory −−−−−−−−−→447/38FS vs. VM• In both settings, want location transparency• In some ways, FS has easier job than than VM:- CPU time to do FS mappings not a big deal (= no TLB)- Page tables deal with sparse address spaces and random access,files often denser (0 . . . filesize − 1) & ∼sequentially accessed• In some ways FS’s problem is harder:- Each layer of translation = potential disk access- Space a huge premium! (But disk is huge?!?!) Reason?Cache space never enough; amou nt of data you can get in onefetch never enough- Range very extreme: Many files <10 KB, some files many GB8/38Some working intuitions• FS performance dominated by # of disk accesses- Each access costs ∼10 milliseconds- Touch the disk 100 extra times = 1 second- Can easily do 100s of millions of ALU ops in same time• Access cost dominated by movement, not transfer:seek time + rotational delay + # bytes/disk-bw- Can get 50x the data for only ∼3% more overhead- 1 sector: 10ms + 8ms + 10µs(= 512 B/(50 MB/s))≈ 18ms- 50 sectors: 10ms + 8ms + .5ms = 18.5ms• Observations that might be helpful:- All blocks in file tend to be used together, sequentially- All files in a directory tend to be used tog ether- All names in a directory tend to be used together9/38Common addressing patterns• Sequential:- File data processed in sequential order- By far the most common mode- Example: editor writes out new file, compiler reads in file, etc• Random access:- Address any block in file directly without passing throughpredecessors- Examples: data set for demand paging, databases• Keyed access- Search for block with particular values- Examples: associative data base, index- Usually not provided by OS10/38Problem: how to track file’s data• Disk management:- Need to keep track of where file contents are on disk- Must be able to use this to map byte offset to disk block- Structure tracking a file’s sectors is called an index node or inode- File descriptors must be stored on disk, too• Things to keep in mind while designing filestructure:- Most files are small- Much of the disk is allocated to large files- Many of the I/O operations are made to large files- Want good sequential and good random access(what do these require?)11/38Straw man: contiguous allocatio n• “Extent-based”: allocate files like segmented memory- When creating a file, make the user specify
View Full Document