Sun’s VFS and NFSA Typical Unix File TreeFilesystemsVFS: the Filesystem SwitchVnodesVnode Operations and AttributesV/Inode CacheContinuum of Distributed SystemsNetwork File System (NFS)NFS ProtocolNFS VnodesFile HandlesPathname TraversalFrom Servers to ServicesNFS: From Concept to ImplementationNFS as a “Stateless” ServiceDrawbacks of a Stateless ServiceThe Synchronous Write ProblemSpeeding Up NFS WritesThe Retransmission ProblemSolutions to the Retransmission ProblemFile Cache ConsistencySun’s VFS and NFSSun’s VFS and NFSA Typical Unix File TreeA Typical Unix File Tree/tmp usretcFile trees are built by graftingvolumes from different volumesor from network servers.Each volume is a set of directories and files; a host’s file tree is the set ofdirectories and files visible to processes on a given host.bin vmunixls shproject userspackages(volume root)tex emacsIn Unix, the graft operation isthe privileged mount system call,and each volume is a filesystem.mount pointmount (coveredDir, volume)coveredDir: directory pathnamevolume: device specifier or network volumevolume root contents become visible at pathname coveredDirFilesystemsFilesystemsEach file volume (filesystem) has a type, determined by its disk layout or the network protocol used to access it. ufs (ffs), lfs, nfs, rfs, cdfs, etc.Filesystems are administered independently.Modern systems also include “logical” pseudo-filesystems in the naming tree, accessible through the file syscalls.procfs: the /proc filesystem allows access to process internals.mfs: the memory file system is a memory-based scratch store.Processes access filesystems through common system calls.VFS: the Filesystem SwitchVFS: the Filesystem Switchsyscall layer (file, uio, etc.)user spaceVirtual File System (VFS)networkprotocolstack(TCP/IP)NFSFFS LFSetc.*FS etc.device driversSun Microsystems introduced the virtual file system interface in 1985 to accommodate diverse filesystem types cleanly.VFS allows diverse specific file systems to coexist in a file tree, isolating all FS-dependencies in pluggable filesystem modules.VFS was an internal kernel restructuringwith no effect on the syscall interface.Incorporates object-oriented concepts:a generic procedural interface withmultiple implementations.Based on abstract objects with dynamicmethod binding by type...in C.Other abstract interfaces in the kernel: device drivers,file objects, executable files, memory objects.VnodesVnodesIn the VFS framework, every file or directory in active use is represented by a vnode object in kernel memory.syscall layerNFS UFSfree vnodesEach vnode has a standardfile attributes struct.Vnode operations aremacros that vector tofilesystem-specificprocedures. Generic vnode points atfilesystem-specific struct(e.g., inode, rnode), seenonly by the filesystem. Each specific file system maintains a cache of its resident vnodes.Vnode Operations and AttributesVnode Operations and Attributesdirectories onlyvop_lookup (OUT vpp, name)vop_create (OUT vpp, name, vattr)vop_remove (vp, name)vop_link (vp, name)vop_rename (vp, name, tdvp, tvp, name)vop_mkdir (OUT vpp, name, vattr)vop_rmdir (vp, name)vop_symlink (OUT vpp, name, vattr, contents)vop_readdir (uio, cookie)vop_readlink (uio)files onlyvop_getpages (page**, count, offset)vop_putpages (page**, count, sync, offset)vop_fsync ()vnode attributes (vattr)type (VREG, VDIR, VLNK, etc.)mode (9+ bits of permissions)nlink (hard link count)owner user IDowner group IDfilesystem IDunique file IDfile size (bytes and blocks)access timemodify timegeneration numbergeneric operationsvop_getattr (vattr)vop_setattr (vattr)vhold()vholdrele()CPS 210V/Inode CacheV/Inode CacheHASH(fsid, fileid)VFS free list headActive vnodes are reference- counted by the structures that hold pointers to them. - system open file table - process current directory - file system mount points - etc.Each specific file system maintains its own hash of vnodes (BSD). - specific FS handles initialization - free list is maintained by VFSvget(vp): reclaim cached inactive vnode from VFS free listvref(vp): increment reference count on an active vnodevrele(vp): release reference count on a vnode vgone(vp): vnode is no longer valid (file is removed)Continuum of Distributed SystemsContinuum of Distributed Systems? ?smallfastbigslowLAN(NFS)GlobalInternetParallelArchitecturesCPS 221high latencylow bandwidthautonomous nodesunreliable networkfear and distrustindependent failuresdecentralized administrationNetworksCPS 214Issues:naming and sharingperformance and scaleresource managementlow latencyhigh bandwidthsecure, reliable interconnectno independent failurescoordinated resourcesMultiprocessorsclusters(GMS)fast networktrusting hostscoordinatedslow networkuntrusting hostsautonomyNetwork File System (NFS)Network File System (NFS)syscall layerUFSNFSserverVFSVFSNFSclientUFSsyscall layerclientuser programsnetworkserverNFS ProtocolNFS ProtocolNFS is a network protocol layered above TCP/IP.•Original implementations (and most today) use UDP datagram transport for low overhead.Maximum IP datagram size was increased to match FS block size, to allow send/receive of entire file blocks.Some newer implementations use TCP as a transport.•The NFS protocol is a set of message formats and types.Client issues a request message for a service operation.Server performs requested operation and returns a reply message with status and (perhaps) requested data.NFS VnodesNFS Vnodessyscall layerUFSNFSserverVFSRPC/UDPnetworknfsnodeNFS client stubsnfs_vnodeopsThe nfsnode holds information needed to interact with the serverto operate on the file.struct nfsnode* np = VTONFS(vp);The NFS protocol has an operation type for (almost) every vnode operation, with similar arguments/results.File HandlesFile HandlesQuestion: how does the client tell the server which file or directory the operation applies to?•Similarly, how does the server return the result of a lookup?More generally, how to pass a pointer or an object reference as an argument/result of an RPC call?In NFS, the reference is a file handle or fhandle, a 32-byte token/ticket whose value is determined by the server.•Includes all information needed to identify the file/object on the server, and get a pointer to it quickly.volume ID inode # generation #Pathname TraversalPathname TraversalWhen a pathname is passed as an argument to a system call, the syscall layer must “convert it to a vnode”.Pathname traversal is a sequence of
View Full Document