Introduction and Definitions Distributed File System Presented By Amir Khella Introduction cont d Different configurations of DFS Servers run on dedicated machines A machine can be both a client and a server Distinctive features are the multiplicity and autonomy of clients and servers in the system Presentation layout Trends and terminologies Multiplicity and dispersion of servers should be transparent to clients Network transparency Client should be able to access remote files using the same set of file operations applicable to local files User mobility User can log in to any machine in the system Performance measure the amount of time needed to satisfy a service request Fault tolerance Communication faults storage device crashes decays of storage medias Scalability capability of the system to adapt to increased service load Goal allow users of physically distributed computers to share data and storage resources by using a common file system Service software entity running on one or more machines and providing a particular type of function to a priori unknown clients Server service software running on a single machine Client a process that can invoke a service using a set of operations that form the client interface Terminologies Naming and transparency Semantics of sharing Remote access methods Fault tolerance Scalability Example Systems Sun NFS and Andrew AFS Naming and transparency Naming mapping between logical and physical objects In transparent DFS the system should hide where in the network the file is located Naming and transparency involves Location transparency and independence Naming schemes Implementation techniques 1 Location Transparency and Independence Location transparency The name of the file should not reveal any hints as to its physical storage location Location independence The name of the file need not to be changed when the file s physical location changes also known as file migration or file mobility Aspects that differentiate and contrast the two above concepts Divorcing data from location is exhibited by location independence Sharing data is provided by location transparency Location independence separates the naming hierarchy from the storage devices hierarchies and the inter server structure Naming schemes Three main approaches Files are named by combining their host name and local name Machines attach mount remote directories to their local name space Naming schemes cont d Single global name structure that spans all the files in the system The composed file system structure should be isomorphic to the structure of a conventional file system Many special files can make the ideal goal difficult to attain e g I O devices in Unix are treated as files Pathname translation Mapping of textual names to low level identifiers is typically done by a recursive lookup procedure based on the one used in conventional Unix Structure identifiers Aggregating sets of files into component units and providing the mapping on a component unit basis rather than a single file basis Bits of strings that usually have two parts Hints A piece of information that speeds up performance if it is correct and does not cause any semantic negative effect if it is incorrect Mount mechanism Joining remote file systems to create a global naming structure A mount operation binds the root of one system to a directory of another system The directory that glues the two file systems together is called a mount point Mount operations are recorded by the operating system kernel in a mount table Shared name space may not be identical on all the machines Implementation Schemes Implementation Schemes cont d Unique system wide names Network transparent same file name is used for local and remote files Neither location independent or location transparent Component unit to which file belongs File identifier within the unit These Identifiers are location independent and hence can be replicated and cached freely without being invalidated by migration of component units Semantics of Sharing File session a series of file accesses attempted by a client to the same file are always included between an the Open and Close operations Unix semantics Session semantics Immutable shared files semantics Transaction like semantics 2 Unix Semantics Every Read of file sees the effects of all previous Writes performed on that file Writes to an open file by a client are visible immediately by other clients who have this file open at the same time It is possible for clients to share the pointer of the current location into the file Advancing the pointer by one client affects all sharing clients Immutable shared files semantics Once a file is declared as shared by its creator it cannot be modified any more An immutable file has two important properties Its name may not be reused Its contents may not be altered Remote access methods Two complementary methods to handle data transfer in a request from a client to a remote file on a server Remote service Requests for accesses are delivered to the server which performs these accesses and returns their results back to the client Caching Requests for remote access brings a local copy of data blocks to the client side Usually the amount of data brought is much more than the data needed Caching works best when the stream of files exhibits locality of reference One problem arises Cache consistency Session semantics Writes to an open file are visible immediately to local clients but are visible to the same file open simultaneously Once a file is closed the changes made to it are visible only in later starting sessions Already open instances of the file do not reflect these changes Transaction like semantics The effect of file sessions on a file and their output are equivalent to the effect and output of executing the same sessions in some serial order Locking a file for the duration of a session implements these semantics Designing a caching scheme Should address the following decisions Granularity of cached data The location of the client s cache Main memory vs local disk How to propagate modifications How to determine if the client s version of the cache is consistent 3 Cache Unit Size Granularity Increasing the cache unit increases the likelihood that data for the next access will be found locally But the time to transfer data and consistency become a problem Simplest policy Write through Reliable Poor write performance Workstations without disks Fast Server caches will be in
View Full Document
Unlocking...