DOC PREVIEW
UMass Amherst CS 677 - Fault Tolerance

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS677: Distributed OSComputer ScienceLecture 18, page 1Last Class: Fault tolerance• Reliable communication– One-one communication– One-many communication• Distributed commit– Two phase commit– Three phase commit• Failure recovery– Checkpointing– Message loggingCS677: Distributed OSComputer ScienceLecture 18, page 2Recovery• Techniques thus far allow failure handling• Recovery: operations that must be performed after afailure to recover to a correct state• Techniques:– Checkpointing:• Periodically checkpoint state• Upon a crash roll back to a previous checkpoint with aconsistent stateCS677: Distributed OSComputer ScienceLecture 18, page 3Coordinated Checkpointing• Periodically checkpoint state– Take a distributed snapshot [discussed in Lec. 11]• Upon a failure, roll back to the latest snapshot– All process restart from the latest snapshotCS677: Distributed OSComputer ScienceLecture 18, page 4Message Logging• Checkpointing is expensive– All processes restart from previous consistent cut– Taking a snapshot is expensive– Infrequent snapshots => all computations after previoussnapshot will need to be redone [wasteful]• Combine checkpointing (expensive) with messagelogging (cheap)– Take infrequent checkpoints– Log all messages between checkpoints to local stable storage– To recover: simply replay messages from previous checkpoint• Avoids recomputations from previous checkpointCS677: Distributed OSComputer ScienceLecture 18, page 5Today: Distributed File Systems• Overview of stand-alone (UNIX) file systems• Issues in distributed file systems• Next two classes: case studies of distributed file systems• NFS• Code• xFS• Log-structured file systems (time permitting)CS677: Distributed OSComputer ScienceLecture 18, page 6File System Basics• File: named collection of logically related data– Unix file: an uninterpreted sequence of bytes• File system:– Provides a logical view of data and storage functions– User-friendly interface– Provides facility to create, modify, organize, and delete files– Provides sharing among users in a controlled manner– Provides protectionCS677: Distributed OSComputer ScienceLecture 18, page 7Unix File System Review• User file: linear array of bytes. No records, no file types• Directory: special file not directly writable by user• File structure: directed acyclic graph [directories may not beshared, files may be shared (why?) ]• Directory entry for each file– File name– inode number– Major device number– Minor device number• All inodes are stored at a special location on disk [super block]– Inodes store file attributes and a multi-level index that has a list of diskblock locations for the fileCS677: Distributed OSComputer ScienceLecture 18, page 8Inode Structure• Fields– Mode– Owner_ID, group_id– Dir_file– Protection bits– Last access time, last write time, last inode time– Size, no of blocks– Ref_cnt– Address[0], … address[14]• Multi-level index: 12 direct blocks, one single, double, andtriple indirect blocksCS677: Distributed OSComputer ScienceLecture 18, page 9Distributed File Systems• File service: specification of what the file system offers– Client primitives, application programming interface (API)• File server: process that implements file service– Can have several servers on one machine (UNIX, DOS,…)• Components of interest– File service– Directory serviceCS677: Distributed OSComputer ScienceLecture 18, page 10File Service• Remote access model– Work done at the server• Stateful server (e.g., databases)• Consistent sharing (+)• Server may be a bottleneck (-)• Need for communication (-)•Upload/download mode– Work done at the client•Stateless server•Simple functionality (+)•Moves files/blocks, need storage (-)CS677: Distributed OSComputer ScienceLecture 18, page 11System Structure: Server Type• Stateless server– No information is kept at server between client requests– All information needed to service a requests must be providedby the client with each request (what info?)– More tolerant to server crashes• Stateful server– Server maintains information about client accesses– Shorted request messages– Better performance– Idempotency easier– Consistency is easier to achieveCS677: Distributed OSComputer ScienceLecture 18, page 12NFS Architecture• Sun’s Network File System (NFS) – widely used distributed file system• Uses the virtual file system layer to handle local and remote filesCS677: Distributed OSComputer ScienceLecture 18, page 13NFS OperationsWrite data to a fileYesYesWriteRead the data contained in a fileYesYesReadSet one or more attribute values for a fileYesYesSetattrRead the attribute values for a fileYesYesGetattrRead the path name stored in a symbolic linkYesYesReadlinkRead the entries in a directoryYesYesReaddirLook up a file by means of a file nameYesYesLookupClose a fileYesNoCloseOpen a fileYesNoOpenRemove an empty subdirectory from a directoryNoYesRmdirChange the name of a fileYesYesRenameCreate a special fileNoYesMknodCreate a subdirectory in a given directoryNoYesMkdirCreate a symbolic link to a fileNoYesSymlinkCreate a hard link to a fileYesYesLinkCreate a nonregular fileYesNoCreateCreate a regular fileNoYesCreateDescriptionv4v3OperationCS677: Distributed OSComputer ScienceLecture 18, page 14Communicationa) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4. Both versions use Open Network Computing (ONC) RPCs - One RPC per operation (NFS v3); multiple operations supported in v4.CS677: Distributed OSComputer ScienceLecture 18, page 15Naming: Mount Protocol• NFS uses the mount protocol to access remote files– Mount protocol establishes a local name for remote files– Users access remote files using local names; OS takes care of the mappingCS677: Distributed OSComputer ScienceLecture 18, page 16Naming: Crossing Mount Points• Mounting nested directories from multiple servers• NFS v3 does not support transitive exports (for security reasons)– NFS v4 allows clients to detects crossing of mount points, supports recursivelookupsCS677: Distributed OSComputer ScienceLecture 18, page 17Automounting• Automounting: mount on demandCS677: Distributed OSComputer ScienceLecture 18, page 18File Attributes (1)• Some general mandatory file attributes in NFS.– NFS modeled based on Unix-like file systems•


View Full Document

UMass Amherst CS 677 - Fault Tolerance

Download Fault Tolerance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fault Tolerance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fault Tolerance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?