DOC PREVIEW
Duke CPS 212 - Petal and Frangipani

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Petal and FrangipaniPetal and FrangipaniPetal/FrangipaniPetal/FrangipaniPetalPetalFrangipaniFrangipaniNFSNFS“SAN”“SAN”“NAS”“NAS”Petal/FrangipaniPetal/FrangipaniPetalPetalFrangipaniFrangipaniNFSNFSUntrustedOS-agnosticFS semanticsSharing/coordinationDisk aggregation (“bricks”)Filesystem-agnosticRecovery and reconfigurationLoad balancingChained declusteringSnapshotsDoes not control sharingEach “cloud” may resize or reconfigure independently.What indirection is required to make this happen, and where is it?Remaining SlidesRemaining SlidesThe following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is stillavailable through Chandu Thekkath’s site at www.thekkath.org.For CPS 212, several issues are important:•Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).• Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.• Understand the similarities/differences between Petal and the other reconfigurable cluster service work we have studied: DDS and Porcupine.• Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.• Understand the nature, purpose, and role of the three key design elements added for Frangipani: leased locks, a write-ownership consistent caching protocol, and server logging for recovery.5Petal: Distributed Virtual DisksPetal: Distributed Virtual DisksSystems Research CenterDigital Equipment CorporationEdward K. LeeChandramohan A. Thekkath10/24/20026Logical System ViewLogical System View/dev/vdisk1/dev/vdisk2/dev/vdisk3/dev/vdisk4/dev/vdisk5AdvFS NT FS PC FS UFSScalable NetworkPetal27Physical System ViewPhysical System ViewScalable NetworkPetal Server Petal Server Petal Server Petal ServerParallel Database or Cluster File System/dev/shared18Virtual DisksVirtual DisksEach disk provides 2^64 byte address space.Created and destroyed on demand.Allocates disk storage on demand.Snapshots via copy-on-write.Online incremental reconfiguration.9Virtual to Physical TranslationVirtual to Physical TranslationPMap0vdiskIDoffset(disk, diskOffset)PMap1Virtual Disk DirectoryGMapPMap2PMap3(server, disk, diskOffset)(vdiskID, offset)Server 0 Server 1 Server 2 Server 310Global State ManagementGlobal State ManagementBased on Leslie Lamport’s Paxos algorithm.Global state is replicated across all servers.Consistent in the face of server & network failures.A majority is needed to update global state.Any server can be added/removed in the presence of failed servers.11FaultFault--Tolerant Global OperationsTolerant Global OperationsCreate/Delete virtual disks.Snapshot virtual disks.Add/Remove servers.Reconfigure virtual disks.12Data Placement & RedundancyData Placement & RedundancySupports non-redundant and chained-declustered virtual disks.Parity can be supported if desired.Chained-declustering tolerates any single component failure.Tolerates many common multiple failures.Throughput scales linearly with additional servers.Throughput degrades gracefully with failures.313ChainedChainedDeclusteringDeclusteringD0Server0D3D4D7D1Server1D0D5D4D2Server2D1D6D5D3Server3D2D7D614ChainedChainedDeclusteringDeclusteringD0Server0D3D4D7Server1D2Server2D1D6D5D3Server3D2D7D6D1D0D5D415The PrototypeThe PrototypeDigital ATM network.• 155 Mbit/s per link.8 AlphaStation Model 600.• 333 MHz Alpha running Digital Unix.72 RZ29 disks.• 4.3 GB, 3.5 inch, fast SCSI (10MB/s).• 9 ms avg. seek, 6 MB/s sustained transfer rate.Unix kernel device driver.User-level Petal servers.16The PrototypeThe Prototypesrc-ss1Digital ATM Network (AN2)src-ss2 src-ss8petal1 petal2 petal8/dev/vdisk1/dev/vdisk1 /dev/vdisk1/dev/vdisk1………………17Throughput ScalingThroughput Scaling0246802468Number of ServersThrouput Scale-upLINEAR512B Rd8KB Rd64KB Rd512B Wr8KB Wr64KB Wr18Virtual Disk ReconfigurationVirtual Disk Reconfiguration0510152025300123456Elapsed Time in MinutesThroughput in MB/s6 servers8 serversvirtual disk w/ 1GB of allocated storage8KB reads & writes4Frangipani: A Scalable Distributed File Frangipani: A Scalable Distributed File SystemSystemC. A. Thekkath, T. Mann, and E. K. LeeSystems Research CenterDigital Equipment CorporationWhy Not An Old File System on Petal?Why Not An Old File System on Petal?Traditional file systems (e.g., UFS, AdvFS) cannot share a block deviceThe machine that runs the file system can become a bottleneckFrangipaniFrangipaniBehaves like a local file system• multiple machines cooperatively managea Petal disk• users on any machine see a consistentview of dataExhibits good performance, scaling, and load balancingEasy to administer Ease of AdministrationEase of AdministrationFrangipani machines are modular• can be added and deleted transparentlyCommon free space pool • users don’t have to be movedAutomatically recovers from crashesConsistent backup without halting the systemComponents of FrangipaniComponents of FrangipaniFile system core• implements the Digital Unix vnode interface• uses the Digital Unix Unified Buffer Cache• exploits Petal’s large virtual space Locks with leasesWrite-ahead redo logLocks Locks Multiple reader/single writer Locks are moderately coarse-grained• protects entire file or directoryDirty data is written to disk before lock is given to another machineEach machine aggressively caches locks• uses lease timeouts for lock recovery5LoggingLoggingFrangipani uses a write ahead redo log for metadata• log records are kept on PetalData is written to Petal• on sync, fsync, or every 30 seconds• on lock revocation or when the log wrapsEach machine has a separate log• reduces contention• independent recoveryRecoveryRecoveryRecovery is initiated by the lock serviceRecovery can be carried out on any machine• log is distributed and available via


View Full Document

Duke CPS 212 - Petal and Frangipani

Download Petal and Frangipani
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Petal and Frangipani and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Petal and Frangipani 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?