CORNELL CS 614 - Scalable Clusters - D2633424

Home> Schools> Cornell University> Computer Science (CS) > CS 614> Scalable Clusters

DOC PREVIEW

CORNELL CS 614 - Scalable Clusters

School name Cornell University

Course Cs 614- Advanced Systems

Pages 37

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Scalable ClustersOverviewMicrosoft Cluster ServiceCluster AbstractionsCluster Abstractions (cont’d)Slide 6Node FailureMember Regroup AlgorithmRegroup Algorithm (cont’d)Joining a ClusterForming a ClusterLeaving a ClusterNode StatesResource ManagementResource MigrationPushing a Resource GroupPulling a Resource GroupClient Access to ResourcesMembership ManagerGlobal Update ManagerFrangipaniServer LayeringAssumptionsSystem StructureSlide 25Security ConsiderationsDisk LayoutLogging and RecoveryConcurrency ConsiderationsConcurrency Considerations (cont’d)Cache CoherenceSynchronizationLocking ServiceLocking Service HoleAdding and Removing ServersBackupsSummaryScalable ClustersJed Liu11 April 2002OverviewMicrosoft Cluster ServiceBuilt on Windows NTProvides high availability servicesPresents itself to clients as a single systemFrangipaniA scalable distributed file systemMicrosoft Cluster ServiceDesign goals:Cluster composed of COTS componentsScalability – able to add components without interrupting servicesTransparency – clients see cluster as a single machineReliability – when a node fails, can restart services on a different nodeCluster AbstractionsNodesResourcese.g., logical disk volumes, NetBIOS names, SMB shares, mail service, SQL serviceQuorum resourceImplements persistent storage for cluster configuration database and change logResource dependenciesTracks dependencies btw resourcesCluster Abstractions (cont’d)Resource groupsThe unit of migration: resources in the same group are hosted on the same nodeCluster databaseConfiguration data for starting the cluster is kept in a database, accessed through the Windows registry.Database is replicated at each node in the cluster.Node FailureActive members broadcast periodic heartbeat messagesFailure suspicion occurs when a node misses two successive heartbeat messages from some other nodeRegroup algorithm gets initiated to determine new membership informationResources that were online at a failed member are brought online at active nodesMember Regroup AlgorithmLockstep algorithmActivate. Each node waits for a clock tick, then starts sending and collecting status messagesClosing. Determine whether partitions exist and determines whether current node is in a partition that should survivePruning. Prune the surviving group so that all nodes are fully-connectedRegroup Algorithm (cont’d)Cleanup. Surviving nodes local membership information as appropriateStabilized. DoneJoining a ClusterSponsor authenticates the joining nodeDenies access if applicant isn’t authorized to joinSponsor sends version info of config databaseAlso sends updates as needed, if changes were made while applicant was offlineSponsor atomically broadcasts information about applicant to all other membersActive members update local membership informationForming a ClusterUse local registry to find address of quorum resourceAcquire ownership of quorum resourceArbitration protocol ensures that at most one node owns quorum resourceSynchronize local cluster database with master copyLeaving a ClusterMember sends an exit message to all other cluster members and shuts down immediatelyActive members gossip about exiting member and update their cluster databasesNode StatesInactive nodes are offlineActive members are either online or pausedAll active nodes participate in cluster database updates, vote in the quorum algorithm, maintain heartbeatsOnly online nodes can take ownership of resource groupsResource ManagementAchieved by invoking a calls through a resource control library (implemented as a DLL)Through this library, MSCS can monitor the state of the resourceResource MigrationReasons for migration:Node failureResource failureResource group prefers to execute at a different nodeOperator-requested migrationIn the first case, resource group is pulled to new nodeIn all other cases, resource group is pushedPushing a Resource GroupAll resources in the old node are brought offlineOld host node chooses a new hostLocal copy of MSCS at new host brings up the resource groupPulling a Resource GroupActive nodes capable of hosting the group determine amongst themselves the new host for the groupNew host chosen based on attributes that are stored in the cluster databaseSince database is replicated at all nodes, decision can be made without any communication!New host brings online the resource groupClient Access to ResourcesNormally, clients access SMB resources using names of the form \\node\serviceThis presents a problem – as resources migrate between nodes, the resource name will changeWith MSCS, whenever a resource migrates, resource’s network name also migrates as part of resource groupClients only sees services and their network names – cluster becomes a single virtual nodeMembership ManagerMaintains consensus among active nodes about who is active and who is definedA join mechanism admits new members into the clusterA regroup mechanism determines current membership on start up or suspected failureGlobal Update ManagerUsed to implement atomic broadcastA single node in the cluster is always designated as the lockerLocker node takes over atomic broadcast in case original sender fails in mid-broadcastFrangipaniDesign goals:Provide users with coherent, shared access to filesArbitrarily scalable to provide more storage, higher performanceHighly available in spite of component failuresMinimal human administrationFull and consistent backups can be made of the entire file system without bringing it downComplexity of administration stays constant despite the addition of componentsServer LayeringUserprogramUserprogramUserprogramFrangipanifile serverFrangipanifile serverPetaldistributed virtualdisk serviceDistributedlock servicePhysical disksAssumptionsFrangipani servers trust:One anotherPetal serversLock serviceMeant to run in a cluster of machines that are under a common administration and can communicate securelySystem StructureFrangipani implemented as a file system option in the OS kernelAll file servers read and write the same file system data structures on the shared Petal diskEach file server keeps a redo log in Petal so that when it fails, another server can access log and

View Full Document