DOC PREVIEW
Rutgers University CS 417 - Clusters

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 417: Distributed Systems 12/4/2011 © 2011 Paul Krzyzanowski 1 Distributed Systems 24. Clusters Paul Krzyzanowski [email protected] Designing highly available systems • Incorporate elements of fault-tolerant design – Replication, TMR • Fully fault tolerant system will offer non-stop availability – You can’t achieve this! • Problem: expensive! Designing highly scalable systems SMP architecture Problem: Performance gain as f(# processors) is sublinear – Contention for resources (bus, memory, devices) – Also … the solution is expensive! Clustering Achieve reliability and scalability by interconnecting multiple independent systems Cluster: a group of standard, autonomous servers configured so they appear on the network as a single machine single system image Ideally… • Bunch of off-the shelf machines • Interconnected on a high speed LAN • Appear as one system to users • Processes are load-balanced – May migrate – May run on different systems – All IPC mechanisms and file access available • Fault tolerant – Components may fail – Machines may be taken down we don’t get all that (yet) (at least not in one package)CS 417: Distributed Systems 12/4/2011 © 2011 Paul Krzyzanowski 2 Clustering types • Supercomputing (HPC) – and Batch processing • High availability (HA) – and Load balancing • High availability / High scalability Cluster Components Cluster management • Software to manage cluster membership – What are the nodes in the cluster? – What are the live nodes in the cluster? • Quorum – Number of elements that must be online for the cluster to function – Voting algorithm to determine whether the set of nodes has quorum (a majority of nodes to keep running) • Keep track of quorum – Count cluster nodes running the cluster manager – If over ½ are active, the cluster has quorum – Forcing a majority avoids split-brain Interconnect Cluster Interconnect • Provide communication between nodes in a cluster • Often separate network from the LAN • Sometimes known as System Area Network (SAN) • Goals – Low latency • Avoid OS overhead, layers of protocols, retransmission, etc. – High bandwidth • High bandwidth, switched links • Avoid overhead of sharing traffic with non-cluster data – Low CPU overhead – Low cost • Cost usually matters if you’re connecting thousands of machines Example High-Speed Interconnects • Myricom’s Myrinet – High-speed switch fabric for low-latency, high-bandwidth, interprocess communication between nodes – Lower overhead than ethernet – 10 Gbps bandwidth between any two nodes – Scales to tens of thousands of nodes – Example: used in IBM’s Linux Cluster Solution • Infiniband – Direct interconnect to CPU bridge chip – 2.5 – 30 Gbps • 10 Gigabit EthernetCS 417: Distributed Systems 12/4/2011 © 2011 Paul Krzyzanowski 3 Disks Shared storage access • If an application can run on any machine, how does it access file data? • If an application fails over from one machine to another, how does it access its file data? • Can applications on different machines share files? Network File Systems • One option – network file systems: NFS, SMB, AFS, AFP, etc. – Caching may cause inconsistencies – Data sharing is usually a problem – Performance may be a problem (LAN vs. local bus access) Shared disk • Shared disk – Allows multiple systems to share access to disk drives – Works well if there isn’t much contention – Disk access must be synchronized • Synchronization via a distributed lock manager (DLM) • Cluster File System – Client runs a file system accessing a shared disk at the block level – Examples: IBM General Parallel File System (GPFS), Microsoft Cluster Shared Volumes (CSV), Oracle Cluster File System (OCFS), Red Hat Global File System (GFS) Cluster File System • No client/server roles, no disconnected modes • All nodes are peers and access a shared disk(s) • Distributed Lock Manager – Process to ensure mutual exclusion when needed – Not needed for local file systems on shared disk – Inode-based locking and caching control • Linux GFS (no relation to Google GFS) – Cluster file system accessing storage at a block level – Cluster Logical Volume Manager (CLVM): volume management of cluster storage – Global Network Block Device (GNBD): block level storage access over ethernet: cheap way to access block-level storage Shared nothing • Shared nothing – No shared devices – Each system has its own storage resources – No need to deal with DLMs – If a machine A needs resources on B, A sends a message to B • If B fails, storage requests have to be switched over to a live node – Exclusive access to shared storage • Multiple nodes may have access to shared storage • Only one node is granted exclusive access • Exclusive access changed on failoverCS 417: Distributed Systems 12/4/2011 © 2011 Paul Krzyzanowski 4 SAN: Node-Disk interconnect • Storage Area Network (SAN) • Separate network between nodes and storage arrays – Fibre channel – iSCSI • Any node can be configured to access any storage through a fibre channel switch • DAS: Direct Attached Storage • SAN: block-level access to a disk via a network • NAS: file-level access to a remote file system Failover HA issues • How do you detect failover? • How long does it take to detect? • How does a dead application move/restart? • Where does it move to? Heartbeat network • Machines need to detect faulty systems – Heartbeat: “ping” mechanism • Need to distinguish system faults from network faults – Useful to maintain redundant networks – Send a periodic heartbeat to test a machine’s liveness – Watch out for split-brain! • Ideally, use a network with a bounded response time – Microsoft Cluster Server supports a dedicated “private network” • Two network cards connected with a pass-through cable or hub – SAN interconnect – Cluster interconnect Failover Configuration Models • Active/Passive (N+M nodes) – M dedicated failover node(s) for N active nodes – Passive nodes do nothing until they’re needed • Active/Active – Failed workload goes to remaining nodes Design options for failover • Cold failover – Application restart • Warm failover – Restart last checkpointed image – Relies on application checkpointing itself periodically – May use writeahead log (tricky)


View Full Document

Rutgers University CS 417 - Clusters

Download Clusters
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Clusters and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Clusters 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?