UTK CS 594 - Grid/MetaComputing Lecture 7 Notes - D456605

Home> Schools> The University of Tennessee, Knoxville> (CS) > CS 594> Grid/MetaComputing Lecture 7 Notes

DOC PREVIEW

UTK CS 594 - Grid/MetaComputing Lecture 7 Notes

School name The University of Tennessee, Knoxville

Course Cs 594- Computer Systems Fundamentals

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Grid/MetaComputing Lecture-7Spring 2003Dr Graham Faggwith support material fromMark Baker and Michael ReschLecture 7 Overview• Making it all work– Resource Management and Scheduling• From Cluster/Batch schedulers to MetaComputing– Scheduling MPI jobs• All we really need to doContents - Resource Management and Scheduling• Clusters, MetaComputing and Super Computers• Motivation for CMS• Motivation for using Clusters• Cycle Stealing• Management Software• Problems with Distributed Computing Resource Management• Cluster Management Software Systems• From Clusters to MetaComputingCluster, Super Computers and Metacomputing• A main goal of distributed computing research is to provide users with simple and transparent access to a distributed set of heterogeneous computing resources.• This is often called Metacomputing - the user submits jobs to a virtual distributed computer, rather than specifying particular computers.• Super Computing on the other hand is designed to provide massive computational power to users. Performance is more important than usability.• MetaComputing promises both.Cluster and Metacomputing• Metacomputing is still a research area - current implementations are limited, mostly applying to LANs rather than WANs.– Except for when SuperComputing Confferences are taking place!– Situation is getting better all the time.• LAN implementations are Cluster Management Software (CMS ) or Cluster Computing Environments (CCE ).• MetaComputing systems can be built by extending LAN systems, or by using many LAN systems together.Cluster and Metacomputing• Cluster Management Software– Managing clusters of (mostly) workstations as a distributed compute resource. – Built on top of existing OS. • Cluster Computing Environments– Software to allow the cluster to be used as an applications environment, similar to Distributed Shared Memory (DSM) systems. – Built into OS kernel for improved performance.2Cluster and Metacomputing• The World Wide Web is now so ubiquitous that it is becoming the platform of choice for distributed computing (Internet or Intranet) and Metacomputing.– See the Webflow project later.– Mostly used as the server to client binding structure.• I.e. submit a web form rather than a job request ticket.Motivation for Resource Management• Users want to be able to submit their jobs without having to worry about where they run - i.e. submit jobs to a metacomputer (virtual computer) rather than search for spare cycles on a real computer. – Ease of use. Requires both distributed code as well as data!• Large organisations (companies, universities, national labs, etc.) typically have hundreds or thousands of powerful workstations for use by employees, which is a major under-utilised compute resource.– Check lecture 3 notes on Resource Management.• What is Spare Cycles and do we want to use them?Motivation for using Clusters• Surveys show utilisation of CPU cycles of desktop workstations is typically <10%.• Performance of workstations and PCs is rapidly improving (my Laptop > 60 Mflops/s on Fortran 77 code).• As performance grows, percent utilisation will decrease even further!• Organisations are reluctant to buy large supercomputers, due to the large expense and short useful life span. UsageUsage depends on the class of the users.. As shown here; Meteorology verse Psychology.Motivations for Clusters• The communications bandwidth between workstations is increasing as new networking technologies and protocols are implemented in LANs and WANs.• Workstation clusters are easier to integrate into existing networks than special parallel computers.– Install Linux from the local NFS copy… (look at NASA Beowulf)– MPPs require special HiPPi switches and interface hookups.• Many MetaComputers will be made from Clusters although most of the larger research efforts prefer to integrate different MPPs rather than clusters.– I.e. more output for less effort from the MetaComputing System itself (I.e getting the Bell Award).Motivation for using Clusters• The development tools for workstations are more mature than the contrasting proprietary solutions for parallel computers -mainly due to the non-standard nature of many parallel systems.• Workstation clusters are a cheap and readily available alternative to specialised High Performance Computing (HPC) platforms.• Use of clusters of workstations as a distributed compute resource is very cost effective - incremental growth of system!!!– CPUs and disks are a lot cheaper, but some of the better InterConnection cards like Myranet, Gigabit etc are expensive.– Well almost… try upgrading all your systems to PIIIs after you just brought all the PIIs… with the wrong motherboards.• Do a few a week maybe?3Cycle Stealing • Usually a workstation will be owned by an individual, group, department, or organisation - they are dedicated to the exclusive use by the owners. • This brings problems when attempting to form a cluster of workstations for running distributed applications.– Unless it is a dedicated cluster like the TORC cluster. (If it is managed correctly that is).• TORC runs too many different tests/configuration – I.e. not a stable platform? Cycle Stealing• Typically, there are three types of owner, who use their workstations mostly for:1. Sending and receiving mail and preparing documents. 2. Software development - edit, compile, debug and test cycle. 3. Running compute-intensive applications.Cycle Stealing• Cluster computing aims to steal spare cycles from (1) and (2) to provide resources for (3). • However, this requires overcoming the ownership hurdle - people are very protective of their workstations. • Usually requires an organizational mandate that computers are to be used in this way. Cycle Stealing• Stealing cycles outside standard work hours (e.g. overnight) is easy, stealing idle cycles during work hours without impacting interactive users (both CPU and memory) is much harder. Management Software• Software for managing clusters or metacomputers must handle many complex issues: – Heterogeneous environments (computer and network hardware, software, OS, protocols, etc.).– Resource Management.• CPUs, disk arrays, and sometimes long haul network connections.– Job scheduling.• Handling multiple schedulers at the same time.– Job allocation policy (prioritisation).– Security and authentication.– Cycle

View Full Document