Unformatted text preview:

Condor A Hunter of Idle Workstations M L Litzkow M Livny and M W Mutka Presented by Priya Ranjan CS816S Thursday Feb 19 2004 2 26 2004 1 Condor A Hunter of Idle Workstations Motivation 1 Workstations used efficiently by individual users 2 Usage patterns 3 Increasing the efficiency of workstation cluster 4 Any ideas for management of idle workstation capability 5 Remote execution facilities 2 26 2004 2 Condor A Hunter of Idle Workstations Introduction 1 2 3 4 A rudimentary capacity scheduling system designed in 1988 Aims to optimize the utilization of workstations Protecting the integrity of local users and their activity Outlines the design implementation and performance of Candor scheduling system 449 Citations according to Research Index Current Status is http www cs wisc edu condor The goal of the Condor Project is to develop implement deploy and evaluate mechanisms and policies that support High Throughput Computing HTC on large collections of distributively owned computing resources Guided by both the technological and sociological challenges of such a computing environment the Condor Team has been building software tools that enable 2 26 2004 scientists and engineers to increase their computing throughput 3 Condor A Hunter of Idle Workstations 2 26 2004 4 Condor A Hunter of Idle Workstations Three issues addressed 1 Analysis of workstation usage patterns 2 Design of remote capacity allocation algorithms 3 Development of Remote execution facilities Understanding of these three can give you a leverage of up to 2000 What is leverage 2 26 2004 5 Condor A Hunter of Idle Workstations System Setup and Specifications for the project 1 100 VAX station II capable of remote execution 2 Light and heavy users Load balancing neural networks combinatorial optimization problems etc 3 Transparent submission of background jobs 4 Automatic restarting of job by Condor 5 Protect the integrity of local user 6 Mechanism should consume minimum possible local capacity 2 26 2004 6 Condor A Hunter of Idle Workstations Scheduling Structure Centralized Vs Distributed Centralized processor gathers information and schedules things accordingly Hard to scale extend and one point of failure Distributed mechanism will have a contention for remote processing cycles Negotiations may be chatty Hybrid Approach in Condor Best of both worlds One central coordinator Only assigns capacities Workstations keep the state info and schedule their jobs and also decide the priority of their own jobs 2 26 2004 7 Condor A Hunter of Idle Workstations Scheduling structure Local Scheduler and a Local Queue Global coordinator 2 26 2004 8 Condor A Hunter of Idle Workstations Scheduling mechanism Central coord polls every 2 min for available CPU and also jobs waiting Local scheduler decides if it can volunteer its CPU Local scheduler also checks every 30 Sec for local activity to free the workstation from any remote activity Central coord allocates capacity to local schedulers if they have a job waiting Keep it SIMPLE and ROBUST Central coord consumes at most 1 of CPU on a workstation 2 26 2004 9 Condor A Hunter of Idle Workstations Remote Unix RU Transforms idle workstations into cycle servers RU invoking starts a shadow process on local machine as a surrogate of remote process Remote system calls can be viewed as remote procedure calls Checkpointing is the most powerful feature of RU Checkpointing is the saving of the state of a program during its execution for restarting purposes Saves text date bss and the stack segments Very useful in condor as a checkpointed program can 2 26 2004 10 be moved to the next available workstation Condor A Hunter of Idle Workstations Fair access to remote capacity Heavy users vs Light users Up down algorithm for fair access Maintains a schedule index for every workstation Index increases if one gets capacity Index decreases if one is denied capacity Maintain a balance between waiting time and 2 26 2004capacity allocation 11 Condor A Hunter of Idle Workstations Performance User Profiles 2 26 2004 12 Condor A Hunter of Idle Workstations Performance Profile of Service Demand 2 26 2004 13 Condor A Hunter of Idle Workstations Performance Queue Length Avg wait ratio 2 26 2004 14 Condor A Hunter of Idle Workstations Performance utilization of remote resources over a month and a week 25 avg utilization 2 26 2004 15 Condor A Hunter of Idle Workstations Performance impact locally Rate of Checkpointing 2 26 2004 16 Condor A Hunter of Idle Workstations Overall performance and leverage 2 26 2004 17 Condor A Hunter of Idle Workstations Conclusion 1 Possible to design implement an efficient system to make optimal use of capacity 2 Improved productivity in terms of leverage upto 2000 3 Lessons learnt about jobs which should not be remotely executed 2 26 2004 18 The Purdue University Network Computing Hub Motivation Network oriented future computing Service based Adapts to the user s demand Universal access Online optimal scheduling Limited support for computing on web A demand based computing can be characterized by its universal accessibility and its ability to make automatic cost performance trade off decision at run time PUNCH The Purdue Univ Network Computing Hub 2 26 2004 19 The Purdue University Network Computing Hub Two categories of tools a Providing support for global scalability Ex ATLAS Globe Globus GUSTO IceT ParaWeb etc b Tools to access and use globally distributed resources Ex CCS MMM MOL NetSolve Ninf PUNCH RCS VNC etc Source object code may not be available for commercial applications Application Independent framework Power multi user operating system than can do 2 26 2004 almost everything 20 The Purdue University Network Computing Hub Characterization of a network computing system Interface to the external world Internal architecture Class of software it can support Capabilities of resource management framework Design Issues Scalability Reliability Security 2 26 2004 21 The Purdue University Network Computing Hub Information management Factors Portability Run specific resource usage characteristics like CPU Usage network usage memory requirements etc Administrative policies and configuration across multiple admin domains Dynamic incremental and distributed nature of information Vertically distributed architecture 2 26 2004 22 The Purdue University Network Computing Hub System Architecture message passing hierarchy of client management and execution units Thin client approach Demand driven


View Full Document

UMD CMSC 818S - Condor- A Hunter of Idle Workstations

Loading Unlocking...
Login

Join to view Condor- A Hunter of Idle Workstations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Condor- A Hunter of Idle Workstations and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?