DOC PREVIEW
UT CS 378 - Batch Systems

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Batch SystemsSlide 2Slide 3Slide 4Batch Submission ProcessLSF Batch SystemLonestar Queue DefinitionsSlide 8LSF FairshareSlide 10Commonly Used LSF CommandsSlide 12Slide 13Batch System ConcernsLSF: Basic MPI Job ScriptLSF: Extended MPI Job ScriptLSF: Job Script SubmissionLSF: Interactive ExecutionBatch Script SuggestionsLSF Job Monitoring (showq utility)LSF Job Monitoring (bjobs command)LSF Job Monitoring (lsuser utility)LSF Job Manipulation/MonitoringBatch Systems•In a number of scientific computing environments, multiple users must share a compute resource:–research clusters–supercomputing centers•On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource•The batch system becomes the “nerve center” for coordinating the use of resources and controlling the state of the system in a way that must be “fair” to its users•As current and future expert users of large-scale compute resources, you need to be familiar with the basics of a batch systemBatch Systems•The core functionality of all batch systems are essentially the same, regardless of the size or specific configuration of the compute hardware:–Multiple Job Queues:•queues provide an orderly environment for managing a large number of jobs•queues are defined with a variety of limits for maximum run times, memory usage, and processor counts; they are often assigned different priority levels as well•may be interactive or non-interactive–Job Control:•submission of individual jobs to do some work (eg. serial, or parallel HPC applications)•simple monitoring and manipulation of individual jobs, and collection of resource usage statistics (e.g., memory usage, CPU usage, and elapsed wall-clock time per job) –Job Scheduling•policy which decides priority between individual user jobs•allocates resources to scheduled jobsBatch Systems•Job Scheduling Policies:–the scheduler must decide how to prioritize all the jobs on the system and allocate necessary resources for each job (processors, memory, file-systems, etc)–scheduling process can be easy or non-trivial depending on the size and desired functionality•first in, first out (FIFO) scheduling: jobs are simply scheduled in the order in which they are submitted•political scheduling: enables some users to have more priority than others•fairshare scheduling, scheduler ensures users have equal access over time –Additional features may also impact scheduling order:•advanced reservations - resources can be reserved in advance for a particular user or job•backfill - can be combined with any of the scheduling paradigms to allow smaller jobs to run while waiting for enough resources to become available for larger jobs –back-fill of smaller jobs helps maximize the overall resource utilization–back-fill can be your friend for small duration jobsBatch Systems•Common batch systems you may encounter in scientific computing:–Platform LSF–PBS–Loadleveler (IBM)–SGE•All have similar functionality but different syntax•Reasonably straight forward to convert your job scripts from one system to another•Above all include specific batch system directives which can be placed in a shell script to request certain resources (processors, queues, etc).•We will focus on LSF primarily since it is the system running on LonestarBatch Submission ProcessinternetinternetServerHeadC1 C2 C3 C4Submission:bsub < job Queue: Job Script waits for resources on Server Master: Compute Node that executes the job script, launches ALL MPI processes Launch: ssh to each compute node to start executable (e.g. a.out) Launch mpirunMasterQueueCompute Nodesmpirun –np # ./a.outibrun ./a.outLSF Batch System•Lonestar uses Platform LSF for both the batch queuing system and scheduling mechanism (provides similar functionality to PBS, but requires different commands for job submission and monitoring)•LSF includes global fairshare, a mechanism for ensuring no one user monopolizes the computing resources•Batch jobs are submitted on the front end and are subsequently executed on compute nodes as resources become available•Order of job execution depends on a variety of parameters:–Submission Time–Queue Priority: some queues have higher priorities than others–Backfill Opportunities: small jobs may be back-filled while waiting for bigger jobs to complete–Fairshare Priority: users who have recently used a lot of compute resources will have a lower priority than those who are submitting new jobs–Advanced Reservations: jobs my be blocked in order to accommodate advanced reservations (for example, during maintenance windows)–Number of Actively Scheduled Jobs: there are limits on the maximum number of concurrent processors used by each userLonestar Queue DefinitionsQueue NameMax RuntimeMin/Max ProcsSU Charge RateUsenormal 24 hours 2/256 1.0 Normal usagehigh 24 hours 2/256 1.8 Higher priority usagedevelopment 15 min 1/32 1.0Debugging and developmentAllows interactive jobshero 24 hours >256 1.0Large job submission Requires special permissionserial 12 hours 1/1 1.0For serial jobs. No more than 4 jobs/userrequest Special RequestsspruceDebugging & development, special priority, urgent comp. env.systest System Use (TACC Staff only)Lonestar Queue Definitions•Additional Queue Limits–In the normal and high queues, only a maximum of 512 processes can be used at one time. Jobs requiring more processors are deferred for possible scheduling until running jobs complete. For example, a single user can have the following job combinations eligible for scheduling:•Run 2 jobs requiring 256 procs•Run 4 jobs requiring 128 procs each•Run 8 jobs requiring 64 procs each•Run 16 jobs requiring 32 procs each–A maximum of 25 queued jobs per user is allowed at one timeLSF Fairshare•A global fairshare mechanism is implemented on Lonestar to provide fair access to its substantial compute resources•Fairshare computes a dynamic priority for each user and uses this priority in making scheduling decisions•Dynamic priority is based on the following criteria–Number of shares assigned–Resources used by jobs belonging to the user:•Number of job slots reserved•Run time of running jobs•Cumulative actual CPU time (not normalized), adjusted so that recently used CPU time is weighted more heavily than CPU time used in the distant pastLSF Fairshare•bhpart: Command to see


View Full Document

UT CS 378 - Batch Systems

Documents in this Course
Epidemics

Epidemics

31 pages

Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Batch Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Batch Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Batch Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?