DOC PREVIEW
Pitt CS 3150 - Energy Efficient Scheduling

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Energy Efficient Scheduling via Partial Shutdown∗Samir Khuller†Jian Li‡Barna Saha§AbstractMotivated by issues of saving energy in data centers wedefine a collection of new problems referred to as “machineactivation” problems. The central framework we introduceconsiders a collection of m machines (unrelated or related)with each machine i having an activation cost of ai. Thereis also a collection of n jobs that need to be performed, andpi,jis the processing time of job j on machine i. Standardscheduling models assume that the set of machines is fixedand all machines are available. However, in our setting, weassume that there is an activation cost budget of A – wewould like to select a subset S of the machines to activatewith total cost a(S) ≤ A and find a schedule for the n jobson the machines in S minimizing the makespan (or any othermetric).We consider both the unrelated machines setting, aswell as the setting of scheduling uniformly related parallelmachines, where machine i has activation cost aiand speedsi, and the processing time of job j on machine i is pi,j=pjsi,where pjis the processing requirement of job j.For the general unrelated machine activation problem,our main results are that if there is a schedule with makespanT and activation cost A then we can obtain a schedule withmakespan (2+!)T and activation cost 2(1+1!)(lnnOP T+1)A,for any !> 0. We also consider assignment costs for jobsas in the generalized assignment problem, and using ourframework, provide algori thms that minimize the machineactivation and the assignment cost simultaneously. Inaddition, we present a greedy algorithm which only worksfor the basic version and yields a makespan of 2T and anactivation cost A(1 + ln n).For the uniformly related parallel machine schedul-ing problem , we develop a polynomial time approximationscheme that outputs a schedule with the property that theactivation cost of the subset of machines is at most A and themakespan is at most (1 + !)T for any !> 0. For the specialcase of m identical speed machines, the machine acti vationproblem is trivial, since the cheap es t subset of k machinesis always the best choice if the optimal solution activates kmachines. In addition, we consider the case when some jobscan be dropped (and are treated as outliers).∗Research supported by NSF Award CCF-0 7288 39 and aGoogle Research Award.†Department of Computer Science, University of Maryland,College Park, MD 20742. E-mail : [email protected].‡Department of Computer Science, University of Maryland,College Park, MD 20742. E-mail : [email protected].§Department of Computer Science, University of Maryland,College Park, MD 20742. E-mail : [email protected] IntroductionLarge scale data centers have emerged as an extremelypopular way to store and manage a large volume ofdata. Most large corporations, such as Go ogle, HP andAmazon have dozens of data centers. These data centersare typically composed of thousands of machines, andhave extremely high energy requireme nts. Data centersare now being used by companies such as Amazon WebServices, to run large scale computation tasks for othercompanies who do not have the resources to create theirown data centers. This is in addition to their owncomputing requirements.These data centers are designed to be able to handleextremely high work loads in periods of peak demand.However, since the workload on these data centersfluctuates over time, we could selectively shut downpart of the system to save energy when the demandon the system is low. Energy savings results not justfrom putting machines in a sleep state , but also fromsavings in cooling costs.Hamilton (see the recent SIGACT News article [3])argues that a ten fold reduction in the power needs ofthe data center may be possible if we can simply buildsystems that are optimized with power management astheir primary goal. Suggested examples (summarizingfrom the original text) are:1. Explore ways to simply do less during surge loadperiods.2. Explore ways to migrate work in time. The workload on modern cloud platforms is very cyclical,with infrequent peaks and deep valleys. Even valleytime is made more expensive by the need to owna power supply to be able to handle the peaks, anumber of nodes adequate to handle surge loads, anetwork provisioned for worst case demand.This leads to the issue of which machines can weshut down, since all m achines in a data center are notnecessarily identical. Each machine stores some data,and is thus not capable of performing every single jobefficiently unless some data is first migrated to themachine. We will formalize this question very shortly.To quote from the recent article by Birm an etal. (SIGACT News [3]) “Scheduling mechanisms thatassign tasks to machines, but more broadly, play therole of provisioning the data center as a whole. Aswe’ll see below, this aspect of cloud computing is ofgrowing importance because of its organic connectionto power consumption: both to spin disks, and runmachines, but also because active machines produceheat and demand cooling. Scheduling, it turns out,comes down to deciding how to spend money.”Data is replicated on storage systems for bothload balancing during peak demand periods, as wellas for fault tolerance. Typically many jobs have to bescheduled on the machines in the data center. In manycases profile information for a set of jobs is availablein advance, as well as estimates of cyclical workloads.Jobs may be I/O intensive or CPU intensive, in eithercase, an estimate of its processing time on each type ofmachine is available. Jobs that need to access specificdata can be assigned to any one of the subset of machinesthat store the needed data. Our goal is to first selecta subset of machines to activate, and then schedulethe jobs on the active machines. From this aspect ourproblems differ from standard scheduling problems withmultiple machines, where the set of active machines isthe set of all machines. Here we have to de cide whichmachines to activate and then schedule all jobs on theactive machines.The scheduling literature is vast, and one can for-mulate a variety of interesting questions in this model.We initiate this work by focusing our attention on per-haps one of the most widely studied machine schedulingproblems since it matches the requirements of the ap-plication. We have a collection of jobs and unrelatedmachines, and need to decide which subset of machinesto activate. The jobs can only be scheduled on


View Full Document

Pitt CS 3150 - Energy Efficient Scheduling

Documents in this Course
JouleSort

JouleSort

12 pages

Load more
Download Energy Efficient Scheduling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Energy Efficient Scheduling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Energy Efficient Scheduling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?