Distributed Energy-Efficient Scheduling

Home> Academic Documents> Distributed Energy-Efficient Scheduling

DOC PREVIEW

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Proc. 27th IEEE International Performance Computing and Communications Conference (IPCCC), Dec. 2008. 1Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids* Cong Liu1, Xiao Qin2, Santosh Kulkarni2, Chengjun Wang2, Shuang Li2, Adam Manzanares2, and Sanjeev Baskiyar2 University of North Carolina at Chapel Hill1 Auburn University2 * The work reported in this paper was supported by the US National Science Foundation under Grants No. CCF-0742187, No. CNS-0757778, No. CNS-0831502, No. OCI-0753305, No. DUE-0621307, and No. DUE-0830831, and Auburn University under a startup grant. Abstract Although data duplications may be able to improve the performance of data-intensive applications on data grids, a large number of data replicas inevitably increase energy dissipation in storage resources on the data grids. In order to implement a data grid with high energy efficiency, we address in this study the issue of energy-efficient scheduling for data grids supporting real-time and data-intensive applications. Taking into account both data locations and application properties, we design a novel Distributed Energy-Efficient Scheduler (or DEES for short) that aims to seamlessly integrate the process of scheduling tasks with data placement strategies to provide energy savings. DEES is distributed in the essence - it can successfully schedule tasks and save energy without knowledge of a complete grid state. DEES encompasses three main components: energy-aware ranking, performance-aware scheduling, and energy-aware dispatching. By reducing the amount of data replications and task transfers, DEES effectively saves energy. Simulation results based on a real-world trace demonstrate that with respect to energy consumption, DEES conserves over 35% more energy than previous approaches without degrading the performance. 1. Introduction Distributed scientific applications in many cases require access to massive data sets. In High Energy Physics (HEP) applications [8], for example, a handful of experiments have started producing petabytes of data per year for decades. Data grids [7] have served as a technology bridge between the need to access extremely large data sets and the goal of achieving high data transfer rates by providing geographically distributed computing resources and large-scale storage systems. When it comes to distributed systems such as data grids, it is the responsibility of schedulers to decide where to run applications (the terms application and task are used interchangeably throughput this paper) based on the applications’ specific requirements as well as system workload conditions. Data resources are of paramount importance for many data-intensive applications - from long running simulations to remote sensing; from biological sequence analysis to video-on-demand systems [11]. A key factor in the process of scheduling data-intensive tasks is the location of input data required by the tasks. A straightforward strategy to enhance performance of data-intensive applications on data grids is to replicate popular data sets (i.e., frequently accessed data sets) to multiple resource sites, thereby offering higher data access speeds compared to maintaining the data sets in a single site. A wide range of data replication strategies, which are practical and effective, have been commonly applied in distributed data centers [15][12]. However, making too many replicas may ultimately lead to a number of drawbacks. First, it is challenging to maintain consistency among replicas in large scale distributed systems such as grids. Second, it is nontrivial to efficiently generate replicas of massive data sets on the fly in data grids. Last but not least, a large number of data replicas inevitably and dramatically increase the energy dissipation in storage resources, which in turn often leads to large electricity bills. Recent studies show that large-scale clusters may require 40TWh per year, costing over $4Billion per year at the price of $100 per MWh [6].Proc. 27th IEEE International Performance Computing and Communications Conference (IPCCC), Dec. 2008. 2Clearly, it is a non-trivial task to improve the performance of data-intensive applications through data replicas while reducing energy dissipation in storage systems in data grids. It is necessary to make better tradeoffs between energy efficiency and high-performance for data-intensive applications since they are two conflicting design goals. In this paper, we investigate an approach to seamlessly integrate data placement strategies with task scheduling, in which both energy efficiency and real-time requirements (e.g., tasks’ deadlines) are fully addressed. In particular, we develop a novel Distributed Energy-Efficient Scheduler called DEES containing three key components: energy-aware ranking, performance-aware scheduling, and energy-aware dispatching. By leveraging an array of data placement strategies, DEES is able to maximize the number of tasks completed before their corresponding deadlines while replicating data in an energy-efficient way. To furnish DEES with an energy-efficient task dispatching mechanism that dispatches real-time tasks to peer computing sites, one has to simultaneously consider three factors: computational capacities of peer computing sites, energy consumption introduced by tasks, and data location. An interesting property of DEES is that the scheduling overhead of DEES does not necessarily increase when data grids scale up. This is quite different from most other grid scheduling techniques in which a centralized scheduler for a data grid inherently exhibits an undesirable performance bottleneck and single point failures may occur. Unlike most existing schedulers deployed in data grids, DEES does not require full knowledge of workload conditions of all the computing sites in a data grid. One must consider that obtaining full knowledge of the state of the grid is a difficult task. The remainder of this paper is organized as follows. A review of recent related work is given in Section 2. Section 3 describes the system model. Section 4 presents the detailed design of DEES. Section 5 presents a comprehensive set of simulations that were used to evaluate the performance of DEES. Conclusions and future work appear in Section 6. 2. Related work Unlike traditional parallel and distributed systems, in which required data usually resides in sites where tasks are


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Please select your school