U of I CS 525 - On the Energy (In)efficiency of Hadoop Clusters - D2241685

Home> Schools> University of Illinois> Computer Science (CS) > CS 525> On the Energy (In)efficiency of Hadoop Clusters

DOC PREVIEW

U of I CS 525 - On the Energy (In)efficiency of Hadoop Clusters

School name University of Illinois

Course Cs 525- Advanced Graphics Processor Programming

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Jacob Leverich, Christos KozyrakisOn the Energy (In)efficiency of Hadoop ClustersJacob Leverich, Christos KozyrakisPresented by: Rini KaushikWhy is energy-management important? US datacenters  Energy costs (EPA): 2003 - $2Billion 2011 - $10 Billion ~1.6% of all energy consumed (Un)Green 12M tons of CO2 annually* Servers worldwide  2005 - 27.3 million (Information Week)*Jeff Chase et. al., Managing Energy and Server Resources in Hosting centers* J. Jackson, energy needs in an internet economy: a closer look at the datacentersWhere is the energy consumed?PUE = 3.02011 = 1.4Courtesy: Luiz André Barroso, Urs Hölzle2011 = 1.4Towards energy-efficiencyOpportunity: Reality on CPU UtilizationRef: The Case for Energy-Proportional Computing, Luiz André Barroso and Urs HölzlePower variation in a typical serverCourtesy: Luiz André Barroso, Urs HölzlePower vs efficiencyRef: The Case for Energy-Proportional Computing, Luiz André Barroso and Urs HölzleEfficiency = utilization/powerEasy energy-efficient option Scale-down Match number of active nodes to workload needs Turn-off remaining nodes to save power Multiple papers use this approach: Managing Energy and Server Resources in Hosting Centers, SOSP’2001 Easy whenOnly requires computation consolidationOnly requires computation consolidation Servers stateless (i.e., serving data that resides on a shared NAS or SAN) Simple replication model Workloads can be migrated to fewer machines during periods of low activity Hard when Servers with significant state Data locality importantHadoop Primer Distributed data processing framework The MapReduce programming model has emerged as a scalable way to perform data-intensive computations on commodity cluster computersCommodity datacenterCommodity datacenter HDFSUnique scale-down challenges of Hadoop clusters Computation and data co-located on servers Servers stateful Servers rarely completely idle Design principles: Load Balancing for better performance Even in low activity, low load in multiple servers than high load in few servers Data striped across nodeso High aggregate IO Commodity servers usage raises reliability and availability concerns N-way replication a norm Result - Hard to turn-off serversScale-down Opportunity: BlockReplication Invariants No two replicas on same node Replicas on atleast two racks If inactive node turned down,data still available on replica1 2 3 4 5 6 7 8 9NodeAdata still available on replicaNaïve approach Only n-1 servers can be turned-off At best, only one rack off Otherwise, availability affectedABCDEFGHCourtesy: Leverich, HotPower’09Raises QuestionsWhich node to disable? Data availability considerationHow to distinguish sleeping node from Down node?To prevent rereplication1 2 3 4 5 6 7 8 9NodeABCTo prevent rereplicationBlockCDEFGHCovering subset invariant Invariant:Every block must have one replica in the covering subset.Covering subset considerations Too large - less energy savings - Rest of the system suffers bottlenecks + Performance of the covering set goodToo smallToo small - Limited in storage capacity - Performance bottleneck + higher energy saving Paper assumes 10 – 30%Missing considerations and issues Assume system admin will establish covering subset Has no knowledge of the workload patterns No adaptability Adhoc 10 – 30% allocation of set can have serious consequences on performance and not cognizant of the consequences on performance and not cognizant of the workload patterns Number of files not accounted forChanges to Hadoop ReplicationTargetChooser One replica in local node One replica in covering subset One replica on a different rackNo re-replication of the blocks on sleeping nodesNo re-replication of the blocks on sleeping nodes Nodes disabled and enabled manuallyEvaluation Disable n nodes, compare Hadoop job energy & perf. Individual runs of webdata_sort/webdata_scan from GridMix 30 minute job batches (with some idle time!) Cluster36 nodes, HP ProliantDL140 G336 nodes, HP ProliantDL140 G3 2 quad-core Xeon 5335s each, 32GB RAM, 500GB disk 9-node covering subset (1/4 of the cluster) Energy model Validated estimate based on CPU utilization Disabled node = 0 Watts Possible to evaluate hypothetical hardwareResults: Performance It slows down (obviously) Peak performance benchmarkSort worse off than ScanSort worse off than ScanResults: Energy Less energy consumed for same amount of work 9% to 51% savedEvaluationInteresting observation – power goes down as the number of sleeping nodes is increasedHowever, energy-consumption may not.Energy = power X timeCost = Energy X cost/KwhSleeping nodes ^  performance v powerSort – 9%, Scan – 51% energy savingPerformance impact Sort - 71%Discussion Used a very small dataset in their experiments Made a statement that there is no impact on data availability which is incorrect Fault injection experiments neededAssumed a power model where power used is dependent Assumed a power model where power used is dependent only on the cpu utilization. This may not be accurate. IO bound benchmarks will have a different characteristic. Replication is meant for performance also Hot spots Tradeoff between availability, performance and energy-efficiencyFuture work Impact on durability of sleeping nodes Revisiting reliability via replication assumption Replication does have performance implications Dynamic schedulingResponds to changes in utilization of the clusterResponds to changes in utilization of the cluster Collaboration between the hadoop’s job scheduler and power controller Different workloads and their characteristics Some may value QoS and throughput more than

View Full Document