View Full Document

Map Reduce Programming and Cost-based Optimization



View the full content.
View Full Document
View Full Document

4 views

Unformatted text preview:

MapReduce Programming and Cost based Optimization Crossing this Chasm with Starfish Herodotos Herodotou Fei Dong Duke University Duke University Duke University hero cs duke edu dongfei cs duke edu shivnath cs duke edu ABSTRACT MapReduce has emerged as a viable competitor to database systems in big data analytics MapReduce programs are being written for a wide variety of application domains including business data processing text analysis natural language processing Web graph and social network analysis and computational science However MapReduce systems lack a feature that has been key to the historical success of database systems namely cost based optimization A major challenge here is that to the MapReduce system a program consists of black box map and reduce functions written in some programming language like C Java Python or Ruby Starfish is a self tuning system for big data analytics that includes to our knowledge the first Cost based Optimizer for simple to arbitrarily complex MapReduce programs Starfish also includes a Profiler to collect detailed statistical information from unmodified MapReduce programs and a What if Engine for fine grained cost estimation This demonstration will present the profiling whatif analysis and cost based optimization of MapReduce programs in Starfish We will show how nonexpert users can employ the Starfish Visualizer to a get a deep understanding of a MapReduce program s behavior during execution b ask hypothetical questions on how the program s behavior will change when parameter settings cluster resources or input data properties change and c ultimately optimize the program 1 INTRODUCTION MapReduce is a relatively young framework both a programming model and an associated run time system for large scale data processing 4 Hadoop 5 is a popular open source implementation of MapReduce that many academic government and industrial organizations use in production deployments Hadoop is used for applications such as Web indexing data



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Map Reduce Programming and Cost-based Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Map Reduce Programming and Cost-based Optimization and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?