DOC PREVIEW
UT Dallas CS 6350 - HadoopLecture2015

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Developing a Simple Map-Reduce Programfor HadoopThe University of Texas at DallasBig Data Course CS6350Professor: Dr. Latifur Khan TA: Gbadebo Ayoade([email protected])Release Date: Spring 2015Content courtesyofMohammad Ridwanur Rahman Mohammad Ali Ghaderi Revised by Gbadebo AyoadeIntroductionThe purpose of this document is to help those students who are not familiar with Hadoop to develop their frstMap-Reduce program for Hadoop.So far from HW-0 :So far from HW-0 we have a hadoop cluster in our machine and we know how to run a jar. But next questions comes in is -- How to write a map-reduce program ?- How to get the jar of the map-reduce program ?- We will demonstrate that and explain the WordCount example code.The processWe assume that you already have Hadoop on your own machine, and now you are ready to develop your frstHadoop program. This document based on Ubuntu 14.04 andHadoop 2.6.0. In the following, we will discuss the steps in details.1. Preparing the IDEHadoop programs are Java programs. You may use any Java IDE such as Eclipse, NetBeans, IntelliJ IDEA to develop your Map-Reduce program. We are going to use Eclipse in this document. If you have Eclipse on your own machine, you can skip this section.To install Eclipse, you can run this command in the shell.sudo apt-get install eclipseWait for it to be downloaded. Then use “eclipse” command to run the environment.eclipseThe default workspace should be good. Click on OK. Then goto Workbench.2. New Java ProjectHadoop projects are simple Java projects. Create a new Java project.Write project name as “MyWordCount” and click on Finish to create the project.3. Creatingmain fleCreate a new fle named “WordCount.java” and write the following lines there:import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {public static class Mapextends Mapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text(); // type of output keypublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString()); // line tostring tokenwhile (itr.hasMoreTokens()) {word.set(itr.nextToken()); // set word as each input keywordcontext.write(word, one); // create a pair <keyword, 1>}}}public static class Reduceextends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0; // initialize the sum for each keywordfor (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result); // create a pair <keyword, number of occurences>}}// Driver programpublic static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();// get all argsif (otherArgs.length != 2) {System.err.println("Usage: WordCount <in> <out>");System.exit(2);}// create a job with name "wordcount"Job job = new Job(conf, "wordcount");job.setJarByClass(WordCount.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class);// uncomment the following line to add the Combinerjob.setCombinerClass(Reduce.class);// set output key type job.setOutputKeyClass(Text.class);// set output value type job.setOutputValueClass(IntWritable.class);//set the HDFS path of the input dataFileInputFormat.addInputPath(job, new Path(otherArgs[0]));// set the HDFS path for the outputFileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//Wait till job completionSystem.exit(job.waitForCompletion(true) ? 0 : 1);}}4. Please download hadoop to your development machinePlease download hadoop to your development machine. This is required to get the dependent jar fles for hadoop compilation.http://mirror.tcpdiag.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz 5. Adding Hadoop reference (Very important)In order to compile Hadoop projects, you need to add Hadoop library as a reference to your projects. Right click on the project. Select “Build Path” -> “Configure Build Paths”, select “Libraries” tab.Click on “Add External JARs…” to continue. Find “hadoop-mapreduce-client-core-2.6.0.jar” in<Your hadoop folder>/share/hadoop/mapreduce folder, and add it.Click on “Add External JARs…”again. Find “hadoop-common-2.6.0.jar” in<Your hadoop folder>/share/hadoop/common folder, and add it.You need also add “commons-cli-1.2.jar” in the folder <Your hadoop folder>/share/hadoop/common/lib folder.Your build path confguration should be similar to this screen now:6. Creating the JAR fle for HadoopAll you need to do now is to create the JAR fle and run it in Hadoop. Right click on the project, and choose“export”.Then use “Browse…” button in front of the “JAR fle:” label to specify the name of the export fle.For example, you may use “/home/user/WordCountSample/wordcount.jar” (you can use any other path)Now, it should be two files inside WordCountSample folder:7. Executing the example in HadoopStart the hortonworks VM as shown in the installation document. Ensure the VM is properly started. Get the IP of the VM following the steps in the installation document.I am using the NAT confguration for the VM network so my IP is 127.0.0.17.1 Copy the wordcount.jar to the hortonworks hadoop sandbox VM.Change directory to where you exported the wordcount jar. Use the scp command to copy the jar file from your development machine to hortonworks vm.( Windows users can download winscp to load files to the hortonworks vm)“scp -P 2222 wordcount.jar [email protected]:”NB: use user : hue, password: hadoop7.2 Login to h o r tonworks VM d i r e ctly or you c a n use ssh or pu t t y . (User should be hue)To ssh from your development machine use the command below“ssh [email protected] -p 2222”password : hadoop.Removing old folders: (if


View Full Document

UT Dallas CS 6350 - HadoopLecture2015

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download HadoopLecture2015
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HadoopLecture2015 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HadoopLecture2015 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?