UT Dallas CS 6350 - HadoopLecture2015 - D3094399

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6350> HadoopLecture2015

DOC PREVIEW

UT Dallas CS 6350 - HadoopLecture2015

School name University of Texas at Dallas

Course Cs 6350- Big Data Management and Analytics

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Developing a Simple Map-Reduce Programfor HadoopThe University of Texas at DallasBig Data Course CS6350Professor: Dr. Latifur Khan TA: Gbadebo Ayoade([email protected])Release Date: Spring 2015Content courtesyofMohammad Ridwanur Rahman Mohammad Ali Ghaderi Revised by Gbadebo AyoadeIntroductionThe purpose of this document is to help those students who are not familiar with Hadoop to develop their frstMap-Reduce program for Hadoop.So far from HW-0 :So far from HW-0 we have a hadoop cluster in our machine and we know how to run a jar. But next questions comes in is -- How to write a map-reduce program ?- How to get the jar of the map-reduce program ?- We will demonstrate that and explain the WordCount example code.The processWe assume that you already have Hadoop on your own machine, and now you are ready to develop your frstHadoop program. This document based on Ubuntu 14.04 andHadoop 2.6.0. In the following, we will discuss the steps in details.1. Preparing the IDEHadoop programs are Java programs. You may use any Java IDE such as Eclipse, NetBeans, IntelliJ IDEA to develop your Map-Reduce program. We are going to use Eclipse in this document. If you have Eclipse on your own machine, you can skip this section.To install Eclipse, you can run this command in the shell.sudo apt-get install eclipseWait for it to be downloaded. Then use “eclipse” command to run the environment.eclipseThe default workspace should be good. Click on OK. Then goto Workbench.2. New Java ProjectHadoop projects are simple Java projects. Create a new Java project.Write project name as “MyWordCount” and click on Finish to create the project.3. Creatingmain fleCreate a new fle named “WordCount.java” and write the following lines there:import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {public static class Mapextends Mapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text(); // type of output keypublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString()); // line tostring tokenwhile (itr.hasMoreTokens()) {word.set(itr.nextToken()); // set word as each input keywordcontext.write(word, one); // create a pair <keyword, 1>}}}public static class Reduceextends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0; // initialize the sum for each keywordfor (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result); // create a pair <keyword, number of occurences>}}// Driver programpublic static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();// get all argsif (otherArgs.length != 2) {System.err.println("Usage: WordCount <in> <out>");System.exit(2);}// create a job with name "wordcount"Job job = new Job(conf, "wordcount");job.setJarByClass(WordCount.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class);// uncomment the following line to add the Combinerjob.setCombinerClass(Reduce.class);// set output key type job.setOutputKeyClass(Text.class);// set output value type job.setOutputValueClass(IntWritable.class);//set the HDFS path of the input dataFileInputFormat.addInputPath(job, new Path(otherArgs[0]));// set the HDFS path for the outputFileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//Wait till job completionSystem.exit(job.waitForCompletion(true) ? 0 : 1);}}4. Please download hadoop to your development machinePlease download hadoop to your development machine. This is required to get the dependent jar fles for hadoop compilation.http://mirror.tcpdiag.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz 5. Adding Hadoop reference (Very important)In order to compile Hadoop projects, you need to add Hadoop library as a reference to your projects. Right click on the project. Select “Build Path” -> “Configure Build Paths”, select “Libraries” tab.Click on “Add External JARs…” to continue. Find “hadoop-mapreduce-client-core-2.6.0.jar” in<Your hadoop folder>/share/hadoop/mapreduce folder, and add it.Click on “Add External JARs…”again. Find “hadoop-common-2.6.0.jar” in<Your hadoop folder>/share/hadoop/common folder, and add it.You need also add “commons-cli-1.2.jar” in the folder <Your hadoop folder>/share/hadoop/common/lib folder.Your build path confguration should be similar to this screen now:6. Creating the JAR fle for HadoopAll you need to do now is to create the JAR fle and run it in Hadoop. Right click on the project, and choose“export”.Then use “Browse…” button in front of the “JAR fle:” label to specify the name of the export fle.For example, you may use “/home/user/WordCountSample/wordcount.jar” (you can use any other path)Now, it should be two files inside WordCountSample folder:7. Executing the example in HadoopStart the hortonworks VM as shown in the installation document. Ensure the VM is properly started. Get the IP of the VM following the steps in the installation document.I am using the NAT confguration for the VM network so my IP is 127.0.0.17.1 Copy the wordcount.jar to the hortonworks hadoop sandbox VM.Change directory to where you exported the wordcount jar. Use the scp command to copy the jar file from your development machine to hortonworks vm.( Windows users can download winscp to load files to the hortonworks vm)“scp -P 2222 wordcount.jar [email protected]:”NB: use user : hue, password: hadoop7.2 Login to h o r tonworks VM d i r e ctly or you c a n use ssh or pu t t y . (User should be hue)To ssh from your development machine use the command below“ssh [email protected] -p 2222”password : hadoop.Removing old folders: (if

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

UT Dallas CS 6350 - HadoopLecture2015

Sign up for free to view:

Please select your school