DOC PREVIEW
UT Dallas CS 6350 - homework1

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS6350:Big Data Analytics and Management Spring 2015DUE DATE: Feb 17,2015TA: Gbadebo [email protected] 1 In this homework you will learn how to solve problems using Map Reduce. Please applyHadoop map-reduce to derive some statistics from IMDB movie data. You can find the dataset in elearning. Copy the data into your hadoop cluster and use it as input data. You can use the put or copyFromLocal HDFS shell command to copy those files into your HDFS directory. There are 3 datafiles :: movies.dat, ratings.dat, users.dat Please read the “README” file to know about the data organization and to know about the Attribute of the data. All are very well explained in that README file. In class there will be brief demo/ discussion about that. Please read the questions carefully and use only the data file that you need. Some question may need only users.dat, or some question may need only movies.dat After being familiar with the data - you are required to write efficient Hadoop Map- Reduce programs in Java to find the following information ::Q1 list all male user id whose age is less or equal to 7 .Using the users.dat file, list all the male userid filtering by age. This demonstrates the use of mapreduce to filter data.Q2 Find the count of female and males users in each age group The age distribution is given below (same as in read me file) * 7: "Under 18" * 24: "18-24" * 31: "25-34" * 41: "35-44" * 51: "45-55" * 56: "55-61"* 62: "62+" Use the users.dat file. A sample output is given below//Age Gender and Count7 M 200 24 F 120 where age is 7, gender is male and count is 200.Q3 List all movie title where genre is “fantasy”The genre input must be taken from command line. Use the movies.dat fileNB:To run your jobs use the following synthaxhadoop jar name_of_jar_file Classname <input dir> <output dir> [<extra input paramter>(may be optional due to question e.g genre input)]Submission :: You have to upload your submission via e-learning before due date. Please upload the following to eLearning: 1. Three jar files, one for each problem/ One jar file containing all solutions. 2. Java files which have the source code. 3. ***A Readme text file about how to run your jar file. Give the command to run your jar


View Full Document

UT Dallas CS 6350 - homework1

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download homework1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view homework1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view homework1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?