UT Dallas CS 6350 - homework1 - D3094439

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6350> homework1

DOC PREVIEW

UT Dallas CS 6350 - homework1

School name University of Texas at Dallas

Course Cs 6350- Big Data Management and Analytics

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS6350:Big Data Analytics and Management Spring 2015DUE DATE: Feb 17,2015TA: Gbadebo [email protected] 1 In this homework you will learn how to solve problems using Map Reduce. Please applyHadoop map-reduce to derive some statistics from IMDB movie data. You can find the dataset in elearning. Copy the data into your hadoop cluster and use it as input data. You can use the put or copyFromLocal HDFS shell command to copy those files into your HDFS directory. There are 3 datafiles :: movies.dat, ratings.dat, users.dat Please read the “README” file to know about the data organization and to know about the Attribute of the data. All are very well explained in that README file. In class there will be brief demo/ discussion about that. Please read the questions carefully and use only the data file that you need. Some question may need only users.dat, or some question may need only movies.dat After being familiar with the data - you are required to write efficient Hadoop Map- Reduce programs in Java to find the following information ::Q1 list all male user id whose age is less or equal to 7 .Using the users.dat file, list all the male userid filtering by age. This demonstrates the use of mapreduce to filter data.Q2 Find the count of female and males users in each age group The age distribution is given below (same as in read me file) * 7: "Under 18" * 24: "18-24" * 31: "25-34" * 41: "35-44" * 51: "45-55" * 56: "55-61"* 62: "62+" Use the users.dat file. A sample output is given below//Age Gender and Count7 M 200 24 F 120 where age is 7, gender is male and count is 200.Q3 List all movie title where genre is “fantasy”The genre input must be taken from command line. Use the movies.dat fileNB:To run your jobs use the following synthaxhadoop jar name_of_jar_file Classname <input dir> <output dir> [<extra input paramter>(may be optional due to question e.g genre input)]Submission :: You have to upload your submission via e-learning before due date. Please upload the following to eLearning: 1. Three jar files, one for each problem/ One jar file containing all solutions. 2. Java files which have the source code. 3. ***A Readme text file about how to run your jar file. Give the command to run your jar

View Full Document

UT Dallas CS 6350 - homework1

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UT Dallas CS 6350 - homework1

Sign up for free to view:

Please select your school