DOC PREVIEW
UT Dallas CS 6350 - HW4#2015

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS6350: BIG DATA ANALYTICS and MANAGEMENTSpring 2015HW #4Related to: Spark, Data Analytics and RecommendationSystemDue: April 22, 2015This homework consists of two parts. The first part focuses on K-means clustering (data analytics) and the second one focuses on recommendation systems.Q1.Implement the k-means algorithm from the scratch using SCALA and spark. Please use this attached dataset in file Q1_testkmean.txt as input. Your number of cluster K should be 3. Your Scala code will produce output in the following ways:- Print each point and the corresponding cluster it belongs to.- Print the final centroidsQ2. Read the following link for co-occurrence based recommendation implementing in mahout. https://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html Currently Mahout switches from MapReduce to Apache Spark. It has an interactive shell (willshow in the class, lecture contains how to install it). Using that, apply item-based collaborativefiltering using mahout’s spark-itemsimilarity. spark-itemsimilarity can be used to create"other people also liked these things" type recommendations. You can find the dataset in elearning. Copy the data into your hadoop cluster and use it as inputdata. You can use the put or copyFromLocal HDFS shell command to copy those files into yourHDFS directory. There are 3 data files: movies.dat, ratings.dat, users.dat. Please read the“README_Important” file to know about the data organization and to know about the Attributeof the data. All are very well explained in that README_Important file.“A user rates some movies with rating 3. Our task is to recommend some movies to him thathas the similar ratings from other users.” Steps to follow: Read the above link carefully and construct the item-similarity matrix of each movie havingrating 3 (use ratings.dat). The output should be like this: In the above matrix, the first integer is the movie id (The movie for which we recommend),then the rest of the text contains the list of the recommended movies id with their value (movieid: value)1. Save the above file to HDFS. Now, Run Apache spark interactive shell. From the shell,take the user id as input (you can fix the id, e.g., val userID = 20). Now find all the moviesthat he rates with rating 3.2. Load/read the above file (item-similarity file) and find the movies that match with theuser’s rated movies with the key of the item-similarity file.For example, suppose a user has id 20 and he rates movies 120 and 855 as 3. Write the code to extract the movie ids from item-similarity matrix file that are presentin the row for 955 and 123 movies and generate the matrix like following:120 898,951,910,905,1269855 3265,1218,1089,3224,2473. Now replace the movie Id with movieid:movie_name.For example,120:<Movie_Name> 898:<Movie_Name>,951:<Movie_Name>,910:<Movie_Name>,905:<Movie_Name>, 1269:<Movie_Name>855:<Movie_Name> 3265:<Movie_Name>,1218:<Movie_Name>,1089:<Movie_Name>,3224:<Movie_Name>, 247:<Movie_Name>You can apply join if it is necessary. (Use movies.dat and ratings.dat)Note: In, 120:<Movie_Name><Movie_Name> should be replaced with movie id 120. Display without angle brackets.Submission:You have to upload your submission via e-learning before due date. Please upload the following to eLearning: 1. A scripting file like, Q2_1.txt that shows the building of spark-itemsimilarity and anotherscripting file Q2_2.txt shows the scala/java program (contains codes for step 1 - 3).If you use java/scala, then submit all source


View Full Document

UT Dallas CS 6350 - HW4#2015

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download HW4#2015
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HW4#2015 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HW4#2015 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?