DOC PREVIEW
UT Dallas CS 6350 - sparktutorial

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

BIG DATA MANAGEMENT AND ANALYTICS cs6350 SPARK TUTORIAL SPARK INSTALLATION It works for both Linux and Windows Operating systems Go to https spark apache org downloads html Chose a package type Prebuilt for Hadoop 2 4 or later Download the spark file Extract the file and change directory to the bin Run spark shell Simple Scala Spark val a Array 1 2 3 2 3 4 5 2 1 2 3 4 3 4 5 val r sc parallelize a val newr r map x x 1 newr reduceByKey collect output will be res6 Array Int Int Array 4 3 1 2 5 2 2 4 3 4 Word count program val in sc textFile beeline in flatMap line line split map word word 1 reduceByKey collectAsMap Filter commands filter by zipcode val lines sc textFile users dat val ln readLine API to take input from command line val linesZipcode lines filter line line contains ln map line line split map line line 0 collect Finding average Find top 10 average rated movies with descending order of rating val lines sc textFile ratings dat val sumratings lines map line line split map line line 1 line 2 toDouble reduceByKey val counts lines map line line split map line line 1 1 reduceByKey Defining Functions in scala def addInt a Int b Int Int var sum Int 0 sum a b return sum Applying functions val a Array 1 2 3 2 3 4 5 2 1 2 3 4 3 4 5 val r sc parallelize a r map x addInt x x collect Stand alone scala programs Create a folder structure as show below simple sbt src src main src main scala src main scala SimpleApp scala In SimpleApp scala Write your code package org apache spark examples streaming SimpleApp scala import org apache spark SparkContext import org apache spark SparkContext import org apache spark SparkConf import java util Properties object SimpleApp def main args Array String In Simple sbt Add the meta information like main class name Simple Project version 1 0 scalaVersion 2 10 4 libraryDependencies org apache spark spark core 1 3 0 mainClass in Compile run Some org apache spark examples streaming Si Run the sbt command to package and run your code sbt bin sbt run sbt bin sbt package sbt bin sbt clean


View Full Document

UT Dallas CS 6350 - sparktutorial

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download sparktutorial
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view sparktutorial and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view sparktutorial and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?