DOC PREVIEW
U of I CS 525 - Cloud Programming

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hyun Duk Kim and Chia-Chi LinJanuary 16, 20101 Introduction Pig Latin DryadLINQ Comparison between Pig Latin and DryadLINQ Wave computing Related work Discussion2 Huge Amount of data analysisEspecially web service companies Need of parallel/distributed system Parallel DB Expensive at web scale, Limited SQL3 Map/Reduce◦ More procedural programming model. Popular cloud computing environment Emergence of parallel computing tools◦ Ease of programming User can just submit tasks in the specific form,then tools execute them in distributed manner. Ex. Hadoop, Dryad, …4- Too low-level, Rigid- Hard to maintain, Hard to reuse code- Re-implement common queries- Poor debugging environment- Redundant computing- Load imbalance- Success rate vs. Window sizePower of programming Optimization across jobsPig Latin, DryadLINQ Wave Computing5Christopher Olston, Benjamin Reed, Utkarsh Srivastava,Ravi Kumar, and Andrew Tomkins6DeclarativeSQL Low-level, ProceduralMap/reducePig Latin7 Find the users who tend to visit high-pagerank pagesSQLSELECT user FROM visits, user WHERE avgpr > 0.6IN ( SELECT user, AVG(pagerank) … one nested SQL queryPig LatinV_p = JOIN visits BY url, pages BY url;Users = GROUP v_p BY user;Useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; … sequence of commandsJava Map/Reducepublic static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {… more than 100 lines8 Execution engine on atop Hadoop Open source project  Mainly developing/using in YahooHadoopPigM/R codePig Latin code9 Find the users who tend to visit high-pagerankpagesURL Category PageRankcnn.com News 0.9bbc.com News 0.8flickr.com Photos 0.7espn.com Sports 0.9Visits URL InfoUserURLTimeAmycnn.com8:00Amybbc.com10:00Amyflickr.com10:05Fredcnn.com12:0010visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;answer = FILTER useravg BY avgpr > „0.5‟; 11visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);V_p = JOIN visits BY url, pages BY url;Users = GROUP v_p BY user;Useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; visits: (Amy, cnn.com, 8am)(Amy, frogs.com, 9am)(Fred, snails.com, 11am)pages: (cnn.com, 0.8)(frogs.com, 0.8)(snails.com, 0.3)12visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;Users = GROUP v_p BY user;Useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; visits: (Amy, cnn.com, 8am)(Amy, frogs.com, 9am)(Fred, snails.com, 11am)pages: (cnn.com, 0.8)(frogs.com, 0.8)(snails.com, 0.3)v_p: (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8)(Fred, snails.com, 11am, snails.com, 0.3)13visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;Useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; v_p: (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8)(Fred, snails.com, 11am, snails.com, 0.3)users: (Amy, { (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8) } )(Fred, { (Fred, snails.com, 11am, snails.com, 0.3) } )14visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;Useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; v_p: (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8)(Fred, snails.com, 11am, snails.com, 0.3)users: (Amy, { (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8) } )(Fred, { (Fred, snails.com, 11am, snails.com, 0.3) } )Nested data model15visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; users: (Amy, { (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8) } )(Fred, { (Fred, snails.com, 11am, snails.com, 0.3) } )useravg: (Amy, 0.8)(Fred, 0.3)16visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;Answer = FILTER useravg BY avgpr > „0.5‟; users: (Amy, { (Amy, cnn.com, 8am, cnn.com, 0.8)(Amy, frogs.com, 9am, frogs.com, 0.8) } )(Fred, { (Fred, snails.com, 11am, snails.com, 0.3) } )useravg: (Amy, 0.8)(Fred, 0.3)Can use any UDFs17visits = LOAD „visits.txt‟ AS (user, url, time);pages = LOAD „pages.txt‟ AS (url, pagerank);v_p = JOIN visits BY url, pages BY url;users = GROUP v_p BY user;useravg = FOREACH users GENERATE group, AVG(v_p.pagerank) AS avgpr;answer = FILTER useravg BY avgpr > „0.5‟; useravg: (Amy, 0.8)(Fred, 0.3)answer: (Amy, 0.8)18Load visitsLoad pagesJoin by urlGroup by userForeach categorygenerate avg…19Load visitsLoad pagesJoin by urlGroup by userForeach categorygenerate avgMap1Reduce1Reduce2Map2Every group or join operation forms a map-reduce boundaryOther operations pipelined into map and reduce phases…20 Atom„alice‟ Tuple(„alice‟ , „lakers‟) Bag(„alice‟, „lakers‟)(„alice‟, („iPod‟, „apple‟)) Map[ „age‟  20 ] Nested Data Model(Amy, { (Amy, cnn.com, 8am, cnn.com, 0.8) (Amy, frogs.com, 9am, frogs.com, 0.8) } )21 Specifying Input Data: LOAD Per-tuple Processing: FOREACH Discarding Unwanted Data: FILTER Getting Related Data Together: COGROUP Other Commends◦ UNION, CROSS, ORDER, DISTINCT Asking for Output: STOREVery Similar to SQL commands22 Pig Pen23 “Safe” optimizer Performs only


View Full Document

U of I CS 525 - Cloud Programming

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download Cloud Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cloud Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cloud Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?