DOC PREVIEW
UW CSE 444 - Query Optimization and Pig Latin

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Section 7 Today’s OverviewSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13SECTION 9Query Optimization and Pig LatinToday’s Overview•Reminders•Project 4 due last day of class at 5pm•Query Optimization•2 examples from previous finals (you can also find these in the relational algebra/query plan worksheet from section 7)•Pig Latin / Map Reduce•1 example from previous final•Project 4•Quiz section evaluations (last ~20 minutes of class)Warm-up•Consider the query R(A,B) join S(C,D) join T(E,F) •the join condition is B=C and D=E and:•M = 100•B(R) = 30•B(S) = 200•B(T) = 60•B(R join S) = 80•B(S join T) = 50. •Design an optimal query plan that uses only main-memory hash join algorithms. Your plan may store intermediate results to disk if necessaryAnswerLoad R & T into memory and create hash tables of them. Then read blocks of S one at a time, performing the joins in the following graph. Hashjoin B=C Hashjoin D = E R S TSummer 2009 Problem 2/3•R(a,b,c) and S(x, y, z)*note this problem is also on the worksheet form section 7Summer 2009 Problem 2/3•Write a SQL query for the plan shown on the previous slide•Optimize the query plan shown on the previous slide, and tell how much you expect the change to improve the performance•Specify good physical plan for joining R and S on R.a = S.z. Give estimated cost of solution in terms of # of disk I/OsPig Latin: Final Summer 2010raw = LOAD 's3n://uw cse444 proj4/excite.log.bz2' USING ‐ ‐PigStorage('\t')KKAS (user, time, query);a = GROUP raw BY user;b = FOREACH a GENERATE group AS user, COUNT(raw) AS n_searches;c = GROUP b ALL;d = FOREACH c GENERATE AVG(b.n_searches), MIN(b.n_searches), MAX(b.n_searches);STORE d INTO '/user/hadoop/answer.txt' USING PigStorage();Questions (a)•What does this program compute and store into the final output file?KKDescribe what the result is, not the details of how it is computed.KK(Hint: “GROUP b ALL” sends all tuples of bag/relation b to a single group.)Answer (a)•The output contains a single row:•avgKKKminKKKmax•where•avg = mean number of searches by one user•min = minimum number of searches by one user•max = maximum number of searches by one user•The actual output when this was run against the excite.log.bz2 data was:•4.705126673040153 1 452Questions (b)•In order to run this program, the Pig system translates it into a sequence of one or more Hadoop Map Reduce ‐jobs.KK•Describe the Map Reduce job(s) needed to execute this ‐program.KKFor each job describe the input and output of each map and reduce phase, including the keys and values at each step.KK•You do not need to guess exactly how Pig would translate the program as long as your answer gives a reasonable implementation as a sequence of map reduce jobs.‐Answer (b)•This program generates two map reduce jobs.‐•Job 1:•MAP•map 1 input:KKkeys = tuple IDs, values = (user, time, query) tuples.•map 1 output:KKkey = user, value = • .KKThe actual value doesn’t matter, it could be the integer 1, a single character, or any other marker to record one search by that user.REDUCE•reduce 1 input:KKkey = user, value = [•], i.e., array of search markers.•reduce 1 output:KKkey = user, value = length of array, i.e., search count for that user.Answer (b)•Job 2:•MAP•map 2 input:KKkey = user, value = search count for that user•map 2 output:KKkey = “x”, value = search count v.KKHere the key value doesn’t matter except that it needs to be the same for all map outputs.•REDUCE•reduce 2 input:KKkey = “x” (same as map 2 output key), value = [v], i.e., array of all individual user search counts.•reduce 2 output:KKaverage, min, and max of values in input array [v].Pig-Latin References•Massively Parallel Data Analysis with MapReduce•http://www.systems.ethz.ch/education/past-courses/hs08/map-reduce/slides/pig.pdf•Intro to


View Full Document

UW CSE 444 - Query Optimization and Pig Latin

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Query Optimization and Pig Latin
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Optimization and Pig Latin and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Optimization and Pig Latin 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?