DOC PREVIEW
UT Arlington CSE 3302 - MapReduce Lecture Notes

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE 3302Lecture 24: MapReduceApril 27, 2010Nate NystromUTATuesday, April 27, 2010• its = the third person, neuter, possessive pronoun• it’s = a contraction of “it is”Tuesday, April 27, 2010Task parallelism• Intelligent task design eliminates as many synchronization points as possible, but some will be inevitable• Independent tasks can operate on different physical machines in distributed fashion• Good task design requires identifying common data and functionality to move as a unit Tuesday, April 27, 2010Tuesday, April 27, 2010Tuesday, April 27, 2010• 378,000 results in 0.17 seconds• including images and video• communicates with 1000s of machines• web server• index servers• document servers• spell checker• ad system• news, book, images, maps, ...Tuesday, April 27, 2010How?• Indexing• Google has a copy of the web on its servers• Google computes an index mapping search terms to URLs• blue -> { www.foo.com, www.bar.com, ... }• heeler -> { www.bar.com, ... }Tuesday, April 27, 2010How big is the web?Tuesday, April 27, 2010How big is the web?• Many billions of documents• One trillion URLs (2008)• Google has a copy of the web on its serversTuesday, April 27, 2010How many servers?Tuesday, April 27, 2010How many servers?• Google has ~450,000 servers (2007)• 20 megawatts (~12500 homes)• $2M / month for electricity• 20–100 petaflopsTuesday, April 27, 2010Tuesday, April 27, 2010Indexing the web• Want to index billions of documents stored on thousands of machines• ... in a reasonable timeTuesday, April 27, 2010Problem• Large amounts of raw data –– petabytes• web documents• logs• indices• the web graph• web crawl metadata• queries• More data than can fit on single machineTuesday, April 27, 2010SolutionTuesday, April 27, 2010Solution• Functional programming!Tuesday, April 27, 2010MapReduce• Framework for distributed computing on large data sets• Introduced by Google• Jeffrey Dean and Sanjay Ghemawat, OSDI 2004• Many other implementations• for: C++, Python, Erlang, Scala, Java, ...• Based on functional programming•Users provide two functions: map and reduceTuesday, April 27, 2010• 2008:• 100000 MapReduce jobs per day• Average of 400 machines per job• Average of 5–10 minutes per jobTuesday, April 27, 2010MapReduce• Goal: reduce complexity of distributed computation•Programmers just write map and reduce functions• Runtime system takes care of:• parallelizing the computation• data partitioning• scheduling• failures• communication• executing on a clusterTuesday, April 27, 2010Programming model• Computation takes a set of key–value pairs•Outputs a set of key–value pairs• Map:• takes a key–value pair, generates intermediate pairs• Reduce:• merges intermediate values associated with same keyTuesday, April 27, 2010Map• map: (Kin, Vin) => Seq[(Kout, Vtmp)]• Takes a key–value pair of one type• Returns a list of intermediate key–value pairs of another type• Applied in parallel to every item in the input set• produces a list for each callTuesday, April 27, 2010Reduce• reduce : (Kout, Seq[Vtmp]) => Seq[Vout]• Applied in parallel to each group• Typically returns an option type (some value or none)Tuesday, April 27, 2010Counting wordsdef map(name: String, doc: String): Seq[(String,Int)] = { doc.split(“ “).map(w => (w, 1)) }def reduce(word: String, counts: Seq[Int]): (String,Int) = { val count = counts.foldLeft(0)(_ + _) (word, count)}Tuesday, April 27, 2010grav.txt,“A screaming comesacross the sky.”1984.txt,“It was a bright cold dayin April, and the clockswere striking thirteen.”neuro.txt,“The sky above the portwas the color oftelevision, tuned to adead channel.”“a”, 1“screaming”, 1“comes”, 1“across”, 1“the”, 1“sky”, 1“it”, 1“was”, 1“a”, 1“bright”, 1“cold”, 1“day”, 1“in”, 1“april”, 1“and”, 1“the”, 1“clocks”, 1“were”, 1“striking”, 1“thirteen”, 1“the”, 1“sky”, 1“above”, 1“the”, 1“port”, 1“was”, 1“the”, 1“color”, 1“of”, 1“television”, 1“tuned”, 1“to”, 1“a”, 1“dead”, 1“channel”, 1mapTuesday, April 27, 2010“a”, 1“screaming”, 1“comes”, 1“across”, 1“the”, 1“sky”, 1“it”, 1“was”, 1“a”, 1“bright”, 1“cold”, 1“day”, 1“in”, 1“April”, 1“and”, 1“the”, 1“clocks”, 1“were”, 1“striking”, 1“thirteen”, 1“a”, [1,1,1]“above”, [1]“across”, [1]“and”, [1]“april”, [1]“bright”, [1]“channel”, [1]“clocks”, [1]“cold”, [1]“color”, [1]“comes”, [1]“day”, [1]“dead”, [1]“in”, [1]“it”, [1]“of”, [1]“port”, [1]“screaming”, [1]“sky”, [1,1]“striking”, [1]“television”, [1]“the”, [1,1,1,1,1]“thirteen”, [1]“to”, [1]“tuned”, [1]“was”, [1,1]“were”, [1]“the”, 1“sky”, 1“above”, 1“the”, 1“port”, 1“was”, 1“the”, 1“color”, 1“of”, 1“television”, 1“tuned”, 1“to”, 1“a”, 1“dead”, 1“channel”, 1partitionTuesday, April 27, 2010reduce“a”, [1,1,1]“above”, [1]“across”, [1]“and”, [1]“april”, [1]“bright”, [1]“channel”, [1]“clocks”, [1]“cold”, [1]“color”, [1]“comes”, [1]“day”, [1]“dead”, [1]“in”, [1]“it”, [1]“of”, [1]“port”, [1]“screaming”, [1]“sky”, [1,1]“striking”, [1]“television”, [1]“the”, [1,1,1,1,1]“thirteen”, [1]“to”, [1]“tuned”, [1]“was”, [1,1]“were”, [1]“a”, 3“above”, 1“across”, 1“and”, 1“april”, 1“bright”, 1“channel”, 1“clocks”, 1“cold”, 1“color”, 1“comes”, 1“day”, 1“dead”, 1“in”, 1“it”, 1“of”, 1“port”, 1“screaming”, 1“sky”, 2“striking”, 1“television”, 1“the”, 5“thirteen”, 1“to”, 1“tuned”, 1“was”, 2“were”, 1Tuesday, April 27, 2010Parallelism•If each map operation is independent, all mappers can be done in parallel•reduce requires all outputs of map with same key presented to same reducer at same timeTuesday, April 27, 2010Examples• grep• map emits a line if it matches a pattern• reduce copies the inputs to outputTuesday, April 27, 2010Examples• count url access frequency• map outputs (url,1) for each url in log• reduce adds counts and emits (url,


View Full Document

UT Arlington CSE 3302 - MapReduce Lecture Notes

Documents in this Course
Smalltalk

Smalltalk

11 pages

Syntax

Syntax

5 pages

Syntax

Syntax

5 pages

JAVA

JAVA

57 pages

Semantics

Semantics

41 pages

Control

Control

74 pages

Load more
Download MapReduce Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MapReduce Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MapReduce Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?