DOC PREVIEW
UT Arlington CSE 3302 - Lecture Notes

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE 3302Lecture 11: Map/Reduce7 October 2010Nate NystromUTA• 378,000 results in 0.17 seconds• including images and video• communicates with 1000s of machines• web server• index servers• document servers• spell checker• ad system• news, book, images, maps, ...How?• Indexing• Google has a copy of the web on its servers• Google computes an index mapping search terms to URLs• blue -> { www.foo.com, www.bar.com, ... }• heeler -> { www.bar.com, ... }How big is the web?How big is the web?• Many billions of documents• One trillion URLs (2008)• Google has a copy of the web on its serversHow many servers?How many servers?• Google has ~450,000 servers (2007)• 20 megawatts (~12500 homes)• $2M / month for electricity• 20–100 petaflopsIndexing the web• Want to index billions of documents stored on thousands of machines• ... in a reasonable timeProblem• Large amounts of raw data –– petabytes• web documents• logs• indices• the web graph• web crawl metadata• queries• More data than can fit on single machineSolutionSolution• Functional programming!Functional programming•Programming with pure functions• output of the function depends only on the input•no side effects (I/O, assign to shared variables)•functions are values• can pass a function as argument to another function• can return a function from another function•Functional languages promote functional programming•But, can do functional programming in many languagesMapReduce• Framework for distributed computing on large data sets• Introduced by Google• Jeffrey Dean and Sanjay Ghemawat, OSDI 2004• Many other implementations• for: C++, Python, Erlang, Scala, Java, ...• Based on functional programming•Users provide two functions: map and reduce• 2008:• 100000 MapReduce jobs per day• Average of 400 machines per job• Average of 5–10 minutes per jobMapReduce• Goal: reduce complexity of distributed computation•Programmers just write map and reduce functions• Runtime system takes care of:• parallelizing the computation• data partitioning• scheduling• failures• communication• executing on a clusterProgramming model• Computation takes a set of key–value pairs•Outputs a set of key–value pairs• Map:• takes a key–value pair, generates intermediate pairs• Reduce:• merges intermediate values associated with same keyMap• map: (Kin, Vin) => Seq[(Kout, Vtmp)]• Takes a key–value pair of one type• Returns a list of intermediate key–value pairs of another type• Applied in parallel to every item in the input set• produces a list for each callReduce• reduce : (Kout, Seq[Vtmp]) => Seq[Vout]• Applied in parallel to each group• Typically returns an option type (some value or none)Counting wordsdef map(name: String, doc: String): Seq[(String,Int)] = { doc.split(“ “).map(w => (w, 1)) }def reduce(word: String, counts: Seq[Int]): (String,Int) = { val count = counts.foldLeft(0)(_ + _) (word, count)}grav.txt,“A screaming comesacross the sky.”1984.txt,“It was a bright cold dayin April, and the clockswere striking thirteen.”neuro.txt,“The sky above the portwas the color oftelevision, tuned to adead channel.”“a”, 1“screaming”, 1“comes”, 1“across”, 1“the”, 1“sky”, 1“it”, 1“was”, 1“a”, 1“bright”, 1“cold”, 1“day”, 1“in”, 1“april”, 1“and”, 1“the”, 1“clocks”, 1“were”, 1“striking”, 1“thirteen”, 1“the”, 1“sky”, 1“above”, 1“the”, 1“port”, 1“was”, 1“the”, 1“color”, 1“of”, 1“television”, 1“tuned”, 1“to”, 1“a”, 1“dead”, 1“channel”, 1map“a”, 1“screaming”, 1“comes”, 1“across”, 1“the”, 1“sky”, 1“it”, 1“was”, 1“a”, 1“bright”, 1“cold”, 1“day”, 1“in”, 1“April”, 1“and”, 1“the”, 1“clocks”, 1“were”, 1“striking”, 1“thirteen”, 1“a”, [1,1,1]“above”, [1]“across”, [1]“and”, [1]“april”, [1]“bright”, [1]“channel”, [1]“clocks”, [1]“cold”, [1]“color”, [1]“comes”, [1]“day”, [1]“dead”, [1]“in”, [1]“it”, [1]“of”, [1]“port”, [1]“screaming”, [1]“sky”, [1,1]“striking”, [1]“television”, [1]“the”, [1,1,1,1,1]“thirteen”, [1]“to”, [1]“tuned”, [1]“was”, [1,1]“were”, [1]“the”, 1“sky”, 1“above”, 1“the”, 1“port”, 1“was”, 1“the”, 1“color”, 1“of”, 1“television”, 1“tuned”, 1“to”, 1“a”, 1“dead”, 1“channel”, 1partitionreduce“a”, [1,1,1]“above”, [1]“across”, [1]“and”, [1]“april”, [1]“bright”, [1]“channel”, [1]“clocks”, [1]“cold”, [1]“color”, [1]“comes”, [1]“day”, [1]“dead”, [1]“in”, [1]“it”, [1]“of”, [1]“port”, [1]“screaming”, [1]“sky”, [1,1]“striking”, [1]“television”, [1]“the”, [1,1,1,1,1]“thirteen”, [1]“to”, [1]“tuned”, [1]“was”, [1,1]“were”, [1]“a”, 3“above”, 1“across”, 1“and”, 1“april”, 1“bright”, 1“channel”, 1“clocks”, 1“cold”, 1“color”, 1“comes”, 1“day”, 1“dead”, 1“in”, 1“it”, 1“of”, 1“port”, 1“screaming”, 1“sky”, 2“striking”, 1“television”, 1“the”, 5“thirteen”, 1“to”, 1“tuned”, 1“was”, 2“were”, 1Parallelism•If each map operation is independent, all mappers can be done in parallel•reduce requires all outputs of map with same key presented to same reducer at same timeExamples• grep• map emits a line if it matches a pattern• reduce copies the inputs to outputExamples• count url access frequency• map outputs (url,1) for each url in log• reduce adds counts and emits (url, sum)Examples• reverse web-link graph• map outputs (target,source) pairs for each link to target in url named source• reduce cats the list of sources and returns(target, list(source))Examples• term-vector per host• map emits (hostname, term-vector)• reduce adds term vectors, discarding infrequent terms• inverted index• map parses each doc and emits (word, url)• reduce sorts url list: (word, list(url))• sort• map extracts key from each record, emits (key, record)• reduce emits pairs unchangedImplementation•map distributed across multiple machines by partitioning input data into a set of M splits• input splits


View Full Document

UT Arlington CSE 3302 - Lecture Notes

Documents in this Course
Smalltalk

Smalltalk

11 pages

Syntax

Syntax

5 pages

Syntax

Syntax

5 pages

JAVA

JAVA

57 pages

Semantics

Semantics

41 pages

Control

Control

74 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?