DOC PREVIEW
Berkeley COMPSCI 61A - Lecture 36

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

61A Lecture 36Wednesday, November 30Tuesday, November 29, 2011Project 4 Contest GalleryPrizes will be awarded for the winning entry in each of the following categories.•Featherweight. At most 128 words of Logo, not including comments and delimiters.•Heavyweight. At most 1024 words of Logo, not including comments and delimiters.Winners will be selected by popular vote! (Homework 13)•Static images of the output of your programs•Tonight at midnight: I'll post your Logo implementations!Run them to see these images evolve!•I will also post a solution to the Logo projectIt runs (almost) all of the contest entriesYou can use it as a study guide for the final2(Demo)Tuesday, November 29, 2011MapReduceMapReduce is a framework for batch processing of Big DataWhat does that mean?•Framework: A system used by programmers to build applications•Batch processing: All the data is available at the outset and results aren't consumed until processing completes•Big Data: A buzzword used to describe datasets so large that they reveal facts about the world via statistical analysis(Demo)The big ideas that underly MapReduce:•Datasets are too big to be stored or analyzed on one machine•When using multiple machines, systems issues abound•Pure functions enable an abstraction barrier between data processing logic and distributed system administration3BonusMaterialTuesday, November 29, 2011SystemsSystems research enables the development of applications by defining and implementing abstractions:•Operating systems provide a stable, consistent interface to unreliable, inconsistent hardware•Networks provide a simple, robust data transfer interface to constantly evolving communications infrastructure•Databases provide a declarative interface to software that stores and retrieves information efficiently•Distributed systems provide a single-entity-level interface to a cluster of multiple machinesA unifying property of effective systems:Hide complexity, but retain flexibility4Tuesday, November 29, 2011The Unix Operating SystemEssential features of the Unix operating system (and variants)•Portability: The same operating system on different hardware•Multi-Tasking: Many processes run concurrently on a machine•Plain Text: Data is stored and shared in text format•Modularity: Small tools are composed flexibly via pipes5standard inputstandard outputprocessstandard errorThe standard streams in a Unix-like operating systemare conceptually similar to Python iteratorsText inputText output(Demo)Tuesday, November 29, 2011Python Programs in a Unix EnvironmentThe built-in input function reads a line from standard input.The built-in print function writes a line to standard output.6(Demo)The values sys.stdin and sys.stdout also provide access to the Unix standard streams as "files."A Python "file" is an interface that supports iteration, read, and write methods.Using these "files" takes advantage of the operating system standard stream abstraction.(Demo)Tuesday, November 29, 2011MapReduce Evaluation ModelMap phase: Apply a mapper function to inputs, emitting a set of intermediate key-value pairs•The mapper takes an iterator over inputs, such as text lines.•The mapper yields 0 or more key-value pairs per input.7Reduce phase: For each intermediate key, apply a reducer function to accumulate all values associated with that key•The reducer takes an iterator over key-value pairs.•All pairs with a given key are consecutive•The reducer yields 0 or more values for a key,each associated with that intermediate key.mapperGoogle MapReduceIs a Big Data frameworkFor batch processingo: 2a: 1u: 1e: 3i: 1a: 4e: 1o: 1a: 1o: 2e: 1i: 1Tuesday, November 29, 2011reducere: 5reducera: 6MapReduce Evaluation Model8mapperGoogle MapReduceIs a Big Data frameworkFor batch processingo: 2a: 1u: 1e: 3i: 1a: 4e: 1o: 1a: 1o: 2e: 1i: 1a: 4a: 1a: 1e: 1e: 3e: 1...i: 2o: 5u: 1Reduce phase: For each intermediate key, apply a reducer function to accumulate all values associated with that key•The reducer takes an iterator over key-value pairs.•All pairs with a given key are consecutive•The reducer yields 0 or more values for a key,each associated with that intermediate key.Tuesday, November 29, 2011Above-the-Line: Execution model9http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0007.htmlTuesday, November 29, 2011Below-the-Line: Parallel Execution10http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0008.htmlA "task" is a Unix process running on a machineMap phase Reduce phaseShuffleTuesday, November 29, 2011Python Examples of a MapReduce ApplicationThe mapper and reducer are both self-contained Python programs•Read from standard input and write to standard output!11#!/usr/bin/env python3import sysfrom ucb import mainfrom mr import emit@maindef run(): for line in sys.stdin: emit_vowels(line)def emit_vowels(line): for vowel in 'aeiou': count = line.count(vowel) if count > 0: emit(vowel, count)MapperThe emit function outputs a key and value as a line of text to standard outputMapper inputs are lines of text provided to standard inputTell Unix: this is PythonTuesday, November 29, 2011Python Examples of a MapReduce ApplicationThe mapper and reducer are both self-contained Python programs•Read from standard input and write to standard output!12#!/usr/bin/env python3import sysfrom ucb import mainfrom mr import emit, values_by_keyReducer@maindef run(): for key, value_iterator in values_by_key(sys.stdin): emit(key, sum(value_iterator))Takes and returns iteratorsInput: lines of text representing key-value pairs, grouped by keyOutput: Iterator over (key, value_iterator) pairs that give all values for each keyTuesday, November 29, 2011What Does the MapReduce Framework ProvideFault tolerance: A machine or hard drive might crash•The MapReduce framework automatically re-runs failed tasks.Speed: Some machine might be slow because it's overloaded•The framework can run multiple copies of a task and keep the result of the one that finishes first.Network locality: Data transfer is expensive•The framework tries to schedule map tasks on the machines that hold the data to be processed.Monitoring: Will my job finish before dinner?!?•The framework provides a web-based interface describing jobs.13Tuesday, November 29,


View Full Document

Berkeley COMPSCI 61A - Lecture 36

Documents in this Course
Lecture 1

Lecture 1

68 pages

Midterm

Midterm

5 pages

Midterm

Midterm

6 pages

Lecture 35

Lecture 35

250 pages

Lecture 14

Lecture 14

125 pages

Lecture 2

Lecture 2

159 pages

Lecture 6

Lecture 6

113 pages

Lecture 3

Lecture 3

162 pages

Homework

Homework

25 pages

Lecture 13

Lecture 13

117 pages

Lecture 29

Lecture 29

104 pages

Lecture 11

Lecture 11

173 pages

Lecture 7

Lecture 7

104 pages

Midterm

Midterm

6 pages

Midterm

Midterm

6 pages

Lecture 8

Lecture 8

108 pages

Lab 4

Lab 4

4 pages

Lecture 7

Lecture 7

52 pages

Lecture 20

Lecture 20

129 pages

Lecture 15

Lecture 15

132 pages

Lecture 9

Lecture 9

95 pages

Lecture 30

Lecture 30

108 pages

Lecture 17

Lecture 17

106 pages

Load more
Download Lecture 36
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 36 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 36 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?