DOC PREVIEW
Berkeley COMPSCI 61C - Map Reduce Lecture 2

This preview shows page 1-2-3-4-5-33-34-35-36-66-67-68-69-70 out of 70 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1ReviewNew-School Machine Structures (It’s a bit more complicated!)AgendaRequest-Level Parallelism (RLP)Google Query-Serving ArchitectureAnatomy of a Web SearchAnatomy of a Web SearchSlide 9Anatomy of a Web SearchData-Level Parallelism (DLP)Problem Trying To SolveMapReduce SolutionData-Parallel “Divide and Conquer” (MapReduce Processing)MapReduce ExecutionMapReduce Popularity at GoogleGoogle Uses MapReduce For …AgendaCourse InformationThis WeekCourse OrganizationEECS Grading PolicyLate PolicyPolicy on Assignments and Independent WorkSlide 26The Rules (and we really mean it!)Architecture of a Lecture61C in the NewsSlide 30The Secret to Getting Good GradesMapReduce Processing Example: Count Word OccurrencesSlide 33MapReduce ProcessingMapReduce ProcessingMapReduce ProcessingMapReduce ProcessingMapReduce ProcessingMapReduce ProcessingMapReduce ProcessingMapReduce Processing Time LineShow MapReduce Job RunningSlide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Another Example: Word Index (How Often Does a Word Appear?)MapReduce Failure HandlingMapReduce Redundant ExecutionSlide 58MapReduce Locality OptimizationAgendaDesign Goals of a WSCWSC Case Study Server ProvisioningCost of WSCWSC Case Study Capital Expenditure (Capex)Cost of WSCWSC Case Study Operational Expense (Opex)How much does a watt cost in a WSC?Replace Rotating Disks with Flash?Replace Rotating Disks with Flash?WSC Case Study Operational Expense (Opex)January 2011 AWS Instances & PricesSummaryCS 61C: Great Ideas in Computer Architecture (Machine Structures)Instructors:Randy H. KatzDavid A. Pattersonhttp://inst.eecs.Berkeley.edu/~cs61c/sp111Apeinf 2011 -- Lecture #201/14/2019Review•CS61c: Learn 5 great ideas in computer architecture to enable high performance programming via parallelism, not just learn C1. Layers of Representation/Interpretation2. Moore’s Law3. Principle of Locality/Memory Hierarchy4. Parallelism5. Performance Measurement and Improvement6. Dependability via Redundancy•Post PC Era: Parallel processing, smart phones to WSC•WSC SW must cope with failures, varying load, varying HW latency bandwidth•WSC HW sensitive to cost, energy efficiency2Spring 2011 -- Lecture #101/14/2019New-School Machine Structures(It’s a bit more complicated!)•Parallel RequestsAssigned to computere.g., Search “Katz”•Parallel ThreadsAssigned to coree.g., Lookup, Ads•Parallel Instructions>1 instruction @ one timee.g., 5 pipelined instructions•Parallel Data>1 data item @ one timee.g., Add of 4 pairs of words•Hardware descriptionsAll gates @ one time01/14/2019 Spring 2011 -- Lecture #1 3SmartPhoneWarehouse Scale ComputerSoftware HardwareHarnessParallelism &Achieve HighPerformanceLogic Gates CoreCoreCoreCore… Memory (Cache) Memory (Cache)Input/OutputInput/OutputComputerMain MemoryMain MemoryCore Instruction Unit(s) Instruction Unit(s) FunctionalUnit(s) FunctionalUnit(s)A3+B3A2+B2A1+B1A0+B0Today’s LectureAgenda•Request and Data Level Parallelism•Administrivia + 61C in the News + Internship Workshop + The secret to getting good grades at Berkeley•MapReduce•MapReduce Examples•Technology Break•Costs in Warehouse Scale Computer (if time permits)01/14/2019 Apeinf 2011 -- Lecture #2 4Request-Level Parallelism (RLP)•Hundreds or thousands of requests per second–Not your laptop or cell-phone, but popular Internet services like Google search–Such requests are largely independent•Mostly involve read-only databases•Little read-write (aka “producer-consumer”) sharing•Rarely involve read–write data sharing or synchronization across requests•Computation easily partitioned within a request and across different requests01/14/2019 Apeinf 2011 -- Lecture #2 5Google Query-Serving Architecture01/14/2019 Apeinf 2011 -- Lecture #2 6Anatomy of a Web Search•Google “Randy H. Katz”–Direct request to “closest” Google Warehouse Scale Computer–Front-end load balancer directs request to one of many arrays (cluster of servers) within WSC–Within array, select one of many Google Web Servers (GWS) to handle the request and compose the response pages–GWS communicates with Index Servers to find documents that contain the search words, “Randy”, “Katz”, uses location of search as well–Return document list with associated relevance score01/14/2019 Apeinf 2011 -- Lecture #2 7Anatomy of a Web Search•In parallel,–Ad system: books by Katz at Amazon.com–Images of Randy Katz•Use docids (document IDs) to access indexed documents •Compose the page–Result document extracts (with keyword in context) ordered by relevance score–Sponsored links (along the top) and advertisements (along the sides)01/14/2019 Apeinf 2011 -- Lecture #2 801/14/2019 Apeinf 2011 -- Lecture #2 9Anatomy of a Web Search•Implementation strategy–Randomly distribute the entries–Make many copies of data (aka “replicas”)–Load balance requests across replicas•Redundant copies of indices and documents–Breaks up hot spots, e.g., “Justin Bieber”–Increases opportunities for request-level parallelism–Makes the system more tolerant of failures01/14/2019 Apeinf 2011 -- Lecture #2 10Data-Level Parallelism (DLP)•2 kinds–Lots of data in memory that can be operated on in parallel (e.g., adding together 2 arrays)–Lots of data on many disks that can be operated on in parallel (e.g., searching for documents)•March 1 lecture and 3rd project does Data Level Parallelism (DLP) in memory•Today’s lecture and 1st project does DLP across 1000s of servers and disks using MapReduce01/14/2019 Apeinf 2011 -- Lecture #2 11Problem Trying To Solve•How process large amounts of raw data (crawled documents, request logs, …) every day to compute derived data (inverted indicies, page popularity, …) when computation conceptually simple but input data large and distributed across 100s to 1000s of servers so that finish in reasonable time?•Challenge: Parallelize computation, distribute data, tolerate faults without obscuring simple computation with complex code to deal with issues•Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, Jan 2008. 01/14/2019 Apeinf 2011 -- Lecture #2 12MapReduce Solution•Apply Map function to user supplier record of key/value pairs•Compute set of intermediate key/value pairs•Apply


View Full Document

Berkeley COMPSCI 61C - Map Reduce Lecture 2

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download Map Reduce Lecture 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Map Reduce Lecture 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Map Reduce Lecture 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?