DOC PREVIEW
U of I CS 525 - Coud Computing I

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Cloud Computing - ICloud ComputingMapReduce: A group-by-aggregateShortcomingsPig Latin: A Not-So-Foreign Language for Data ProcessingPig PhilosophyFeaturesPig LatinExample Data Analysis TaskData FlowIn Pig LatinQuick Start and InteroperabilityOptional SchemasUDFs as First-class citizensOperatorsCOGROUP Vs JOINCompilation into MapReduceDebugging EnvironmentFuture WorkDryadLINQ: A System for General Purpose Distributed Data-Parallel Computing Using a High-Level LanguageDryad System ArchitectureLINQDryadLINQ ConstructsDryad + LINQ = DryadLINQDryadLINQ Execution OverviewSystem ImplementationStatic OptimizationsDynamic OptimizationsSlide 29EvaluationSlide 31Main BenefitsDiscussionComparisonImproving MapReduce Performance in Heterogeneous EnvironmentsHadoop Speculative Execution OverviewHadoop’s AssumptionsBreaking Down the AssumptionsSlide 39Slide 40LATE SchedulerSlide 42Performance Comparison Without StragglersPerformance Comparison With StragglersSlide 45SensitivitySlide 47TakeawaysFurther questionsPresenters: Abhishek Verma, Nicolas ZeaMap ReduceClean abstractionExtremely rigid 2 stage group-by aggregationCode reuse and maintenance difficultGoogle → MapReduce, SawzallYahoo → Hadoop, Pig LatinMicrosoft → Dryad, DryadLINQImproving MapReduce in heterogeneous environmentk1v1k2v2k1v3k2v4k1v5mapk1v1k1v3k1v5k2v2k2v4OutputrecordsmapreducereduceInputrecordsSplitSplitshufek1v1k1v3k2v2Local QSortk1v5k2v4Extremely rigid data flowOther flows hacked in Stages Joins SplitsCommon operations must be coded by handJoin, filter, projection, aggregates, sorting,distinctSemantics hidden inside map-reduce fnsDifficult to maintain, extend, and optimizeM RM RM RChristopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew TomkinsResearchPigs Eat AnythingCan operate on data w/o metadata : relational, nested, or unstructured.Pigs Live AnywhereNot tied to one particular parallel frameworkPigs Are Domestic AnimalsDesigned to be easily controlled and modified by its users.UDFs : transformation functions, aggregates, grouping functions, and conditionals.Pigs FlyProcesses data quickly(?)6Dataflow languageProcedural : different from SQLQuick Start and InteroperabilityNested Data ModelUDFs as First-Class CitizensParallelism RequiredDebugging Environment7Data ModelAtom : 'cs'Tuple: ('cs', 'ece', 'ee')Bag: { ('cs', 'ece'), ('cs')}Map: [ 'courses' → { ('523', '525', '599'}]ExpressionsFields by position $0Fields by name f1,Map Lookup #8Find the top 10 most visited pages in each categoryURLCategoryPageRankcnn.com News 0.9bbc.com News 0.8flickr.com Photos 0.7espn.com Sports 0.9Visits URL InfoUser URL TimeAmy cnn.com 8:00Amy bbc.com 10:00Amy flickr.com 10:05Fred cnn.com 12:00Load VisitsGroup by urlForeach urlgenerate countLoad Url InfoJoin on urlGroup by categoryForeach categorygenerate top10 urlsvisits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);visitCounts = join visitCounts by url, urlInfo by url;gCategories = group visitCounts by category;topUrls = foreach gCategories generate top(visitCounts,10);store topUrls into ‘/data/topUrls’;visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);visitCounts = join visitCounts by url, urlInfo by url;gCategories = group visitCounts by category;topUrls = foreach gCategories generate top(visitCounts,10);store topUrls into ‘/data/topUrls’;Operates directly over filesvisits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);visitCounts = join visitCounts by url, urlInfo by url;gCategories = group visitCounts by category;topUrls = foreach gCategories generate top(visitCounts,10);store topUrls into ‘/data/topUrls’;Schemas 0ptional can be assigned dynamicallyvisits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);visitCounts = join visitCounts by url, urlInfo by url;gCategories = group visitCounts by category;topUrls = foreach gCategories generate top(visitCounts,10);store topUrls into ‘/data/topUrls’;UDFs can be used in every constructLOAD: specifying input dataFOREACH: per-tuple processingFLATTEN: eliminate nestingFILTER: discarding unwanted dataCOGROUP: getting related data togetherGROUP, JOINSTORE: asking for outputOther: UNION, CROSS, ORDER, DISTINCT15Every group or join operation forms a map-reduce boundaryOther operations pipelined into map and reduce phasesLoad VisitsGroup by urlForeach urlgenerate countLoad Url InfoJoin on urlGroup by categoryForeach categorygenerate top10 urlsMap1Reduce1Map2Reduce2Map3Reduce3Write-run-debug cycleSandbox datasetObjectives:RealismConcisenessCompletenessProblems:UDFs18Optional “safe” query optimizerPerforms only high-confidence rewritesUser interfaceBoxes and arrows UIPromote collaboration, sharing code fragments and UDFsTight integration with a scripting languageUse loops, conditionals of host languageYuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu,Ulfar Erlingsson, Pradeep Kumar Gunda, Jon CurreyFiles, TCP, FIFO, NetworkFiles, TCP, FIFO, Networkjob scheduledata planecontrol planeNSNSPDPDPDPDPDPDVV VJob manager clusterCollection<T> collection;bool IsLegal(Key);string Hash(Key);var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};PartitionCollectionC# objectsPartitioning: Hash, Range, RoundRobinApply, ForkHintsCollection<T> collection;bool IsLegal(Key k);string Hash(Key);var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};C#collectionresultsC# C# C#VertexcodeQueryplan(Dryad job)DataDryadLINQClient machine(11)Distributed query planC#Query ExprData centerOutput TablesResultsInput TablesInvokeQueryOutput


View Full Document

U of I CS 525 - Coud Computing I

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download Coud Computing I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Coud Computing I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Coud Computing I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?