Cloud Computing SkepticismOutlineRecent TrendsTremendous BuzzGartner Hype Cycle*Blind men and an ElephantSlide 7Slide 8Slide 9ReliabilityA Comparison of Approaches to Large-Scale Data Analysis*MapReduce – A major step backwardsSlide 13MapReduce II*Tested SystemsData LoadingGrep Task ResultsSelect Task ResultsJoin TaskConcluding RemarksThe Cost of a Cloud: Research Problem in Data Center NetworksOverviewCost of a Cloud?Cost of a CloudAre Clouds any different?Enterprise DC vs Cloud DC (1)Enterprise DC vs Cloud DC (2)Types of Cloud Service DC (1)Types of Cloud Service DC (2)Cost BreakdownServer Cost (1)Server Cost (2)Reducing Server CostInfrastructure CostReducing Infrastructure CostPowerReducing Power CostsNetworkReducing Network CostsPerspectiveCost of Large Scale DCSolutions!Improving DC efficiencyAgilityNetworking in Current DCConventional Network ArchitectureProblems (1)Problems (2)Problems (3)Problems (4)DC Networking: Design ObjectivesIncenting Desirable Behavior (1)Incenting Desirable Behavior (2)Geo-DistributionOptimal Placement & Sizing (1)Optimal Placement & Sizing (2)Geo-Distributing State (1)Geo-Distributing State (2)SummaryOpinionsSlide 61Slide 62Slide 63Slide 64Slide 65Slide 66Slide 67To Cloud or Not to Cloud?ReferencesAbhishek Verma, Saurabh NangiaCloud computing hypeCynicismMapReduce Vs Parallel DBMSCost of a cloudDiscussion2Google App Engine(April 2008)Microsoft Azure(Oct 2008)Facebook Platform(May 2007)Amazon EC2(August 2006)Amazon S3(March 2006)SalesforceAppExchange(March 2006)34Cloud Computing* From http://en.wikipedia.org/wiki/Hype_cycle 56“Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades.”“Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades.”whatis.comDefinition of Cloud Computing7“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. […] The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. […] The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”Larry EllisonDuring Oracle’s Analyst DayFrom http://blogs.wsj.com/biztech/2008/09/25/larry-ellisons-brilliant-anti-cloud-computing-rant/ 8From http://geekandpoke.typepad.com9Many enterprise (necessarily or unnecessarily) set their SLAs uptimes at 99.99% or higher, which cloud providers have not yet been prepared to matchAmazon’s cloud outages receive a lot of exposure …July 20, 2008Failure due to stranded zombies, lasts 5 hoursFeb 15, 2008Authentication overload leads to two-hour service outageOctober 2007Service failure lasts two daysOctober 2006Security breach where users could see other users data… and their current SLAs don’t match those of enterprises*Amazon EC2 99.95% Amazon S3 99.9%* SLAs expressed in Monthly Uptime Percentages; Source : McKinsey & Company• Not clear that all applications require such high services• IT shops do not always deliver on their SLAs but their failures are less public and customers can’t switch easily10Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael StonebrakerTo appear in SIGMOD ‘09*Basic ideas from MapReduce - a major step backwards, D. DeWitt and M. StonebrakerA giant step backwardNo schemas, Codasyl instead of RelationalA sub-optimal implementationUses brute force sequential search, instead of indexingMaterializes O(m.r) intermediate filesDoes not incorporate data skewNot novel at all Represents a specific implementation of well known techniques developed nearly 25 years agoMissing most of the common current DBMS featuresBulk loader, indexing, updates, transactions, integrity constraints, referential Integrity, viewsIncompatible with DBMS toolsReport writers, business intelligence tools, data mining tools, replication tools, database design tools12Architectural ElementParallel Databases MapReduceSchema SupportStructured UnstructuredIndexingB-Trees or Hash basedNoneProgramming ModelRelational CodasylData DistributionProjections before aggregationLogic moved to data, but no optimizationsExecution StrategyPush PullFlexibilityNo, but Ruby on Rails, LINQYesFault ToleranceTransactions have to be restarted in the event of a failureYes: Replication, Speculative execution13MapReduce didn't kill our dog, steal our car, or try and date our daughters.M MapReduce is not a database system, so don't judge it as oneBoth analyze and perform computations on huge datasetsMapReduce has excellent scalability; the proof is Google's useDoes it scale linearly?No scientific evidenceMapReduce is cheap and databases are expensiveWe are the old guard trying to defend our turf/legacy from the young turksPropagation of ideas between sub-disciplines is very slow and sketchy Very little information is passed from generation to generation* http://www.databasecolumn.com/2008/01/mapreduce-continued.html 14Hadoop0.19 on Java 1.6, 256MB block size, JVM reuseRack-awareness enabledDBMS-X (unnamed)Parallel DBMS from a “major relational db vendor”Row based, compression enabledVertica (co-founded by Stonebraker)Column orientedHardware configuration: 100 nodes2.4 GHz Intel Core 2 Duo 4GB RAM, 2 250GB SATA hard disksGigE ports, 128Gbps switching fabric15HadoopCommand line utilityDBMS-XLOAD SQL commandAdministrative command to re-organize dataGrep DatasetRecord = 10b key + 90b random value5.6 million records = 535MB/nodeAnother set = 1TB/cluster16SELECT * FROM Data WHERE field LIKE ‘%XYZ%’;17SELECT pageURL, pageRankFROM Rankings WHERE pageRank > X;18SELECT INTO Temp sourceIP,AVG(pageRank) as avgPageRank,SUM(adRevenue) as totalRevenueFROM Rankings AS R, UserVisits AS UVWHERE R.pageURL = UV.destURLAND UV.visitDate BETWEEN
View Full Document