CMSC424: Database DesignTodayOne thing…Another thing…Motivation: Data OverloadSlide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12DBMS to the RescueStructured vs UnstructuredSlide 15PowerPoint PresentationSlide 17Slide 18Slide 19Slide 20Slide 21Out of scope…What we will cover…Slide 24Slide 25Administrivia BreakSlide 27Slide 28Slide 29SummarySlide 31CMSC424: Database DesignInstructor: Amol Deshpande [email protected]MotivationRole of DBMS in today’s worldSyllabusAdministriviaWorkload etcData management challenges in a very simple applicationWe will also discuss some interesting open problems/research directionsOne thing…No laptop use allowed in the class !!Another thing…I will not be using slides most of the timeYou should take notesBut… you will be okay if you just read the textbookMotivation: Data OverloadThere is a *HUGE* amount of data in this worldEverywhere you see…Personal (emails, data on your computer)EnterpriseBanks, supermarkets, universities, airlines etc etcScientific (biological, astronomical)…Motivation: Data OverloadMuch more is produced every day“More data will be produced in the next year than has been generated during the entire existence of humankind”IBM: “… in 2005, the amount of data will grow from 3.2 million exabytes to 43 million exabytes”[[“total amount of printed material in the world is estimated to br 5 exabytes…”]]Motivation: Data OverloadMuch more is produced every dayWal-mart: 583 terabytes of sales and inventory dataAdds a billion rows every day“we know how many 2.4 ounces of tubes of toothpastes sold yesterday and what was sold with them”Yes we can do it; is there any point to it ?[[“library of congress --> 20 TBs”]]Motivation: Data OverloadMuch more is produced every dayNeilsen Media Research: 20 GB a day; total 80-100 TB From where ???12000 households or personal meters Extending to iPods and TiVos in recent yearsIs there a point beyond telling you what great TV shows you are missing ?Motivation: Data OverloadScientific data is literally astronomical on scale“Wellcome Trust Sanger Institute's World Trace Archive database of DNA sequences hit one billion entries..”Stores all sequence data produced and published by the world scientific community22 Tbytes and doubling every 10 months"Scanning the whole dataset for a single genetic sequence… a lot like searching for a single sentence in the contents of the British Library”Motivation: Data OverloadAutomatically generated data through instrumentation“Britain to log vehicle movements through cameras. 35 million reads per day.”Wireless sensor networks are becoming ubiquitous.RFID: Possible to track every single piece of product throughout its life (Gillette boycott)Motivation: Data OverloadHow do we do anything with this data ?Where and how do we store it ?Disks are doubling every 18 months or so -- not enoughHow do we search through it ?Text search ?“how much time from here to pittsburgh if I start at 2pm ?”Data is there; more will be soon (live traffic data)Motivation: Data OverloadWhat if the disks crash ?Very common, especially if we are talking about 1000’s of disks storing a single systemSpeed !! Imagine a bank and millions of ATMsHow much time does it take you to do a withdrawl ?The data is not localHow do we ensure “correctness” ?Can’t have money disappearingHarder than you might thinkDBMS to the RescueProvide a systematic way to answer most of these questions…Aim is to allow easy management of dataStore it Update itQuery itMassively successful for structured dataWhat do I mean by that ?Structured vs UnstructuredA lot of the data we encounter is structuredSome have very simple structures E.g. Data that can be represented in tabular formsSignficantly easier to deal withWe will actually focus on such data for much of the classAccountbname acct_no balanceDowntownMianusPerryR.HA-101A-215A-102A-305500700400350Customercname cstreet ccityJonesSmithHayesCurryLindsayMainNorthMainNorthParkHarrisonRyeHarrisonRyePittsfieldStructured vs UnstructuredSome data has a little more complicated structureE.g graph structuresMap data, social networks data, the web link structure etcIn many cases, can convert to tabular forms (for storing)Slightly harder to deal withQueries require dealing with the graph structureCollaborations GraphQuery: Find my Erdos Number.Structured vs UnstructuredIncreasing amount of data in a semi-structured formatXML – Self-describing tagsComplicates a lot of thingsWe will discuss this toward the endStructured vs UnstructuredA huge amount of data is unfortunately unstructuredBooks, WWW Amenable to pretty much only text search Information Retreival deals with this topicWhat about Google ?Google is actually successful because it uses the structureDBMS to the RescueProvide a systematic way to answer most of these questions…… for structured data… increasing for semi-structured dataXML database systems have been coming upSolving the same problems for truly unstructured data remains an open problemMuch research in Information Retrieval communityDBMS to the RescueThey are everywhere !!EnterprisesBanks, airlines, universitiesInternetSearchsystems.net lists 35568 public records DBsAmazon, Ebay, IMDBBlogs, social networks…Your computer (emails especially)…DBMS to the RescueOut of scope…How do we guarantee the data will be there 10 years from now ?Much harder than you might thinkPrivacy and security !!!Every other day we see some database leaked on the webNew kinds of data Scientific/biological, Image, Audio/Video, Sensor data etcInteresting research challenges !What we will cover…representing informationdata modelinglanguages and systems for querying datacomplex queries & query semanticsover massive data setsconcurrency control for data manipulationcontrolling concurrent access ensuring transactional semanticsreliable data storagemaintain data semantics even if you pull the plugWhat we will cover…We will see…Algorithms and cost analysesSystem architecture and implementationResource management and schedulingComputer language design, semantics and optimizationApplications of AI topics including
View Full Document