DOC PREVIEW
Berkeley COMPSCI 186 - Introduction to Database Systems

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS186: Introduction toDatabase SystemsJoe Hellersteinand Christopher OlstonFall 2005Queries for Today• What?• Why?• Who?• How?• For instance?What: Database Systems ThenWhat: Database Systems TodayWhat: Database Systems TodayWhat: Database Systems Today2What: Database Systems TodaySo… What is a Database?• We will be broad in our interpretation• A Database:– A very large, integrated collection of data.• Typically models a real-world “enterprise”– Entities (e.g., teams, games)– Relationships (e.g. The A’s are playing in the World Series)• Might surprise you how flexible this is– Web search:• Entities: words, documents• Relationships: word in document, document links to document.– P2P filesharing:• Entities: words, filenames, hosts• Relationships: word in filename, file available at hostWhat is a Database Management System?• A Database Management System (DBMS) is:– A software system designed to store, manage,and facilitate access to databases.• Typically this term used narrowly– Relational databases with transactions• E.g. Oracle, DB2, SQL Server– Mostly because they predate other largerepositories• Also because of technical richness– When we say DBMS in this class we will usuallyfollow this convention• But keep an open mind about applying the ideas!What: Is the WWW a DBMS?• Fairly sophisticated search available– Crawler indexes pages on the web– Keyword-based search for pages• But, currently– data is mostly unstructured and untyped– search only:• can’t modify the data• can’t get summaries, complex combinations of data– few guarantees provided for freshness of data, consistencyacross data items, fault tolerance, …– Web sites typically have a (relational) DBMS in thebackground to provide these functions.• The picture is changing quickly– Information Extraction to get structure from unstructured– New standards e.g., XML, Semantic Web can help datamodelingWhat: “Search” vs. Query• What if you wanted tofind out which actorsdonated to John Kerry’spresidential campaign?• Try “actors donated tojohn kerry” in yourfavorite search engine.• If it isn’t “published”, it can’t be searched!What: A “Database Query” Approach3“Yahoo Actors” JOIN “FECInfo”(Courtesy of the Telegraph research group @Berkeley)Q: Did it Work?• Thought Experiment 2:–You’re updating a file.–The power goes out.–Which changes survive?A) YoursB) Partner’s C) Both D) Neither E) ???A) All B) None C) All Since Last Save D) ??? What: Is a File System a DBMS?• Thought Experiment 1:– You and your project partner are editing thesame file.– You both save it at the same time.– Whose changes survive?• Thought Experiment 2:–You’re updating a file.–The power goes out.–Which changes survive?A) YoursB) Partner’s C) Both D) Neither E) ???A) All B) None C) All Since Last Save D) ??? What: Is a File System a DBMS?• Thought Experiment 1:– You and your project partner are editing thesame file.– You both save it at the same time.– Whose changes survive?Q: How do you write programs over a subsystem when it promises you only “???” ?A: Very, very carefully!!OS Support for Data Management• Data can be stored in RAM– this is what every programming languageoffers!– RAM is fast, and random access– Isn’t this heaven?• Every OS includes a File System– manages files on a magnetic disk– allows open, read, seek, close on a file– allows protections to be set on a file– drawbacks relative to RAM?Database Management Systems• What more could we want than a file system?– Simple, efficient ad hoc1 queries– concurrency control– recovery– benefits of good data modeling• S.M.O.P.2? Not really…– as we’ll see this semester– in fact, the OS often gets in the way!1ad hoc: formed or used for specific or immediate problems or needs2SMOP: Small Matter Of ProgrammingCurrent Commercial Outlook• A major part of the software industry:– Oracle, IBM, Microsoft– also Sybase, Informix (now IBM), Teradata– smaller players: java-based dbms, devices, OO, …• Well-known benchmarks (esp. TPC)• Lots of related industries– data warehouse, document management, storage, backup,reporting, business intelligence, ERP, CRM, app integration• Traditional Relational DBMS products dominant and evolving– adapted for extensibility (user-defined types), native XML support.– Microsoft merger of file system/DB…?• Open Source coming on strong– MySQL, PostgreSQL, Apache Derby, BerkeleyDB, Ingres, EigenBase• And of course, the other “database” technologies– Search engines, P2P, etc.4What database systems will we cover?• We will be try to be broad and touch upon– Relational DBMS (e.g. Oracle, SQL Server, DB2,Postgres)– Document search engines (e.g. Google, Yahoo!Search, Verity, Spotlight)– “Semi-structured” DB systems (e.g. XMLrepositories like Xindice)• Starting point– We assume you have used web search engines– We assume you don’t know relational databases• Yet they pioneered many of the key ideas– So focus will be on relational DBMSs• With frequent side-notes on search engines, XML issuesWhy take this class?A. Database systems are at the core of CSB. They are incredibly important to societyC. The topic is intellectually richD. A capstone course for undergradE. It isn’t that much workF. Looks good on your resumeLet’s spend a little time on each of these• Shift from computation to information– True in corporate computing for years– Web, p2p made this clear for personal computing– Increasingly true of scientific computing• Need for DB technology has exploded in the lastyears– Corporate: retail swipe/clickstreams, “customer relationshipmgmt”, “supply chain mgmt”, “data warehouses”, etc.– Web:not just “documents”. Search engines, e-commerce,blogs, wikis, other “web services”.– Scientific: digital libraries, genomics, satellite imagery,physical sensors, simulation data– Personal: Music, photo, & video libraries. Email archives.File contents (“desktop search”).A. Database systems are the core of CSWhy take this class?Why take this class?• “Knowledge is power.” --Sir Francis Bacon• “With great power comesgreat responsibility.” --SpiderMan’s Uncle BenB. DBs are incredibly important to societyPolicy-makers should understand technological possibilities.Informed Technologists


View Full Document

Berkeley COMPSCI 186 - Introduction to Database Systems

Documents in this Course
Load more
Download Introduction to Database Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Database Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Database Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?