Course IntroductionIntroduction to DatabasesInstructor: Joe BockhorstUniversity of Wisconsin - MilwaukeeFirst Reading Assignment• Chapters 1 and 2 (today and part of Thursday) Chapter 13 and handout“There's a prayer each night that I always pray:Let the data guide me through every day”Warren ZevonData is Ubiquitous• Three classes of technological advances are changing our relationship with data:• More storage space– allows us to keep more data•Faster processor (and memory) speeds– allows us to access and process more data• Different “sensors”– allows us to access new kinds of datahttp://en.wikipedia.org/wiki/Hard_diskMicroarrays – An Example of aNew Sensing TechnologyThe color of each spot represents the activity level of a gene under some experimental condition10 000s of spots on a single chipA microarrayOther Data Examples• Airline flight management system• Financial data• Commercial store (eg, WalMart) data• Department of Motor Vehicles• Surveillance video• University student records• Baseball results• Web sites• Medical records• ...Effectively Data Management is Essential• Organizations need their data to be an asset•Given:the amount of data available to store&costs to manage data (hardware, software, labor) • Ineffective policies can make an organization’s data a liabilityDatabase Management System (DBMS)• DBMS is:– A collection of software programs– General purpose• DBMS enables users to:– Define DB– Construct DB– Change (or update) DB– Ask questions about the data in DB–Share DB• DBMS maintains the integrity of DBSome RDBM SystemsCommercial SystemsOracle ($$$$)DB2 (IBM) ($$$)SQL Server (Microsoft) ($$)Open Source SystemsPostgreSQLMySQLSource: International Data CorporationMain Goals of this Course• To understand how to use a DBMS– How to create DB, data models, SQL,...• To understand how a DBMS works– Physical properties of disks and files, software to manage reading and writing to disk, implementation of algorithms to answer user queries,...catalogDatabases are self-describing: catalog describes the structure of the data stored in the DBExample: Internet Movie Database (IMDB)Building a DB:construct a conceptual modelmoviepersonacts indirector oftitlerelease datename birthdaterolerole type• A conceptual model identifies entities and relationshipsMNN1entityattributerelationshipBuilding a DB:Define DB Schema• A schema describes DB using data modelsupported by DMBS (eg, relational model)• RDBMS – DBMS that supports relational modelRating DirectorTitleMIDBdayNamePIDRtypeRolePIDMIDMOVIEPERSONACTS_INA Schema Diagram for “University” DB(from the textbook)tablescolumnsBuilding a DB:Describe Physical Data Model• PDM indicates how data is organized on disk• Includes description of access paths or indexes– Example: store “Movie” table with records ordered by MID and construct an index on the “Title” attribute1 The Big Lebowski R 992 Star Wars PG 16270 The Big Chill PG 3The Big ChillThe Big LebowskiFile of records of the MOVIE tableIndex on Title columnBuilding a DB:Populate DBPGRRating...29Star Wars272The Big Lebowski1DirectorTitleMID...7/13/42Harrison Ford212/4/49Jeff Daniels1BdayNamePID...CO_STARHan Solo22STARThe Dude11RtypeRolePIDMIDMOVIEPERSONACTS_INSet initial records of the DBQuerying The Database• Most RDBMS allow users to query the database using SQL (structured query language)• Example: get cast of “The Big Lebowski”SELECT Name, Role, RtypeFROM PERSON, ACTS_INWHERE MID = ‘1’ AND PERSON.PID == ACTS_IN.PIDBuilding the Application ProgramImplementing Queries• “Relational Algebra” is a mathematical way to describe operations on relational data• SQL queries correspond to sequence of relational algebra operations– The previous query requires a join operation between person and acts_in• Query Optimization involves finding a good order to carry out operations• Operator implementationManaging Physical Data Storage• RDBMS maintains database (and meta-data) on non-volatile storage (hard disks)• Physical design impacts RDBMS performance• Example: The time to answer a query such as What is the MID of “The Big Lebowski” can be greatly reduced if an index of Title column is maintained for the Movie table.Maintaining Integrity of the Database• Concurrent users– Multiple users may attempt to update simultaneously• Security– Preventing unauthorized access• System failures– If lightening strikes during an update the DB must able to be recoveredSummary of Topics• Conceptual modeling• Logical Modeling• Querying the DB• Building applications • Implementing Queries• Managing hardware• Maintaining Integrity how to use DBMShow a DBMS worksControl AbstractionQuery OptimizationRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBApplication ProgramUserDBMSEach layer need not know (or care) how other layers are implementedData AbstractionEach layer need not know how other layers organize dataWhy Use DBMS?• Program Data Independence• Controlling redundancy• Providing backup and recovery• Efficient query processing• Others: see Section 1.6Why not to use a DBMS?• Consider custom software if DBMS overhead (cost, complexity, performance) is unnecessary– Example: single user of fixed datasetSchemas and Instances• A schema describes a database– RDBMS typically store schemas in the catalog• The actual data in the DB at a particular time is the database state– The current set of all instances in the DBPeople who work with DBMSs• Database Administrator DBA– Maintains databases, DBMS and related software– [avg salary* $76k]• Application Programmers– Software engineers (developers) that build software solutions for end users that access DBMS• End Users– Example: bank teller uses “canned transactions”• DBMS designers and implementers– Example: Oracle developers *source: payscale.com,
View Full Document