UI STAT 5400 - Computing in Statistics

Unformatted text preview:

122S:166Computing in StatisticsIntro to relational database conceptsLecture 16Oct. 10, 2011Kate Cowles374 SH, [email protected] to relational database con-cepts• database: a system for storing data• relational database model has become thede-facto standard for the design of databasesboth large and small2• storage of data for use in statistical analysisideally should follow this model• today’s lecture deals with two related topics– efficient storage of data (applies to settingup datafiles for use by S AS or any otheranalysis system)– some aspects of relational database soft-ware (such as Microsoft Access)Material drawn in part fromwww.citilink.com/~jgarrick/vbasic/database/rdbms.html and http://www.citilink.com/~jgarrick/vbasic/database/fundamentals.html3What is a relational database?• relational database stores all its data in “ta-bles”• table is a set of rows and columns– set has no predefined sort order f o r its el-ements– “record” is database terminology for a rowor observation– “field” or “attribute” is database termi-nology for a column or variable4Basic concepts• Primary and Foreign Keys• Queries• Referential Integrity• Normalization5Flat files (how not to store complexdata)• simplest model for a database• a single table which inclu des fields for eachelement you need to store• you have probably worked with flat file databases,at least in the form of spreadsheets• waste storage space and are problematic tomaintain6Example: customer order entry system• You’re managing the data for a company witha number of customers, each of which will beplacing multiple orders.• Each order can have one or more items7Data that we wish to record for eachcomponent of the application• Customers– Customer Number– Company Name– Address– City, State, ZIP Code– Phone Number• Orders– Order Number– Order Date– PO Number• Order Line Items– Item Number– Description– Quantity– Price8Problems with a flat file for represent-ing this data• Each ti me an order is placed, you’ll need torepeat the customer i nformation, inclu dingthe Customer Number, Company Name, etc.• What’s worse is that for each item, you notonly need to repeat the order informatio nsuch as the Order Number and Order Date,but you a lso need to continue repeating thecustomer information as well.• Let’s say there’s one customer who has placedtwo orders, each with four line items. Tomaintain this tiny amount of i nformation,you need to enter the Customer Number andCompany Name eight times.• What if the company should send you a changeof address?9• unacceptable aspects of flat file storage– effort required to maintain the data– likelihood of data entry errors causin g in -consistency in customer address betweenrecords10Solution: use a relational model for thedata• each order entered is related to a customerrecord• each line item is related to an order record• relational database manag ement system (RDBMS)is a piece of software that manages groups ofrecords which are related to one another11Break flat file into three tables• Customers– CustID– CustName– CustAddress– CustCity– CustState– CustZIP– CustPhone• Orders– OrdID– OrdCustID (new field)– OrdDate– OrdPONumber12• OrderDetails– ODID– ODOrdID (new field)– ODDescription– ODQty– ODPrice13Keys• key: a fiel d that can be used to identify arecord• key fields may contain– data element you are storing or deri vedfrom that data– an arbitrary value• example: for the Customers table– could use the company name as a key, bu tif you ever had two companies with thesame name, your system would be broken– alternatively could use some derivation ofthe company name in an eff o rt to preserveenough of th e name to make it easy forusers to derive the name b a sed on the key,but that often breaks down when the ta-bles become large– may be easiest to simply use an arbitrarywhole number14Primary and Foreign Keys• primary key: a field that uniquely identifiesa record in a table– No two records can have the same valuefor a primary key.– Each value in a primary key will identifyone and only one record.• foreign key: represents the value of primarykey for a related table– foreign keys are the cornerstone of rela-tional databa ses– example: in Orders table, OrdCustID fieldwould hold the value of the CustID fieldfor the customer who placed the order.∗ By doing this, we can attach the i nfor-mation for the customer record to theorder by storing only the one value.15Queries• so far we’ve– broken down our order entry system intothree tabl es– added foreign keys to the Orders and Or-derDetails tabl es• Now, rather than repeating the Customerstable data for each Orders table record, wesimply record a customer number in the Or-dCustID field.• By doing this, we can change the informationin the Customers table record and have thatchange be reflected in every order placed bythe customer.• This is acc o mplished by using queries to re-assemble the data.• query: a view of data which represents thedata from one or more tables16Reassembling the data for analysis andpresentation• human users of the system wi ll only be a bleto view data in two dimensions– become rows and columns in a table eith eron the screen or on paper• to see orders pla c ed by our customers– link the Customers and Orders tables us-ing the CustID field from Customers andthe OrdCustID field from Orders∗ value of the Ord C ustID field representsa related record in the Customers tab leand is equal to the CustID value fromthat record– by joining together the two tabl es basedon this relationship, we can add fields fromboth tables and see all ord ers along withany pertinent customer data17Referential Integrity• purpose is to maintain validity of data• example: what would happen if you neededto delete a customer?– if the cu stomer has orders, the orders willbe orpha ned– must have a means in place to enforce thatfor each order, there i s a correspondingcustomer• two ways th at that database managementsystem can enfore “referential integrity”– by cascading deletions through the relatedtables– by preventing d eletions when related reco rd sexist18Normalization• essentially the p rocess of distilling the struc-ture of the database to remove repeating groupsof data into separate tab les.• example: we have


View Full Document

UI STAT 5400 - Computing in Statistics

Documents in this Course
Load more
Download Computing in Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?