Introduction to Database Systems CSE 444 Lectures 6-7: Database Design Magda Balazinska - CSE 444, Fall 2010 12 Outline • Design theory: 3.1-3.4 – [Old edition: 3.4-3.6] Magda Balazinska - CSE 444, Fall 20103 Schema Refinements = Normal Forms • 1st Normal Form = all tables are flat • 2nd Normal Form = obsolete • Boyce Codd Normal Form = will study • 3rd Normal Form = see book Magda Balazinska - CSE 444, Fall 20104 First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Name GPA Courses Alice 3.8 Bob 3.7 Carol 3.9 Math DB OS DB OS Math OS Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Student Course Math DB OS Student Course Alice Math Carol Math Alice DB Bob DB Alice OS Carol OS Takes Course May need to add keys5 Relational Schema Design Person buys Product name price name ssn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Magda Balazinska - CSE 444, Fall 20106 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Magda Balazinska - CSE 444, Fall 20107 Relational Schema Design Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Primary key is thus (SSN,PhoneNumber) The above is in 1NF, but was is the problem with this schema? Magda Balazinska - CSE 444, Fall 20108 Relational Schema Design Anomalies: • Redundancy = repeat data • Update anomalies = what if Fred moves to “Bellevue”? • Deletion anomalies = what if Joe deletes his phone number? (what if Joe had only one phone #) Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Magda Balazinska - CSE 444, Fall 20109 Relation Decomposition Break the relation into two: Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: • No more repeated data • Easy to move Fred to “Bellevue” (how ?) • Easy to delete all Joe’s phone numbers (how ?) Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield10 Relational Schema Design (or Logical Design) Main idea: • Start with some relational schema • Find out its functional dependencies – They come from the application domain knowledge! • Use them to design a better relational schema Magda Balazinska - CSE 444, Fall 201011 Functional Dependencies • A form of constraint – Hence, part of the schema • Finding them is part of the database design • Use them to normalize the relations Magda Balazinska - CSE 444, Fall 201012 Functional Dependencies (FDs) Definition: If two tuples agree on the attributes then they must also agree on the attributes Formally: A1, A2, …, An B1, B2, …, Bm A1, A2, …, An B1, B2, …, Bm Magda Balazinska - CSE 444, Fall 201013 When Does an FD Hold Definition: A1, ..., Am B1, ..., Bn holds in R if: ∀t, t’ ∈ R, (t.A1 = t’.A1 ∧ ... ∧ t.Am = t’.Am ⇒ t.B1 = t’.B1 ∧ ... ∧ t.Bn = t’.Bn ) A1 ... Am B1 ... nm if t, t’ agree here then t, t’ agree here t t’ R14 Example EmpID Name, Phone, Position Position Phone but not Phone Position An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Magda Balazinska - CSE 444, Fall 201015 Example Position Phone EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 ← Salesrep E1111 Smith 9876 ← Salesrep E9999 Mary 1234 Lawyer Magda Balazinska - CSE 444, Fall 201016 Example EmpID Name Phone Position E0045 Smith 1234 → Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 → Lawyer But not Phone Position Magda Balazinska - CSE 444, Fall 201017 Example FD’s are constraints: • On some instances they hold • On others they don’t name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Does this instance satisfy all the FDs ? name color category department color, category price18 Example name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one ? name color category department color, category price19 An Interesting Observation If all these FDs are true: name color category department color, category price Then this FD also holds: name, category price Why ?? Magda Balazinska - CSE 444, Fall 201020 Goal: Find ALL Functional Dependencies • Anomalies occur when certain “bad” FDs hold • We know some of the FDs • Need to find all FDs • Then look for the bad ones Magda Balazinska - CSE 444, Fall 201021 Armstrong’s Rules (1/3) Is equivalent to Splitting rule and Combing rule A1 ... Am B1 ... Bm A1, A2, …, An B1, B2, …, Bm A1, A2, …, An B1 A1, A2, …, An B2 . . . . . A1, A2, …, An Bm Magda Balazinska - CSE 444, Fall 201022 Armstrong’s Rules (2/3) Trivial Rule Why ? A1 … Am where i = 1, 2, ..., n A1, A2, …, An Ai Magda Balazinska - CSE 444, Fall 201023 Armstrong’s Rules (3/3) Transitive Rule If and then Why ? A1, A2, …, An B1, B2, …, Bm B1, B2, …, Bm C1, C2, …, Cp A1, A2, …, An C1, C2, …, Cp Magda Balazinska - CSE 444, Fall 201024 A1 … Am B1 … Bm C1 ... Cp Magda Balazinska - CSE 444, Fall 2010 Armstrong’s Rules (3/3) Illustration25 Example (continued) Start from the following FDs: Infer the following FDs: 1. name color 2. category department 3. color, category price Inferred FD Which Rule did we apply ? 4. name, category name 5. name, category color 6. name, category
View Full Document