1Relational Database Design TheoryPart IICPS 116Introduction to Database Systems2Announcements (October 12) Midterm graded; sample solution available Homework #2 graded; sample solution already distributed Please verify all your grades on Blackboard Project milestone #1 due today3Review Functional dependencies X → Y: If two rows agree on X, they must agree on Y)A generalization of the key concept Non-key functional dependencies: a source of redundancyNontrivialX→YwhereXis not a superkeyNon-trivial X→Ywhere Xis not a superkey)Called a BCNF violation BCNF decomposition: a method for removing redundancies Given R(X, Y, Z) and a BCNF violation X → Y, decompose R into R1(X, Y) and R2(X, Z))A lossless join decomposition Schema in BCNF has no redundancy due to FD’s24Next 3NF (BCNF is too much) Multivalued dependencies: another source of redundancy 4NF (BCNF is not enough)(g)5Motivation for 3NF Address (street_address, city, state, zip) street_address, city, state → zip zip → city, state Keys BCNF?6To decompose or not to decomposeAddress1(zip, city, state)Address2(street_address, zip) FD’s in Address1FD’sinAddress2FD s in Address2 Hey, where is street_address, city, state → zip? Cannot check without joining Address1and Address2back together Problem: Some lossless join decomposition is not dependency-preserving Dilemma: Should we get rid of redundancy at the expense of making constraints harder to enforce?373NF R is in Third Normal Form (3NF) if for every non-trivial FD X → A (where A is single attribute), either X is a superkey of R, or A is a member of at least one key of R)Intuitively, BCNF decomposition on X → Awould “break” the uvy,CNdopo oowou d bkey containing A So Address is already in 3NF Tradeoff: Can enforce all original FD’s on individual decomposed relations Might have some redundancy due to FD’s8BCNF = no redundancy? Student (SID, CID, club) Suppose your classes have nothing to do with the clubs you join FD’s?SID CID club142 CPS116 ballet BCNF? Redundancies?142 CPS116 sumo142 CPS114 ballet142 CPS114 sumo123 CPS116 chess123 CPS116 golf... ... ...9Multivalued dependencies A multivalued dependency (MVD) has the formX ) Y, where X and Y are sets of attributes in a relation R X ) Y means that whenever two rows in R agree on XY Zab1 c1ab2 c2…… …all the attributes of X, then we can swap their Ycomponents and get two new rows that are also in RMust be in R tooXY Zab1 c1ab2 c2ab1 c2ab2 c1…… …410MVD examplesStudent (SID, CID, club)11Complete MVD + FD rules FD reflexivity, augmentation, and transitivity MVD complementation:If X ) Y, then X ) attrs(R) – X – Y MVD augmentation:If X ) Y and V ⊆W, then XW) YV MVD transitivity:If X ) Y and Y ) Z, then X ) Z – Y Replication (FD is MVD):If X → Y, then X ) Y Coalescence:If X ) Y and Z ⊆ Y and there is some W disjoint from Ysuch that W → Z, then X → ZTry proving things using these!12An elegant solution: chase Given a set of FD’s and MVD’s D, does another dependency d (FD or MVD) follow from D? Procedure Start with the hypothesis of d, and treat them as “seed” tuples in a relationtuples in a relation Apply the given dependencies in D repeatedly• If we apply an FD, we infer equality of two symbols• If we apply an MVD, we infer more tuples If we infer the conclusion of d, we have a proof Otherwise, if nothing more can be inferred, we have a counterexample513Proof by chase In R(A, B, C, D), does A ) B and B ) C imply that A ) C?HaveNeedABCD ABCDA ) BB ) C$ab1c1 d1ab2 c2 d2ab2 c1 d1ab1 c2 d2ab2 c1 d2ab2 c2 d1ab1c2 d1ab2 c1 d214Another proof by chase In R(A, B, C, D), does A → B and B → C imply that A → C?HaveNeedc1= c2$ABCDA → Bb1= b2B → Cc1= c2In general, both new tuples and new equalitiesmay be generatedab1 c1 d1ab2 c2 d215Counterexample by chase In R(A, B, C, D), does A ) BC and CD → B imply that A → B?HaveNeedb1= b2'ABCDA ) BCCounterexample!ab1 c1 d1ab2 c2 d2ab2 c2 d1ab1 c1 d26164NF A relation R is in Fourth Normal Form (4NF) if For every non-trivial MVD X ) Y in R, X is a superkey That is, all FD’s and MVD’s follow from “key → other attributes” (i.e., no MVD’s and no FD’s besides key filddi)functional dependencies) 4NF is stronger than BCNF Because every FD is also a MVD174NF decomposition algorithm Find a 4NF violation A non-trivial MVD X ) Y in R where X is not a superkey Decompose R into R1and R2, where R1has attributes X ∪ YRhas attributesX∪Z(Zcontains attributes not inXorY)R2has attributes X∪Z(Zcontains attributes not in Xor Y) Repeat until all relations are in 4NF Almost identical to BCNF decomposition algorithm Any decomposition on a 4NF violation is lossless184NF decomposition exampleStudent (SID, CID, club)4NF violation: SID ) CIDSID CID club142 CPS116 ballet142 CPS116 sumo142 CPS114 ballet142 CPS114 sumo123 CPS116 chess123 CPS116 golf... ... ...Enroll (SID, CID) Join (SID, club)4NF 4NFSID CID142 CPS116142 CPS114123 CPS116... ...SID club142 ballet142 sumo123 chess123 golf... ...7193NF, BCNF, 4NF, and beyondOf hi i l iAnomaly/normal form 3NF BCNF 4NFLose FD’s? No Possible PossibleRedundancy due to FD’s Possible No NoRedundancy due to MVD’s Possible Possible NoOf historical interests 1NF: All column values must be atomic 2NF: Slightly more relaxed than 3NF20Summary Philosophy behind BCNF, 4NF:Data should depend on the key, the whole key, and nothing but the key! Philosophy behind 3NF: … But not at the expense of more expensive constraint
View Full Document