CMSC 424 Database design Lecture 11 Normalization Mihai Pop The Normal Forms 1NF every attribute has an atomic value not a set value 2NF we will not be concerned in this course 3NF if for each FD X Y either it is trivial or X is a superkey Y X is a proper subset of a candidate key BCNF if for each FD X Y either it is trivial or X is a superkey 1NF 2NF 3NF BCNF 4NF we are not concerned in this course 4NF Goals Lossless decomposition Dependency preservation Recap FD closure attribute closure FDs Normal forms etc why Start with a schema Decompose relations until in a normal form Functional dependencies constraints we d like preserved drive the decomposition The resulting schema is better Note that functional dependencies can either be explicit we want to enforce these constraints irrespective of data in the relations can be encoded in SQL implicit the data happen to satisfy them see netflix example Normalization only concerned with explicit FDs Privacy anonymization need to worry about implicit FDs Boyce Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F of the form where R and R at least one of the following holds is trivial i e is a superkey for R i e R Example schema not in BCNF bor loan customer id loan number amount because loan number amount holds on bor loan but loan number is not a superkey Decomposing a Schema into BCNF Suppose we have a schema R and a non trivial dependency causes a violation of BCNF We decompose R into R In our example loan number amount and bor loan is replaced by loan number amount R customer id loan number Testing for BCNF To check if a non trivial dependency causes a violation of BCNF compute the attribute closure of and verify that it includes all attributes of R that is it is a superkey of R Simplified test To check if a relation schema R with a given set of functional dependencies F is in BCNF it suffices to check only the dependencies in the given set F for violation of BCNF rather than checking all dependencies in F We can show that if none of the dependencies in F causes a violation of BCNF then none of the dependencies in F will cause a violation of BCNF either Testing for BCNF cont However using only F is incorrect when testing a relation in a decomposition of R E g Consider R A B C D with F A B B C Decompose R into R A B and R A C D 1 2 Neither of the dependencies in F contain only attributes from A C D so we might be mislead into thinking R2 satisfies BCNF In fact dependency A C in F shows R is not in BCNF 2 Simplified test Avoids computing F For every subset of R compute under F i Then either includes no attributes of R or includes all attributes i of Ri In R A C D above A ABC A A BC includes an attribute 2 of Ri but not all violation Then R is the violator A BC ACD C is an FD i actually in F which violates BCNF BCNF Decomposition Algorithm result R done false compute F while not done do if there is a schema R in result that is not in BCNF then begin let be a nontrivial functional dependency that holds on Ri such that R is not in F and result result Ri Ri end else done true i i Note each Ri is in BCNF and decomposition is lossless join Example of BCNF Decomposition R branch name branch city assets customer name loan number amount R Bn Bc As Cn Ln Am F Bn As Bc Ln Am Bn F branch name assets branch city loan number amount branch name Key loan number customer name Decomposition R1 branch name branch city assets R2 branch name customer name loan number amount R3 branch name loan number amount R4 customer name loan number Final decomposition R1 R3 R4 Ln Cn Bn Bc As Am 1 Bn As Bc in R Bn Bn As Bc key not SK Decompose R1 Bn Bc As R2 Bn Cn Ln Am 2 Ln Am Bn in R2 Ln Ln Am Bn As Bc decompose R3 Ln Am Bn R4 Ln Cn not SK BCNF and Dependency Preservation Constraints including functional dependencies are costly to check in practice unless they pertain to only one relation If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold then that decomposition is dependency preserving Because it is not always possible to achieve both BCNF and dependency preservation we consider a weaker normal form known as third normal form Third Normal Form A relation schema R is in third normal form 3NF if for all in F at least one of the following holds is trivial i e is a superkey for R Each attribute A in is contained in a candidate key for R NOTE each attribute may be in a different candidate key If a relation is in BCNF it is in 3NF since in BCNF one of the first two conditions above must hold Third condition is a minimal relaxation of BCNF to ensure dependency preservation will see why later 3NF Cont Example R J K L F JK L L K Two candidate keys JK and JL R is in 3NF JK L L K JK is a superkey K is contained in a candidate key Redundancy in 3NF Example of problems due to redundancy in 3NF R J K L F JK L L K J L K j1 l1 k1 j2 l1 k1 j3 l1 k1 null l2 k2 A schema in 3NF but not in BCNF has the following problems redundancy of information need to use null values e g to represent relationship l k 2 2 when there is no corresponding j value Testing for 3NF Optimization Need to check only FDs in F need not check all FDs in F Use attribute closure to check for each dependency if is a superkey If is not a superkey we have to verify if each attribute in is contained in a candidate key of R this test is more expensive since it involve finding ALL candidate keys testing for 3NF has been shown to be NP hard Canonical Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example A C is redundant in A B B C Parts of a functional dependency may be redundant E g on RHS A B B C A CD can be simplified to A B B C A D E g on LHS A B B C AC D can be simplified to A B B C A D Intuitively a canonical cover of F is a minimal set of functional dependencies equivalent to …
View Full Document
Unlocking...