CMSC 424 – Database designLecture 11NormalizationMihai PopThe Normal Forms•1NF: every attribute has an atomic value (not a set value)•2NF: we will not be concerned in this course•3NF: if for each FD X→Y either–it is trivial or– X is a superkey–Y-X is a proper subset of a candidate key•BCNF: if for each FD X→Y either–it is trivial or–X is a superkey•4NF,…: we are not concerned in this course.BCNF3NF2NF1NF4NF,..Goals•Lossless decomposition•Dependency preservation•Recap: FD closure, attribute closureFDs, Normal forms, etc..., why?•Start with a schema•Decompose relations until in a normal form•Functional dependencies (constraints we'd like preserved) drive the decomposition•The resulting schema is “better”•Note that functional dependencies can either be:–explicit: we want to enforce these constraints irrespective of data in the relations – can be encoded in SQL–implicit: the data happen to satisfy them (see netflix example)Normalization only concerned with explicit FDsPrivacy/anonymization – need to worry about implicit FDsBoyce-Codd Normal FormA relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form where α ⊆ R and β ⊆ R, at least one of the following holds:Example schema not in BCNF:bor_loan = ( customer_id, loan_number, amount )because loan_number → amount holds on bor_loan but loan_number is not a superkey is trivial (i.e. ⊆ ) is a superkey for R, i.e. +=RDecomposing a Schema into BCNF●Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.We decompose R into:•In our example, –α = loan_number–β = amountand bor_loan is replaced by● = ( loan_number, amount )● = ( customer_id, loan_number ) ∪R−−∪R−−Testing for BCNF●To check if a non-trivial dependency α →β causes a violation of BCNF●compute α+ (the attribute closure of α), and ●verify that it includes all attributes of R, that is, it is a superkey of R.●Simplified test: To check if a relation schema R with a given set of functional dependencies F is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.●We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.Testing for BCNF...cont●However, using only F is incorrect when testing a relation in a decomposition of R●E.g. Consider R (A, B, C, D), with F = { A →B, B →C}●Decompose R into R1(A,B) and R2(A,C,D) ●Neither of the dependencies in F contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. ●In fact, dependency A → C in F+ shows R2 is not in BCNF. ●Simplified test: Avoids computing F+ ●For every subset α of Ri compute α+ under F●Then either α+ includes no attributes of Ri-α or includes all attributes of Ri● In R2(A,C,D) above A+=ABC, A+-(A)=(BC) includes an attribute of Ri but not all (violation)●Then α (α+ - α)∩ Ri is the violator ABC ∩ (ACD)=C is an FD (actually in F+) which violates BCNFBCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) doif (there is a schema Ri in result that is not in BCNF)then beginlet α → β be a nontrivial functionaldependency that holds on Risuch that α → Ri is not in F+, and α ∩ β = ∅; result := (result – Ri) ∪ (Ri – β) ∪ (α, β ); endelse done := true;Note: each Ri is in BCNF, and decomposition is lossless-join.Example of BCNF Decomposition•R = (branch-name, branch-city, assets,customer-name, loan-number, amount)•F = (branch-name → assets branch-cityloan-number → amount branch-name)Key = {loan-number, customer-name}•Decomposition–R1 = (branch-name, branch-city, assets)–R2 = (branch-name, customer-name, loan_number, amount)–R3 = (branch-name, loan-number, amount)–R4 = (customer-name, loan-number)•Final decomposition R1, R3, R4R=(Bn,Bc,As,Cn,Ln,Am)F={Bn→As Bc, Ln→Am Bn, Ln Cn→Bn Bc As Am} key1) Bn→As Bc in R Bn+={Bn As Bc} not SK DecomposeR1 = (Bn,Bc,As) R2 = (Bn,Cn,Ln,Am)2) Ln→Am Bn in R2 Ln+={Ln Am Bn As Bc} not SK decompose R3=(Ln Am Bn) R4=(Ln Cn)BCNF and Dependency Preservation●Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation●If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving.●Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as third normal form.Third Normal Form•A relation schema R is in third normal form (3NF) if for all:α → β in F+at least one of the following holds:–α → β is trivial (i.e., β ∈ α)–α is a superkey for R–Each attribute A in β – α is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key)•If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold).•Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later).3NF (Cont.)•Example–R = (J, K, L)F = {JK → L, L → K}–Two candidate keys: JK and JL–R is in 3NFJK → L JK is a superkeyL → K K is contained in a candidate keyRedundancy in 3NFJj1j2j3nullLl1l1l1l2Kk1k1k1k2•Example of problems due to redundancy in 3NF–R = (J, K, L)F = {JK → L, L → K}A schema in 3NF but not in BCNF has the following problems:● redundancy of information● need to use null values (e.g. to represent relationship l2k2,when there is no corresponding j value)Testing for 3NF•Optimization: Need to check only FDs in F, need not check all FDs in F+.•Use attribute closure to check, for each dependency α → β, if α is a superkey.•If α is not a superkey, we have to verify if each attribute in β is contained in a candidate key of R–this test is more expensive, since it involve finding ALL candidate keys–testing for 3NF has been shown to be NP-hardCanonical Cover•Sets of functional
View Full Document