UMD CMSC 424 - Lecture 11 Normalization - D2043133

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 424> Lecture 11 Normalization

DOC PREVIEW

UMD CMSC 424 - Lecture 11 Normalization

School name University of Maryland, College Park

Course Cmsc 424- Database Design

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CMSC 424 – Database designLecture 11NormalizationMihai PopThe Normal Forms•1NF: every attribute has an atomic value (not a set value)•2NF: we will not be concerned in this course•3NF: if for each FD X→Y either–it is trivial or– X is a superkey–Y-X is a proper subset of a candidate key•BCNF: if for each FD X→Y either–it is trivial or–X is a superkey•4NF,…: we are not concerned in this course.BCNF3NF2NF1NF4NF,..Goals•Lossless decomposition•Dependency preservation•Recap: FD closure, attribute closureFDs, Normal forms, etc..., why?•Start with a schema•Decompose relations until in a normal form•Functional dependencies (constraints we'd like preserved) drive the decomposition•The resulting schema is “better”•Note that functional dependencies can either be:–explicit: we want to enforce these constraints irrespective of data in the relations – can be encoded in SQL–implicit: the data happen to satisfy them (see netflix example)Normalization only concerned with explicit FDsPrivacy/anonymization – need to worry about implicit FDsBoyce-Codd Normal FormA relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form where α ⊆ R and β ⊆ R, at least one of the following holds:Example schema not in BCNF:bor_loan = ( customer_id, loan_number, amount )because loan_number → amount holds on bor_loan but loan_number is not a superkey  is trivial (i.e. ⊆ ) is a superkey for R, i.e.  +=RDecomposing a Schema into BCNF●Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.We decompose R into:•In our example, –α = loan_number–β = amountand bor_loan is replaced by● = ( loan_number, amount )● = ( customer_id, loan_number ) ∪R−−∪R−−Testing for BCNF●To check if a non-trivial dependency α →β causes a violation of BCNF●compute α+ (the attribute closure of α), and ●verify that it includes all attributes of R, that is, it is a superkey of R.●Simplified test: To check if a relation schema R with a given set of functional dependencies F is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.●We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.Testing for BCNF...cont●However, using only F is incorrect when testing a relation in a decomposition of R●E.g. Consider R (A, B, C, D), with F = { A →B, B →C}●Decompose R into R1(A,B) and R2(A,C,D) ●Neither of the dependencies in F contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. ●In fact, dependency A → C in F+ shows R2 is not in BCNF. ●Simplified test: Avoids computing F+ ●For every subset α of Ri compute α+ under F●Then either α+ includes no attributes of Ri-α or includes all attributes of Ri● In R2(A,C,D) above A+=ABC, A+-(A)=(BC) includes an attribute of Ri but not all (violation)●Then α  (α+ - α)∩ Ri is the violator ABC ∩ (ACD)=C is an FD (actually in F+) which violates BCNFBCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) doif (there is a schema Ri in result that is not in BCNF)then beginlet α → β be a nontrivial functionaldependency that holds on Risuch that α → Ri is not in F+, and α ∩ β = ∅; result := (result – Ri) ∪ (Ri – β) ∪ (α, β ); endelse done := true;Note: each Ri is in BCNF, and decomposition is lossless-join.Example of BCNF Decomposition•R = (branch-name, branch-city, assets,customer-name, loan-number, amount)•F = (branch-name → assets branch-cityloan-number → amount branch-name)Key = {loan-number, customer-name}•Decomposition–R1 = (branch-name, branch-city, assets)–R2 = (branch-name, customer-name, loan_number, amount)–R3 = (branch-name, loan-number, amount)–R4 = (customer-name, loan-number)•Final decomposition R1, R3, R4R=(Bn,Bc,As,Cn,Ln,Am)F={Bn→As Bc, Ln→Am Bn, Ln Cn→Bn Bc As Am}  key1) Bn→As Bc in R Bn+={Bn As Bc}  not SK DecomposeR1 = (Bn,Bc,As) R2 = (Bn,Cn,Ln,Am)2) Ln→Am Bn in R2 Ln+={Ln Am Bn As Bc}  not SK decompose R3=(Ln Am Bn) R4=(Ln Cn)BCNF and Dependency Preservation●Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation●If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving.●Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as third normal form.Third Normal Form•A relation schema R is in third normal form (3NF) if for all:α → β in F+at least one of the following holds:–α → β is trivial (i.e., β ∈ α)–α is a superkey for R–Each attribute A in β – α is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key)•If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold).•Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later).3NF (Cont.)•Example–R = (J, K, L)F = {JK → L, L → K}–Two candidate keys: JK and JL–R is in 3NFJK → L JK is a superkeyL → K K is contained in a candidate keyRedundancy in 3NFJj1j2j3nullLl1l1l1l2Kk1k1k1k2•Example of problems due to redundancy in 3NF–R = (J, K, L)F = {JK → L, L → K}A schema in 3NF but not in BCNF has the following problems:● redundancy of information● need to use null values (e.g. to represent relationship l2k2,when there is no corresponding j value)Testing for 3NF•Optimization: Need to check only FDs in F, need not check all FDs in F+.•Use attribute closure to check, for each dependency α → β, if α is a superkey.•If α is not a superkey, we have to verify if each attribute in β is contained in a candidate key of R–this test is more expensive, since it involve finding ALL candidate keys–testing for 3NF has been shown to be NP-hardCanonical Cover•Sets of functional

View Full Document