DOC PREVIEW
UMD CMSC 424 - Lecture 11 Normalization

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 424 – Database designLecture 11NormalizationMihai PopThe Normal Forms•1NF: every attribute has an atomic value (not a set value)•2NF: we will not be concerned in this course•3NF: if for each FD X→Y either–it is trivial or– X is a superkey–Y-X is a proper subset of a candidate key•BCNF: if for each FD X→Y either–it is trivial or–X is a superkey•4NF,…: we are not concerned in this course.BCNF3NF2NF1NF4NF,..Goals•Lossless decomposition•Dependency preservation•Recap: FD closure, attribute closureFDs, Normal forms, etc..., why?•Start with a schema•Decompose relations until in a normal form•Functional dependencies (constraints we'd like preserved) drive the decomposition•The resulting schema is “better”•Note that functional dependencies can either be:–explicit: we want to enforce these constraints irrespective of data in the relations – can be encoded in SQL–implicit: the data happen to satisfy them (see netflix example)Normalization only concerned with explicit FDsPrivacy/anonymization – need to worry about implicit FDsBoyce-Codd Normal FormA relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form where α ⊆ R and β ⊆ R, at least one of the following holds:Example schema not in BCNF:bor_loan = ( customer_id, loan_number, amount )because loan_number → amount holds on bor_loan but loan_number is not a superkey  is trivial (i.e. ⊆ ) is a superkey for R, i.e.  +=RDecomposing a Schema into BCNF●Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.We decompose R into:•In our example, –α = loan_number–β = amountand bor_loan is replaced by● = ( loan_number, amount )● = ( customer_id, loan_number ) ∪R−−∪R−−Testing for BCNF●To check if a non-trivial dependency α →β causes a violation of BCNF●compute α+ (the attribute closure of α), and ●verify that it includes all attributes of R, that is, it is a superkey of R.●Simplified test: To check if a relation schema R with a given set of functional dependencies F is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.●We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.Testing for BCNF...cont●However, using only F is incorrect when testing a relation in a decomposition of R●E.g. Consider R (A, B, C, D), with F = { A →B, B →C}●Decompose R into R1(A,B) and R2(A,C,D) ●Neither of the dependencies in F contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. ●In fact, dependency A → C in F+ shows R2 is not in BCNF. ●Simplified test: Avoids computing F+ ●For every subset α of Ri compute α+ under F●Then either α+ includes no attributes of Ri-α or includes all attributes of Ri● In R2(A,C,D) above A+=ABC, A+-(A)=(BC) includes an attribute of Ri but not all (violation)●Then α  (α+ - α)∩ Ri is the violator ABC ∩ (ACD)=C is an FD (actually in F+) which violates BCNFBCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) doif (there is a schema Ri in result that is not in BCNF)then beginlet α → β be a nontrivial functionaldependency that holds on Risuch that α → Ri is not in F+, and α ∩ β = ∅; result := (result – Ri) ∪ (Ri – β) ∪ (α, β ); endelse done := true;Note: each Ri is in BCNF, and decomposition is lossless-join.Example of BCNF Decomposition•R = (branch-name, branch-city, assets,customer-name, loan-number, amount)•F = (branch-name → assets branch-cityloan-number → amount branch-name)Key = {loan-number, customer-name}•Decomposition–R1 = (branch-name, branch-city, assets)–R2 = (branch-name, customer-name, loan_number, amount)–R3 = (branch-name, loan-number, amount)–R4 = (customer-name, loan-number)•Final decomposition R1, R3, R4R=(Bn,Bc,As,Cn,Ln,Am)F={Bn→As Bc, Ln→Am Bn, Ln Cn→Bn Bc As Am}  key1) Bn→As Bc in R Bn+={Bn As Bc}  not SK DecomposeR1 = (Bn,Bc,As) R2 = (Bn,Cn,Ln,Am)2) Ln→Am Bn in R2 Ln+={Ln Am Bn As Bc}  not SK decompose R3=(Ln Am Bn) R4=(Ln Cn)BCNF and Dependency Preservation●Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation●If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving.●Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as third normal form.Third Normal Form•A relation schema R is in third normal form (3NF) if for all:α → β in F+at least one of the following holds:–α → β is trivial (i.e., β ∈ α)–α is a superkey for R–Each attribute A in β – α is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key)•If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold).•Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later).3NF (Cont.)•Example–R = (J, K, L)F = {JK → L, L → K}–Two candidate keys: JK and JL–R is in 3NFJK → L JK is a superkeyL → K K is contained in a candidate keyRedundancy in 3NFJj1j2j3nullLl1l1l1l2Kk1k1k1k2•Example of problems due to redundancy in 3NF–R = (J, K, L)F = {JK → L, L → K}A schema in 3NF but not in BCNF has the following problems:● redundancy of information● need to use null values (e.g. to represent relationship l2k2,when there is no corresponding j value)Testing for 3NF•Optimization: Need to check only FDs in F, need not check all FDs in F+.•Use attribute closure to check, for each dependency α → β, if α is a superkey.•If α is not a superkey, we have to verify if each attribute in β is contained in a candidate key of R–this test is more expensive, since it involve finding ALL candidate keys–testing for 3NF has been shown to be NP-hardCanonical Cover•Sets of functional


View Full Document

UMD CMSC 424 - Lecture 11 Normalization

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Download Lecture 11 Normalization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 11 Normalization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 11 Normalization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?