Berkeley COMPSCI 186 - Schema Refinement and Normalization - D2033374

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 186> Schema Refinement and Normalization

DOC PREVIEW

Berkeley COMPSCI 186 - Schema Refinement and Normalization

School name University of California, Berkeley

Course Compsci 186- Introduction to Database Systems

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Schema Refinementand NormalizationNobody realizes that some peopleexpend tremendous energymerely to be normal. Albert CamusFunctional Dependencies (FDs) (Review)• A functional dependency X → Y holds over relationschema R if, for every allowable instance r of R: t1 ∈ r, t2 ∈ r, πX (t1) = πX (t2) implies πY (t1) = πY (t2)(where t1 and t2 are tuples;X and Y are sets of attributes)• In other words: X → Y means Given any two tuples in r, if the X values are the same,then the Y values must also be the same. (but not viceversa)• Read “→” as “determines”Reasoning About FDs (Review)• Given some FDs, we can usually infer additional FDs:title → studio, star implies title → studio and title → startitle → studio and title → star implies title → studio, startitle → studio, studio → star implies title → starBut, title, star → studio does NOT necessarily imply thattitle → studio or that star → studio• An FD f is implied by a set of FDs F if f holdswhenever all FDs in F hold.• F+ = closure of F is the set of all FDs that are impliedby F. (includes “trivial dependencies”)Rules of Inference (Review)• Armstrong’s Axioms (X, Y, Z are sets of attributes):–Reflexivity: If X ⊇ Y, then X → Y–Augmentation: If X → Y, then XZ → YZ for any Z–Transitivity: If X → Y and Y → Z, then X → Z• These are sound and complete inference rules for FDs!– i.e., using AA you can compute all the FDs in F+ and onlythese FDs.• Some additional rules (that follow from AA):–Union: If X → Y and X → Z, then X → YZ–Decomposition: If X → YZ, then X → Y and X → ZAttribute Closure• Computing the closure of a set of FDs can be expensive. (Size ofclosure is exponential in # attrs!)• Typically, we just want to check if a given FD X → Y is in theclosure of a set of FDs F. An efficient check:– Compute attribute closure of X (denoted X+) wrt F.X+ = Set of all attributes A such that X → A is in F+• X+ := X• Repeat until no change: if there is an fd U → V in F such that U is in X+,then add V to X+– Check if Y is in X+– Approach can also be used to find the keys of a relation.• If all attributes of R are in the closure of X then X is a superkey forR.• Q: How to check if X is a “candidate key”?Normal Forms• Back to schema refinement…• Q1: is any refinement needed??!• If a relation is in a normal form (BCNF, 3NF etc.):– we know that certain problems are avoided/minimized.– helps decide whether decomposing a relation is useful.• Role of FDs in detecting redundancy:– Consider a relation R with 3 attributes, ABC.• No (non-trivial) FDs hold: There is no redundancy here.• Given A → B: If A is not a key, then several tuples could have thesame A value, and if so, they’ll all have the same B value!• 1st Normal Form – all attributes are atomic– i.e. the relational model• 1st ⊃2nd (of historical interest) ⊃ 3rd ⊃ Boyce-Codd ⊃ …2Boyce-Codd Normal Form (BCNF)• Reln R with FDs F is in BCNF if, for all X → A in F+– A ∈ X (called a trivial FD), or– X is a superkey for R.• In other words: “R is in BCNF if the only non-trivial FDsover R are key constraints.”• If R in BCNF, then every field of every tuple recordsinformation that cannot be inferred using FDs alone.– Say we know FD X → A holds this example relation: • Can you guess the value of themissing attribute?•Yes, so relation is not in BCNF?y2xAy1xAYXDecomposition of a Relation Schema• If a relation is not in a desired normal form, it can bedecomposed into multiple relations that each are in thatnormal form.• Suppose that relation R contains attributes A1 ... An. Adecomposition of R consists of replacing R by two or morerelations such that:– Each new relation scheme contains a subset of theattributes of R, and– Every attribute of R appears as an attribute of at leastone of the new relations. Example (same as before)• SNLRWH has FDs S → SNLRWH and R → W• Q: Is this relation in BCNF?Hourly_EmpsNo, The second FD causes a violation;W values repeatedly associated with R values.Decomposing a Relation• Easiest fix is to create a relation RW to store theseassociations, and to remove W from the mainschema:•Decompositions should be used only when needed.–Q: potential problems of decomposition?•Q: Are both of these relations are now in BCNF?Problems with Decompositions• There are three potential problems to consider:1) May be impossible to reconstruct the original relation! (LossyDecomposition)• Fortunately, not in the SNLRWH example.2) Dependency checking may require joins (not Dependency Preserving)• Fortunately, not in the SNLRWH example.3) Some queries become more expensive.• e.g., How much does Guldu earn?Tradeoff: Must consider these issues vs. redundancy.(Well, not usually #1)Lossless Decomposition (example)=><3Lossy Decomposition (example)Lossless Join Decompositions• Decomposition of R into X and Y is lossless w.r.t. a setof FDs F if, for every instance r that satisfies F: (r) (r) = r• It is always true that r (r) (r)– In general, the other direction does not hold! If it does,the decomposition is lossless-join.• Definition extended to decomposition into 3 or morerelations in a straightforward way.•It is essential that all decompositions used to deal withredundancy be lossless! (Avoids Problem #1)!X!Y!!X!Y><><More on Lossless Decomposition• The decomposition of R into X and Y islossless with respect to F if and only if theclosure of F contains:X ∩ Y → X, orX ∩ Y → Yin example: decomposing ABC into AB and BC islossy, because intersection (i.e., “B”) is not a keyof either resulting relation.• Useful result: If W → Z holds over R and W ∩ Z isempty, then decomposition of R into R-Z and WZ isloss-less.i.e. the common attributesform a superkey for oneside or the otherLossless Decomposition (example)But, now we can’t check A → B without doing a join!Dependency Preserving Decomposition• Dependency preserving decomposition (Intuitive):– If R is decomposed into X, Y and Z, and weenforce the FDs that hold individually on X, on Yand on Z, then all FDs that were given to holdon R must also hold. (Avoids Problem #2 onour list.)• Why do we care??•Projection of set of FDs F : If R is decomposed intoX

View Full Document