1FunctionalDependenciesR&G Chapter 19Science is the knowledge ofconsequences, and dependenceof one fact upon another.Thomas Hobbes(1588-1679)Review: Database Design• Requirements Analysis– user needs; what must database do?• Conceptual Design– high level descr (often done w/ER model)• Logical Design– translate ER into DBMS data model• Schema Refinement– consistency,normalization• Physical Design - indexes, disk layout• Security Design - who accesses whatThe Evils of Redundancy•Redundancy: root of several problems with relationalschemas:– redundant storage, insert/delete/update anomalies•Functional dependencies:– a form of integrity constraint that can identify schemas withsuch problems and suggest refinements.• Main refinement technique: decomposition– replacing ABCD with, say, AB and BCD, or ACD and ABD.• Decomposition should be used judiciously:– Is there reason to decompose a relation?– What problems (if any) does the decomposition cause?Functional Dependencies (FDs)• A functional dependency X → Y holds over relationschema R if, for every allowable instance r of R: t1 ∈ r, t2 ∈ r, πX (t1) = πX (t2) implies πY (t1) = πY (t2)(where t1 and t2 are tuples;X and Y are sets of attributes)• In other words: X → Y means Given any two tuples in r, if the X values are the same,then the Y values must also be the same. (but not viceversa)• Read “→” as “determines”FD’s Continued• An FD is a statement about all allowablerelations.– Must be identified based on semantics ofapplication.– Given some instance r1 of R,we can check if r1 violates some FD f, butwe cannot determine if f holds over R.• Question: How related to keys?• if “K → all attributes of R” then K is asuperkey for R(does not require K to be minimal.)• FDs are a generalization of keys.Example: Constraints on Entity Set• Consider relation obtained from Hourly_Emps: Hourly_Emps (ssn, name, lot, rating, wage_per_hr, hrs_per_wk)• We sometimes denote a relation schema by listing theattributes: e.g., SNLRWH• This is really the set of attributes {S,N,L,R,W,H}.• Sometimes, we refer to the set of all attributes of a relation byusing the relation name. e.g., “Hourly_Emps” for SNLRWHWhat are some FDs on Hourly_Emps?ssn is the key: S → SNLRWHrating determines wage_per_hr: R → Wlot determines lot: L → L (“trivial” dependnency)2Problems Due to R → W•Update anomaly: Should we be allowed to modify Win only the 1st tuple of SNLRWH?•Insertion anomaly: What if we want to insert anemployee and don’t know the hourly wage for his orher rating? (or we get it wrong?)•Deletion anomaly: If we delete all employees withrating 5, we lose the information about the wage forrating 5!S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40Hourly_EmpsDetecting ReduncancyS N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40Hourly_EmpsQ: Why was R → W problematic, but S →W not? Decomposing a Relation• Redundancy can be removed by “chopping”the relation into pieces (vertically!)• FD’s are used to drive this process.R → W is causing the problems, so decomposeSNLRWH into what relations?S N L R H123-22-3666 Attishoo 48 8 40231-31-5368 Smiley 22 8 30131-24-3650 Smethurst 35 5 30434-26-3751 Guldu 35 5 32612-67-4134 Madayan 35 8 40R W8 105 7Hourly_Emps2WagesRefining an ER Diagram• 1st diagram becomes:Workers(S,N,L,D,Si)Departments(D,M,B)– Lots associated withworkers.• Suppose all workers ina dept are assigned thesame lot: D → L• Redundancy; fixed by:Workers2(S,N,D,Si)Dept_Lots(D,L)Departments(D,M,B)• Can fine-tune this:Workers2(S,N,D,Si)Departments(D,M,B,L)lotdnamebudgetdidsincenameWorks_InDepartmentsEmployeesssnlotdnamebudgetdidsincenameWorks_InDepartmentsEmployeesssnBefore:After:Reasoning About FDs• Given some FDs, we can usually infer additional FDs:title → studio, star implies title → studio and title → startitle → studio and title → star implies title → studio, startitle → studio, studio → star implies title → starBut, title, star → studio does NOT necessarily imply thattitle → studio or that star → studio• An FD f is implied by a set of FDs F if f holdswhenever all FDs in F hold.• F+ = closure of F is the set of all FDs that are impliedby F. (includes “trivial dependencies”)Rules of Inference• Armstrong’s Axioms (X, Y, Z are sets of attributes):–Reflexivity: If X ⊇ Y, then X → Y–Augmentation: If X → Y, then XZ → YZ for any Z–Transitivity: If X → Y and Y → Z, then X → Z• These are sound and complete inference rules for FDs!– i.e., using AA you can compute all the FDs in F+ and onlythese FDs.• Some additional rules (that follow from AA):–Union: If X → Y and X → Z, then X → YZ–Decomposition: If X → YZ, then X → Y and X → Z3Example• Contracts(cid,sid,jid,did,pid,qty,value), and:– C is the key: C → CSJDPQV– Proj purchases each part using single contract: JP → C– Dept purchases at most 1 part from a supplier: SD → P• Problem: Prove that SDJ is a key for Contracts• JP → C, C → CSJDPQV imply JP → CSJDPQV(by transitivity) (shows that JP is a key)• SD → P implies SDJ → JP (by augmentation)• SDJ → JP, JP → CSJDPQV imply SDJ → CSJDPQV (by transitivity) thus SDJ is a key.Q: can you now infer that SD → CSDPQV (i.e., dropJ on both sides)?No! FD inference is not like arithmetic multiplication.Attribute Closure• Computing the closure of a set of FDs can be expensive. (Size ofclosure is exponential in # attrs!)• Typically, we just want to check if a given FD X → Y is in theclosure of a set of FDs F. An efficient check:– Compute attribute closure of X (denoted X+) wrt F.X+ = Set of all attributes A such that X → A is in F+• X+ := X• Repeat until no change: if there is an fd U → V in F such that U is in X+,then add V to X+– Check if Y is in X+– Approach can also be used to find the keys of a relation.• If all attributes of R are in the closure of X then X is a superkey forR.• Q: How to check if X is a “candidate key”?Attribute Closure (example)• R = {A, B, C, D, E}• F = { B →CD, D → E, B → A,
View Full Document