CMSC424 Normalization Instructor Amol Deshpande amol cs umd edu Databases Data Models Conceptual representation of the data Data Retrieval How to ask questions of the database How to answer those questions Data Storage How where to store data how to access it Data Integrity Manage crashes concurrency Manage semantic inconsistencies Relational Database Design Where did we come up with the schema that we used E g why not store the actor names with movies If from an E R diagram then Did we make the right decisions with the E R diagram Goals Formal definition of what it means to be a good schema How to achieve it Movies Database Schema Movie title year length inColor studioName producerC StarsIn movieTitle movieYear starName MovieStar name address gender birthdate MovieExec name address cert netWorth Studio name address presC Changed to Movie title year length inColor studioName producerC starName StarsIn merged into above MovieStar name address gender birthdate MovieExec name address cert netWorth Studio name address presC Is Is this this a a good good schema schema Movie title year length inColor studioName producerC starName Title Year Length inColor StudioName prodC StarName Star wars 1977 121 Yes Fox 128 Hamill Star wars 1977 121 Yes Fox 128 Fisher Star wars 1977 121 Yes Fox 128 H Ford King Kong 2005 187 Yes Universal 150 Watts King Kong 1933 100 no RKO 20 Fay Issues 1 Redundancy higher storage inconsistencies anomalies update anomalies insertion anamolies 2 Need nulls Unable to represent some information without using nulls How to store movies w o actors pre productions etc Movie title year length inColor studioName producerC starNames Title Year Length inColor StudioName prodC StarNames Star wars 1977 121 Yes Fox 128 Hamill Fisher H ford King Kong 2005 187 Yes Universal 150 Watts King Kong 1933 100 no RKO 20 Fay Issues 3 Avoid sets Hard to represent Hard to query Smaller Smaller schemas schemas always always good good Split Studio name address presC into Studio1 name presC Studio2 name address Name presC Name Address Fox 101 Fox Address1 Studio2 101 Studio2 Address1 Universial 102 Universial Address2 This process is also called decomposition Issues 4 Requires more joins w o any obvious benefits 5 Hard to check for some dependencies What if the address is actually the presC s address No easy way to ensure that constraint w o a join Smaller Smaller schemas schemas always always good good Decompose StarsIn movieTitle movieYear starName into StarsIn1 movieTitle movieYear StarsIn2 movieTitle starName movieTitle movieYear movieTitle starName Star wars 1977 Star Wars Hamill King Kong 1933 King Kong Watts King Kong 2005 King Kong Faye Issues 6 joining them back results in more tuples than what we started with King Kong 1933 Watts King Kong 2005 Faye This is a lossy decomposition We lost some constraints information The previous example was a lossless decomposition Desiredata No sets Correct and faithful to the original design Avoid lossy decompositions As little redundancy as possible To avoid potential anomalies No inability to represent information Nulls shouldn t be required to store information Dependency preservation Should be possible to check for constraints Not Not always always possible possible We We sometimes sometimes relax relax these these for for simpler simpler schemas schemas and and fewer fewer joins joins during during queries queries Approach 1 We will encode and list all our knowledge about the schema Functional dependencies FDs SSN name means SSN implies length If two tuples have the same SSN they must have the same name movietitle length Not true But movietitle movieYear length True 2 We will define a set of rules that the schema must follow to be considered good Normal forms 1NF 2NF 3NF BCNF 4NF A normal form specifies constraints on the schemas and FDs 3 If not in a normal form we modify the schema Functional Dependencies Let R be a relation schema and R and R The functional dependency holds on R iff for any legal relations r R whenever two tuples t1 and t2 of r have same values for they have same values for t1 t2 t1 t2 Example A 1 1 3 B 4 5 7 On this instance A B does NOT hold but B A does hold Functional Dependencies Difference between holding on an instance and holding on all legal relation Title Year Length inColor StudioName prodC StarName Star wars 1977 121 Yes Fox 128 Hamill Star wars 1977 121 Yes Fox 128 Fisher Star wars 1977 121 Yes Fox 128 H Ford King Kong 1933 100 no RKO 20 Fay Title Year holds on this instance Is this a true functional dependency No Two movies in different years can have the same name Can t draw conclusions based on a single instance Need to use domain knowledge to decide which FDs hold Functional Dependencies Functional dependencies and keys A key constraint is a specific form of a FD E g if A is a superkey for R then A R Similarly for candidate keys and primary keys Deriving FDs A set of FDs may imply other FDs e g If A B and B C then clearly A C We will see a formal method for inferring this later Definitions 1 A relation instance r satisfies a set of functional dependencies F if the FDs hold on the relation 2 F holds on a relation schema R if no legal allowable relation instance of R violates it 3 A functional dependency A B is called trivial if B is a subset of A e g Movieyear length length 4 Given a set of functional dependencies F its closure F is all the FDs that are implied by FDs in F Approach 1 We will encode and list all our knowledge about the schema Functional dependencies FDs Also Multi valued dependencies briefly discuss later Join dependencies etc 2 We will define a set of rules that the schema must follow to be considered good Normal forms 1NF 2NF 3NF BCNF 4NF A normal form specifies constraints on the schemas and FDs 3 If not in a normal form we modify the schema BCNF Boyce Codd Normal Form A relation schema R is in BCNF if Every functional dependency A B that holds on it is EITHER 1 Trivial OR 2 A is a superkey of R Why is BCNF good Guarantees that there can be no redundancy because of a functional dependency Consider a relation r A B C D with functional dependency A B and two tuples a1 b1 c1 d1 and a1 b1 c2 d2 b1 is repeated because of the functional dependency BUT this relation is not in BCNF A B is neither trivial nor is A a superkey for the relation BCNF and Redundancy Why does redundancy arise Given a FD A B if A is repeated B A has to be repeated 1 If rule 1 is satisfied B A is empty so not a problem 2
View Full Document
Unlocking...