CMSC 424 Database design Lecture 9 Normalization Mihai Pop Administrative SQL assignment questions Sharath Project please pair up submit pairs by Monday March 4 For midterm chapters 1 4 6 Anything you d like me to go over now Accessing databases from software Embedded SQL special commands within C Java etc code SQL APIs ODBC JDBC Perl DBI Ruby on Rails Basic protocol connect to server run SQL commands tuples returned as cursors iterators allows you iterate over each tuple in result table disconnect from server Read chapter 4 You ll need this for project SQL last thoughts You learn best through practice Every database system is different syntax conventions etc READ THE REFERENCE MANUALS Relational Database Design Where did we come up with the schema that we used E g why not store the actor names with movies Or store the author names with the papers Topics Formal definition of what it means to be a good schema How to achieve it Movies Database Schema Movie title year length inColor studioName producerC StarsIn movieTitle movieYear starName MovieStar name address gender birthdate MovieExec name address cert netWorth Studio name address presC Changed to Movie title year length inColor studioName producerC starName merged into above MovieStar name address gender birthdate MovieExec name address cert netWorth Studio name address presC Example Relation Movie title year length inColor studioName producerC starName merged into above MovieStar name address gender birthdate MovieExec name address cert netWorth Studio name address presC Title Year Length StudioName prodC StarName Star wars 1977 120 Fox 128 Hamill Star wars 1977 120 Fox 128 Fisher Star wars 1977 120 Fox 128 H Ford King Kong 2005 Studio A 150 Naomi King Kong 1940 Studio B 20 Faye What we re looking for in a schema Low no redundancy Easy to understand structure Easy to write queries Efficient to answer queries Ease of maintaining integrity of the data Difficult to do this by hand Normalization formal algorithms for creating a reasonable schema Combine Schemas Suppose we combine borrow and loan to get bor loan customer id loan number amount Result is possible repetition of information L 100 in example below A Combined Schema Without Repetition Consider combining loan branch and loan loan amt br loan number amount branch name No repetition as suggested by example below What About Smaller Schemas Suppose we had started with bor loan How would we know to split up decompose it into borrower and loan Write a rule if there were a schema loan number amount then loan number would be a candidate key Denote as a functional dependency loan number amount Functional Dependencies set of attributes whose values uniquely determine the values of the remaining attributes e g a key defines an FD e g in EMP eno ename sal key FDs eno ename DEPT dno dname floor eno sal WORKS IN eno dno hours other FDs eno dno hours for every pair of values of eno dno there exists exactly one value for hours in general if R and R then holds in the extension r R of R iff for any pair t1 and t2 tuples of r R such that t1 t2 then it is also true that t1 t2 uniqueness of values we can use the FDs as constraints that we want to enforce e g keys for checking if the FDs are satisfied in the database A B satisfied no A C yes C A no AB D yes R A 1 1 2 2 3 B 1 2 2 3 3 C 1 1 2 2 2 D 1 2 2 3 4 FDs continued trivial dependencies if closure need all FDs some logically implied by others e g if A B B C then A C is implied given F set of FDs find F the closure of all logically implied by F Amstrong s axioms reflexivity if then trivial FD augmentation if then transitivity if then More FD Rules union rule decomposition rule pseudotransitivity rule Example R A B C G H I F A B A C CG H CG I B H F A H CG HI AG I AG H if then if then if then A B H transitivity CG H CG I union rule A C augmentation AG CG I CG H there is a non trivial exponential algorithm for computing F Closure of Attribute Sets useful to find if a set of attributes is a superkey the closure of a set of attributes under F is the set of all attributes that are functionally determined by there is an algorithm that computes the closure Example R A B C G H I F A B A C CG H CG I B H Algorithm to Compute AG start with result AG A B expands result AGB A C expands result AGBC CG H result AGBCH CG I result AGBCHI B H no more expansion Note that since G is not on any right hand side no subset of the attributes can be a superkey unless it contains G for there is no FD to generate it
View Full Document
Unlocking...