Unformatted text preview:

Recap of Feb 13 SQL Relational Calculi Functional Dependencies SQL multiple group bys having lots of examples Tuple Calculus Domain Calculus Functional Dependencies F the closure of the set of FDs on a given relation Relational Database Design A major goal in designing a database is to have a schema that makes queries simpler easy to phrase avoids redundancies and update anomalies about which more later Schema and Query Simplicity 1 Example Schema 1 EMP eno ename sal dno DEPT dno dname floor mgr Query 1 find all employees that make more than their manager select e ename from EMP e EMP m DEPT d where e dno m dno and d mgr m eno and e sal m sal Query 2 for each department find the maximum salary select d dname max e sal from EMP e DEPT d where e dno d dno group by d dno Q1 requires two joins Q2 requires a join and a group by Schema and Query Simplicity 2 Example Schema 2 a single relation ED eno ename sal dno dname floor mgr Query 1 find all employees that make more than their manager select e ename from ED e ED m where e mgr m eno and e sal m sal Query 2 for each department find the maximum salary select d dname max sal from ED e group by dno Q1 requires one join Q2 requires just a group by Schema and Query Simplicity 3 How did we get simpler queries Schema 2 was a more complicated relation with more information in essence ED was EMP and DEPT from Schema 1 with the join precomputed Should we just precompute the joins and store bigger relations Taken to the extreme we could compute the universal relation with all attributes inside it and null values for those values that make no sense Why wouldn t we want to do that Problems with too complex relations repetition of information data redundancy and inability to represent certain information update anomalies DB Design Redundancy and Anomalies Redundancy repetition of information each department is repeated for each employee in it great risk of inconsistencies suppose the department is moved to a new floor A simple update change in mgr name department floor etc in Schema 1 becomes multiple updates in Schema 2 Anomalies inability to represent some types of information departments can t exist without employees A department cannot exist until the first employee is inserted and it can no longer exist when the last employee is deleted from the ED relation DB Design Dealing with Anomalies So complex relations make for simpler queries but have the disadvantages of data redundancy and creation of anomalies How do we balance the two objectives We want simple queries no anomalies minimize data redundancy If we start with Schema 2 and discover anomalies we can decompose the relation s until the problems go away This process is called normalization Objectives of DB Design Normalization no redundancy for space efficiency and to reduce the potential for inconsistencies update integrity avoid update anomalies linguistic efficiency simpler queries are much better for the application programmer and for the query optimizer good performance smaller relations imply more joins bad Lossy Decompositions Not all decompositions are reversible lossless Example Shipment S P J decomposed into SP S P and s1 s2 s2 s3 s4 p1 p2 p3 p3 p4 j1 j1 j2 j3 j3 s1 s2 s2 s3 s4 p1 p2 p3 p3 p4 SJ S J s1 s2 s2 s3 s4 j1 j1 j2 j3 j3 Lossy Decompositions Shipment S P J decomposed into SP S P and s1 s2 s2 s3 s4 p1 p2 p3 p3 p4 j1 j1 j2 j3 j3 s1 s2 s2 s3 s4 p1 p2 p3 p3 p4 p1 p2 p3 p3 p4 SJ P J j1 j1 j2 j3 j3 If we join SP and SJ again into SP PJ S P P J we get s1 s2 s2 s2 s3 s3 s4 p1 p2 p3 p3 p3 p3 p4 p1 p2 p3 p3 p3 p3 p4 j1 j1 j2 j3 j2 j3 j3 from the joined tuples we cannot deduce the original form of the data this is called the connection trap and the decomposition is lossy Example of Lossy Join Decomposition Lossy join decompositions result in information loss Example decomposition of R A B into R 1 A and R2 B R A B R 1 1 2 1 R1 X R2 A B 1 2 1 2 A 1 2 R2 B Decomposition Continued Decompose the relation schema All attributes of an original schema R must appear in the decomposition R1 R2 Lossless reversible join decomposition for all possible relations r on schema R the decomposition into R1 R2 is lossless if r R1 r R2 r The decomposition of R into R1 and R2 is lossless if and only if at least one of the following dependencies is in F R1 R 2 R 1 R1 R2 R2 Lossless Join Decomposition and Functional Dependencies So FDs can help determine whether a decomposition is lossless R is a relation schema and F its FDs Then a decomposition R R1 R2 is lossless if at least one of the following dependencies holds R1 R 2 R 1 R1 R 2 R 2 either of the above FDs guarantees uniqueness in the mapping and therefore that the decomposition is lossless Dependency Preservation Dependencies are preserved in a decomposition if we do not need to join in order to enforce FDs all FDs remain intra relational and do not become inter relational To check if a decomposition is dependency preserving we need to examine all FDs in F There is an algorithm for testing dependency preservation requires the computation of F Goals of Normalization Decide whether a particular relation R is in good form if it is not in good form decompose it into a set of relations R1 R2 R3 Rn such that each relation is in good form the decomposition is a lossless join decomposition based upon functional dependencies Normalization Types of FDs in R A B C D with A B a candidate key trivial partial AB A A C C depends upon a part of the key TEACH student teacher subject student subject teacher students not allowed in the same subject of two different teachers teacher subject transitive A C D each teacher teaches only one subject ED eno ename sal dno dname floor mgr eno dno mgr Normalization using FDs When we decompose a relation schema R with a set of functional dependencies F into R1 R2 R3 Rn we want lossless join decomposition otherwise the decomposition results in loss of information relative to the original schema R no redundancy the relations R i should be in either BCNF BoysCodd Normal Form or 3NF Third Normal Form about which more in a slide or two Dependency preservation let Fi be the set of dependencies in F that include only attributes in R i preferably the decomposition should be dependency perserving That is F2 F3 Fn F F1 Otherwise checking updates for violation of FDs may require computing joins which is expensive The Normal …


View Full Document

UMD CMSC 424 - SQL, Relational Calculi, Functional Dependencies

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Loading Unlocking...
Login

Join to view SQL, Relational Calculi, Functional Dependencies and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SQL, Relational Calculi, Functional Dependencies and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?