Midterm 3 Revision 1RevisionNormalizationFirst Normal FormSecond Normal FormSlide 6Third Normal FormSlide 8ReviewSlide 10Slide 11Lossless Join PropertySlide 13 Testing for the lossless join propertySlide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30OverviewSlide 32Slide 33Slide 34Slide 35The Decomposition AlgorithmSlide 37ExampleBCNF DecompositionSlide 40BCNF Decomposition - AlgorithmBCNF Decomposition - ExampleSlide 43QuestionsSlide 45Slide 46Attribute Preservation ConditionSlide 48Dependency Preservation ConditionSlide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Midterm 3 Revision 1Prof. Sin-Min LeeDepartment of Computer Science01/14/19 Revision•For each of the E-R problems, develop a relational schema•Identify the primary keys, foreign keys and attribute domains•Comment on the data quality of each of the relational schema that you have developed01/14/19 Normalization•A process to design good relations•Not a requirement of the relational model or any other model•Attempts to reflect the semantics of the data. This meaning is determined by the organization. –What does salary mean? Profits? Stock Price?•Relations can be in 1 NF, 2 NF, 3 NF, Boyce-Codd NF, 4 NF, and 5 NF•In this revision, we will concern ourselves with 1 NF, 2 NF, 3 NF and 4NF.01/14/19 First Normal Form•All attributes must be single-valued–Each row can assume only one value for a given attribute •Consider employees working for a given department at a given salary level–EMPLOYEE(EMPNO, department, salary)•Now consider employees who works for a given department and receive a raise every year. We are interested in maintaining data about each employee’s yearly salary. •EMPLOYEE(EMPNO, YEAR, department, salary)•Note the very different meanings of the attribute ‘salary’•Note the change in primary key - why?01/14/19 Second Normal Form01/14/19 Third Normal Form•Consider the following: A large urban university offers several types of scholarships. Let’s assume that there are four types and we have a predefined code for each. The type of scholarship that a student is eligible for is based on their major. An analyst sets up the following database:•(SSNO, name, major, GPA, scholarship type)–What problems are we likely to encounter? Why?•Note that scholarship type is not dependent on (determined by) SSNO. Rather, it is determined by major, which is determined by SSNO.•This represents a transitive dependency.01/14/19 Third Normal Form•Decompose the relation into two relations–Student(SSNO, name, major, GPA)–Scholarship_Eligibility(Major, Scholarship_Type)•The 3 NF rule–Once you are in 2 NF, examine if there are any transitive dependencies–Engage in lossless decomposition so as to remove these transitive dependencies–You are now in 3 NF!Review•Given R(A, B, C) and F={A->B}•Is R in 2NF?•Is R in 3NF?•Is R in BCNFReview•Given R(A, B, C) and F={A->B, B->C}•Is R in 2NF?•Is R in 3NF?•Is R in BCNFReview•Given R(A, B, C) and F={A->B, B->C}•Is R in 2NF?•Is R in 3NF?•Is R in BCNFLossless Join Property•A decomposition D = {R1, R2, ..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if for every relation instance r of R that satisfies F, the following holds:•/ *(<R1>(r), ... , <Rm>(r)) = rTesting for the lossless join property •Consider R(A, B, C, D, E) and F={AB C, B D}•D={R1(A, B, C), R2(B,D)}•Is D a lossless decomposition?Lossless Join Property•Consider R(A, B, C, D) and F= {AB C}•Let D1={R1(A, B, C), R2(C, D)}•/Let r be one of the legal instanceA B C D----------------------------a1 b1 c1 d1a2 b1 c1 d2a2 b2 c2 d2Lossless Join Property<R1>: <R2>:A B C C D------------------- -----------a1 b1 c1 c1 d1a2 b1 c1 c1 d2a2 b2 c2 c2 d2Lossless Join Property<R1> * <R2> :A B C D--------------------------------a1 b1 c1 d1a1 b1 c1 d2 <------- Spurious tuplea2 b1 c1 d1 <-------- Spurious tuplea2 b1 c1 d2a2 b2 c2 d2R1 and R2 are not a lossless decomposition.Lossless Join Property•Informally, a decomposition is lossless if no spurious tuples appear when the relations in the decomposition are JOINed. •Thus, decomposition D1 is not losslessTesting for the lossless join property •Step1: create a matrix S with one row i for each relation Ri in the decomposition D, and one column j for each attribute Aj in R;•Step 2: set S(i,j):=bij for all matrix entries;Testing for the lossless join property •Step 3. for each row i representing relation schema Rifor each column j representing attribute Ajif Ri includes attribute Ajthen set S(i,j):=ajTesting for the lossless join property •Step 4. repeat the following until a loop execution results in no changes to Sfor each functional dependency X Y in Ffor all rows in S which have the same symbols in the columns corresponding to attributes in XTesting for the lossless join property •Step 4 – continued . make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: –if any of the rows has an "a" symbol for the column, set the other rows to the same "a" symbol in the column –if no "a" symbol exists for the attribute in any of the rows choose one of the "b" symbols that appear in one of the rows for the attribute and set the other rows to the "b" symbols in the column;Testing for the lossless join property •Step 5If a row is made up entirely of "a" symbols, then the decomposition has the lossless join property-otherwise, it does not;/Testing for the lossless join property •Consider R(A,B,C) and F={AB, B C} D = {R1(A,B), R2(B,C)}A B C A B C---------------------- ----------------------R1 a1 a2 b13 => a1 a2 a3R2:b21 a2 a3 b21 a2 a3/•Thus, D is a lossless decomposition.Review•Given R(A, B, C) and F={FD1,FD2,FD3,…,FDk}•Is R in 2NF?•Is R in 3NF?•Is R in BCNF•which is in 3NF but not in BCNF.Overview•It is possible to decompose any relational schema into a set of relational schemas with the following properties:•1) Attribute Preserving•2) FDs preserving•3) Lossless JoinBCNF normalisationHGFEDCBAHGFBCNFFEDCBAnot 2NFFCBCNFDBBCNFEDCBAnot 2NFECBABCNFThe Decomposition Algorithm•Input: A relational schema R and a set of FDs F.•Output: A set of relational schemas R1, R2, ..., RmThe
View Full Document