Unformatted text preview:

CS211 Lecture: Database Designlast revised November 21, 2006Objectives:1. To introduce the anomalies that result from redundant storage of data2. To introduce the notion of functional dependencies3. To introduce the basic rules for BCNF normalizationMaterials: 1. Handout: progressive normalization of a registration scheme to BCNFI. Basic Principles of Relational Database DesignA. The topic of relational database design is a complex one, and one we consider in detail in the DBMS course. For now, we look at a few simple principles, which we will make more formal later.B. One principle is that each relation should have a subset of its attributes which, together, form a PRIMARY KEY for the relation.1. It is helpful, then, to specify the primary key of each relation as part of the design process.2. Of course, we need the primary key of an entity in order to create the tables for any relationships in which it participates, since the primary keys of the entities become columns in the table representing the relationship.3. Good DBMS software will be capable of enforcing a PRIMARY KEY CONSTRAINT - i.e. a primary key can be declared when a table is created, and the DBMS will signal an error if an attempt is made to insert or modify a row in such a way as to create two rows with the same primary key value(s).C. Another principle is to develop the database scheme in such a way as o avoid storage of redundant information.1. Often, this will involve decomposing a relation scheme into two or more smaller schemes.EXAMPLE: We might be inclined to represent information about student registrations by a scheme like this:1Enrolled(department, course_number, section, days, time, room, title,student_id, last_name, first_name, faculty_id, professor_name)HANDOUT2. However, a scheme like this exhibits several serious problems, all arising from REDUNDANCY:a) The course's id, days, time, room, and title are stored once for each student enrolled - potentially dozens of copies.b) The student's id, last and first names are stored once for each course the student is enrolled in.c) The professor's id and name is stored once for each student enrolled in each course he/she teaches3. Redundancy is a problem in its own right, since it wastes storage, and increases the time needed to back up or transmit the information. Moreover, redundancy gives rise to some additional problems beyond wasted space and time:a) The UPDATE ANOMALY.Suppose the room a course meets in is changed. Every Enrolled row in the database must now be updated - one for each student enrolled.(1) This entails a fair amount of clerical work.(2) If some rows are updated while others are not, the database will give conflicting answers to the question "where does ____ meet?"b) An even worse problem is the DELETION ANOMALY. (1) Suppose that the last student enrolled is dropped from the course. All information about the course in the database is now lost! (One might argue that this is not a problem, since courses with zero enrollment make no sense. However, this could happen early in the registration process - e.g. if a senior is mistakenly registered for a freshman course, and this is 2corrected before freshmen register. In any case, the decision to delete a course should be made by an appropriate administrator, not by the software!(2) Likewise if a student is dropped from all his/her courses, information about the student is lost. This may not be what is intended.c) There is a related problem called the INSERTION ANOMALY:(1) We cannot even store information in the database about a course before some student enrolls - unless we want to create a "dummy" student.(2) Likewise, we cannot store information about a student until the student is enrolled in at least one course.(3) Can you think of another example?ASKWe cannot store information about a faculty member who is not teaching any courses - e.g. a faculty member on sabbatical.4. A better scheme - though still not a perfect one, as we shall see - would be to break this scheme up into several tables:Enrolled(department, course_number, section, student_id)Course(department, course_number, section, days, time, room, title, faculty_id)Student(student_id, last_name, first_name)Professor(faculty_id, professor_name)The process of breaking a large single scheme into two or more smaller schemes is called DECOMPOSITIOND. Decomposition must be done with care, lest information be lost.EXAMPLE: Suppose, in avoiding to store redundant information, we had come up with this decomposition (same as above, except for no Enrolled scheme, and no faculty_id attribute in Course.)3Course(department, course_number, section, days, time, room, title)Student(student_id, last_name, first_name)Professor(faculty_id, professor_name)1. It appears that we haven't lost any information - all the data that was stored in the original single scheme is still present in some scheme. Indeed, each value is stored in exactly one table.2. However, we call such a decomposition a LOSSY-JOIN DECOMPOSITION, because we have actually lost some information.What information have we lost?ASKa) We have lost the information about what students are enrolled in what courses.b) We have lost the information about which faculty member teaches which course.c) In contrast, our original decomposition was LOSSLESS-JOIN. If we did the following natural join (where |X| stands for natural join): Enrolled |X| Course |X| Student |X| Professor we would get back the undecomposed table we started with.(If we tried to do a similar set of natural joins on our lossy-join decomposition, we would end up with every student enrolled in every course, taught by every professor!)3. The "acid test" of any decomposition performed to address redundancy is that it must be LOSSLESS-JOIN.E. A principle related to using lossless join decompositions to avoid redundancy is the explicit identification of FOREIGN KEYS.1. In our lossless join decomposition, what made the decomposition work correctly is that the first scheme - Enrolled - had foreign keys that referenced the Course and Student tables; and Course had a foreign key that referenced the Professor table.42. Many DBMS's (though not MySQL, unfortunately), allow foreign keys to be declared when a table is created. The DBMS will then enforce the rule that no row can be inserted or modified in such a way as to have foreign key values that do not appear in some row of the table being referenced.e.g. if we made


View Full Document

Gordon CPS 211 - Database Design

Download Database Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?