SJSU CMPE 226 - DB Normalization - D926321

Home> Schools> San Jose State University> Computer Engineering (CMPE) > CMPE 226> DB Normalization

DOC PREVIEW

SJSU CMPE 226 - DB Normalization

School name San Jose State University

Course Cmpe 226- Database Systems

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Data Normalization FundamentalsData Normalization Fundamentalsby Luke ChungPresident of FMS Inc.http://www.fmsinc.comIntroductionThe ability to analyze data in Access is a fundamental skill that all developers must master. The better you are at organizing your data and knowing how to analyze it, the easier your application development will be. There are lots of ways to analyze data and different techniques must be used depending on your goal. However, there are a few fundamentals that must be understood:- Data Normalization- Separate Databases for the Application and DataData Normalization OverviewThere are lots of articles and books on data normalization. They usually scare people including me. I am not going to get into a theoretical discussion of the pros and cons of data normalization levels. Basically, it comes down to this: how to store and retrieve data efficiently. This differs depending on the database used, so the more you understand how to manipulate data in Access, the more obvious the way you should store data in tables and fields. A primary goal of good database design is to make sure your data can be easily maintained over time. Databases are great at managing more records. They are terrible if fields need to be added since all its queries, forms, reports, and code are field dependent.Spreadsheet GurusData normalization is a particularly difficult concept for spreadsheet experts. Having been a spreadsheet developer prior to using databases, I sympathize with those struggling to make the transition. The main reason you’re using a database rather than a spreadsheet is probably because have so much data you can’t manage it properly in Excel. The fundamental advantage of a database is that it allows your data to grow without causing other problems. The big disaster in most spreadsheets is the need to add new columns or worksheets (for new years, products, etc.) which cause massive rewrites of formulas and macros that are difficult to debug and test thoroughly. Been there. Designed properly, databases let your data grow over time without affecting your queries or reports. You need to understand how to structure your data so your database takes advantage of this. How you store your data is totally different from how you show it. So, stop creating fields for each month, quarter or year, and start storing dates as a field. You’ll be glad youdid it:Non-Normalized "Spreadsheet" DataNormalized DataBoth tables in the example above contain the same data, but they are distinctly different. Notice how the normalized table lets you easily add more records (years) without forcing a restructuring of the table. In the non-normalized table, adding next year’s data would require adding a field. By avoiding the need to add a field when you get more data, you eliminate the need to update all the objects (queries, forms, reports, macros, and modules)that depend on the table. Basically, in databases, new records are "free" while new columns are "expensive". Try to structure your tables so you don’t need to modify their fields over time.Efficient Storage and Unique IDs A fundamental principle of data normalization is the same data should not be stored in multiple places. Information that change over time such as customer names and addressesshould be stored in one table and other tables referencing that information should link to it. Unique IDs (key fields) with no connection to the data are used to link between tables. For instance, customer information should be stored in a customer table with a Customer ID field identifying the record. Access lets you use an AutoNumber field to automaticallyassign new ID numbers. It doesn’t matter what the ID number is or whether they are consecutive. The ID number has no meaning other than identifying the record and letting records in other tables link to that record.I’ve seen databases where people use ID numbers that combine a few letters of the last name, first name, and number. That makes no sense and creates a mess over time. The ID should just be a number and if you want your data sorted in a particular order, use a secondary index.Data Normalization ExtremesIt is important to not take data normalization to extremes in Access. Most people are familiar with separating Customers into its own table. If not, they quickly discover why they need to. But what about optional fields like telephone numbers: business phone, fax number, mobile phone, modem, home phone, home fax, etc.? Most customers won’t haveall those numbers, but you may want to store them for those that do. There are three approaches:1. All the fields are in the Customer table.2. A separate table is created for each type of telephone (a one-to-one link). The table would contain the customer ID and telephone number.3. A telephone table is created with these fields: customer ID, the Telephone Type ID, and number so you could conceivably have unlimited phone numbers (a one-to-many link).There are arguments for each alternative and to some extent it depends how well you know your data. Data normalization purists and programs such as Erwin often suggest separate table(s). Obviously option 3 is the most flexible since it allows you to support anunlimited number and type of phones. If you cannot limit the number of alternatives, this is your only choice. However, if you can limit the types, you should opt for option 1 in Access.First, Access stores data in variable length records. One of the reasons for data normalization is to save disk space. Old file formats such as dBase, FoxPro, and Paradox stored data in fixed length records with each record taking the same space regardless if itsfields were blank. The more fields, the larger the table. Not only is disk space is cheap today, Access records only grow if data is contained in them. Therefore, this kind of data normalization is not necessary for efficient data storage in Access. Second, and more important, the retrieval of data across multiple tables may be an unnecessary hassle. If you are always going to show the customer’s phone and fax number, retrieving those records out of another table is an unnecessary and will hurt your performance. The way Access is designed, it is much easier to just pick the fields from the Customer table rather than using a separate query or sub-report to grab each telephonetype separately.Storing Duplicate DataIt is very important to remember there are situations where you must store what seems

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

SJSU CMPE 226 - DB Normalization

Sign up for free to view:

Please select your school