Unformatted text preview:

Databases: The Table1 IntroductionIntroduce different scenarios as to how we come to use a database• in industry, data collected from manufacturing process in databasesand interested in the production process and improving, e.g., yield.• a clinical trial where data is gathered on observational units for avariety of different purposesClinical trials study how well a new drug or treatment works, and inorder for the Food and Drug Administration (FDA) to approve thedrug, there must be convincing evidence that the treatment is safeand effective. therefore it is critical that accurate, reliable, and securedata are kept on the patients involved.Clinical trials involve many people, including doctors and nurses atmultiple remote locations who monitor the health of the patient, labworkers who process lab tests, s ocial workers and health care profes-sionals who maintain contact with the patients, a researcher team,including doctors and statisticians who follow the progress of the trial,analyze the data, and report results, data managers and programmerswho collect and clean data, and managers who oversee the trial. Theseteam members must share ideas, files, information and knowledge ona real time basisClinical trials involve large numbers of patients over long periods oftime. Several kinds of information need to be kept on a patient, in-cluding personal data such as name and address, lab results, and whois the attending physician. After an initial interview and once a par-ticipant agrees to join the study, a baseline visit gives informationagainst which to measure future changes. The participant receives the1test drug, a comparison drug, or a placebo, and visits the physician’sclinic on numerous occasions for check ups and additional lab workto assess the effects of the treatment and the health of the patient.Typically, patients return to the clinic at regular time intervals, butpatients may miss appointments, drop out of the study, and other-wise have varying numbers of checkups. Clinical studies also haveenrollment windows during which patients can join the study, and asthe study progresses, patients may drop out before completion of thestudy. Live data – monitor the results for ethical stop of trial whentreatment has been shown to be far superior to another.• information is gathered for tracking inventory and sales in Wal-Mart.Different groups decide to “mine” it for relationships to see if they canimprove the Supply Chain Network (SCN), marketing strategies, etc.• you are starting a study with different types of data (images, numbers,files, etc.) and a large quantity of it (e.g. from a collection source suchas a computer network). Rather than using some ad hoc solution tomanage the data without knowing precisely how you will use it, youchoose to keep your options open and to use a general database systemto manage the data. S-Net is an example.Cover topics such as• imposed on users because of corporate/institutional approaches togathering and managing data• meta-data• synchronization• client-server computing• security2• performance (specialized)• connections to data frames and statistical data “models”• live dataOther advantages of a database:Synchronized access to dataPropagation and standards enforced when updates, deletions, and addi-tions madeCentralized data for backupsOften times we are forced to use a database because that is how it thedata are m ade available to us.2 The Basic Relational Component: The TableThe basic conceptual unit in a relational database is the two-dimensionaltable. A simple example appears in Figure 1, where the table containslaboratory results and test dates for three patients in a hypothetical clinicaltrial. The data form a rectangular arrangement of values similar to a dataframe, where a row represents a case, rec ord, or experimental unit, and acolumn represents a variable, characteristic, or attribure of the cases. In thisexample, the three columns correspond to a patient identification number,the date of the patient’s lab test, and the result of the test, and each of theeight rows a specific lab test for a particular patient. We see that patient#101 received tests on four occasions, patient #102 was given three tests,and the third patient has been tested only once.The terminology used in database m anageme nt differs from a statisti-cian’s vocabulary. A data frame or table is called a relation. Rows in tablesare commonly called tuples, rather than cases, and columns are known asattributes. The degree of a table corresponds to its number of columns,and the cardinality of a table refers to the number of rows. Statisticians3ID Test Date Lab Results101 2000-01-20 3.7101 2000-03-15 NULL101 2000-09-21 10.1101 2001-09-01 12.9102 2000-10-20 6.5102 2000-12-07 7.3102 2001-03-13 12.2103 2000-02-16 10.1Figure 1: Lab results for 3 patients in a hypothetical clinical trial. Reportedhere are the patient identification number (ID), the date of the test, and theresults. The results from patient #101s test on March 15, 2000 are missing.Object Statistics DatabaseTable Data frame RelationRow Case TupleColumn Variable AttributeRow ID Row name KeyRow count size cardinalityColumn count dimension degreeFigure 2: Correspondence of statistics descriptors to database terms for atwo-dimensional table.usually refer to these as the dimension and the sample size or populationsize, respectively. Table 2 summarizes these various table descriptors.2.1 EntityAn entity is an abstraction of the database table. It denotes the generalobject of interest. In the example found in Figure 1, the entity is a lab test.An instance of the entity is a single, particular occurrence, such as the labtest that patient #102 received on the 7th of Decembe r 2000. A naturalfollow on to the idea that a case is a single, particular occurrence of theentity, is that the rows in a table are unique. To uniquely identify each rowin the table, we use what is called a key, which is simply an attribute, or4a combination of attributes. In our clinical trial (Figure 1), the key for thetable is a composite key made from the patient identfication number andtest date. (We assume here that patients do not have more than one lab teston the same day). When we look over the rows in the table, we see that thetest dates are unique, yet we do not use the single attribute test date for thekey to this table because although we have not observed two patients withthe same test date so far,


View Full Document
Download Databases - The Table
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Databases - The Table and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Databases - The Table 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?