UST QMCS 450 - Part 19 Implementation

Unformatted text preview:

Part 19 ImplementationCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 2 Performance Efficiency Depends On: • Physical data storage • Use of indices • Query optimization • Compiled vs. interpreted execution • Ability to predict database usage, communicate that prediction to the DBMS, and make use of that informationCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 3 Physical Data Storage Flat files/tables can be inefficient PROBLEM: Access to an employee and all assigned tasks may require one physical disk access per task Possible Solutions: • use a hybrid DBMS (hierarchical, network) (different physical and logical) • store first normal form relations • store in master/detail (JoinDef) form ("array" within relation) One relation per entity can be inefficient PROBLEM: 80-20 rule states that 80% of all retrievals will occur against 20% of the attributes (e.g. emergency contact) Possible Solutions: • multiple relations per entity, cluster attributes • separate access mechanism for each attributeCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 4 Table Organization Default organization in most systems is a "heap" • non-sequential file, usually in order of input Organizations commonly available: heap non-sequential, duplicates, new records at end cheap compressed heap heapsort sorted at modify, maintained as heap cheapsort compressed heapsort hash random hash table, no duplicates chash compressed hash btree dynamic B-tree, no duplicates cbtree compressed btree isam static indexed-sequential, no duplicates cisam compressed isam sorted maintained as sorted sequential Ingres allows table organization to be established via: MODIFY emp TO chash UNIQUE ON name, WHERE FILLFACTOR = 50;Copyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 5 Use of Indices An index is used to speed up retrieval • aid "associative" retrieval by rapidly mapping values to locations • can answer existence questions without data access • “no free lunch” principle - slows updates General index types • Sequential - useful for range queries • Direct - useful for list queries Specific index types • sorted sequential • direct relation • B-tree • hashing • pointer chains • bit mapsCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 6 Index Creation 1. Create an index CREATE INDEX rateindex ON emp(rate) • generates a table of the form rate pointer In Oracle, this new index table is sorted and can be used immediately. In Ingres, this index table is a “cheap” and is useless until modified. 2. In Ingres, organize the index MODIFY rateindex TO Btree on rate; 3. In Oracle, automatically get an index when you specify that a field is a PRIMARY KEY or UNIQUE 4. Use of the index is selected by query optimizerCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 7 Query Optimization REQUIRED in any mainframe / mini environment to get acceptable performance OPPORTUNITY to capitalize on the strengths of the relational model System free to • decide which record-level operations are needed • use information available to it • data values • history of access • select from a wide variety of alternatives EXAMPLE: SELECT DISTINCT s.sname FROM s, sp WHERE s.s# = sp.s# AND sp.p# = 'p2'; (100 suppliers - 10,000 shipments - 50 shipments of p2) Method 1: Generate Cartesian product, then restrict Method 2: Restrict sp before join Method 2 is 1/2 of 1% the work of Method 1!Copyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 8 Query Optimization Process 1. Cast query into an internal representation • SQL has many ways to state the same query • usually cast as an abstract syntax tree 2. Convert query into a canonical form • apply rules for restating query • usually converted to conjunctive normal form p or (q and r) --> (p or q) and (p or r) 3. Choose candidate low-level procedures • consider availability of indices, physical locations of data, and size of relations 4. Generate query plans and choose the cheapest • can be done at compile time or run time • estimate the number of disk accesses or use a rule-based system • don't generate/consider all combinationsCopyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 9 Index Selection Query optimizers generally try to use indices when available to speed retrieval. However, indices slow update speed and take disk space. This is further complicated by the fact that you can index combinations of columns in addition to single columns. So if you have a table with 12 columns, there are over 1.3 billion possible column combinations that can be indexed in this one table alone. In addition, each combination of columns can be indexed multiple times, using various index organizations. Below is a set of “rules of thumb” to use when setting up indices: 1. If you are doing almost exclusively data entry, forget about indices until you start doing queries. 2. Index your primary keys. (For example deptno in table dept, and the combination of ename, project_id, and tname in table task.) 3. Index any field(s) referenced as foreign keys. (For example mgr and deptno in table emp) 4. Index any fields used in a significant number of WHERE clauses, provided that no one value occurs in more than 20% of the rows. (For example, if there were lots of retrievals on the task table both by hours and by tname, hours would be better than tname.)Copyright  1971-2002 Thomas P. Sturm Implementation Part 19, Page 10 Mathematical Sidelight on Number of Possible Indices For 1 field, there is only 1 index possible. For 2 fields, a and b, you can index a, b, ab, ba, so 4 indices are possible. For 3 fields, a, b, and c, you can index a, b, c, ab, ba, ac, ca, bc, cb, abc, acb, bac, bca, cab, cba, so 15 indices are possible. In general, you can create n + n(n-1) + n(n-1)(n-2) + … + n! indices on n columns. This can be re-written as n!(1/(n-1)! + 1/(n-2)! + 1/(n-3)! + … + 1/(1!) + 1/(0!)), but the sum in the parentheses approaches e = 2.718281828… as n gets large, so the number of indices is approximately e(n!). The table below gives the exact values for all small values of n. # of columns Number of indices possible 1 1 2 4 3 15 4 64 5 325 6 1956 7 13699 8 109600 9 986409


View Full Document

UST QMCS 450 - Part 19 Implementation

Download Part 19 Implementation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Part 19 Implementation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Part 19 Implementation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?