DOC PREVIEW
Berkeley COMPSCI 186 - Physical Database Design and Tuning

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Physical Database Design and TuningIntroductionUnderstanding the WorkloadCreating an ISUD ChartDecisions to MakeIndex SelectionIssues to Consider in Index SelectionIssues in Index Selection (Contd.)Example 1Example 2Examples of ClusteringClustering and JoinsMulti-Attribute Index KeysIndex-Only PlansHorizontal DecompositionsHorizontal Decompositions (Contd.)Masking Conceptual Schema ChangesIndex Tuning “Wizards”Tuning Queries and ViewsMore Guidelines for Query TuningGuidelines for Query Tuning (Contd.)Points to RememberPoints to remember (Contd.)Physical Database Design and TuningR&G - Chapter 20Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should call this dream or phantasm real enough, if, using reason well, we were never deceived by it. Baron Gottfried Wilhelm von LeibnizIntroduction•We have talked at length about “database design”–Conceptual Schema: info to capture, tables, columns, views, etc.–Physical Schema: indexes, clustering, etc.•Physical design linked tightly to query optimization•We must begin by understanding the workload:–The most important queries and how often they arise.–The most important updates and how often they arise.–The desired performance for these queries and updates.Understanding the Workload•For each query in the workload:–Which relations does it access?–Which attributes are retrieved?–Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? •For each update in the workload:–Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?–The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.Creating an ISUD Chart Employee TableTransactionFrequency% table Name Salary AddressPayroll Run monthly 100 S S SAdd Emps daily 0.1 I I IDelete Emps daily 0.1 D D DGive Raises monthly 10 S UInsert, Select, Update, Delete FrequenciesDecisions to Make•What indexes should we create?–Which relations should have indexes? What field(s) should be the search key? Should we build several indexes?•For each index, what kind of an index should it be?–Clustered? Dynamic/static? •Should we make changes to the conceptual schema?–For example, denormalize•Horizontal partitioning, replication, views ...Index Selection•One approach:–Consider most important queries in turn. –Consider best plan using the current indexes, and see if better plan is possible with an additional index. –If so, create it.•Before creating an index, must also consider the impact on updates in the workload!–Trade-off: indexes can make queries go faster, updates slower. Require disk space, too.Issues to Consider in Index Selection•Attributes mentioned in a WHERE clause are candidates for index search keys.–Range conditions are sensitive to clustering –Exact match conditions don’t require clustering•Or do they???? :-)•Try to choose indexes that benefit as many queries as possible. •NOTE: only one index can be clustered per relation! –So choose it based on important queries that benefit the most from clustering!!Issues in Index Selection (Contd.)•Multi-attribute search keys should be considered when a WHERE clause contains several conditions.–If range selections are involved, order of attributes should be carefully chosen to match the range ordering.–Such indexes can sometimes enable index-only strategies for important queries.•For index-only strategies, clustering is not important!•When considering a join condition:–B+-tree on inner is very good for Index Nested Loops.•Should be clustered if join column is not key for inner, and inner tuples need to be retrieved.–Clustered B+ tree on join column(s) good for Sort-Merge.Example 1•B+ tree index on D.dname supports ‘Toy’ selection.–Given this, index on D.dno is not needed.•B+ tree index on E.dno allows us to get matching (inner) Emp tuples for each selected (outer) Dept tuple.•What if WHERE included: `` ... AND E.age=25’’ ?–Could retrieve Emp tuples using index on E.age, then join with Dept tuples satisfying dname selection. Comparable to strategy that used E.dno index. –So, if E.age index is already created, this query provides much less motivation for adding an E.dno index.SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.dno=D.dno AND D.dname=‘Toy’Example 2•All selections are on Emp so it should be the outer relation in any Index NL join.–Suggests that we build a B+ tree index on D.dno.•What index should we build on Emp?–B+ tree on E.sal could be used, OR an index on E.hobby could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions.•As a rule of thumb, equality selections more selective than range selections.•As both examples indicate, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query. Have to understand optimizers!SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.sal BETWEEN 10000 AND 20000 AND E.hobby=‘Stamps’ AND E.dno=D.dnoExamples of Clustering•B+ tree index on E.age can be used to get qualifying tuples.–How selective is the condition?–Is the index clustered?•Consider the GROUP BY query.–If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.–Clustered E.dno index may be better!•Equality queries and duplicates:–Clustering on E.hobby helps!SELECT E.dnoFROM Emp EWHERE E.age>40SELECT E.dno, COUNT (*)FROM Emp EWHERE E.age>10GROUP BY E.dnoSELECT E.dnoFROM Emp EWHERE E.hobby=StampsClustering and Joins•Clustering is especially important when accessing inner tuples in INL.–Should make index on E.dno clustered.•Suppose that the WHERE clause is instead:WHERE E.hobby=‘Stamps’ AND E.dno=D.dno–If many employees collect stamps, Sort-Merge join may be worth considering. A clustered index on D.dno would help.•Summary: Clustering is useful whenever many tuples are to be retrieved.SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dnoMulti-Attribute Index Keys•To retrieve Emp records with age=30 AND sal=4000, an index on <age,sal> would be better than an index on age or an index on sal.–Such indexes also called composite or concatenated indexes.–Choice of index key orthogonal to clustering etc.•If condition is:


View Full Document

Berkeley COMPSCI 186 - Physical Database Design and Tuning

Documents in this Course
Load more
Download Physical Database Design and Tuning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Physical Database Design and Tuning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Physical Database Design and Tuning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?