Introduction to Database Systems CSE 444 Lecture 17: Database Tuning Magda Balazinska - CSE 444, Fall 2010 1Database Tuning Overview • The database tuning problem • Index selection (discuss in detail) • Horizontal/vertical partitioning (see lecture 4) • Denormalization (discuss briefly) 2 This material is partially based on the book: “Database Management Systems” by Ramakrishnan and Gehrke, Ch. 20 Magda Balazinska - CSE 444, Fall 2010Magda Balazinska - CSE 444, Fall 2010 Levels of Abstraction in a DBMS Disk Physical Schema Conceptual Schema External Schema External Schema External Schema a.k.a logical schema describes stored data in terms of data model includes storage details file organization indexes views access control 3The Database Tuning Problem • We are given a workload description – List of queries and their frequencies – List of updates and their frequencies – Performance goals for each type of query • Perform physical database design – Choice of indexes – Tuning the conceptual schema • Denormalization, vertical and horizontal partition – Query and transaction tuning 4 Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem • Given a database schema (tables, attributes) • Given a “query workload”: – Workload = a set of (query, frequency) pairs – The queries may be both SELECT and updates – Frequency = either a count, or a percentage • Select a set of indexes that optimizes the workload 5 In general this is a very hard problem Magda Balazinska - CSE 444, Fall 2010Index Selection: Which Search Key • Make some attribute K a search key if the WHERE clause contains: – An exact match on K – A range predicate on K – A join on K 6 Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 1 7 V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this What indexes ? Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 1 8 V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this A: V(N) and V(P) (hash tables or B-trees) Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 2 9 V(M, N, P); SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this What indexes ? INSERT INTO V VALUES (?, ?, ?) 100000 queries: Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 2 10 V(M, N, P); SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this INSERT INTO V VALUES (?, ?, ?) 100000 queries: A: definitely V(N) (must B-tree); unsure about V(P) SELECT * FROM V WHERE N>? and N<? Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 3 11 V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: 1000000 queries: Your workload is this What indexes ? INSERT INTO V VALUES (?, ?, ?) 100000 queries: Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 3 12 V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: 1000000 queries: Your workload is this A: V(N, P) INSERT INTO V VALUES (?, ?, ?) 100000 queries: Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 4 13 V(M, N, P); SELECT * FROM V WHERE P>? and P<? 1000 queries: 100000 queries: Your workload is this SELECT * FROM V WHERE N>? and N<? What indexes ? Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem 4 14 V(M, N, P); SELECT * FROM V WHERE P>? and P<? 1000 queries: 100000 queries: Your workload is this SELECT * FROM V WHERE N>? and N<? A: V(N) secondary, V(P) primary index Magda Balazinska - CSE 444, Fall 2010The Index Selection Problem • SQL Server – Automatically, thanks to AutoAdmin project – Much acclaimed successful research project from mid 90’s, similar ideas adopted by the other major vendors • PostgreSQL – You will do it manually, part of project 3 – But tuning wizards also exist 15 Magda Balazinska - CSE 444, Fall 2010Basic Index Selection Guidelines • Consider queries in workload in order of importance • Consider relations accessed by query – No point indexing other relations • Look at WHERE clause for possible search key • Try to choose indexes that speed-up multiple queries • And then consider the following… Magda Balazinska - CSE 444, Fall 2010 16Index Selection: Multi-attribute Keys Consider creating a multi-attribute key on K1, K2, … if • WHERE clause has matches on K1, K2, … – But also consider separate indexes • SELECT clause contains only K1, K2, .. – A covering index is one that can be used exclusively to answer a query, e.g. index R(K1,K2) covers the query: 17 SELECT K2 FROM R WHERE K1=55 Magda Balazinska - CSE 444, Fall 2010To Cluster or Not • Range queries benefit mostly from clustering • Covering indexes do not need to be clustered: they work equally well unclustered 18 Magda Balazinska - CSE 444, Fall 201019 Percentage tuples retrieved Cost 0 100 Sequential scan SELECT * FROM R WHERE K>? and K<? Magda Balazinska - CSE 444, Fall 2010Hash Table v.s. B+ tree • Rule 1: always use a B+ tree • Rule 2: use a Hash table on K when: – There is a very important selection query on equality (WHERE K=?), and no range queries – You know that the optimizer uses a nested loop join where K is the join attribute of the inner relation (you will understand that in a few lectures) 20 Magda Balazinska - CSE 444, Fall 2010Balance Queries v.s. Updates • Indexes speed up queries – SELECT FROM WHERE • But they usually slow down updates: – INSERT, DELETE, UPDATE – However some updates benefit from indexes 21 UPDATE R SET A = 7 WHERE K=55 Magda Balazinska - CSE 444, Fall 2010Tools for Index Selection • SQL Server 2000 Index Tuning Wizard • DB2 Index Advisor • How they work: – They walk through a large number of configurations, compute their costs, and choose the configuration with minimum cost 22 Magda Balazinska - CSE 444, Fall 2010Tuning the Conceptual Schema • Index selection • Horizontal/vertical partitioning (see lecture 4) • Denormalization 23 Magda Balazinska - CSE 444, Fall 2010Denormalization 24 SELECT x.pid, x.pname FROM Product x, Company y WHERE x.cid = y.cid
View Full Document