CORNELL CS 632 - Final Exam Part C - D682758

Home> Schools> Cornell University> Computer Science (CS) > CS 632> Final Exam Part C

CORNELL CS 632 - Final Exam Part C

Pages 3

Download Save

Unformatted text preview:

CS632 Final Exam Part CA. Demers7 May 2003Here is the last piece of the final – the promised question on query processing.As usual, you may consult others for ideas and proof approaches, but pleasereference your sources and write up answers independently.4 Query ProcessingThis is a squishy “thought question” to expose some of the over-simplificationhidden in our discussion of query optimizers.In a production database system, most queries may be precompiled rather thanad hoc. Such a query has zero or more parameters. The query is compiled andoptimized once, then executed repeatedly with different values of the parame-ters. The salary updating transaction from part (1e) is an example of this. Thevalue of the parameter w is supplied at execution time ; it is not available to theoptimizer.An obvious argument in favor of this approach is that query optimization isexpensive, and precompiling queries should allow the cost of the optimizer tobe amortized over a large number of query evaluations.Unfortunately, things are not quite so simple. In general, the execution cost of aquery depends strongly on the parameter values; it also depends on the numberof memory buffers made available when the query runs.It also depends on the buffer manager’s replacement policy (LRU,FIFO, LIFO, . . . ) but at this level of detail we ignore the replace-ment policy, assuming the buffer manager never does somethingstupid like evicting a page when there are empty buffers available.We also ignore the effect of running more than one query concur-rently. This too is unrealistic – in many real systems there are “hot”1tables, used by a large fraction of the queries, which essentially livein the shared buffer pool – but it simplifies the discussion to ignorethis effect.To summarize, we are computing the cost in I/O operations of a single queryrunning in isolation, as a function of its parameter values p1, . . . , pkand thebuffer p ool size B.Problem 4a: First consider varying only the parameter values, letting eachquery run in an infinite buffer pool. Give an example of a parameterized con-junctive query (and the database instance it is applied to) for which at leastthree different query plans are optimal depending on the parameter values.In addition to specifying the relation contents and what indices exist, don’tforget to specify important physical properties like the size of the relation(s), thenumber of tuples or index entries per disk block, whether indices are clusteredor unclustered, and so forth.Can you extend your example to arbitrarily large values of three – that is, forany n can you construct an example query for which n distinct query plansare optimal, depending on parameter values? How fast do the query, databaseschema and/or database instance sizes grow as a function of n? (Hint: I don’tknow the best answer to this last part).Problem 4b: Now consider varying the size of the available buffer pool, keep-ing parameter values constant.Every query plan has a required space, which is the least number of buffers Bwith which it is feasible to execute the plan at all, and a maximum cost, whichis the cost of executing the plan in its required space. In the I/O cost model acouple of things are obvious:The execution cost decreases monotonically with B (this justifies the terminol-ogy “maximum cost” above).In the limit as B → ∞, the cost of any query plan is at most the sum of thesizes of its leaf relations (why?).Now every query plan has an unconstrained cost, which is the cost of evaluatingit with an infinite, initially-empty buffer pool, and an optimum space, which isthe least number of buffers B for which the plan achieves its unconstrained cost.What are the values of the above parameters for the following query processingalgorithms:2Nested loop join of relations A and B.Index nested loop join of relations A and B using an unclustered B+ tree indexon B.Select by full table scan of relation A.Select by range query using (clustered or unclustered) B+ tree index on A.State any assumptions you need to make about physical properties of the database,selectivities, etc.Problem 4c: Repeat (4a) but for buffer pool size rather than query param-eter. That is, for various values of n give examples for which n different queryplans are optimal, depending on the buffer pool size B. How does the size ofthe example (query, schema, instance, indices) depend on n? (The same hintapplies: I don’t know the optimal

View Full Document


School:
Email:
New Password:
Confirm Password:

CORNELL CS 632 - Final Exam Part C

Sign up for free to view:

Please select your school