This preview shows page 1-2-3-4 out of 11 pages.

View Full Document
View Full Document

End of preview. Want to read all 11 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

6.830 Lab 4: Query Optimization6.830 Lab 4: Query OptimizationAssigned: 11/04/10 Due: 11/23/10 Version History: ● 11/02/10 : Initial version In this lab, you will implement a query optimizer on top of SimpleDB. The main tasks include implementing a selectivity estimation framework and a cost-based optimizer. You have freedom as to exactly what you implement, but we recommend using something similar to the Selinger cost-based optimizer discussed in class. The remainder of this document describes what is involved in adding optimizer support and provides a basic outline of how you might add this support to your database. As with the previous lab, we recommend that you start as early as possible. Locking and transactions can be quite tricky to debug! 0. Find bugs, be patient, earn candybars It is very possible you are going to find bugs, inconsistencies, and bad, outdated, or incorrect documentation, etc. We apologize profusely. We did our best, but, alas, we are fallible human beings. We ask you, therefore, to do this lab with an adventurous mindset. Don't get mad if something is not clear, or even wrong; rather, try to figure it out yourself or send us a friendly email. We promise to help out by sending bugfixes, new tarballs, etc. ...and if you find a bug in our code, we'll give you a candybar (see Section 3.3)! 1. Getting started You should begin with the code you submitted for Lab 3 (if you did not submit code for Lab 3, or your solution didn't work properly, contact us to discuss options.) We have provided you with extra test cases as well as source code files for this lab that are not in the original code distribution you received. We reiterate that the unit tests we provide are to help guide your implementation along, but they are not intended to be comprehensive or to establish correctness. You will need to add these new test cases to your release. The easiest way to do this is to untar the new code in the same directory as your top-level simpledb directory, as follows: ● Make a backup of your Lab 3 solution by typing : $ tar -cvzf 6.830-lab3-submitted.tar.gz 6.830-lab3● Rename the directory that contains the code: $ mv 6.830-lab3 6.830-lab46.830 Lab 4: Query Optimization● Download the new tests and skeleton code for Lab 4 from http://db.csail.mit.edu/6.830/6.830-lab4-supplement.tar.gz: $ wget http://db.csail.mit.edu/6.830/6.830-lab4-supplement.tar.gz● Extract the new files for Lab 4 by typing: tar -xvzf 6.830-lab4-supplement.tar.gzThe following files are modified and the tar command should overwrite them in your lab4 directory: ❍ src/Parser.java❍ src/simpledb/SimpleDb.java❍ src/simpledb/Utility.java❍ test/systemtest/SystemTestUtil.javaNote that the changes include a new SimpleDb.java file under src/simpledb. The modified Parser.java refers to this SimpleDb.java. It's safe to delete src/SimpleDb.java. If you have modified either of these files, you should be able to simply install our new version, though you may wish to double check with us before doing so. 2. Optimizer outline Recall that the main idea of a cost-based optimizer is to: ❍ Use statistics about tables to estimate "costs" of different query plans. Typically, the cost of a plan is related to the of cardinalities (number of tuples produced by) intermediate joins and selections, as well as the selectivity of selection and join predicates. ❍ Use these statistics to order joins and selections in an optimal way, and to select the best implementation for join algorithms from amongst several alternatives. In this lab, you will implement code to perform both of these functions. The optimizer will be invoked from simpledb/Parser.java. You may wish to review the lab 2 parser exercise before starting this lab. Briefly, if you have a catalog file catalog.txt describing your tables, you can run the parser by typing (from the simpledb/ directory): java -classpath bin/src/:lib/jline-0.9.94.jar:lib/sql4j.jar:lib/zql.jar simpledb.Parser catalog.txt(Note that this method of invocation is slightly different from that given in lab2 because we inadvertedly distributed a simpledb.java class that lacked the "parser" option. If you made the method in lab2 work, you may run the parser in that way as well.) When the Parser is invoked, it will compute statistics over all of the tables (using statistics code you provide). When a query is issued, the parser will convert the query into a logical plan representation and then call your query optimizer to generate an optimal plan. The overall control flow of the SimpelDB modules of the parser and optimizer is shown in Figure 1. The key at6.830 Lab 4: Query Optimizationthe bottom explains the symbols; you will implement the components with double-borders. The classes and methods will be explained in more detail in the text that follows (you may wish to refer back to this diagram), but the basic operation is as follows: 1. Parser.java constructs a set of table statistics (the tableStatsMap) when it is initialized. It then waits for a query to be input, and calls parseQuery on that query. parseQuery constructs a LogicalPlan that represents the parsed query.2. parseQuery calls physicalPlan on the LogicalPlan it has constructed. physicalPlan will return a DBIterator object that can be used to actually run the query. You will implement the methods that help physicalPlan devise an optimal plan. In particular:■ You must write the methods in tableStats that allow it to estimate the selectivities of selections and the cost of scans, using histograms (the IntHistogram class) or some other form of statistics of your devising. physicalPlan constructs a map called filterSelectivities using these methods to estimate the selectivities of the filter expressions over all the tables. ■ You must write the methods in joinOptimizer that allow it to estimate the cost and selectivities of joins.■ You must write the orderJoins method that produces an optimal ordering for a series of joins (likely using the Selinger algorithm studied in class), given the filterSelectivities and tableStats objects computed in prior steps. Figure 1: Diagram illustrating classes, methods, and objects used in the parser and optimizer.6.830 Lab 4: Query Optimization2.1. Statistics Estimation Accurately estimating plan cost is quite tricky. In this lab, we will focus only on the cost of sequences of joins and base table accesses. We won't worry about access method selection (since we only have


View Full Document
Loading Unlocking...
Login

Join to view Lab 4- Query Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab 4- Query Optimization and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?