Unformatted text preview:

EECS 339 Project C Dinda Page 1 of 7 Project C: B+Tree In this last project, you will implement a B+Tree index in C++. At the end of the project, you will have a C++ class that conforms to a specific interface. Your class is then used by various command-line tools. The tools will let you create and manipulate persistent B+Tree indexes stored in virtual disks and accessed through a buffer cache that manipulates disk blocks or pages. The tools will also tell you what the performance is, in terms of how long individual operations take and how many disk reads and writes you do. The I/O model of computation is used – we only count disk time. You can assume that requests to the B+Tree are serialized, meaning that you can finish a request before starting the next one. In a real database system, however, locking and logging are used to allow multiple requests to simultaneously execute on the tree. If you really feel ambitious, you can add support for this for extra credit. You can assume that keys and values in the B+Tree are of fixed size and given when the B+Tree is initialized. In a real database system, however, keys and values can be of variable size. You are welcome to add support for variable length keys and values for extra credit. You will be implementing a “pure” B+Tree. In a B+Tree, leaf nodes hold keys and values, while interior nodes hold keys and block pointers. You can also optionally chain the B+Tree leaf nodes together into a linked list, making range queries much faster. Your class will be evaluated using a test harness that will evaluate its correctness and performance. The test harness will generate a random, but repeatable stream of requests, run them through your implementation and a reference implementation, compare the outputs, pointing out errors in your implementation, and presenting a performance number. We will grade your project based on correctness1 using a random request stream generated from a particular seed. We won’t tell you which seed, but you can test your program using lots of different seeds. Over the course of the project, we may also hand binaries of other implementations. This project may be done in groups of up to three people. Getting and installing the framework To install the framework, log into your account on 339 and do the following: cd ~ tar xvfz ~cs339/HANDOUT/btree_lab.tgz cd btree_lab 1 For this year, performance will not be part of the grading criteria.EECS 339 Project C Dinda Page 2 of 7 export PATH=$PATH:. more README The README file will give you detailed instructions on how to configure the framework and verify that it is working. You will be writing btree.cc and btree.h, and adding other files as you see fit. Note test.pl – it is the test harness mentioned above. ref_impl.pl is the reference implementation. You can use test_me.pl to run your implementation against the reference implementation. Your implementation will be executed via sim.cc The code will compile and work on any Unix-like environment, provided you have gcc and Perl. If you want graphical displays of your B+Trees, you’ll also need to AT&T’s GraphViz package (which is free). We’ve had no trouble running this code on other Linux machines, and on Cygwin (free) on Windows. Your code will be tested on 339. Because you can graphically display trees, you may find it useful to have X or VNC. If you’re using an XServer (for example, on your own Linux box, or Cygwin or XWin32 on Windows), you can easily forward X traffic back your machine. Simply Enable X11 tunneling in your ssh client (ssh –X, for example). If you’d like to use VNC, you’ll need to explicitly set up an ssh tunnel. (ssh –Lmylocalport:339.cs.northwestern.edu:myvncport [email protected]). If the last two paragraphs made no sense to you, please ask and we will help. Knowing how to set up your own Unix or Unix-like development environment, and knowing how to do ssh tunneling for various purposes are very useful skills. B+Tree operations and the command-line At a high-level of abstraction, a B+Tree is a mapping from keys to values. B+Trees can require that all keys be unique, but it’s not necessary – there is a distinction between a key in a B+Tree and a key in relational database terminology. This is also necessary for SQL. In SQL, it is perfectly OK to create an index on some attribute or set of attributes that form neither a key or superkey. Your implementation can assume that all keys are unique. A B+Tree implementation must perform the following operations: • Initialize: create a new B+Tree structure on the disk – “format” or “mkfs” in a file system. • Attach: open a B+Tree for use. In our API, this is combined with initialization. You can ask for the B+Tree to be attached with our without initialization. • Insert (key, value) • Update (key, value) • Delete (key) • Lookup (key) : returns value associated with the key In addition, your B+Tree implementation will also support the following operations:EECS 339 Project C Dinda Page 3 of 7 • Sanity Check: do a self-check of the tree looking for problems – “chkdsk” or “fsck” in a file system. • Display: do a traversal of the B+Tree, printing out the sorted (key,value) pairs in ascending order of the keys. This is coupled with scripts based on GraphViz to allow you to graphically visualize your trees. The btree_* tools that are built implement these operations. This lets you manipulate a B+Tree from the command line. At the end of each execution, the performance statistics are printed. The sim tool reads a sequence of these operations, starting with an initialization, from standard input and applies them. The results of each operation are printed. At the very end, the performance statistics for the entire run are printed. What does the B+Tree look like on the disk? A B+Tree on the disk looks a lot like a file system on a disk. The blocks of the disk are used to store B+Tree nodes. B+Tree nodes come in two forms: • Internal nodes: These store keys and pointers. • Leaf nodes: These store keys and their associated values. By pointer, what I mean is a disk block number (the blocks are numbered from 0 to the total number of blocks minus one. The size of a block is determined when the disk is created. Since the B+Tree nodes are the size of disk blocks, we often


View Full Document

NU EECS 339 - Project C

Download Project C
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Project C and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Project C 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?