NU EECS 339 - Project C - B+Tree - D825437

Home> Schools> Northwestern University> Elect Engineering & Comp Sci (EECS) > EECS 339> Project C - B+Tree

NU EECS 339 - Project C - B+Tree

Course Eecs 339- Introduction to Databases

Pages 8

Download Save

Unformatted text preview:

EECS 339 Project C DindaProject C: B+TreeTHIS PROJECT HAS BEEN UPDATED TOREFLECT THE REDUCED TIME AVAILABLEIn this last project, you will implement a B+Tree index in C++. At the end of the project,you will have a C++ class that conforms to a specific interface. Your class is then usedby various command-line tools. The tools will let you create and manipulate persistentB+Tree indexes stored in virtual disks and accessed through a buffer cache thatmanipulates disk blocks or pages. The tools will also tell you what the performance is, interms of how long individual operations take and how many disk reads and writes youdo. The I/O model of computation is used – we only count disk time. You can assume that requests to the B+Tree are serialized, meaning that you can finish arequest before starting the next one. In a real database system, however, locking andlogging are used to allow multiple requests to simultaneously execute on the tree. If youreally feel ambitious, you can add support for this for extra credit. You can assume that keys and values in the B+Tree are of fixed size and given when theB+Tree is initialized. In a real database system, however, keys and values can be ofvariable size. You are welcome to add support for variable length keys and values forextra credit. You will be implementing a “pure” B+Tree. In a B+Tree, leaf nodes hold keys andvalues, while interior nodes hold keys and block pointers. You can also optionally chainthe B+Tree leaf nodes together into a linked list, making range queries much faster.Your class will be evaluated using a test harness that will evaluate its correctness andperformance. The test harness will generate a random, but repeatable stream of requests,run them through your implementation and a reference implementation, compare theoutputs, pointing out errors in your implementation, and presenting a performancenumber. We will grade your project based on correctness1 using a random request streamgenerated from a particular seed. We won’t tell you which seed, but you can test your1 For this year, performance will not be part of the grading criteria.Page 1 of 8EECS 339 Project C Dindaprogram using lots of different seeds. Over the course of the project, we may also handbinaries of other implementations.This project may be done in groups of up to three people. Getting and installing the frameworkTo install the framework, log into your account on 339 and do the following:cd ~tar xvfz ~cs339/HANDOUT/btree/btree_lab.tgzcd btree_labexport PATH=$PATH:.more READMEThe README file will give you detailed instructions on how to configure the frameworkand verify that it is working. You will be writing btree.cc and btree.h, and adding otherfiles as you see fit. Note test.pl – it is the test harness mentioned above. ref_impl.pl isthe reference implementation. You can use test_me.pl to run your implementationagainst the reference implementation. Your implementation will be executed via sim.ccThe code will compile and work on any Unix-like environment, provided you have gccand Perl. If you want graphical displays of your B+Trees, you’ll also need to AT&T’sGraphViz package (which is free). We’ve had no trouble running this code on otherLinux machines, and on Cygwin (free) on Windows. 339 has all the necessary toolsinstalled. Your code will be tested on 339.Because you can graphically display trees, you may find it useful to have X or VNC toprovide remote access to graphical applications or desktops running on the Unixmachine. Please see the handout “Using Unix Remotely Without The ExcruciatingPain”, which is available from the course web site for information on how to do this. B+Tree operations and the command-lineAt a high-level of abstraction, a B+Tree is a mapping from keys to values. B+Trees canrequire that all keys be unique, but it’s not necessary – there is a distinction between akey in a B+Tree and a key in relational database terminology. This is also necessary forSQL. In SQL, it is perfectly OK to create an index on some attribute or set of attributesPage 2 of 8EECS 339 Project C Dindathat form neither a key or superkey. Your implementation can assume that all keys areunique.A B+Tree implementation must perform the following operations:• Initialize: create a new B+Tree structure on the disk – “format” or “mkfs” in afile system. • Attach: open a B+Tree for use. In our API, this is combined with initialization.You can ask for the B+Tree to be attached with our without initialization.• Insert (key, value)• Update (key, value)• Delete (key) Eliminated due to lack of time• Lookup (key) : returns value associated with the keyIn addition, your B+Tree implementation will also support the following operations:• Sanity Check: do a self-check of the tree looking for problems – “chkdsk” or“fsck” in a file system.• Display: do a traversal of the B+Tree, printing out the sorted (key,value) pairs inascending order of the keys. This is coupled with scripts based on GraphViz toallow you to graphically visualize your trees.The btree_* tools that are built implement these operations. This lets you manipulate aB+Tree from the command line. At the end of each execution, the performance statisticsare printed. The sim tool reads a sequence of these operations, starting with an initialization, fromstandard input and applies them. The results of each operation are printed. At the veryend, the performance statistics for the entire run are printed.What does the B+Tree look like on the disk?A B+Tree on the disk looks a lot like a file system on a disk. The blocks of the disk areused to store B+Tree nodes. B+Tree nodes come in two forms:• Internal nodes: These store keys and pointers.• Leaf nodes: These store keys and their associated values.By pointer, what I mean is a disk block number (the blocks are numbered from 0 to thetotal number of blocks minus one. The size of a block is determined when the disk isPage 3 of 8EECS 339 Project C Dindacreated. Since the B+Tree nodes are the size of disk blocks, we often use the words“node” and “block” interchangeably The size of a key and the size of a value are determined when the B+Tree is initializedand need not be the same (and generally are not). Because of this variation, you will needserializers/unserializers that read and write disk blocks into appropriate in-memorystructures.2 The

View Full Document


School:
Email:
New Password:
Confirm Password:

NU EECS 339 - Project C - B+Tree

Sign up for free to view:

Please select your school