DOC PREVIEW
MIT 6 033 - TRIPLET

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

KiKiDB:a triplet store with two-key indexingJoe [email protected] Design Project 1Prof. Steve Ward TR2-3TA. David SchultzMarch 22, 20071. IntroductionA triplet is a three field data structure represented abstractly as (subject String, relationship String, object String). Efficient triplet storage is essential for effectively browsing large relational databases such as those in E-commerce applications1 or the RDF graph data of the Semantic Web2. Applications of triplet stores typically have many subjects with related objects and far fewer types of relationships. Users of these applications are not usually interested in accessing a single triplet at a time. Instead, they want to access a collection of properties about a given object. Searching against just two of the fields for all matching triplets is the most common operation. This triplet store design is optimized for performance for this type of search.KiKiDB is a two-key indexed triplet store designed to excel in environments where disk space is cheap and low latency searches are required. A triplet stored in the system is replicated in three hash tables uniquely indexed for efficient two-key searches.Each hash table is indexed with two different triplet fields as shown in Figure 1. The cache partition at the end of the disk stores RAM cache objects during system shutdown.KiKiDB is desirable when disk space is inexpensive and search patterns are known to frequently take advantage of the system's two-key indexes.2. Design DescriptionKiKiDB runs as a network server with RPC calls for manipulating the storage system. Section 2.1 presents the RPC interfaces, 2.2 describes the storage system's disk structures, and 2.3 describes the implementation of the RPC API calls.1Figure 1. a terabyte physical disk partitioned for the KiKiDB triplet storage system2.1. Storage System InterfaceThis design supports an RPC interface with four calls for manipulating the triplet store. Insertion and deletion are named insert() and remove(). Their parameters are both similar to triplet form (subject String, relationship String, object String) and immediately affect the success of subsequent lookups. Lookups and searches are handled by find(). find() has 5 parameters (subject String, relationship String, object String, start integer, count integer). These parameters include the triplet fields like insert() and remove() except any of find()'s three string parameters also support the wildcard character (“*”) which instructs the server to match all triplets regardless of the “wildcarded” field's contents. The start and count parameters are helpful range settings for returning a portion of a search with many results. find(x,y,*,S,C) skips S triplets and then returns no more than C subsequent triplets.2.2. Disk StructuresThree hash tables are allocated on disk as shown in Figure 1. Hash tables contain triplets and index markers. Both of these are 128 bytes structures with two header status bits. The two status bits mark a sector when deleted and define it as a triplet or index marker accordingly. The triplet, index marker, hash table and cache partition disk structures are detailed in sections 2.2.1-2.2.4. Notes on garbage collection follow in Section 2.2.5.2.2.1. TripletsA triplet is a 128 byte sector stored at a hashed location in each of the three hash tables. Its data starts with two state bits followed by three null-terminated (“\0”) strings as shown in Figure 2. If more space is necessary to store the strings, the sector may occupy two sequential sectors and indicate presence of the second sector by omitting the null-terminator in the first sector (see Figure 3).2Figure 2. Two active triplets in the system: (bob, owns, windows)(bob, owns, osx)* addresses are disk locations in bytesFigure 3. A deleted triplet spanning two 128-byte sectors:(bob, productkey, qqgr4-asdft-...)Though the definition of status bits could vary between different implementations, this paper presents with a scheme consistent with Figure 4.2.2.2. Index MarkersAn index marker demarcates triplet sector groupings in hash tables as shown in Figure 5. A marker stores only the two fields indexed by its parent hash table. A third field is a binary integer that counts how many triplets matching the two indexed fields exist after the index marker. This improves search performance by advertising an expected number of results to find(). A count of N indicates that find() should stop seeking when searching after finding N matching triplets and return the results. An index marker is created when an inserted triplet has values for the table's indexed fields that do not have a preexisting index marker as shown in Figure 6. Index markers are also created by searches that returns no results to expedite future searches. Marker counts are incremented on insert(), decremented on remove(), and must precede all counted triplets.2.2.3. Hash TablesA hash table is formed of 4 KB disk blocks. Each block contains 32 active, deleted, or empty 128 byte triplet or marker sectors. The table location is obtained by hashing the two triplet fields it indexes. As shown in Figure 7, a function for hash table B in Figure 1 has the form hash(object String, relationship String). When a hash collision occurs during 3The marker above (windows, owns, 2) is two because one of the three matching triplets was deleted.Markers may be out of order due to order of inserts, deletes, and finds or hash collisions yet still improve performance.Figure 5. Two possible sequences of markers and triplets from hash table B of Figure 1.Active DeletedTriplets 1 0 0 0Index Markers 1 1 0 1Figure 4. Status bit definitionsFigure 6. Inserting a triplet:(1, 0, foo, has, dog)to the 1st table of Figure 5 requires inserting a marker:(1, 1, dog, has, 1)The new index marker is inserted at 256 to replace the previously deleted sector and the new triplet is added at 512, the next empty sector after its marker.insert(), the system references the disk usage bitmap3 to locate the next empty sector using linear probing4. When placing an index marker sector, the next deleted or empty sector after the hashed location for the index marker is selected. When placing a triplet sector, the next deleted or empty sector after its index marker is selected. If an empty sector is reached in the seek for the index marker, a new index marker must be inserted as discussed in Section 2.2.2. The new index marker


View Full Document

MIT 6 033 - TRIPLET

Documents in this Course
End Layer

End Layer

11 pages

Quiz 1

Quiz 1

4 pages

Threads

Threads

18 pages

Quiz I

Quiz I

15 pages

Atomicity

Atomicity

10 pages

QUIZ I

QUIZ I

7 pages

Load more
Download TRIPLET
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view TRIPLET and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view TRIPLET 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?