UW-Milwaukee COMPSCI 557 - Heap Files - D2777198

Home> Schools> University of Wisconsin-Milwaukee> Computer Science (COMPSCI) > COMPSCI 557> Heap Files

DOC PREVIEW

UW-Milwaukee COMPSCI 557 - Heap Files

School name University of Wisconsin-Milwaukee

Course Compsci 557- Introduction to Database Systems

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

TodayDirectory of Slots ExampleHeap FilesHeap File ExampleProg 1 HintsSlide 6Slide 7Slide 8Slide 9Ordered FilesFile of Ordered RecordsHashed FilesHashed Files (contd.)Slide 14Slide 15Hashed Files - Overflow handlingFill in This TableToday•Review of Directory of Slot Block Organizations•Heap Files•Program 1 Hints•Ordered Files & Hash Files•RAIDDirectory of Slots ExampleHeap Files•Heap files are stored as unordered records–the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation•The simplest organizationHeap File ExampleParadise, Sal 231Favor, Sue 123Mach, Chris 401Rodgers, Bill 616Smith, MaryYost, Ned 819Alm, LouisLink, StevePatch, LindaJones, Jim 9832819314183 Ming, YaoTuring, Alan933709block 1 of file block 2 of file block N of fileName IDName ID Name IDassuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of:2D+C2D+CN(D+RC)N(D+RC)/2N(D+RC)inserting a record, deleting record given RID (ignore reclaiming space)scan, search for “key” w/ equality selection,search w/ range selection?Prog 1 Hints•check slot numbers to make sure they are valid•sizeof(), memcpy(), memmove()–man is your friend•test patterns – make sure you handle error cases correctly•error reportingclass HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array. char data[MAX_SPACE - DPFIXED]; // methods ...// **********************************************************// page class constructorvoid HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot)}•init()•getNextPage(), setNextPage()•getPrevPage(), setPrevPage()•insertRecord(), deleteRecord()•firstRecord(), nextRecord()•getRecord(), returnRecord()•available_space()•empty()int HFPage::available_space(void){ // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record. int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot. return freeSpace - sizeof(slot_t);}Ordered Files•Also called a sequential file.•File records are kept sorted by the values of an ordering field.•Insertion is expensive: records must be inserted in the correct order.–It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.•A binary search can be used to search for a record on its ordering field value.–This requires reading and searching log2 of the file blocks on the average, an improvement over linear search.•Reading the records in order of the ordering field is quite efficient.File of Ordered RecordsHashed Files•Hashing for disk files is called External Hashing•The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ..., bucketM-1.–Typically, a bucket corresponds to one (or a fixed number of) disk block.•One of the file fields is designated to be the hash key of the file.•The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function.•Search is very efficient on the hash key.•Collisions occur when a new record hashes to a bucket that is already full.–An overflow file is kept for storing such records.–Overflow records that hash to each bucket can be linked together.Hashed Files (contd.)Hashed Files (contd.)•There are numerous methods for collision resolution, including the following:–Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. –Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. –Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.Hashed Files (contd.)•To reduce overflow records, a hash file is typically kept 70-80% full.•The hash function h should distribute the records uniformly among the buckets–Otherwise, search time will be increased because many overflow records will exist.•Main disadvantages of static external hashing:–Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.–Ordered access on the hash key is quite inefficient (requires sorting the records).Hashed Files - Overflow handlingFill in This TableHeapSortedHashedscan equality range insert delete search

View Full Document