DOC PREVIEW
Duke CPS 116 - Physical Data Organization

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Physical Data OrganizationCPS 116Introduction to Database Systems2Announcements (November 4) Homework #3 due this Thursday Help session this Wednesday 5-6pm in LSRC D344 Project milestone #2 due next Tuesday3Outline It’s all about disks! That’s why we always draw databases as  And why the single most important metric in database processing is the number of disk I/O’s performed Storing data on a disk Record layout Block layout4Storage hierarchyRegistersCacheMemoryWhy a hierarchy?DiskTa p e sWhy a hierarchy?5How far away is data?Location CyclesRegisters 1On-chip cache 2On-board cache 10Location TimeMy head 1 min.This room 2 min.Duke campus 10 min.Memory 100Disk 106Tape 109pWa s h i n g t o n D.C. 1.5 hr.Pluto 2 yr.Andromeda 2000 yr.(Source: AlphaSort paper, 1995)) I/O dominates—design your algorithms to reduce I/O!6A typical diskPlatterPlatterTr a c k sSpindle rotationPlatterSpindleArm movementDisk armDisk headCylinders“Moving parts” are slow27Top viewTr a c kTr a c kTr a c kSectorsHigher-density sectors on inner tracksand/or more sectorson outer tracksA block is alogical unitof transferconsisting ofone or more sectors8Disk access timeSum of: Seek time: time for disk heads to move to the correct cylinder Rotational delay: time for the desired block to rotate yunder the disk head Transfer time: time to read/write data in the block (= time for disk to rotate over the block)9Random disk accessSeek time + rotational delay + transfer time Average seek time Time to skip one half of the cylinders? Not quite; should be time to skip a third of them (why?) “Typical” value: 5 ms  Average rotational delay Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM)10Sequential disk accessSeek time + rotational delay + transfer time Seek time 0 (assuming data is on the same track) Rotational delayy 0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk access!11Performance tricks Disk layout strategy Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder Double buffering While processing the current block in memory, prefetch the next bl k f di k ( l I/O i h i )block from disk (overlap I/O with processing) Disk scheduling algorithm Example: “elevator” algorithm Track buffer Read/write one entire track at a time Parallel I/O More disk heads working at the same time12Record layoutRecord = row in a table Variable-format records Rare in DBMS—table schema dictates the format Relevant for semi-structured data such as XML Focus on fixed-format records With fixed-length fields only, or With possible variable-length fields313Fixed-length fields All field lengths and offsets are constant Computed from schema, stored in the system catalog Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT);44614204Bart (padded with space)2410 2.328 36 Watch out for alignment May need to pad; reorder columns if that helps What about NULL? Add a bitmap at the beginning of the record14Variable-length records Example: CREATE TABLE Student(SID INT, name VARCHAR(20), age INT, GPA FLOAT,comment VARCHAR(100)); Approach 1: use field delimiters (‘\0’ okay?)14204Bt\01023816W i d kid!\0 Approach 2: use an offset array Put all variable-length fields at the end (why?) Update is messy if it changes the length of a field142Bart\0102.3Weird kid!\014204Bart10 2.3816Weird kid!18 22 3222 3215LOB fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT, picture BLOB(32000));Student records get “de-clustered” Bad because most queries do not involve picture Decomposition (automatically done by DBMS and transparent to the user) Student(SID, name, age, GPA) StudentPicture(SID, picture)16Block layoutHow do you organize records in a block? NSM (N-ary Storage Model) Most commercial DBMS PAX(Partition Attributes Across)() Ailamaki et al., VLDB 200117NSM Store records from the beginning of each block Use a directory at the end of each block To locate records and manage free space Necessary for variable-length records142 Bart 10 2.3 123 Milhouse 10 3.1456 Ralph 8 2.3857 Lisa 8 4.3Why store data and directoryat two different ends?Both can grow easily18Options Reorganize after every update/delete to avoid fragmentation (gaps between records) Need to rewrite half of the block on average What if records are fixed-length? Reorganize after delete• Only need to move one record• Need a pointer to the beginning of free space Do not reorganize after update• Need a bitmap indicating which slots are in use419Cache behavior of NSM Query: SELECT SID FROM Student WHERE GPA > 2.0;Assumption: cache line size < record size Lots of cache missesSIDandGPAare not close enough by memory standardsSIDand GPAare not close enough by memory standards142 Bart 10 2.3 123 Milhouse 10 3.1456 Ralph 8 2.3857 Lisa 8 4.3142 Bart 10 2.3 123 Milhouse10 3.1 857 Lisa8 4.3 456 Ralph 8Cache2.3 20PAX Most queries only access a few columns Cluster values of the same columns in each block When a particular column of a row is brought into the cache, the same column of the next row is brought in togetherReorganize after every update142 123 857 4561111Bart Milhouse Lisa Ralph10 10 8 82.3 3.1 4.3 2.34(number of records)1111(for variable-length records only)and delete to keep fields together(IS NOT NULL bitmap)21Summary Storage hierarchy Why I/O’s dominate the cost of database operations Disk Steps in completing a disk access Sequential versus random accesses Record layout Handling variable-length fields Handling NULL Handling modifications Block layout NSM: the traditional layout PAX: a layout that tries to improve cache


View Full Document

Duke CPS 116 - Physical Data Organization

Documents in this Course
Part I

Part I

8 pages

XSLT

XSLT

4 pages

XSLT

XSLT

8 pages

Part I

Part I

8 pages

XSLT

XSLT

8 pages

Load more
Download Physical Data Organization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Physical Data Organization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Physical Data Organization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?