Duke CPS 116 - Physical Data Organization - D2080936

Home> Schools> Duke University> (CPS) > CPS 116> Physical Data Organization

DOC PREVIEW

Duke CPS 116 - Physical Data Organization

School name Duke University

Course Cps 116- Introduction to Database Systems

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Physical Data OrganizationCPS 116Introduction to Database Systems2Announcements (November 3) Homework #3 due today Project milestone #2 due in a week3Outline It’s all about disks! That’s why we always draw databases as  And why the single most important metric in database processing is the number of disk I/O’s performed Storing data on a disk Record layout Block layout24Storage hierarchyRegistersCacheMemoryDiskTapesWhy a hierarchy?5How far away is data?Location CyclesRegisters 1On-chip cache 2On-board cache 10Memory 100Disk 106Tape 109Location Time(Source: AlphaSort paper, 1995)) I/O dominates—design your algorithms to reduce I/O!6A typical diskSpindle rotationPlatterPlatterSpindlePlatterTracksArm movementDisk armDisk headCylinders“Moving parts” are slow37Top viewTrackTrackTrackSectorsHigher-density sectors on inner tracksand/or more sectorson outer tracksA block is alogical unitof transferconsisting ofone or more sectors8Disk access timeSum of: Seek time: time for disk heads to move to the correct cylinder Rotational delay: time for the desired block to rotate under the disk head Transfer time: time to read/write data in the block (= time for disk to rotate over the block)9Random disk accessSeek time + rotational delay + transfer time Average seek time Time to skip one half of the cylinders? “Typical” value: 5 ms  Average rotational delay Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM)410Sequential disk accessSeek time + rotational delay + transfer time Seek time 0 (assuming data is on the same track) Rotational delay 0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk access!11Performance tricks Disk layout strategy Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder Double buffering While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing) Disk scheduling algorithm Example: “elevator” algorithm Track buffer Read/write one entire track at a time Parallel I/O More disk heads working at the same time12Record layoutRecord = row in a table Variable-format records Rare in DBMS—table schema dictates the format Relevant for semi-structured data such as XML Focus on fixed-format records With fixed-length fields only, or With possible variable-length fields513Fixed-length fields All field lengths and offsets are constant Computed from schema, stored in the system catalog Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT);14204Bart (padded with space)2410 2.328 36Watch out for alignment May need to pad; reorder columns if that helps What about NULL?14Variable-length records Example: CREATE TABLE Student(SID INT, name VARCHAR(20), age INT, GPA FLOAT,comment VARCHAR(100)); Approach 1: use field delimiters (‘\0’ okay?) Approach 2: use an offset array Put all variable-length fields at the end (why?) Update is messy if it changes the length of a field14204Bart\010 2.3816Weird kid!\014204Bart10 2.3816Weird kid!18 22 3222 3215LOB fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT, picture BLOB(32000));Decomposition (automatically done by DBMS and transparent to the user) Student(SID, name, age, GPA) StudentPicture(SID, picture)616Block layoutHow do you organize records in a block? NSM (N-ary Storage Model) Most commercial DBMS PAX (Partition Attributes Across) Ailamaki et al., VLDB 200117NSM Store records from the beginning of each block Use a directory at the end of each block To locate records and manage free space Necessary for variable-length records142 Bart 10 2.3 123 Milhouse 10 3.1456 Ralph 8 2.3857 Lisa 8 4.3Why store data and directoryat two different ends?18Options Reorganize after every update/delete to avoid fragmentation (gaps between records) Need to rewrite half of the block on average What if records are fixed-length? Reorganize after delete• Only need to move one record• Need a pointer to the beginning of free space Do not reorganize after update• Need a bitmap indicating which slots are in use719Cache behavior of NSM Query: SELECT SID FROM Student WHERE GPA > 2.0;Assumption: cache block size < record size Lots of cache misses ID and GPA are not close enough by memory standards142 Bart 10 2.3 123 Milhouse 10 3.1456 Ralph 8 2.3857 Lisa 8 4.3142 Bart 10 2.3 123 Milhouse10 3.1 857 Lisa8 4.3 456 Ralph 8Cache2.3 20PAX Most queries only access a few columns Cluster values of the same columns in each block When a particular column of a row is brought into the cache, thesame column of the next row is brought in together142 123 857 4561111Bart Milhouse Lisa Ralph10 10 8 82.3 3.1 4.3 2.34(number of records)1111Reorganize after every update(for variable-length records only)and delete to keep fields together(IS NOT NULL bitmap)21Summary Storage hierarchy Why I/O’s dominate the cost of database operations Disk Steps in completing a disk access Sequential versus random accesses Record layout Handling variable-length fields Handling NULL Handling modifications Block layout NSM: the traditional layout PAX: a layout that tries to improve cache

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Duke CPS 116 - Physical Data Organization

Sign up for free to view:

Please select your school