DOC PREVIEW
UW CSE 444 - Lecture Notes

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 19: Data Storage and IndexesOutlineFiles and TablesRepresenting Data ElementsIssuesRecord Formats: Fixed LengthRecord HeaderVariable Length RecordsStoring Records in BlocksSpanning Records Across BlocksBLOBFile TypesModifications: InsertionOverflow BlocksModifications: DeletionsModifications: UpdatesPointersIndexesIndex ClassificationPrimary IndexSlide 21Secondary IndexesClustered/UnclusteredClustered vs. Unclustered IndexSlide 25B+ TreesB+ Trees BasicsB+ Tree ExampleB+ Tree DesignSearching a B+ TreeB+ Trees in PracticeInsertion in a B+ TreeSlide 33Slide 34Slide 35Slide 36Slide 37Slide 38Deletion from a B+ TreeSlide 40Slide 41Slide 42Slide 43Slide 44Slide 45Summary on B+ Trees1Lecture 19: Data Storage and IndexesWednesday, November 15, 20062Outline•Representing data elements (12)•Index structures (13.1, 13.2)•B-trees (13.3)3Files and Tables•A disk = a sequence of blocks•A file = a subsequence of blocks, usually contiguous•Need to store tables/records/indexes in files/block4Representing Data Elements•Relational database elements:•A tuple is represented as a record•The table is a sequence of recordsCREATE TABLE Product (pid INT PRIMARY KEY,name CHAR(20),description VARCHAR(200),maker CHAR(10) REFERENCES Company(name))CREATE TABLE Product (pid INT PRIMARY KEY,name CHAR(20),description VARCHAR(200),maker CHAR(10) REFERENCES Company(name))5Issues•Represent attributes inside the records•Represent the records inside the blocs6Record Formats: Fixed Length•Information about field types same for all records in a file; stored in system catalogs.•Finding i’th field requires scan of record.•Note the importance of schema information!Base address (B)L1 L2L3 L4pid namedescr makerAddress = B+L1+L27Record HeaderL1 L2L3 L4To schemalengthtimestampNeed the header because:•The schema may changefor a while new+old may coexist•Records from different relations may coexistheaderpid namedescr maker8Variable Length RecordsL1 L2L3 L4Other header informationlengthPlace the fixed fields first: F1Then the variable length fields: F2, F3, F4Null values take 2 bytes onlySometimes they take 0 bytes (when at the end)headerpid namedescr maker9Storing Records in Blocks•Blocks have fixed size (typically 4k – 8k)R1R2R3BLOCKR410Spanning Records Across Blocks•When records are very large•Or even medium size: saves space in blocksblockheaderblockheaderR1 R2R2R311BLOB•Binary large objects•Supported by modern database systems•E.g. images, sounds, etc.•Storage: attempt to cluster blocks togetherCLOB = character large objec•Supports only restricted operations12File Types•Unsorted (heap)•Sorted (e.g. by pid)13Modifications: Insertion•File is unsorted: add it to the end (easy )•File is sorted:–Is there space in the right block ?•Yes: we are lucky, store it there–Is there space in a neighboring block ?•Look 1-2 blocks to the left/right, shift records–If anything else fails, create overflow block14Overflow Blocks•After a while the file starts being dominated by overflow blocks: time to reorganizeBlockn-1BlocknBlockn+1Overflow15Modifications: Deletions•Free space in block, shift records•Maybe be able to eliminate an overflow block•Can never really eliminate the record, because others may point to it–Place a tombstone instead (a NULL record)How can we point to a record in an RDBMS ?16Modifications: Updates•If new record is shorter than previous, easy •If it is longer, need to shift records, create overflow blocks17PointersLogical pointer to a record consists of:•Logical block number•An offset in the block’s headerNote: review what a pointer in C isWhere do weneed themin RDBMS ?Indexes•An index on a file speeds up selections on the search key fields for the index.–Any subset of the fields of a relation can be the search key for an index on the relation.–Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).•An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k.19Index Classification•Primary/secondary–Primary = may reorder data according to index–Secondary = cannot reorder data•Clustered/unclustered–Clustered = records close in the index are close in the data–Unclustered = records close in the index may be far in the data•Dense/sparse–Dense = every key in the data appears in the index–Sparse = the index contains only some keys•B+ tree / Hash table / …20Primary Index•File is sorted on the index attribute•Dense index: sequence of (key,pointer) pairs1020304050607080102030405060708021Primary Index•Sparse index1030507090110130150102030405060708022Secondary Indexes•To index other attributes than primary key•Always dense (why ?)1010202020303030203030201020103023Clustered/Unclustered•Primary indexes = usually clustered•Secondary indexes = usually unclusteredClustered vs. Unclustered IndexData entries(Index File)(Data file)Data RecordsData entriesData RecordsCLUSTEREDUNCLUSTERED25Secondary Indexes•Applications:–index other attributes than primary key–index unsorted files (heap files)–index clustered data26B+ Trees•Search trees•Idea in B Trees:–make 1 node = 1 block•Idea in B+ Trees:–Make leaves into a linked list (range queries are easier)27•Parameter d = the degree•Each node has >= d and <= 2d keys (except root)•Each leaf has >=d and <= 2d keys:B+ Trees Basics30 120 240Keys k < 30Keys 30<=k<120Keys 120<=k<240 Keys 240<=k40 50 604050 60Next leaf28B+ Tree Example8020 60 100 120 14010 15 18 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 90d = 2Find the key 4040  8020 < 40  6030 < 40  4029B+ Tree Design•How large d ?•Example:–Key size = 4 bytes–Pointer size = 8 bytes–Block size = 4096 byes•2d x 4 + (2d+1) x 8 <= 4096•d = 17030Searching a B+ Tree•Exact key values:–Start at the root–Proceed down, to the leaf•Range queries:–As above–Then sequential traversalSelect nameFrom peopleWhere age = 25Select nameFrom peopleWhere age = 25Select nameFrom peopleWhere 20 <= age and age <= 30Select nameFrom peopleWhere 20 <= age and age <= 30B+ Trees in Practice•Typical order: 100. Typical fill-factor: 67%.–average fanout = 133•Typical capacities:–Height 4: 1334 = 312,900,700 records–Height 3: 1333 = 2,352,637 records•Can often hold top levels in buffer pool:–Level 1 = 1 page = 8


View Full Document

UW CSE 444 - Lecture Notes

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?