DOC PREVIEW
UW CSE 444 - Lecture Notes

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Database SystemsCSE 444Lecture 15: Data Storage and IndexesCSE 444 - Summer 2010 1Where We Are• How to use a DBMS as a:– Data analyst: SQL, SQL, SQL,…Application programmer: JDBCXML–Application programmer: JDBC, XML,…– Database administrator: tuning, triggers, security– Massive-scale data analyst: Pig/MapReduce• How DBMSs work:– Transactions–Data storage and indexing–Data storage and indexing– Query execution • Databases as a serviceCSE 444 - Summer 2010 2Outline• Storage model• Index structures (Section 14.1)• B-trees (Section 14.2)CSE 444 - Summer 2010 3Storage Model• DBMS needs spatial and temporal control over storage– Spatial control for performanceTemporal control for correctness and performance–Temporal control for correctness and performance• Solution: Buffer manager inside DBMS (see past lectures)• For spatial control, two alternatives– Use “raw” disk device interface directlyUse OS files–Use OS filesCSE 444 - Summer 2010 4Spatial ControlSpatial ControlUsing “Raw” Disk Device Interface• Overview– DBMS issues low-level storage requests directly to disk deviceAd t•Advantages– DBMS can ensure that important queries access data sequentially – Can provide highest performance• DisadvantagesRequires devoting entire disks to the DBMS–Requires devoting entire disks to the DBMS – Reduces portability as low-level disk interfaces are OS specific– Many devices are in fact “virtual disk devices”CSE 444 - Summer 2010 5Spatial ControlSpatial ControlUsing OS Files• Overview– DBMS creates one or more very large OS filesAd t•Advantages– Allocating large file on empty disk can yield good physical locality• Disadvantages– OS can limit file size to a single diskOS can limit the number of open file descriptors–OS can limit the number of open file descriptors– But these drawbacks have mostly been overcome by modern OSsCSE 444 - Summer 2010 6Commercial Systems• Most commercial systems offer both alternatives– Raw device interface for peak performanceOS files more commonly used–OS files more commonly used• In both cases, we end-up with a DBMS file pabstraction implemented on top of OS files or raw device interfaceCSE 444 - Summer 2010 7Outline• Storage model• Index structures (Section 14.1)– [Old edition: 13.1 and 13.2]• B-trees (Section 14.2)()– [Old edition: 13.3]CSE 444 - Summer 2010 8Database File TypesThe data file can be one of:•Heap filep– Set of records, partitioned into blocks– Unsorted• Sequential file– Sorted according to some attribute(s) called key“key” here means something else than “primary key”CSE 444 - Summer 2010 9Index• A (possibly separate) file, that allows fast access to records in the data file• The index contains (key, value) pairs:– The key = an attribute value– The value = one of:• pointer to the record (secondary index)•orthe recorditself (primary index)or the record itself (primary index)“key” (aka “search key”) again means something elseCSE 444 - Summer 2010 10Index Classification• Clustered/unclustered– Clustered = data file is ordered by the index’s search key–Unclustered=otherwise–Unclustered= otherwise• Primary/secondary– Meaning 1: same as clustered/unclustered– Meaning 2:• Primary = index over set of fields that include the primary key• Secondary = not primary; index cannot reorder data, does not determine data location• Organization: B+ tree or Hash tableCSE 444 - Summer 2010 11Clustered Index• File is sorted on the index attribute•Only one per tableyp1010Index FileData File203040203040506070805060708012Unclustered Index• Several per table1020101020202030302020303030201020301030CSE 444 - Summer 2010 13Clustered vs. Unclustered IndexDt tiB+ TreeB+ TreeData entries(Index File)(Data file)Data entries(Data file)Data RecordsData RecordsCLUSTERED UNCLUSTEREDCSE 444 - Summer 2010 14Outline• Storage model• Index structures (Section 14.1)• B-trees (Section 14.2)CSE 444 - Summer 2010 15B+ Trees• Search trees• Idea in B Trees– Make 1 node = 1 block• Idea in B+ Trees– Make leaves into a linked list: facilitates range queriesCSE 444 - Summer 2010 16B+ Trees Basics• Parameter d = the degree• Each node has >=d and <= 2d keys(except root)y(p)30 120 240Each node alsohas m+1 pointers•Eachleaf has>=dand<=2d keysKeys k < 30Keys 30<=k<120Keys 120<=k<240 Keys 240<=kEach leaf has>dand < 2d keys40 50 60Next leaf4050 60CSE 444 - Summer 201017B+ Tree Exampled=2Fi d h k4080d 2Find the key 4040 ≤ 8020 60 100 120 14010 15 18 20 30 40 50 60 65 80 85 9020 < 40 ≤ 6030 < 40 ≤ 4010 15 18 20 30 40 50 60 65 80 85 90CSE 444 - Summer 2010 18Using a B+ TreeIndex onPeople(age)• Exact key values:– Start at the rootSelect nameIndex on People(age)– Proceed down, to the leafFrom PeopleWhere age = 25• Range queries:– As aboveTh ti l t lSelect name–Then sequential traversalFrom PeopleWhere 20 <= ageand age<=3019and age 30CSE 444 - Summer 2010B+ Tree Design• How large d ?•Example:p– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 bytes• 2d x 4 + (2d+1) x 8 <= 4096• d = 170CSE 444 - Summer 2010 20B+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%– average fanout = 133Tpicalcapacities•Typical capacities– Height 4: 1334= 312,900,700 records–Height 3: 1333=2,352,637 recordsHeight 3: 133 2,352,637 records• Can often hold top levels in buffer pool– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 Mbytes CSE 444 - Summer 2010 21Insertion in a B+ TreeInsert (K, P)• Find leaf where K belongs, insertIf no overflow (2d keys or less) halt•If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:parent K3parentK1 K2 K3 K4 K5P0P1P2P3P4P5K1 K2P0P1P2K4 K5P3P4P5K3 • If leaf, keep K3 too in right node•When root splits new root has 1 key onlyP0P1P2P3P4P5P0P1P2P3P4P5When root splits, new root has 1 key onlyCSE 444 - Summer 2010 22Insertion in a B+ TreeInsert K=1980Insert K1920 60 100 120 14010 15 18 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 90CSE 444 - Summer 2010 23Insertion in a B+ TreeAfter insertion80After insertion20 60 100 120 14010 15 18 19 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 9019CSE 444 - Summer 2010 24Insertion in a B+ TreeNow insert 258020 60 100 120 14010 15 18 19


View Full Document

UW CSE 444 - Lecture Notes

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?