DOC PREVIEW
UW CSE 444 - Lecture Notes

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11/2/10 1 Introduction to Database Systems CSE 444 Lecture 16: Data Storage and Indexes Magda Balazinska - CSE 444, Fall 2010 1 About the Midterm • Open book and open notes – But you won’t have time to read during midterm! – No laptops, no mobile devices • Four questions: 1. SQL 2. ER Diagrams / Database design 3. Transactions - recovery 4. Transactions - concurrency control CSE 444 - Spring 2009 2 More About the Midterm • Review Lectures 1 through 15 – Read the lecture notes carefully – Read the book for extra details, extra explanations • Review Project 1 (Project 2 not on any exam) • Review HW1 and HW2 • Take a look at sample midterms & finals CSE 444 - Spring 2009 3 Where We Are • How to use a DBMS as a: – Data analyst: SQL, SQL, SQL,… – Application programmer: JDBC, XML,… – Database administrator: tuning, triggers, security – Massive-scale data analyst: Pig/MapReduce • How DBMSs work: – Transactions – Data storage and indexing – Query execution • Databases as a service 4 Magda Balazinska - CSE 444, Fall 2010 5 Outline • Storage model • Index structures (Section 14.1) – [Old edition: 13.1 and 13.2] • B-trees (Section 14.2) – [Old edition: 13.3] Magda Balazinska - CSE 444, Fall 2010 Storage Model • DBMS needs spatial and temporal control over storage – Spatial control for performance – Temporal control for correctness and performance • Solution: Buffer manager inside DBMS (see past lectures) • For spatial control, two alternatives – Use “raw” disk device interface directly – Use OS files Magda Balazinska - CSE 444, Fall 2010 611/2/10 2 Magda Balazinska - CSE 444, Fall 2010 Spatial Control Using “Raw” Disk Device Interface • Overview – DBMS issues low-level storage requests directly to disk device • Advantages – DBMS can ensure that important queries access data sequentially – Can provide highest performance • Disadvantages – Requires devoting entire disks to the DBMS – Reduces portability as low-level disk interfaces are OS specific – Many devices are in fact “virtual disk devices” 7 Magda Balazinska - CSE 444, Fall 2010 Spatial Control Using OS Files • Overview – DBMS creates one or more very large OS files • Advantages – Allocating large file on empty disk can yield good physical locality • Disadvantages – OS can limit file size to a single disk – OS can limit the number of open file descriptors – But these drawbacks have mostly been overcome by modern OSs 8 Magda Balazinska - CSE 444, Fall 2010 Commercial Systems • Most commercial systems offer both alternatives – Raw device interface for peak performance – OS files more commonly used • In both cases, we end-up with a DBMS file abstraction implemented on top of OS files or raw device interface 9 10 Outline • Storage model • Index structures (Section 14.1) – [Old edition: 13.1 and 13.2] • B-trees (Section 14.2) – [Old edition: 13.3] Magda Balazinska - CSE 444, Fall 2010 11 Database File Types The data file can be one of: • Heap file – Set of records, partitioned into blocks – Unsorted • Sequential file – Sorted according to some attribute(s) called key “key” here means something else than “primary key” Magda Balazinska - CSE 444, Fall 2010 Index • A (possibly separate) file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: – The key = an attribute value – The value = either a pointer to the record, or the record itself 12 “key” (aka “search key”) again means something else Magda Balazinska - CSE 444, Fall 201011/2/10 3 13 Index Classification • Clustered/unclustered – Clustered = records close in index are close in data – Unclustered = records close in index may be far in data • Primary/secondary – Meaning 1: • Primary = is over attributes that include the primary key • Secondary = otherwise – Meaning 2: means the same as clustered/unclustered • Organization: B+ tree or Hash table Magda Balazinska - CSE 444, Fall 2010 Clustered/Unclustered • Clustered – Index determines the location of indexed records – Typically, clustered index is one where values are data records (but not necessary) • Unclustered – Index cannot reorder data, does not determine data location – In these indexes: value = pointer to data record Magda Balazinska - CSE 444, Fall 2010 14 15 Clustered Index • File is sorted on the index attribute • Only one per table 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 Index File Data File 16 Unclustered Index • Several per table 10 10 20 20 20 30 30 30 20 30 30 20 10 20 10 30 Magda Balazinska - CSE 444, Fall 2010 Clustered vs. Unclustered Index Data entries (Index File) (Data file) Data Records Data entries Data Records CLUSTERED UNCLUSTERED B+ Tree B+ Tree 17 Magda Balazinska - CSE 444, Fall 2010 • More commonly, in a clustered B+ Tree index, data entries are data records Hash-Based Index Example CSE 444 - Spring 2009 18 10 21 20 20 30 18 40 19 50 22 60 18 70 21 80 19 H1 h1(sid) = 00 h1(sid) = 11 sid Example hash-based index on sid (student id) This is a primary index because it determines the location of indexed records In this case, data entries in the index are actual data records There is no separate data file This index is also clustered Index File Hash function h111/2/10 4 Hash-Based Index Example 2 CSE 444 - Spring 2009 19 18 18 20 22 19 21 21 19 10 21 20 20 30 18 40 19 50 22 60 18 70 21 80 19 H2 age h2(age) = 00 h2(age) = 01 Secondary index Data entries in index are (key,RID) pairs Unclustered index Data File Index File Magda Balazinska - CSE 444, Fall 2010 Hash-Based Index 18 18 20 22 19 21 21 19 10 21 20 20 30 18 40 19 50 22 60 18 70 21 80 19 H1 h1(sid) = 00 h1(sid) = 11 sid H2 age h2(age) = 00 h2(age) = 01 Another example of clustered/primary index Another example of unclustered/secondary index Good for point queries but not range queries 20 21 Outline • Storage model • Index structures (Section 14.1) – [Old edition: 13.1 and 13.2] • B-trees (Section 14.2) – [Old edition: 13.3] Magda Balazinska - CSE 444,


View Full Document

UW CSE 444 - Lecture Notes

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?