Unformatted text preview:

Tree-Structured IndexesReview: Files, Pages, RecordsIndexes: IntroductionIndexes: OverviewIndexes: What Selections do they support?Example Tree IndexAlternatives for Data Entry k* in IndexAlternatives for Data Entries (Contd.)Slide 9Index Classification (continued)Clustered vs. Unclustered IndexUnclustered vs. Clustered IndexesCost of OperationsComposite Search KeysTree-Structured Indexes: IntroductionA Note of CautionRange SearchesISAMExample ISAM TreeISAM is a STATIC StructureExample: Insert 23*, 48*, 41*, 42*PowerPoint PresentationISAM ---- Issues?Administrivia - Exam Schedule ChangeB+ Tree: The Most Widely Used IndexExample B+ TreeA Note on TerminologyB+Tree PagesB+ Trees in PracticeInserting a Data Entry into a B+ TreeExample B+ Tree – Inserting 23*Example B+ Tree - Inserting 8*Data vs. Index Page Split (from previous example of inserting “8”)Deleting a Data Entry from a B+ TreeSlide 36Example Tree (including 8*) Delete 19* and 20* ...... And Then Deleting 24*Example of Non-leaf Re-distributionAfter Re-distributionPrefix Key CompressionBulk Loading of a B+ TreeBulk Loading (Contd.)Summary of Bulk LoadingA Note on `Order’SummarySummary (Contd.)Slide 48Tree-Structured IndexesCS 186, Spring 2006, Lectures 5 &6R & G Chapters 9 & 10“If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham LincolnReview: Files, Pages, Records•Abstraction of stored data is “files” of “records”.–Records live on pages–Physical Record ID (RID) = <page#, slot#>•Variable length data requires more sophisticated structures for records and pages. (why?)–Records: offset array in header–Pages: Slotted pages w/internal offsets & free space area•Often best to be “lazy” about issues such as free space management, exact ordering, etc. (why?)•Files can be unordered (heap), sorted, or kinda sorted (i.e., “clustered”) on a search key.–Tradeoffs are update/maintenance cost vs. speed of accesses via the search key.–Files can be clustered (sorted) at most one way.•Indexes can be used to speed up many kinds of accesses. (i.e., “access paths”)Indexes: Introduction•Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g.,–Find all students in the “CS” department–Find all students with a gpa > 3•An index on a file is a disk-based data structure that speeds up selections on the search key fields for the index.–Any subset of the fields of a relation can be the search key for an index on the relation.–Search key is not the same as key (e.g. doesn’t have to be unique ID).•An index contains a collection of data entries, and supports efficient retrieval of all records with a given search key value k. –Typically, index also contains auxiliary information that directs searches to the desired data entriesIndexes: Overview•Many indexing techniques exist:– B+ trees, hash-based structures, R trees, …•Can have multiple (different) indexes per file.–E.g. file sorted by age, with a hash index on salary and a B+tree index on name.•Index Classification–What selections does it support–Representation of data entries in index•i.e., what kind of info is the index actually storing?•3 alternatives here–Clustered vs. Unclustered Indexes–Single Key vs. Composite Indexes–Tree-based, hash-based, otherIndexes: What Selections do they support?•Selections of form field <op> constant•Equality selections (op is =)–Either “tree” or “hash” indexes help here.•Range selections (op is one of <, >, <=, >=, BETWEEN)–“Hash” indexes don’t work for these.•More exotic selections:–2-dimensional ranges (“east of Berkeley and west of Truckee and North of Fresno and South of Eureka”)•Or n-dimensional–2-dimensional distances (“within 2 miles of Soda Hall”)•Or n-dimensional–Ranking queries (“10 restaurants closest to Berkeley”)–Regular expression matches, genome string matches, etc.–Keyword/Web search - includes “importance” of words in documents, link structure, …Example Tree Index•Index entries:<search key value, page id> they direct search for data entries in leaves.•Example where each node can hold 2 entries;10* 15* 20* 27* 33* 37*40*46*51*55*63*97*20 3351 6340RootAlternatives for Data Entry k* in Index•Question: What is actually stored in the leaves of the index for key value “k”? (i.e., what are the “data entries”?)•Three alternatives:1. Actual data record(s) with key value k2. {<k, rid of matching data record>}3. <k, list of rids of matching data records>•Choice is orthogonal to the indexing technique.–e.g., B+ trees, hash-based structures, R trees, …Alternatives for Data Entries (Contd.)•Alternative 1: Actual data record (with key value k)–If this is used, index structure is a file organization for data records (like Heap files or sorted files).–At most one index on a given collection of data records can use Alternative 1. –This alternative saves pointer lookups but can be expensive to maintain with insertions and deletions.Alternatives for Data Entries (Contd.)Alternative 2 {<k, rid of matching data record>}and Alternative 3 <k, list of rids of matching data records>•Easier to maintain than Alt 1. •If more than one index is required on a given file, at most one index can use Alternative 1; rest must use Alternatives 2 or 3.•Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.•Even worse, for large rid lists the data entry would have to span multiple blocks!Index Classification (continued)•Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of index data entries, then called clustered index.–A file can be clustered on at most one search key.–Cost of retrieving data records through index varies greatly based on whether index is clustered or not!–Alternative 1 implies clustered, but not vice-versa.Clustered vs. Unclustered Index•Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file.– To build clustered index, first sort the Heap file (with some free space on each block for future inserts). –Overflow blocks may be needed for inserts. (Thus, order of data recs is `close to’, but not identical to, the sort order.)Index entriesData entriesdirect search for (Index File)(Data file)Data Recordsdata entriesData entriesData


View Full Document

Berkeley COMPSCI 186 - Tree-Structured Indexes

Documents in this Course
Load more
Download Tree-Structured Indexes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tree-Structured Indexes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tree-Structured Indexes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?