DOC PREVIEW
Berkeley COMPSCI 186 - Query Processing 1: Joins and Sorting

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Query Processing 1: Joins and SortingAdministriviaReviewReview – SQL QueriesQuestionsWhy Sort?Streaming Data Through RAM2-Way SortTwo-Way External Merge SortGeneral External Merge SortCost of External Merge SortNumber of Passes of External SortInternal Sort AlgorithmMore on HeapsortI/O for External Merge SortNumber of Passes of Optimized SortSorting Records!Using B+ Trees for SortingClustered B+ Tree Used for SortingUnclustered B+ Tree Used for SortingExternal Sorting vs. Unclustered IndexSorting - ReviewSorting – Review (cont)A Related Topic: JoinsSchema for ExamplesEquality Joins With One Join ColumnSimple Nested Loops JoinIndex Nested Loops JoinExamples of Index Nested LoopsBlock Nested Loops JoinExamples of Block Nested LoopsSort-Merge Join (R S)Example of Sort-Merge JoinRefinement of Sort-Merge JoinHash-JoinObservations on Hash-JoinCost of Hash-JoinGeneral Join ConditionsConclusionsQuery Processing 1:Joins and SortingR&G, Chapters 12, 13, 14Lecture 8One of the advantages of being disorderly is that one is constantly making excitingdiscoveries. A. A. MilneAdministrivia•Homework 1 Due Tonight•You must be in groups of 3 to submit(11 without groups Wednesday PM, if you’re the last two you’re allowed to be a group of two)•Homework 2 Available over the WeekendReview•Data Modelling–Relational–E-R•Storing Data–File Indexes–Buffer Pool Management•Query Languages–SQL–Relational Algebra–Relational CalculusReview – SQL Queries•Data Definition Language (DDL)•Data Manipulation Language (DML)–Range variables in Select clause–Expressions in Select, Where clauses–Set operators between queries:•Union, Intersect, Except/Minus–Set operators in nested queries:•In, Exists, Unique, <op> Any, <op> All–Aggregates: Count, Sum, Avg, Min, Max–Group By–Group By/HavingQuestions•We learned that the same query can be written many ways.–How does DBMS decide which is best?•We learned about tree & hash indexes.–How does DBMS know when to use them?•Sometimes we need to sort data.–How to sort more data than will fit in memory?Why Sort?•A classic problem in computer science!•Database needs it in order–e.g., find students in increasing gpa order–first step in bulk loading B+ tree index.–eliminating duplicates–aggregating related groups of tuples–Sort-merge join algorithm involves sorting.•Problem: sort 1Gb of data with 1Mb of RAM.–why not virtual memory?Streaming Data Through RAM•An important detail for sorting & other DB operations•Simple case:–Compute f(x) for each record, write out the result–Read a page from INPUT to Input Buffer–Write f(x) for each item to Output Buffer–When Input Buffer is consumed, read another page–When Output Buffer fills, write it to OUTPUT•Reads and Writes are not coordinated–E.g., if f() is Compress(), you read many pages per write.–E.g., if f() is DeCompress(), you write many pages per read.f(x)RAMInputBufferOutputBufferOUTPUTINPUT2-Way Sort•Pass 0: Read a page, sort it, write it.–only one buffer page is used (as in previous slide)•Pass 1, 2, …, etc.:–requires 3 buffer pages–merge pairs of runs into runs twice as long–three buffer pages used.Main memory buffersINPUT 1INPUT 2OUTPUTDiskDiskTwo-Way External Merge Sort•Each pass we read + write each page in file.•N pages in the file => the number of passes•So total cost is: •Idea: Divide and conquer: sort subfiles and merge  log21N  2 12N Nlog Input file1-page runs2-page runs4-page runs8-page runsPASS 0PASS 1PASS 2PASS 393,46,29,4 8,7 5,6 3,123,45,62,6 4,9 7,81,3 22,34,64,78,91,35,6 22,34,46,78,91,23,561,22,33,44,56,67,8General External Merge Sort•To sort a file with N pages using B buffer pages:–Pass 0: use B buffer pages. Produce sorted runs of B pages each. –Pass 1, 2, …, etc.: merge B-1 runs.  N B/B Main memory buffersINPUT 1INPUT B-1OUTPUTDiskDiskINPUT 2. . .. . .. . . More than 3 buffer pages. How can we utilize them?Cost of External Merge Sort•Number of passes:•Cost = 2N * (# of passes)•E.g., with 5 buffer pages, to sort 108 page file:–Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) •Now, do four-way (B-1) merges–Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)–Pass 2: 2 sorted runs, 80 pages and 28 pages–Pass 3: Sorted file of 108 pages  11log /BN B 108 5/ 22 4/Number of Passes of External Sort N B=3 B=5 B=9 B=17 B=129 B=257100 7 4 3 2 1 11,000 10 5 4 3 2 210,000 13 7 5 4 2 2100,000 17 9 6 5 3 31,000,000 20 10 7 5 3 310,000,000 23 12 8 6 4 3100,000,000 26 14 9 7 4 41,000,000,000 30 15 10 8 5 4( I/O cost is 2N times number of passes)Internal Sort Algorithm•Quicksort is a fast way to sort in memory.•Alternative: “tournament sort” (a.k.a. “heapsort”, “replacement selection”)•Keep two heaps in memory, H1 and H2read B-2 pages of records, inserting into H1;while (records left) {m = H1.removemin(); put m in output buffer;if (H1 is empty)H1 = H2; H2.reset(); start new output run; elseread in a new record r (use 1 buffer for input pages);if (r < m) H2.insert(r);else H1.insert(r);}H1.output(); start new run; H2.output();More on Heapsort•Fact: average length of a run is 2(B-2)–The “snowplow” analogy•Worst-Case:–What is min length of a run?–How does this arise?•Best-Case:–What is max length of a run?–How does this arise?•Quicksort is faster, but … longer runs often means fewer passes!BI/O for External Merge Sort•Actually, doing I/O a page at a time–Not an I/O per record•In fact, read a block (chunk) of pages sequentially!•Suggests we should make each buffer (input/output) be a chunk of pages.–But this will reduce fan-out during merge passes!–In practice, most files still sorted in 2-3 passes.Number of Passes of Optimized SortN B=1,000 B=5,000 B=10,000100 1 1 11,000 1 1 110,000 2 2 1100,000 3 2 21,000,000 3 2 210,000,000 4 3 3100,000,000 5 3 31,000,000,000 5 4 3 Block size = 32, initial pass produces runs of size 2B.Sorting Records!•Sorting has become a blood sport!–Parallel sorting is the name of the game ...•Minute Sort: how many 100-byte records can you sort in a minute? –Typical DBMS: 10MB (~100,000 records)–Current World record: 21.8 GB•64 dual-processor Pentium-III PCs (1999)•Penny Sort: how many can you sort for a penny?–Current world record: 12GB•1380


View Full Document

Berkeley COMPSCI 186 - Query Processing 1: Joins and Sorting

Documents in this Course
Load more
Download Query Processing 1: Joins and Sorting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Processing 1: Joins and Sorting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Processing 1: Joins and Sorting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?