UMD CMSC 424 - Class Notes - D1691502

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 424> Class Notes

DOC PREVIEW

UMD CMSC 424 - Class Notes

School name University of Maryland, College Park

Course Cmsc 424- Database Design

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Quick Review of Apr 17 materialTodayQuery ProcessingQuery Processing DiagramQuery Processing StepsQuery Processing ExampleQuery Processing MetadataQuery Processing Metadata (2)Selection for Equality (no indices)Selection for Inequality (no indices)Projection on A1Selection (Indexed Scan) for EqualitySelection (Indexed Scan) for InequalityComplex SelectionsComplex Selections with IndicesQuick Review of Apr 17 material •Multiple-Key Access–There are good and bad ways to run queries on multiple single keys•Indices on Multiple Attributes–Combining two keys into a single concatenated attribute–Grid files•crude array with one dimension and linear scale for each attribute•more than one cell may point to a given bucket of values•array grid may be dynamically resized during use–Other alternatives are spatial databases: R-tree, quad-trees, k-d tree•Bitmap Indices–linear array of bits: bit j is set if tuple j has the attribute that this bitmap tracks (e.g., “Moonroof”: bit j is 1 if record j is a car with moonroof)–Queries are answered by combining several bitmaps using and, or, notToday •HW #4: due Thursday April 24 (next class)–Questions: 12.11, 12.12, 12.13, 12.16•No HW for next week•Today:–Start Chapter 13: Query ProcessingQuery Processing•SQL is good for humans, but not as an internal (machine) representation of how to calculate a result•Processing an SQL (or other) query requires these steps:–parsing and translation•turning the query into a useful internal representation in the extended relational algebra–optimization•manipulating the relational algebra query into the most efficient form (one that gets results the fastest)–evaluation•actually computing the results of the queryQuery Processing DiagramQuery Processing Steps1. parsing and translation–details of parsing are covered in other places (texts and courses on compilers). We’ve already covered SQL and relational algebra; translating between the two should be relatively familiar ground2. optimization–This is the meat of chapter 13. How to figure out which plan, among many, is the best way to execute a query3. evaluation–actually computing the results of the query is mostly mechanical (doesn’t require much cleverness) once a good plan is in place.Query Processing Example•Initial query: select balancefrom accountwhere balance<2500•Two different relational algebra expressions could represent this query:–sel balance<2500(Pro balance(account))–Pro balance( sel balance<2500(account))•which choice is better? It depends upon metadata (data about the data) and what indices are available for use on these operations.Query Processing Metadata•Cost parameters (some are easy to maintain; some are very hard -- this is statistical info maintained in the system’s catalog)–n(r ): number of tuples in relation r–b(r ): number of disk blocks containing tuples of relation r–s(r ): average size of a tuple of relation r–f(r ): blocking factor of r: how many tuples fit in a disk block–V(A,r): number of distinct values of attribute A in r. (V(A,r)=n(r ) if A is a candidate key)–SC(A,r): average selectivity cardinality factor for attribute A of r. Equivalent to n(r )/V(A,r). (1 if A is a key)–min(A,r): minimum value of attribute A in r–max(A,r): maximum value of attribute A in rQuery Processing Metadata (2)•Cost parameters are used in two important computations:–I/O cost of an operation–the size of the result•In the following examination we’ll find it useful to differentiate three important operations:–Selection (search) for equality (R.A1=c)–Selection (search) for inequality (R.A1>c) (range queries)–Projection on attribute A1Selection for Equality (no indices)•Selection (search) for equality (R.A1=c)–cost (sequential search on a sorted relation) =b(r )/2 average unsuccessfulb(r )/2 + SC(A1,r) -1 average successful–cost (binary search on a sorted relation) =log b(r ) average unsuccessfullog b(r ) + SC(A1,r) -1 average successful–size of the result n(select(R.A1=c)) =SC(A1,r) = n(r )/V(A1,r)Selection for Inequality (no indices)•Selection (search) for inequality (R.A1>c)–cost (file unsorted) = b(r )–cost (file sorted on A1) =b(r )/2 + b(r )/2 (if we assume that half the tuples qualify)b(r ) in general (regardless of the number of tuples that qualify. Why?)–size of the result = depends upon the query; unpredictableProjection on A1•Projection on attribute A1–cost = b(r )–size of the result n(Pro(R,A1)) =V(A1,r)Selection (Indexed Scan) for EqualityPrimary Index on key:cost = (height+1) unsuccessfulcost = (height+1) +1 successfulPrimary (clustering) Index on non-key:cost = (height+1) + SC(A1,r)/f(r )all tuples with the same value are clusteredSecondary Indexcost = (height+1) + SC(A1,r)tuples with the same value are scatteredSelection (Indexed Scan) for InequalityPrimary Index on key: search for first value and then pick tuples >= valuecost = (height+1) +1+ size of the result (in disk pages)= height+2 + n(r ) * (max(A,r)-c)/(max(A,r)-min(A,r))/f(r )Primary (clustering) Index on non-key:cost as above (all tuples with the same value are clustered)Secondary (non-clustering) Indexcost = (height+1) +B-treeLeaves/2 + size of result (in tuples)= height+1 + B-treeLeaves/2 + n(r ) * (max(A,r)-c)/(max(A,r)-min(A,r))Complex Selections•Conjunction (select where theta1 and theta2)(s1 = # of tuples satisfying selection condition theta1)combined SC = (s1/n(r )) * (s2/n(r )) = s1*s2/n(r )2assuming independence of predicates•Disjunction (select where theta1 or theta2)combined SC = 1 - (1 - s1/n(r )) * (1 - s2/n(r ))= s1/n(r )) + s2/n(r ) - s1*s2/n(r )2•Negation (select where not theta1)n(! Theta1) = n(r ) - n(Theta1)Complex Selections with IndicesGOAL: apply the most restrictive condition first and combined use of multiple indices to reduce the intermediate results as early as possible•Why? No index will be available on intermediate results!•Conjunctive selection using one index B: –select using B and then apply remaining predicates on intermediate results•Conjunctive selection using a composite key index (R.A1, R.A2): –create a composite key or range from the query values and search directly (range search on the first attribute (MSB of the composite key) only)•Conjunctive selection using two indices B1 and B2: –search each separately and intersect the tuple identifiers (TIDs) •Disjunctive selection using two

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

UMD CMSC 424 - Class Notes

Sign up for free to view:

Please select your school