DOC PREVIEW
UMD CMSC 424 - Database Design

This preview shows page 1-2-3-4-29-30-31-32-33-60-61-62-63 out of 63 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1DatabasesQuery ProcessingOverview“Cost”Query ProcessingSelection OperationSelection OperationSelection OperationSelection w/ B+-Tree IndexesSelection OperationSelection OperationSelection OperationQuery ProcessingJoinNested-loops JoinNested-loops JoinNested-loops JoinBlock Nested-loops JoinBlock Nested-loops JoinBlock Nested-loops JoinIndex Nested-loops JoinIndex Nested-loops JoinIndex Nested-loops JoinSo far…Hash JoinHash JoinHash JoinHash JoinHash JoinHash JoinHash Join: IssuesHash Join: IssuesHash Join: IssuesHybrid Hash JoinHybrid Hash JoinSo far…Merge-Join (Sort-merge join)Merge-Join (Sort-merge join)Joins: SummaryQuery ProcessingSortingExternal sort-mergeExternal sort-mergeExample: External Sorting Using Sort-MergeExternal Merge Sort (Cont.)Query ProcessingGroup By and AggregationGroup By and AggregationGroup By and AggregationDuplicate EliminationSet operationsQuery ProcessingEvaluation of ExpressionsEvaluation of ExpressionsMaterializationPipeliningPipeliningPipelining: Demand-drivenHash-Join Iterator InterfaceHash-Join Iterator InterfacePipelining (Cont.)Recap: Query ProcessingCMSC424: Database DesignInstructor: Amol Deshpande [email protected]Data ModelsConceptual representation of the dataData RetrievalHow to ask questions of the databaseHow to answer those questionsData StorageHow/where to store data, how to access itData IntegrityManage crashes, concurrencyManage semantic inconsistenciesDatabasesQuery ProcessingOverviewSelection operation Join operatorsSortingOther operatorsPutting it all together…OverviewUserselect *from R, Swhere …R, B+Tree on R.aS, Hash Index on S.a…ResultsQuery ParserResolve the references,Syntax errors etc.Converts the query to an internal format relational algebra like Query OptimizerFind the best way to evaluate the query Which index to use ? What join method to use ? … Query ProcessorRead the data from the filesDo the query processing joins, selections, aggregates …“Cost”Complicated to computeWe will focus on disk:Number of I/Os ?Not sufficientNumber of seeks matters a lot… why ?tT – time to transfer one blocktS – time for one seekCost for b block transfers plus S seeks b * tT + S * tS Measured in secondsQuery ProcessingOverviewSelection operation Join operatorsSortingOther operatorsPutting it all together…Selection Operationselect * from person where SSN = “123”Option 1: Sequential ScanRead the relation start to end and look for “123”Can always be used (not true for the other options)Cost ?Let br = Number of relation blocksThen:1 seek and br block transfersSo:tS + br * tT secImprovements:If SSN is a key, then can stop when foundSo on average, br/2 blocks accessedSelection Operationselect * from person where SSN = “123”Option 2 : Binary Search:Pre-condition:The relation is sorted on SSNSelection condition is an equalityE.g. can’t apply to “Name like ‘%424%’”Do binary searchCost of finding the first tuple that matcheslog2(br) * (tT + tS)All I/Os are random, so need a seek for allThe last few are closeby, but we ignore such small effectsNot quite: What if 10000 tuples match the condition ?Incurs additional costSelection Operationselect * from person where SSN = “123”Option 3 : Use IndexPre-condition:An appropriate index must existUse the indexFind the first leaf page that contains the search keyRetrieve all the tuples that match by following the pointersIf primary index, the relation is sorted by the search keyGo to the relation and read blocks sequentiallyIf secondary index, must follow all pointers using the indexSelection w/ B+-Tree Indexesn * (tT + tS)n = number of records that matchThis can be badhi * (tT + tS)secondary index, not a key, equality1 * (tT + tS)hi * (tT + tS)secondary index, candidate key, equality1 * (tT + tS) + (b – 1) * tTNote: primary == sortedb = number of pages that contain the matcheshi * (tT + tS)primary index, not a key, equality1 * (tT + tS)hi * (tT + tS)primary index, candidate key, equalitycost of retrieving the tuplescost of finding the first leafhi = height of the indexSelection OperationSelections involving rangesselect * from accounts where balance > 100000select * from matches where matchdate between ’10/20/06’ and ’10/30/06’Option 1: Sequential scanOption 2: Using an appropriate indexCan’t use hash indexes for this purposeCost formulas:Range queries == “equality” on “non-key” attributesSo rows 3 and 5 in the preceding pageSelection OperationComplex selectionsConjunctive: select * from accounts where balance > 100000 and SSN = “123”Disjunctive: select * from accounts where balance > 100000 or SSN = “123”Option 1: Sequential scanOption 2 (Conjunctive only): Using an appropriate index on one of the conditionsE.g. Use SSN index to evaluate SSN = “123”. Apply the second condtion to the tuples that matchOr do the other way around (if index on balance exists)Which is better ?Option 3 (Conjunctive only) : Choose a multi-key index Not commonly availableSelection OperationComplex selectionsConjunctive: select * from accounts where balance > 100000 and SSN = “123”Disjunctive: select * from accounts where balance > 100000 or SSN = “123”Option 4: Conjunction or disjunction of record identifiersUse indexes to find all RIDs that match each of the conditionsDo an intersection (for conjunction) or a union (for disjunction)Sort the records and fetch them in one shotCalled “Index-ANDing” or “Index-ORing”Heavily used in commercial systemsQuery ProcessingOverviewSelection operation Join operatorsSortingOther operatorsPutting it all together…Joinselect * from R, S where R.a = S.aCalled an “equi-join”select * from R, S where |R.a – S.a | < 0.5Not an “equi-join”Option 1: Nested-loops for each tuple r in R for each tuple s in S check if r.a = s.a (or whether |r.a – s.a| < 0.5)Can be used for any join conditionAs opposed to some algorithms we will see laterR called outer relationS called inner relationNested-loops JoinCost ? Depends on the actual values of parameters, especially memorybr, bs  Number of blocks of R and Snr, ns  Number of


View Full Document

UMD CMSC 424 - Database Design

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Download Database Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?