DOC PREVIEW
UMD CMSC 424 - Lecture 20 Join size prediction Concurrency/recovery

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 424 – Database designLecture 20Join size predictionConcurrency/recoveryMihai PopAdminJob DescriptionSoftware DeveloperThis Software Developer position is located at the U.S. Army Medical Research and Materiel Command (MRMC)Bioinformatics Cell (BIC), Ft. Detrick, MD. Applicants should have a Bachelors or Masters degree in computerscience, engineering, or a related discipline and an interest in pursuing the development of software tools thatdirectly support the mission of the MRMC life scientists. In particular, the Software Developer supportsbiomedical research by transitioning proof of concept software prototypes to production quality systems forresearch applications. This position requires substantial experience developing Web enabled client/serversystems with a database backend. A strong knowledge of servelets, CGI, HTML, JavaScript, SQL, XML, Javaapplets, and GNU tools is desirable. Must know C and possess good knowledge of OO, MySQL/Oracle, andPython/Ruby/LISP. Operating system knowledge should include Unix variants (BSD’s, GNU/Linux) andWindows OS. The Software Developer will also help maintain the BIC’s IT infrastructure. Experience withinstalling and maintaining UNIX/LINUX based server machines, and a working knowledge of Local AreaNetworks, MS Exchange based mail servers, DNS servers and CISCO firewalls is a plus.Please submit resume to:Jaques Reifman, Ph.D.Senior Research ScientistU.S. Army Medical Research and Material CommandFt. Detrick, MDEmail: [email protected] Phone: 301-619-7915www.bioanalysis.orgStatistical Information for Cost Estimation•nr: number of tuples in a relation r.•br: number of blocks containing tuples of r.•lr: size of a tuple of r.•fr: blocking factor of r — i.e., the number of tuples of r that fit into one block.•V(A, r): number of distinct values that appear in r for attribute A; same as the size of ∏A(r).•If tuples of r are stored together physically in a file, then: •Also: instead of V(A,r) can store a histogram=rfrnrbSelection Size Estimation•σA=v(r)•nr / V(A,r) : number of records that will satisfy the selection•Equality condition on a key attribute: size estimate = 1•σA≤V(r) (case of σA ≥ V(r) is symmetric)–Let c denote the estimated number of tuples satisfying the condition. –If min(A,r) and max(A,r) are available in catalog•c = 0 if v < min(A,r)•c =– If histograms available, can refine above estimate–In absence of statistical information c is assumed to be nr / 2.nr.v−min  A , r max A , r−min  A , rSize Estimation of Complex Selections•The selectivity of a condition θi is the probability that a tuple in the relation r satisfies θi . – If si is the number of satisfying tuples in r, the selectivity of θi is given by si /nr.•Conjunction: σθ1∧ θ2∧. . . ∧ θn (r). Assuming indepdence, estimate of tuples in the result is:•Disjunction:σθ1∨ θ2 ∨. . . ∨ θn (r). Estimated number of tuples:•Negation: σ¬θ(r). Estimated number of tuples:nr – size(σθ(r))nrnrnsssn∗∗∗∗ . . . 21−∗∗−∗−−∗)1(...)1()1(121rnrrrnsnsnsnJoin Operation: Running ExampleRunning example: depositor customerCatalog information for join examples:•ncustomer = 10,000.•fcustomer = 25, which implies that bcustomer =10000/25 = 400.•ndepositor = 5000.•fdepositor = 50, which implies that bdepositor = 5000/50 = 100.•V(customer_name, depositor) = 2500, which implies that , on average, each customer has two accounts.–Also assume that customer_name in depositor is a foreign key on customer.–V(customer_name, customer) = 10000 (primary key!)hEstimation of the Size of Joins•The Cartesian product r x s contains nr .ns tuples; each tuple occupies sr + ss bytes.•If R ∩ S = ∅, then r s is the same as r x s. •If R ∩ S is a key for R, then a tuple of s will join with at most one tuple from r–therefore, the number of tuples in r s is no greater than the number of tuples in s.•If R ∩ S in S is a foreign key in S referencing R, then the number of tuples in r s is exactly the same as the number of tuples in s.•The case for R ∩ S being a foreign key referencing S is symmetric.•In the example query depositor customer, customer_name in depositor is a foreign key of customer– hence, the result has exactly ndepositor tuples, which is 5000Estimation of the Size of Joins (Cont.)•If R ∩ S = {A} is not a key for R or S.If we assume that every tuple t in R produces tuples in R S, the number of tuples in R S is estimated to be:If the reverse is true, the estimate obtained will be:The lower of these two estimates is probably the more accurate one.•Can improve on above if histograms are available–Use formula similar to above, for each cell of histograms on the two relations ),( sAVnnsr∗),( rAVnnsr∗Estimation of the Size of Joins (Cont.)•Compute the size estimates for depositor customer without using information about foreign keys:–V(customer_name, depositor) = 2500, andV(customer_name, customer) = 10000–The two estimates are 5000 * 10000/2500 - 20,000 and 5000 * 10000/10000 = 5000–We choose the lower estimate, which in this case, is the same as our earlier computation using foreign keys.Size Estimation for Other Operations•Projection: estimated size of ∏A(r) = V(A,r)•Aggregation : estimated size of AgF(r) = V(A,r)•Set operations– For unions/intersections of selections on the same relation: rewrite and use size estimate for selections•E.g. σθ1 (r) ∪ σθ2 (r) can be rewritten as σθ1 σθ2 (r)–For operations on different relations:•estimated size of r ∪ s = size of r + size of s. •estimated size of r ∩ s = minimum size of r and size of s.•estimated size of r – s = r.•All the three estimates may be quite inaccurate, but provide upper bounds on the sizes.Transactions, concurrency, recovery•Until now we learned–how to design an efficient database–how to quickly answer queries•Next: how we ensure database is consistent:–equipment failures (disk, power, internet...)–concurrent accessesOverview•Transaction: A sequence of database actions enclosed within special tags•Properties:–Atomicity: Entire transaction or nothing–Consistency: Transaction, executed completely, takes database from one consistent state to another–Isolation: Concurrent transactions appear to run


View Full Document

UMD CMSC 424 - Lecture 20 Join size prediction Concurrency/recovery

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Download Lecture 20 Join size prediction Concurrency/recovery
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 20 Join size prediction Concurrency/recovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20 Join size prediction Concurrency/recovery 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?