Unformatted text preview:

CMSC 424 Database design Lecture 20 Join size prediction Concurrency recovery Mihai Pop Admin Job Description Software Developer This Software Developer position is located at the U S Army Medical Research and Materiel Command MRMC Bioinformatics Cell BIC Ft Detrick MD Applicants should have a Bachelors or Masters degree in computer science engineering or a related discipline and an interest in pursuing the development of software tools that directly support the mission of the MRMC life scientists In particular the Software Developer supports biomedical research by transitioning proof of concept software prototypes to production quality systems for research applications This position requires substantial experience developing Web enabled client server systems with a database backend A strong knowledge of servelets CGI HTML JavaScript SQL XML Java applets and GNU tools is desirable Must know C and possess good knowledge of OO MySQL Oracle and Python Ruby LISP Operating system knowledge should include Unix variants BSD s GNU Linux and Windows OS The Software Developer will also help maintain the BIC s IT infrastructure Experience with installing and maintaining UNIX LINUX based server machines and a working knowledge of Local Area Networks MS Exchange based mail servers DNS servers and CISCO firewalls is a plus Please submit resume to Jaques Reifman Ph D Senior Research Scientist U S Army Medical Research and Material Command Ft Detrick MD Email reifman bioanalysis org Phone 301 619 7915 www bioanalysis org Statistical Information for Cost Estimation nr number of tuples in a relation r br number of blocks containing tuples of r lr size of a tuple of r fr blocking factor of r i e the number of tuples of r that fit into one block V A r number of distinct values that appear in r for attribute A same as the size of A r If tuples of r are stored together physically in a file then nr br f r Also instead of V A r can store a histogram Selection Size Estimation A v r nr V A r number of records that will satisfy the selection Equality condition on a key attribute size estimate 1 A V r case of A V r is symmetric Let c denote the estimated number of tuples satisfying the condition If min A r and max A r are available in catalog c 0 if v min A r c nr v min A r max A r min A r If histograms available can refine above estimate In absence of statistical information c is assumed to be nr 2 Size Estimation of Complex Selections The selectivity of a condition i is the probability that a tuple in the relation r satisfies i If si is the number of satisfying tuples in r the selectivity of i is given by si nr Conjunction 1 2 n r Assuming indepdence estimate of s1 s2 sn tuples in the result is nr nrn Disjunction 1 2 n r Estimated number of tuples s1 s2 sn nr 1 1 1 1 nr nr nr Negation r Estimated number of tuples nr size r Join Operation Running Example h Running example depositor customer Catalog information for join examples ncustomer 10 000 fcustomer 25 which implies that bcustomer 10000 25 400 ndepositor 5000 fdepositor 50 which implies that bdepositor 5000 50 100 V customer name depositor 2500 which implies that on average each customer has two accounts Also assume that customer name in depositor is a foreign key on customer V customer name customer 10000 primary key Estimation of the Size of Joins The Cartesian product r x s contains nr ns tuples each tuple occupies sr ss bytes If R S then r s is the same as r x s If R S is a key for R then a tuple of s will join with at most one tuple from r therefore the number of tuples in r s is no greater than the number of tuples in s If R S in S is a foreign key in S referencing R then the number of tuples in r s is exactly the same as the number of tuples in s The case for R S being a foreign key referencing S is symmetric In the example query depositor customer customer name in depositor is a foreign key of customer hence the result has exactly ndepositor tuples which is 5000 Estimation of the Size of Joins Cont If R S A is not a key for R or S If we assume that every tuple t in R produces tuples in R S the number of tuples in R S is estimated to be nr ns V A s If the reverse is true the estimate obtained will be nr ns V A r The lower of these two estimates is probably the more accurate one Can improve on above if histograms are available Use formula similar to above for each cell of histograms on the two relations Estimation of the Size of Joins Cont Compute the size estimates for depositor customer without using information about foreign keys V customer name depositor 2500 and V customer name customer 10000 The two estimates are 5000 10000 2500 20 000 and 5000 10000 10000 5000 We choose the lower estimate which in this case is the same as our earlier computation using foreign keys Size Estimation for Other Operations Projection estimated size of A r V A r Aggregation estimated size of AgF r V A r Set operations For unions intersections of selections on the same relation rewrite and use size estimate for selections E g 1 r 2 r can be rewritten as 1 2 r For operations on different relations estimated size of r s size of r size of s estimated size of r s minimum size of r and size of s estimated size of r s r All the three estimates may be quite inaccurate but provide upper bounds on the sizes Transactions concurrency recovery Until now we learned how to design an efficient database how to quickly answer queries Next how we ensure database is consistent equipment failures disk power internet concurrent accesses Overview Transaction A sequence of database actions enclosed within special tags Properties Atomicity Entire transaction or nothing Consistency Transaction executed completely takes database from one consistent state to another Isolation Concurrent transactions appear to run in isolation Durability Effects of committed transactions are not lost Consistency Transaction programmer needs to guarantee that DBMS can do a few things e g enforce constraints on the data Rest DBMS guarantees Assumptions and Goals Assumptions The system can crash at any time Similarly the power can go out at any point Contents of the main memory won t survive a crash or power outage BUT disks are durable They might stop but data is not lost Disks only guarantee atomic sector writes nothing more Transactions are by themselves consistent Goals Guaranteed durability atomicity As much concurrency as possible while not compromising isolation and or consistency Two transactions


View Full Document

UMD CMSC 424 - Lecture 20 Join size prediction Concurrency/recovery

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 20 Join size prediction Concurrency/recovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20 Join size prediction Concurrency/recovery and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?