MIT 6 830 - Problem Set 3 - D2139619

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 830> Problem Set 3

DOC PREVIEW

MIT 6 830 - Problem Set 3

School name Massachusetts Institute of Technology

Course 6 830- Database Systems

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 6.830 Problem Set 3 6.830 Problem Set 3 Assigned: 10/28 Due: 11/30 The purpose of this problem set is to give you some practice with concepts related to query optimization and concurrency control and recovery. 1 Parallel Query Processing Recall that the standard way to execute a query in a shared-nothing parallel database is to horizontally partition all tables across all nodes in the database. Recall also that hashing can be used to repartition one or both tables as needed to compute joins in a distributed fashion. Suppose you are trying to compute the following query: SELECT * FROM R,S WHERE R.a > v1 AND S.b > v2 AND R.c = S.d You are given the following: • Both tables are 5,000 MB, • The disk can read at 50 MB/sec (for the purposes of this problem, you may ignore differences between sequential and random I/O), • The network on each node can transmit data at 40 MB/sec, regardless of the number or rate at which other nodes are simultaneously transmitting, • A computer cannot send over the network and read or write from its disk at the same time, • The selectivity of both selection predicates is 0.2, • Each tuple in R joins with exactly one tuple in S, • Each machine in your distributed database has 500 MB of memory. Question 1: Suppose both tables are stored on a single node. Describe the best query plan for executing this query on that node. Question 2: Ignoring CPU costs, estimate the time to answer this query on a single node. Question 3: Now, suppose that the tables are hash-partitioned on R.a and S.b across a 4 node distributed database. Describe the best distributed query plan for executing this query. Question 4: Ignoring CPU costs, estimate the time to answer this query on the distributed database. Question 5: Now, suppose that the tables are hash-partitioned on R.c and S.d across a 4 node distributed database. Describe the best distributed query plan for executing this query. Question 6: Ignoring CPU costs, estimate the time to answer this query on the distributed database.6.830 Problem Set 3 2 2 Locking Below are 3 simpliﬁed tables from the Wikipedia database. page: (id int auto_increment, title char(100), latest int) revision: (id int auto_increment, comment char(100), pageid int, textid int) text: (id int auto_increment, content char(5000)) page.latest references revision.id, revision.pageid references page.id, and revision.textid ref-erences text.id. Each id ﬁeld starts at 0 and are ordered contiguously. There are clustered B+Trees on page.id, revision.id, and text.id, as well as a secondary B+Tree index on page.title. There are 10M pages, 200M revisions, and each revision has one text tuple (200M text tuples). Integers occupy 4 bytes. Data pages hold 8,000 bytes, and each index page holds 500 pointers and key values. You can assume that each page has the same number of revisions. For each of the following queries: • Indicate the query plan that the database would most likely use to answer the query. • Estimate the number of read (S) and write (X) locks that would be acquired by the execution of your plan assuming the use of page level locking. Assume that locks must be acquired on both index and data pages before they are read or written. • Estimate the the number of read and write locks that would be required by your plan assuming row locking? Suppose that locks must be acquired for each distinct value read from an index (e.g., if the actor with id “10” is read from the database, a single index lock is acquired.) Question 7: a. SELECT * FROM pages WHERE id > 5,000,000 b. INSERT INTO revision VALUES (default, ’some comment’, 1000, 10000) c. UPDATE revision SET comment = ’new comment’ WHERE pageid =25000 Question 8: Describe a workload where using table level locking will out perform page or row level locking. Please explicitly write down an example of the types of queries you have in mind and provide a quick explaination of your reasoning.6.830 Problem Set 3 3 3 ARIES Question 9: Suppose an ARIES-based database system (using page-level logging) runs the following transaction over a single-table database of gradstud with the schema <studid int, name char(100), area char(100), officeid int>. Assume that the table is stored in a heap ﬁle on disk but that records were inserted in ofﬁceid order, such that records in the ﬁle are sorted by ofﬁceid (but the database system does not maintain this sorted order as additional updates happen.) Each disk page is 2000 bytes. Due to the recession, the students are packed like sardines. Students 0-4 are in ofﬁce 1, students 5-9 are in ofﬁce 2, and students 10-14 are in ofﬁce 3. To make room for a particularly bright new student, existing students need to be shufﬂed around... transactionally. BEGIN TRANSACTION SELECT name FROM gradstud WHERE officeid = 1; UPDATE gradstud SET officeid = 2 WHERE studid = 1; ABORT; BEGIN TRANSACTION UPDATE gradstud SET officeid = 2 WHERE officeid = 3; INSERT INTO gradstud VALUES (15, ’Billy’, ’Quizboy’, 3) COMMIT If these are the only transactions that have ever run on the system, what will the contents of the log look like after they complete? Show the log records, indicating their type, the transaction they apply to, and any additional data needed to describe the state of the records. You don’t need to write out the contents of the before and after images in detail, but you should describe what they consist of. The ARIES paper mentions that CLRs allow it to avoid performing an UNDO more than once if there is a crash during recovery: “... ARIES will rollback only those actions that had not already been undone. This is possible since history is repeated for such transactions and

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

MIT 6 830 - Problem Set 3

Sign up for free to view:

Please select your school