UW CSE 444 - Study Notes - D2793204

Home> Schools> University of Washington> Computer Science and Engineering (CSE) > CSE 444> Study Notes

DOC PREVIEW

UW CSE 444 - Study Notes

School name University of Washington

Course Cse 444- Database Internals

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Computer Science & Engineering 444 – Final ExamJune 10, 2002Closed book & notes – 120 minutes100 points totalName: ____________________________ StuID:_____________________ Part Score1234Total 1. [30 points] Consider the following relational data:Id Time Title Composer Conductor3 1:01 Mad Rush J. Sibelius L. Bernstein4 1:47 Andante L. Beethoven NULL5 1:57Upon EnchantedGroundA. Hovhaness NULLId Parent Soloist13 3 A. Karis55 5 Y. Kondonassis56 5 F. Hendrickx57 5 H. CorynName:_________________________________________________________ a. [10 points] Write the XML document wnyc.xml obtained by exporting therelational data into the DTD specified below. <!ELEMENT wnyc (piece*)><!ELEMENT piece (time, title, composer, conductor?, soloist*><!ELEMENT time (#PCDATA)><!ELEMENT title (#PCDATA)><!ELEMENT composer (#PCDATA)><!ELEMENT conductor (#PCDATA)><!ELEMENT soloist (#PCDATA)><!ATTLIST piece id ID REQUIRED>Name:_________________________________________________________ b. [5 points] Write an XQuery to retrieve all conductors that conduct "Mad Rush" by"J. Sibelius". The result should not contain duplicates. c. [5 points] Write an XQuery to retrieve all soloists that conduct some pieces thatare broadcast at the same time as “Andante” (including “Andante”). The resultshould not contain duplicates.Name:_________________________________________________________ d. [10 points] What’s the result of the following XQuery when it is applied to theXML document in (a)<wnyc-by-solo> {FOR $s IN distinct-values (/wnyc/piece/soloist)RETURN <solo><name>{$s/text()}</name>{FOR $c IN distinct-values (//piece[soloist = $s/text()]/conductor)RETURN <conductor><name>{$c/text()}</name>{FOR $p IN (//piece[conductor = $c/text() AND soloist = $s/text()])RETURN <piece> { $p/title,$p/composer}</piece>}</conductor>}</solo> } </wnyc-by-solo>Name:_________________________________________________________ 2. [20 points] Consider the following relational data: Purchase (pname, date, buyer, seller) Product (name, price, manufacturer, category)o Relation Purchase contains 10000 tuples and has 50 tuples per page.o Relation Product contains 2000 tuples and has 10 tuples per page.o Attribute name is the primary key for Produceo Both relations are stored as simple heap files.o Neither relation has any indexes built on it.o 102 buffer pages are availableConsider the following query: Select name, seller, buyer From Product, Purchase Where Product.name = Purchase.pnamePlease answer the following questions:a. [5 points] What is the cost of joining Purchase and Product using a Page-oriented Nested Loops Join?b. [5 points] What is the cost of joining Purchase and Product using a HashJoin?Name:_________________________________________________________ c. [5 points] Suppose we had 205 pages of main memory buffer pages available.How would you do the join of Purchase and Product? What would be thecost of the join? d. [5 points] Now suppose we have a hash-index on pname of Purchase. Theindex is non-clustered. The records are NOT stored together with the keyvalues in the index. We need 1.2 I/O to get data entry in index page onaverage. What’s the cost of joining Purchase and Product using Index NestedLoop Join?Name:_________________________________________________________ 3. [26 points] Consider the following relational schema, that includes two relations: Author(pid, name) Paper(pid, title, year, citations) The Paper relation stores a set of published papers, including their publication ID,title, year of publication, and the number of citations by other papers. The Authorrelation stores for every paper ID, the set of author names (so there is a row in theAuthor table for every author). a. [7 points] Write an SQL query over the above schema that returns for everyauthor, the maximal number of citations on any of his/her publications, butonly considering papers that have at most 3 authors. The query returns a set ofpairs of the form (author, n), where n is the maximal number of citations on apaper of author.Name:_________________________________________________________ b. [6 points] Consider the following query, that returns for every year, themaximal number of citations for papers published that year, but only if thenumber of citations is more than 20: Select year, Max(citations) From Paper Group By year Having Max(citations) > 20. Can you propose a transformation on the above query that is likely to reduce thecost of evaluating the query?c. [5 points] Suppose the query was modified to also return the number of paperspublished in those years, i.e., Select year, Max(citations), Count(*) From Paper Group By year Having Max(citations) > 20. Would the optimization you proposed above still be valid? Why or why not?Name:_________________________________________________________ d. [8 points] Suppose we had a third relation in the schema, Cites(pid1, pid2),storing the actual citation references. That is, if the tuple (paper1, paper2) is inthe relation Cites, then paper1 cites paper2. Write a trigger that enforces thatwhenever a tuple (paper1, paper2) is inserted into the Cites relation, then theyear of publication of paper2 is not greater than the year of publication ofpaper1. When the condition is not satisfied, the tuple should not be added tothe relation.Name:_________________________________________________________ 4. [24 points] Answer the following short questions in 4 lines or less.a. What are the key differences between the Entity/Relationship and the ODL modeling formalisms?b. Is it always a good idea to push selections before joins in query execution plans? Why or why not? c. Give an example of where knowledge of a key constraint is useful for speeding up query processing.Name:_________________________________________________________ d. When is a sort-merge join preferable over a hash-join?e. When is a relation in Boyce-Codd Normal form?f. What is the difference between a sparse and dense

View Full Document

UW CSE 444 - Study Notes

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

UW CSE 444 - Study Notes

Sign up for free to view:

Please select your school