UH COSC 6340 - COSC 6340 HOMEWORK# 1 - D827267

Home> Schools> University of Houston> Computer Science (COSC) > COSC 6340> COSC 6340 HOMEWORK# 1

UH COSC 6340 - COSC 6340 HOMEWORK# 1

Pages 2

Download Save

Unformatted text preview:

Dr. Christoph F. EickGraded Homework1 COSC 6340Spring 2003Available Points: 72Due Dates: Problems 1-3: Submit solutions electronically to [email protected] (deadline We., March 19, 9p; no late solutions accepted!) and bring a hardcopy to the Th., March 20 class. Some solutions will be discussed in the March 20, 2003 class.Problems 4-5: Tu., April 1, 2003 in class or during my office hours (no late solutions accepted!)1) Implementing Joins [15]Assume we have to implement a natural join, denoted by ‘‘, between two relations R1(A, B, C) and R2(A, D). A is the primary key for R2 and R1.A is a foreign key (referring to R2.A) in R1 (R1[A] . R2[A]). R1 and R2 are stored as an unordered file (blocks are assumed to be 100% full); R1 contains 200000 tuples that are stored in 500 blocks, and R2 has 50000 tuples that are stored in 1000 blocks (that is, every R2.A value occurs at an average four time as a value of R1.A). Moreover, assume that only a small buffer size of 8 blocks is available. Based on these assumptions answer the following questions (indicate, if you assume in your computations that the output relation is written back to disk or not): a. How many tuples will the output relation R=R1  R2 contain? [1]b. What is the cost for implementing the natural join using the block-nested loop join? [2]c. Now assume that either a hashed index on A of R1 or a hashed index on A of R2 is available (assume that there are no overflow pages). Compute the cost for using the indexnested loops join using the index for R1.A and for using the index nested loops join for R2.A (2 computations][4]d. Is it possible to apply the hash-join in this case (explain your answer!)? [2]e. Which of the 4 proposed implementations is the best? [1]f. Now assume that the response time of the methods you evaluated is not satisfactory. What else could be done, to speed up R1 R2? [5]2) Relational Database Design [12]Assume a relation R(A, B, C, D, E) with the following functional dependencies is given:(1) A  BC(2) B  A(3) D  A(4) E  Da) What is (are) the candidate key(s) for R? [1]1b) Transform R into a relational schema that is in BCNF and which does not have any lost dependencies; if this is not possible decompose R into a schema that is in BCNF and has the fewest number of lost functional dependencies. Give the derivation tree for your solution and specify clearly, if and what functional dependencies are lost for a particular relational schema. [11]3) Reasoning with Dependencies [13]Consider the following relation R(A,B,C,D,E) with the following dependencies is given ( denotes a functional dependency and  denotes a multi-valued dependency):(1) A  C(2) B  DEa) Assume we decompose R into R1(A,B, C) and R2(C, D, E). Does this decomposition have the lossless join property --- is it possible to reconstruct R from R1 and R2 usinga natural join? Give reasons for your answer! [6]b) What is (are) the candidate key(s) of R? [1]c) Does ED hold for R? Give reasons for your answer (either show that the dependency can be inferred from (1) and (2), or give a counter example of the contents of R that satisfies (1) and (2) but which violates ED)! [6]4) Physical Database Design and Database Tuning [18]Solve problem 20.6 (problem 16.8 second edition) of the textbook (pages 689/493)5) Object Similarity Assessment [14]Assume the following relation Students(ssn, age, gpa, avg_class_rank, Country, Continent) that contains students that were admitted in the year 2000 into our undergraduate program is given. You can assume that  age is an integer; the maximum age is 50 the minimum age is 20, and the average age is 28 and the mean absolute deviation is 10. gpa denotes the UH COSC gpa; the average gpa is 2.8 and the mean absolute deviation is 0.6; the maximum gpa is 4.0 the minimum gpa is 0. Avg_class_rank has 5 values (4=top-5%, 3=top-15%, 2=top-25% 1=top_half, 0=bottom half) Assume that students from the same continent are more similar than students fromdifferent continents.a) Define a student similarity measure that considers gpa and class_rank of being of major importance, and age and country/continent of being of minor importance. [11]b) Using your (dis)similarity measure compute the (dis)similarity for the following 4 student tuples [3] : (111111111, 25, 2.8, 2, China, Asia)(222222222, 24, 3.7, 3, India, Asia)(333333333, 37, 2.9, 4, USA, America)(444444444, 24, 3.8, 4, USA,

View Full Document


School:
Email:
New Password:
Confirm Password:

UH COSC 6340 - COSC 6340 HOMEWORK# 1

Sign up for free to view:

Please select your school