CS 511 Design of Database Management System Homework 1 Due 2 00 PM CST on September 12 2007 NOTE Please submit a hard copy of your homework Bring it to the lecture table at the beginning of the lecture on 12th September 2007 The hard copy should be as clearly readable as possible You may be subtracted points for unreadability and ugly presentation I2CS Students You should email your solutions to the TA termehch uiuc edu in the pdf or the word format Please send your email with subject CS511 HW1 and solution file attached by 2PM UIUC time CST Problem 1 Relational Model 20 Pts Briefly mention two flaws of the relational data model Relational model has bunch of shortcomings For instance 1 The relational model is not flexible enough to capture various formats of relationships among data in real application like semi structured data 2 Since model is quite complicated itself and recommends normalization its performance is relatively lower than simple file based models 3 It does not support customized operations for complex data types Problem 2 Relational Model 30 Pts Scientists use multiple sensors to gather temperature information of a specific object in scientific observations in order to obtain exact result Each sensor observes temperature information every second and sends the data to the main server to be stored on specific times A sensor would sometime report nothing null value or wrong value due to the physical problems Scientists have to use data from all sensors to find out the most precise temperature at each time Hence they merge and take average of the observed data for each time spot They also compute differences among data gathered in different times or compound them to see temperature behavior during longer time period One dimensional array is a good fit to represent this data Specify and formulate an exact data model to represent the data and an algebra that covers the required operations You could add additional data item s to the array structure in your model Problem asks for array based model In this model the basic elements instead of relation is one dimensional array Each observation could be modeled by an array A N M which stores observations over N time steps Extra M element could be used to store information like the starting time of observation and so on We use just the observation start time here In order to cover the final operations we will need the following basic operations 1 Value Selection S A i Given array name and element index returns value stored in the index Using this operator a new operator called Time Selection ST A could be defined which given array name returns starting time period of observation 2 Compound C A B Given two arrays with consecutive observation time period returns a new array whose size is sum of the input arrays and its starting time is the smaller input starting time 3 Valid VA value Given a value decides if it is valid A value is valid if it is in a specified range of values MIN MAX or not null 4 Basic mathematical operations among numbers It is easy to show that having a database management system based on the above model the required operations could be implemented We can formulate the required operations like difference among two arrays as basic operations but it is redundant Obviously there could be various solutions for this problem The solutions that had redundant operations were given complete points Relation based models were given complete points as well Problem 3 System R 30 Pts In relational database management systems like system R when user does not assign any value to a column in a row it is tagged by especial value called null 1 Give two different interpretations of having a null value in a column of type Boolean 2 Logical conjunction AND and logical disjunction OR consider just true and false as their input and output values For each interpretation in previous part create a new truth table for those operations that includes null value 1 NULL could represent any value in the domain which user does not know about at the time of data entry for instance if an employee is older than 25 Therefore it complies with the properties of Boolean algebra Operand 1 TRUE FALSE NULL Operand 2 NULL NULL NULL AND NULL FALSE NULL OR TRUE NULL NULL 2 Existence of NULL in some columns may be because to having some thing outside of the designed domain It usually arises due to the changes in the application domain For instance a database design might consider the marital status of an employee as married True or single False and not foresee separated or widow cases that might have different employment rules The truth table for this case will be the same as the previous case Other reasonable interpretations get points as well Codd indicated in his 1990 book The Relational Model for Database Management Version 2 that the single NULL mandated by the SQL standard was inadequate and should be replaced by two separate Null type markers to indicate the reason why data is missing These two Null type markers are commonly referred to as A Values and IValues representing Missing But Applicable and Missing But Inapplicable respectively Others have suggested adding additional Null type markers to Codd s recommendation to indicate even more reasons that a data value might be Missing increasing the complexity of SQL s logic system Problem 4 R Tree Indexing 20 Pts High dimensional data is data whose number of dimensions is more than say 15 Briefly mention one disadvantage of using R Tree index for high dimensional data 1 When the number of dimensions is very high user mostly submits queries containing only some of dimensions R Tree index causes unnecessary extra time to process these kinds of queries 2 Practical observation show that overlaps among nodes in R tree increases considerably for large number of dimensions Other reasonable arguments get complete points
View Full Document