UVA DEPARTMENT OF COMPUTER SCIENCEPhysical-1Physical Data OrganizationDatabase design using logical model of the database- appropriate level for users to focus on- user independence from implementation detailsPerformance- other major factor in user satisfaction- depends on- efficient data structures for data representation- efficiency of system operation on those structuresDisk access- one of the most critical factors in performance- main memory is in general not big enough for entire DB- recovery problem with main memory DB- disk contains data files and system files includingdata dictionary and index filesUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-2Storage Media HierarchyStorage medium: primary storage and secondary storage- database is stored physically on some some storage medium- primary storage: can be operated directly by CPU--- main memory & cache- secondary storage: larger capacity, lower cost, slower access;cannot be operated directly by CPU; must be copied to primaryHierarchy- access speed, cost per unit of data, reliability- cache: fastest and most costly- main memory- flash memory: limited number of writes (also slow)non-volatile: disk-substitute in embedded systems- magnetic disk and optical disk (CD-ROM)- tape storage: sequential access; for backup and archivalUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-3Disk Access and Buffer ManagementDisk- direct access storage device (not sequential)- arm movement involves seek time and latency time- goal is to reduce # of disk access and seek time- a block need not to be transferred every time- buffer blocks: closely related with concurrency controland recovery strategy of the database systemBuffer management- goal is to increase hit ratio- similar to virtual memory management in OS- differences: forced writing for recovery andMRU (most recently used first) replacement algorithm- priority-based replacement: data dictionary andindex blocks have high priorityUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-4RAIDRedundant arrays of independent disks- motivation: large # of small disks might be costeffective; higher reliability and higher performanceHigher reliability by redundancy- mirroring/shadowing: a logical disk consists of twophysical disks --- write on bothHigher performance by parallelism- data striping: splitting data across multiple disks- bit-level or block-level striping- with n disks, block i will go to disk (i mod n) + 1RAID levels- to provide redundancy at lower cost using disk stripingcombined with error-correcting bits, instead of mirroringUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-5File OrganizationFile- a sequence of records mapped unto disk blocks- block: unit of data transfer between disk and memory- block size ranges from 512 bytes to few Kbytes- fixed-length records vs variable-length recordsFixed-length records- size of each field is declared- when delete, mark it to be ignored: searching fordeleted free space may not be efficient- use pointer for free space: danger of dangling pointerwhich no longer points to the desired record- problem of interblock records: needs 2 accesses... block i) (block i+1 ...----------record jUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-6Variable-length RecordsWhen such situations occur?- multiple record types in one file- record type allows variable length fields- repeating groups (multiple values)Methods to deal with them- byte string representation: special end-of-recordsymbol ( ) at the end of each record- each record is a string of consecutive bytes- difficulty in reusing the space of deleted record- fixed-length representation:1) reserved space for expected maximum length- useful only if most are close to max. length2) a list of fixed-length records chained by pointers3) anchor block (first record of the chain) and overflowblock (all the others) chained by pointersUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-7Mapping Data to FilesRelational database- straight-forward- in most cases, each relation in a separate fileFile organization- how to organize a given set of records in files- heap file: any record can be placed anywhere (no ordering)- sequential file: records are stored in a sequential order- hashing file: hash function computes the specific blockfor the record based some attribute value- clustering file: records of different relations stored onthe same file/block for efficient processing- related records can be read by one block read- may be inefficient for other operationsUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-8Efficient SearchingAdditional structures help searching- associated with files to make the search forrecords based on certain field more efficient- for direct data locating w/o sequential search- two approaches: indexing and hashingSequential file- records are chained together by pointersfor fast retrieval in search key order- records are stored physically in search key orderto minimize the number of block accesses- difficult to maintain the physical sequentialorder as records are inserted and deleted- binary search for files can be done on the blocksrather than on the records, if block address areavailable in the file headerUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-9Index StructuresIndex file- index is usually defined on a single fieldof a record (index field)- index file is for fast random accessDense index- one index record for every search-key value- faster access but higher overheadSparse index- index records for only some of the records- less faster but less overhead(Brighton) (record: Brighton, ..) (Brighton)(Downhill) (record: Downhill, ..)(Marinion) (record: Marinion, ..) (Marinion)dense index sparse indexUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-10Index StructuresHierarchy of index- multi-level index for a large index file- index tree (search tree)Primary and secondary index- primary index is the one whose search key specifiesthe sequential order of the file- secondary index: index other than primary one- secondary index improves the performance of queriesthat use keys other than the primary search key- modifying DB imposes a serious overhead on secondaryindex (compared to the primary index)- dense index is desirable than sparse index forsecondary index, since the file is not orderedphysically according to the secondary indexUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-11Clustering IndexClustering field- a non-key field that does not have a distinct valuefor each record, on which records of a file arephysically orderedClustering index- clustering index is
View Full Document