UVA CS 662 - Physical Data Organization - D3043993

Home> Schools> University Of Virginia> Computer Science (CS) > CS 662> Physical Data Organization

DOC PREVIEW

UVA CS 662 - Physical Data Organization

School name University Of Virginia

Course Cs 662- Database Systems

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

UVA DEPARTMENT OF COMPUTER SCIENCEPhysical-1Physical Data OrganizationDatabase design using logical model of the database- appropriate level for users to focus on- user independence from implementation detailsPerformance- other major factor in user satisfaction- depends on- efficient data structures for data representation- efficiency of system operation on those structuresDisk access- one of the most critical factors in performance- main memory is in general not big enough for entire DB- recovery problem with main memory DB- disk contains data files and system files includingdata dictionary and index filesUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-2Storage Media HierarchyStorage medium: primary storage and secondary storage- database is stored physically on some some storage medium- primary storage: can be operated directly by CPU--- main memory & cache- secondary storage: larger capacity, lower cost, slower access;cannot be operated directly by CPU; must be copied to primaryHierarchy- access speed, cost per unit of data, reliability- cache: fastest and most costly- main memory- flash memory: limited number of writes (also slow)non-volatile: disk-substitute in embedded systems- magnetic disk and optical disk (CD-ROM)- tape storage: sequential access; for backup and archivalUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-3Disk Access and Buffer ManagementDisk- direct access storage device (not sequential)- arm movement involves seek time and latency time- goal is to reduce # of disk access and seek time- a block need not to be transferred every time- buffer blocks: closely related with concurrency controland recovery strategy of the database systemBuffer management- goal is to increase hit ratio- similar to virtual memory management in OS- differences: forced writing for recovery andMRU (most recently used first) replacement algorithm- priority-based replacement: data dictionary andindex blocks have high priorityUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-4RAIDRedundant arrays of independent disks- motivation: large # of small disks might be costeffective; higher reliability and higher performanceHigher reliability by redundancy- mirroring/shadowing: a logical disk consists of twophysical disks --- write on bothHigher performance by parallelism- data striping: splitting data across multiple disks- bit-level or block-level striping- with n disks, block i will go to disk (i mod n) + 1RAID levels- to provide redundancy at lower cost using disk stripingcombined with error-correcting bits, instead of mirroringUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-5File OrganizationFile- a sequence of records mapped unto disk blocks- block: unit of data transfer between disk and memory- block size ranges from 512 bytes to few Kbytes- fixed-length records vs variable-length recordsFixed-length records- size of each field is declared- when delete, mark it to be ignored: searching fordeleted free space may not be efficient- use pointer for free space: danger of dangling pointerwhich no longer points to the desired record- problem of interblock records: needs 2 accesses... block i) (block i+1 ...----------record jUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-6Variable-length RecordsWhen such situations occur?- multiple record types in one file- record type allows variable length fields- repeating groups (multiple values)Methods to deal with them- byte string representation: special end-of-recordsymbol ( ) at the end of each record- each record is a string of consecutive bytes- difficulty in reusing the space of deleted record- fixed-length representation:1) reserved space for expected maximum length- useful only if most are close to max. length2) a list of fixed-length records chained by pointers3) anchor block (first record of the chain) and overflowblock (all the others) chained by pointersUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-7Mapping Data to FilesRelational database- straight-forward- in most cases, each relation in a separate fileFile organization- how to organize a given set of records in files- heap file: any record can be placed anywhere (no ordering)- sequential file: records are stored in a sequential order- hashing file: hash function computes the specific blockfor the record based some attribute value- clustering file: records of different relations stored onthe same file/block for efficient processing- related records can be read by one block read- may be inefficient for other operationsUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-8Efficient SearchingAdditional structures help searching- associated with files to make the search forrecords based on certain field more efficient- for direct data locating w/o sequential search- two approaches: indexing and hashingSequential file- records are chained together by pointersfor fast retrieval in search key order- records are stored physically in search key orderto minimize the number of block accesses- difficult to maintain the physical sequentialorder as records are inserted and deleted- binary search for files can be done on the blocksrather than on the records, if block address areavailable in the file headerUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-9Index StructuresIndex file- index is usually defined on a single fieldof a record (index field)- index file is for fast random accessDense index- one index record for every search-key value- faster access but higher overheadSparse index- index records for only some of the records- less faster but less overhead(Brighton) (record: Brighton, ..) (Brighton)(Downhill) (record: Downhill, ..)(Marinion) (record: Marinion, ..) (Marinion)dense index sparse indexUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-10Index StructuresHierarchy of index- multi-level index for a large index file- index tree (search tree)Primary and secondary index- primary index is the one whose search key specifiesthe sequential order of the file- secondary index: index other than primary one- secondary index improves the performance of queriesthat use keys other than the primary search key- modifying DB imposes a serious overhead on secondaryindex (compared to the primary index)- dense index is desirable than sparse index forsecondary index, since the file is not orderedphysically according to the secondary indexUVA DEPARTMENT OF COMPUTER SCIENCEPhysical-11Clustering IndexClustering field- a non-key field that does not have a distinct valuefor each record, on which records of a file arephysically orderedClustering index- clustering index is

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

UVA CS 662 - Physical Data Organization

Sign up for free to view:

Please select your school