UT Dallas CS 6360 - CS-6360 Chapter 17-FileStructuresHashing - D3104085

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6360> CS-6360 Chapter 17-FileStructuresHashing

DOC PREVIEW

UT Dallas CS 6360 - CS-6360 Chapter 17-FileStructuresHashing

School name University of Texas at Dallas

Course Cs 6360- Database Design

Pages 87

This preview shows page 1-2-3-4-5-6-40-41-42-43-44-82-83-84-85-86-87 out of 87 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 87 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Chapter 17 File Structures and Hashing CS 6360 Database Design Dr Chris Irwin Davis Email cid021000 utdallas edu Phone 972 883 3574 O ce ECSS 4 705 Placing File Records on Disk Operations on Files Files of Unordered Records Heap Files File of Ordered Records Sorted Files Hashing techniques Other Primary File Organizations Parallelizing Disk Access Using RAID New Storage Systems 2 Placing File Records On Disk Data is usually stored in the form of records Each record consists of a collection of related data values or items where each value is formed of one or more bytes and corresponds to a particular field of the record Records usually describe entities and their attributes 3 Placing File Records On Disk For example an EMPLOYEE record represents an employee entity and each field value in the record specifies some attribute of that employee such as Name Birth date Salary or Supervisor A collection of field names and their corresponding data types constitutes a record type or record format definition A data type associated with each field specifies the types of values a field can take 4 Data Types and Size The data type of a field is usually one of the standard data types used in programming e g numeric string Boolean sometimes Date and or Time The number of bytes required for each data type is fixed for a given computer system Software platform dependent Hardware platform dependent 5 Records For example an EMPLOYEE record type may be defined using the C programming language notation as the following structure What if an attribute is not a primitive type 6 BLOBs The need may arise for storing data items that consist of large unstructured objects which represent images digitized video or audio streams or free text These are referred to as BLOBs binary large objects A BLOB data item is typically stored separately from its record in a pool of disk blocks and a pointer to the BLOB is included in the record 7 Files Fixed Length Records and Variable Length Records Files Fixed Length Records and VariableLength Records A file is a sequence of records In many cases all records in a file are of the same record type If every record in the file has exactly the same size in bytes the file is said to be made up of fixed length records If different records in the file have different sizes the file is said to be made up of variable length records 9 Variable Length Records A file may have variable length records for several reasons The file records are of the same record type but one or more of the fields are of varying size variablelength fields For example the Name field of EMPLOYEE can be a variable length field The file records are of the same record type but one or more of the fields may have multiple values for individual records such a field is called a repeating field and a group of values for the field is often called a repeating group 10 Variable Length Records The file records are of the same record type but one or more of the fields are optional that is they may have values for some but not all of the file records optional fields The file contains records of different record types and hence of varying size mixed file This would occur if related records of different types were clustered placed together on disk blocks for example the GRADE REPORT records of a particular student may be placed following that STUDENT s record 11 Record Storage Formats 12 Record Storage Formats 13 Record Storage Formats 14 Operations on Files OPEN Readies the file for access and associates a pointer that will refer to a current file record at each point in time FIND Searches for the first file record that satisfies a certain condition and makes it the current file record FINDNEXT Searches for the next file record from the current record that satisfies a certain condition and makes it the current file record READ Reads the current file record into a program variable INSERT Inserts a new record into the file makes it the current file record 15 Operations on Files DELETE Removes the current file record from the file usually by marking the record to indicate that it is no longer valid MODIFY Changes the values of some fields of the current file record CLOSE Terminates access to the file REORGANIZE Reorganizes the file records For example the records marked deleted are physically removed from the file or a new organization of the file records is created FIND ORDERED Retrieves all the records in the file in some specified order READ ORDERED Read the file blocks in order of a specific field of the file 16 Unordered Records Heap Files Also called a heap or a pile file New records are inserted at the end of the file A linear search through the file records is necessary to search for a record This requires reading and searching half the file blocks on the average and is hence quite expensive Record insertion is quite efficient Reading the records in order of a particular field requires sorting the file records 17 Ordered Records Sorted Files Also called a sequential file File records are kept sorted by the values of an ordering field Insertion is expensive records must be inserted in the correct order It is common to keep a separate unordered overflow or transaction file for new records to improve insertion efficiency this is periodically merged with the main ordered file 18 Ordered Records Sorted Files A binary search can be used to search for a record on its ordering field value This requires reading and searching log2 of the file blocks on the average an improvement over linear search Reading the records in order of the ordering field is quite efficient 19 Memory Blocks of a Sorted File Other Strategies to Organize Data What if data were evenly distributed among memory buckets Which field to sort order by How to insure even distribution 21 Hashing Another type of primary file organization is based on hashing which provides very fast access to records under certain search conditions This organization is usually called a hash file The search condition must be an equality condition on a single field called the hash field In most cases the hash field is also a key field of the file in which case it is called the hash key 22 Hashing The idea behind hashing is to provide a function h called a hash function or randomizing function which is applied to the hash field value of a record and yields the address of the disk block in which the record is stored A search for the record within the block can be carried out in a main

View Full Document