File Organizations and IndexingReview: Memory, Disks, & Buffer MgtContextFiles of RecordsRecord Formats: Fixed LengthRecord Formats: Variable LengthPage Formats: Fixed Length Records“Slotted Page” for Variable Length RecordsSystem CatalogsAttr_Cat(attr_name, rel_name, type, position)Alternative File OrganizationsUnordered (Heap) FilesHeap File Implemented as a ListHeap File Using a Page DirectoryCost Model for AnalysisSome Assumptions in the AnalysisCost of OperationsSorted FilesSlide 19Indexes: avoiding the extremesIndex OverviewFile Structure SummaryFile Organizations and IndexingLecture 4R&G Chapter 8"If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co., Consumer's Guide, 1897Review: Memory, Disks, & Buffer Mgt•Everything won’t fit in RAM (usually)•Hierarchy of storage, RAM, disk, tape•“Block” - unit of storage on disk•“Frame” – a block-sized chunk of memory•Allocate space on disk for fast access•Buffer pool management–Frames in RAM to hold blocks–Policy to move blocks between RAM & diskContextQuery Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBFiles of Records•Blocks interface for I/O, but…•Higher levels of DBMS operate on records, and files of records.•FILE: A collection of pages, each containing a collection of records. Must support:–insert/delete/modify record–fetch a particular record (specified using record id)–scan all records (possibly with some conditions on the records to be retrieved)•Note: typically page size = block size = frame size.Record Formats: Fixed Length•Information about field types same for all records in a file; stored in system catalogs.•Finding i’th field done via arithmetic.Base address (B)L1 L2L3 L4F1 F2F3 F4Address = B+L1+L2Record Formats: Variable Length•Two alternative formats (# fields is fixed): Second offers direct access to i’th field, efficient storage of nulls (special don’t know value); small directory overhead. $ $$ $Fields Delimited by Special SymbolsF1 F2 F3 F4F1 F2 F3 F4Array of Field OffsetsPage Formats: Fixed Length RecordsRecord id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable.Slot 1Slot 2Slot N. . .. . .N M10. . .M ... 3 2 1PACKEDUNPACKED, BITMAPSlot 1Slot 2Slot NFreeSpaceSlot M11number of recordsnumberof slots“Slotted Page” for Variable Length Records•Record id = <page id, slot #>•Can move records on page without changing rid; so, attractive for fixed-length records too.•Page is full when data space and slot array meet.Page iRid = (i,N)Rid = (i,2)Rid = (i,1)Pointerto startof freespaceSLOT DIRECTORYN . . . 2 12016 24N# slotsSlot ArrayData•Q: What if a record grows too big to fit in the page???System Catalogs•For each relation:–name, file location, file structure (e.g., Heap file)–attribute name and type, for each attribute–index name, for each index–integrity constraints•For each index:–structure (e.g., B+ tree) and search key fields•For each view:–view name and definition•Plus statistics, authorization, buffer pool size, etc. Catalogs are themselves stored as relations!Attr_Cat(attr_name, rel_name, type, position) attr_name rel_name type position attr_name Attribute_Cat string 1 rel_name Attribute_Cat string 2 type Attribute_Cat string 3 position Attribute_Cat integer 4 sid Students string 1 name Students string 2 login Students string 3 age Students integer 4 gpa Students real 5 fid Faculty string 1 fname Faculty string 2 sal Faculty real 3Alternative File OrganizationsMany alternatives exist, each good for some situations, and not so good in others:–Heap files: Suitable when typical access is a file scan retrieving all records.–Sorted Files: Best for retrieval in search key order, or only a `range’ of records is needed.–Clustered Files (with Indexes): A compromise between the above two extremes.Unordered (Heap) Files•Simplest file structure contains records in no particular order.•As file grows and shrinks, disk pages are allocated and de-allocated.•To support record level operations, we must:–keep track of the pages in a file–keep track of free space on pages–keep track of the records on a page•There are many alternatives for keeping track of this.–We’ll consider 2Heap File Implemented as a List •The header page id and Heap file name must be stored someplace.–Database “catalog”•Each page contains 2 `pointers’ plus data.HeaderPageDataPageDataPageDataPageDataPageDataPageDataPagePages withFree SpaceFull PagesHeap File Using a Page Directory•The entry for a page can include the number of free bytes on the page.•The directory is a collection of pages; linked list implementation is just one alternative.–Much smaller than linked list of all HF pages!DataPage 1DataPage 2DataPage NHeaderPageDIRECTORY•Q: How to find a particular record in a Heap file???Cost Model for AnalysisWe ignore CPU costs, for simplicity:–B: The number of data blocks–R: Number of records per block–D: (Average) time to read or write disk block•Measuring number of block I/O’s ignores gains of pre-fetching and sequential access; thus, even I/O cost is only loosely approximated. •Average-case analysis; based on several simplistic assumptions.–Often called a “back of the envelope” calculation. Good enough to show the overall trends!Some Assumptions in the Analysis•Single record insert and delete.•Equality selection - exactly one match (what if more or less???).•Heap Files:–Insert always appends to end of file.–Delete just leaves free space in the page.•Sorted Files:–Files compacted after deletions.–Selections on search key.Cost of Operations B: The number of data pagesR: Number of records per pageD: (Average) time to read or write disk pageHeap File Sorted File Clustered FileScan all recordsEquality Search(unique key)Range SearchInsertDeleteBD0.5 BDBD2D(0.5B+1)DSorted Files•Q: When do Heap files perform well? When don’t they?•Heap files are lazy on update - you end up paying on searches.•Sorted files eagerly maintain the file on update.–The opposite choice in the trade-off•Let’s consider an extreme version–No gaps allowed, pages fully
View Full Document