Unformatted text preview:

1Chapter 12 1Chapter 12:Representing Data Elements(Slides by Hector Garcia-Molina,http://www-db.stanford.edu/~hector/cs245/notes.htm)Chapter 12 2• How to lay out data on disk• Sec. 12.3 and 15.7 (self-study): How to move data to memoryTopics for todayChapter 12 3Issues:Flexibility Space UtilizationComplexity Performance2Chapter 12 4To evaluate a given strategy, computefollowing parameters:-> space used for expected data-> expected time to- fetch record given key- fetch record with next key- insert record- append record- delete record- update record- read all file- reorganize fileChapter 12 5What are the data items we want to store?• a salary• a name• a date• a pictureWhat we have available: Bytes8bitsChapter 12 6To represent:• Integer (short): 2 bytese.g., 35 is00000000 00100011• Real, floating pointn bits for mantissa, m for exponent….3Chapter 12 7• Characters→ various coding schemes suggested,most popular is asciiTo represent:Example:A: 1000001a: 11000015: 0110101LF: 0001010Chapter 12 8• Booleane.g., TRUE FALSE1111 11110000 0000To represent:• Application specifice.g., RED → 1 GREEN → 3 BLUE → 2 YELLOW → 4 …Can we use less than 1 byte/code?Yes, but only if desperate...Chapter 12 9• Datese.g.: - Integer, # days since Jan 1, 1900 - 8 characters, YYYYMMDD - 7 characters, YYYYDDD(not YYMMDD! Why?)• Timee.g. - Integer, seconds since midnight - characters, HHMMSSFFTo represent:4Chapter 12 10• String of characters– Null terminatede.g.,– Length givene.g.,- Fixed lengthc tac ta3To represent:Chapter 12 11• Bag of bitsLength BitsTo represent:Chapter 12 12Key Point• Fixed length items• Variable length items- usually length given at beginning5Chapter 12 13• Type of an item: Tells us how tointerpret(plus size if fixed)AlsoChapter 12 14Data ItemsRecordsBlocksFilesMemoryOverviewChapter 12 15Record - Collection of related dataitems (called FIELDS)E.g.: Employee record:name field,salary field,date-of-hire field, ...6Chapter 12 16Types of records:• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTHChapter 12 17A SCHEMA (not record) containsfollowing information- # fields- type of each field- order in record- meaning of each fieldFixed formatChapter 12 18Example: fixed format and lengthEmployee record(1) E#, 2 byte integer(2) E.name, 10 char. Schema(3) Dept, 2 byte code55s m i t h0283j o n e s01Records7Chapter 12 19• Record itself contains format“Self Describing”Variable formatChapter 12 20Example: variable format and length4I52 4S DROF46Field name codes could also be strings, i.e. TAGS # FieldsCode identifying field as E#Integer typeCode for EnameString typeLength of str.Chapter 12 21Variable format useful for:• “sparse” records• repeating fields• evolving formatsBut may waste space...8Chapter 12 22• EXAMPLE: var format record with repeating fieldsEmployee can have children3 E_name: Fred Child: Sally Child: TomChapter 12 23Note: Repeating fields does not imply- variable format, nor- variable sizeJohn Sailing Chess --• Key is to allocate maximum number ofrepeating fields (if not used → null)Chapter 12 24Many variants betweenfixed - variable format:Ex. #1: Include record type in recordrecord type record lengthtells me whatto expect(i.e. points to schema)5 27 . . . .9Chapter 12 25Record header - data at beginningthat describes recordMay contain:- record type- record length- time stamp- other stuff ...Chapter 12 26Ex #2 of variant between FIXED/VAR format• Hybrid format– one part is fixed, other variableE.g.: All employees have E#, name, deptother fields vary.25 Smith Toy 2 retiredHobby:chess# of varfieldsChapter 12 27Other interesting issues:• Compression– within record - e.g. code selection– collection of records - e.g. find commonpatterns• Encryption10Chapter 12 28Next: placing records into blocksblocks ...a fileassume fixedlength blocksassume a single file (for now)Chapter 12 29(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split recordsOptions for storing records in blocks:Chapter 12 30Block(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)- within each record- in block header(1) Separating recordsR2R1 R311Chapter 12 31• Unspanned: records must be within oneblockblock 1 block 2 ...• Spannedblock 1 block 2...(2) Spanned vs. UnspannedR1 R2R1R3 R4 R5R2R3(a)R3(b)R6R5R4R7(a)Chapter 12 32need indication need indicationof partial record of continuation“pointer” to rest (+ from where?)R1 R2R3(a)R3(b)R6R5R4R7(a)With spanned records:Chapter 12 33• Unspanned is much simpler, but maywaste space…• Spanned essential ifrecord size > block sizeSpanned vs. unspanned:12Chapter 12 34Example106 recordseach of size 2,050 bytes (fixed)block size = 4096 bytesblock 1 block 2 2050 bytes wasted 2046 2050 bytes wasted 2046R1 R2• Total wasted = 2 x 109 Utiliz = 50%• Total space = 4 x 109Chapter 12 35• Mixed - records of different types(e.g. EMPLOYEE, DEPT)allowed in same block e.g., a block(3) Mixed record typesEMPe1DEPTd1DEPTd2Chapter 12 36Why do we want to mix?Answer: CLUSTERINGRecords that are frequentlyaccessed together should bein the same block13Chapter 12 37Compromise:No mixing, but keep relatedrecords in same cylinder ...Chapter 12 38ExampleQ1: select A#, C_NAME, C_CITY, …from DEPOSIT, CUSTOMERwhere DEPOSIT.C_NAME =CUSTOMER.C.NAMEa blockCUSTOMER,NAME=SMITHDEPOSIT,NAME=SMITHDEPOSIT,NAME=SMITHChapter 12 39• If Q1 frequent, clustering good• But if Q2 frequentQ2: SELECT * FROM CUSTOMERCLUSTERING IS COUNTER PRODUCTIVE14Chapter 12 40Fixed part in one blockTypically forhybrid formatVariable part in another block(4) Split recordsChapter 12 41Block with fixed recs.R1 (a)R1 (b)Block with variable recs.R2 (a)R2 (b)R2 (c)This block alsohas fixed recs.Chapter 12 42QuestionWhat is difference between- Split records- Simply using two differentrecord types?15Chapter 12 43(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split recordsOptions for storing records in blocksChapter 12 44(1) Insertion/Deletion(2) Buffer Management(3) Comparison of SchemesOther TopicsChapter 12 45BlockDeletionRx16Chapter 12 46Options:(a) Immediately reclaim space(b) Mark deleted– May need chain of deleted records(for re-use)– Need a way to mark:• special charactersChapter 12 47 As


View Full Document

NCSU CSC 440 - Representing Data Elements

Download Representing Data Elements
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Representing Data Elements and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Representing Data Elements 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?