3/22/11%1%Storage%and%Indexing%CISC437/637,%Lecture%#11%Ben%Cartere<e%1%Copyright%©%Ben%Cartere<e%Physical%Layer%• So%far%we%have%focused%on%highGlevel%database%layers%– Conceptual:%%EGR%model,%EGR%diagrams%– Logical:%%relaMonal%model,%SQL%• UlMmately%the%logical%layer%needs%to%be%implemented%in%data%structures%and%stored%in%files%on%disk%– The!physical!layer!• Understanding%how%the%files%are%organized%is%essenMal%to%using%a%DBMS%effecMvely%Copyright%©%Ben%Cartere<e% 3%3/22/11%2%Physical%Storage%Media%• The%actual%physical%objects%that%data%is%stored%in%• It%is%convenient%to%classify%media%according%to%characterisMcs:%– Speed%of%access%– Cost%per%unit%of%storage%– Reliability%• Loss%of%data%on%power%outage%or%crash%• Failure%of%device%– VolaMlity%• VolaM le%storage%loses%contents%when%shut%down%• NonGvolaMle%persists%during%shut%down%period%Copyright%©%Ben%Cartere<e% 4%Physical%Storage%Media%• Cache%–%very%fast;%very%expensive;%volaMle;%managed%by%hardware/OS%• Main!memory!–%fast;%comparaMvely%expensive;%volaMle%– Usually%too%expensive%to%allow%storage%of%enMre%DB%• Flash!memory%–%fast%read,%slower%write;%ro ughl y%as%costly%as%main%memory;%nonGvolaMle!Copyright%©%Ben%Cartere<e% 5%3/22/11%3%Physical%Storage%Media%• Magne3c!disk!–%slow;%cheap;%nonGvolaMle%– This%is%where%the%full%DB%would%be%stored%– DBMS%must%move%data%from%disk%to%memory%for%access,%and%from%memory%to%disk%for%storage%• Doi ng%th is%efficiently%is%the%driving%force%behind%storage%decisions%• Op3cal!storage%–%slow;%cheap;%nonGvolaMle%– CDs,%DVDs%• Tape%–%very%slow;%very%cheap;%nonGvolaMle%– Only%allows%sequen3al!access:%%data%accessed%in%storage%order%from%beginning%• Comp are%to%direct!access%offered%by%magneMc%disks%– Used%for%backups%and%archives%Copyright%©%Ben%Cartere<e% 6%Physical%Storage%Hierarchy%Copyright%©%Ben%Cartere<e% 7%primary%secondary%(onGline)%terMary%(offGline)%3/22/11%4%MagneMc%Disk%Mechanism%Copyright%©%Ben%Cartere<e% 8%MagneMc%Disk%Details%• Read<write!head%reads%and%writes%magneMcallyGencoded%informaMon%from%conMnuouslyGspinning%pla>ers!• A%pla<er%is%divided%into%circular%tracks%– 50,000%–%100,000%tracks%per%pla<er%• Tracks%are%divided%into%sectors%– The%smallest%unit%of%informaMon%that%can%be%stored%(typically%512%bytes)%– 500+%sectors/track%on%inner%tracks;%1000+%on%outer%• When%a%read/write%operaMon%executes,%head%goes%to%correct%track%and%reads/writes%data%from/to%sectors%as%they%pass%underneath%%Copyright%©%Ben%Cartere<e% 9%3/22/11%5%Measuring%Disk%Performance%• Access!3me%–%Mme%from%issue%of%read/write%request%to%when%data%transfer%begins%– Seek!3me%–%Mme%to%posiMon%the%read/write%head%over%the%correct%track%• Worst%case%=%Mme%to%move%from%innermost%track%to%outermost%• Average%case%=%½%worst%case%• 4%–%10%ms%on%typical%disks%– Rota3onal!latency%–%Mme%for%sector%being%accessed%to%appear%under%head%• Worst%case%=%full%rotaMon%• Average%case%=%½%worst%case%• 4%–%11%ms%on%typical%disks%Copyright%©%Ben%Cartere<e% 10%Measuring%Disk%Performance%• Data<transfer!rate%–%the%rate%at%which%data%can%be%retrieved%from%o r%stored%to%disk%– 25%–%100%MB%per%second%max;%lower%for%inner%tracks%• Mean!3me!to!failure%–%average%Mme%the%disk%is%expected%to%run%conMnuously%with%no%failure%– 3%–%5%years%Copyright%©%Ben%Cartere<e% 11%3/22/11%6%Disk%Blocks%and%Pages%• A%page%or%block%is%a%logical%unit%consisMng%of%a%fixed%number%of%conMguous%sectors%– Data%is%transferred%from%disk%to%main%memory%in%pages%– Usually%4KB%to%8KB%pages%• Smaller%%more%transfers%from%disk%• Larger%%more%wasted%space%from%parMallyGused%blocks%• Access%pa<erns:%– Sequen3al%–%successive%requests%for%successive%pages;%requires%only%one%disk%seek%– Random%–%successive%requests%for%pages%at%arbitrary%posiMons;%each%request%requires%a%seek!Copyright%©%Ben%Cartere<e% 12%OpMmizing%Page%Access%• Keeping%disk%access%fast%is%imperaMve%for%large%databases%• At%the%hardware/OS%level:%– Buffering%of%pages%stored%temporarily%in%memory%while%not%used%(also%done%by%DBMS)!– Read<ahead%of%pages%in%case%they%might%be%used%– Scheduling%of%page%access%to%minimize%disk%arm%movement%– File!organiza3on%to%place%pages%on%disk%that%corresponds%to%expected%access%pa<ern%– Nonvola3le!write!buffers%for%temporary%storage%– Log!disk%devoted%to%recording%page%updates!Copyright%©%Ben%Cartere<e% 13%3/22/11%7%File%OrganizaMon%• A%database%is%stored%on%disk%in%a%collecMon%of%files%that%span%one%or%more%pages%• Each%file%contains%a%sequence%of%records%• Pages%may%contain%one%or%more%records%• The%DBMS%can%insert%and%update%records%in%files,%and%scan%through%records%in%a%file%one%at%a%Mme%– Note%that%we%have%not%specified%that%all%records%are%from%the%same%relaMon%• How%these%files%are%structured%and%stored%on%disk%can%have%a%big%impact%on%data%access%speed%Copyright%©%Ben%Cartere<e% 14%OrganizaMon%of%Records%in%Files%• Simplest%organizaMon%scheme%is%a%heap!file%– Records%are%stored%in%random%order%across%pages%• A%sequen3al!file%stores%each%record%with%a%pointer%to%the%locaMon%of%the%next%record%on%disk%– Records%ordered%by%search!key!• Search%keys%correspond%to%[sets%of]%fields/a<ributes/columns%• They%do%not%need%to%be%keys%in%the%relaMonal%sense%• An%index%is%a%separate%file%that%indicates%the%block%that%full%records%can%be%found%in%– The%idea%of%an%index%is%to%efficiently%locate%all%records%matching%a%search%key%Copyright%©%Ben%Cartere<e% 15%3/22/11%8%Indexes%• An%index!structure!consists%of%a%series%of%index!entries!with%form%(k,%r)%– A%search%key%value%and%a%pointer%to%a%record%– Could%include%a%list%of%pointers%in%some%cases%• Index%structures%are%stored%on%disk%in%index!files%• Three%basic%kinds%of%indexes:%– Ordered!indexes%in%which%search%keys%are%stored%in%sorted%order%– Hash!indexe s%in%which%search%keys%are%distributed%uniformly%in%“buckets”%– Tree!indexes!in%which%search%keys%are%organized%in%a%tree%structure!Copyright%©%Ben%Cartere<e% 16%Ordered%Indexes%• A%clustered!index%is%an%ordered%index%to%a%file%of%records%in%the%same%order%–
View Full Document