Unformatted text preview:

3/27/11&1&Hash+Based&Indexing&• Records&in&a&file&are&grouped&into&buckets(• Each&bucket&consists&of&a&primary(page&and&zero&or&more&overflow&pages(• A&hash(func7on&h&takes&a&search+key&value&k&and&returns&the&address&of&a&bucket&• To&find&records&matching&a&search&key&value&k,&calculate&h(k),&then&look&through&bucketed&pages&sequenKally&to&find&matching&data&entries&Copyright&©&Ben&CartereNe& 21&Tree+Based&Indexing&• Search&key&values&are&organized&in&a&tree(• The&highest&level&is&the&root(• The&lowest&level&(the&leaf(level)&contains&data&entries&• Each&node&in&the&tree&is&a&page&on&disk&– Retrieving&nodes&involves&disk&I/O&– And&therefore&the&number&of&disk&reads&in&a&search&is&equal&to&the&length&of&the&path&from&root&to&leaf&• A&B+(tree&is&an&index&structure&that&ensures&all&paths&from&root&to&leaf&are&the&same&length&Copyright&©&Ben&CartereNe& 22&3/27/11&2&Comparing&File&OrganizaKons&• We&are&interested&in&the&total%cost&of&accessing&and&modifying&data&with&a&given&file&organizaKon&scheme&• Specifically,&what&is&the&cost&of:&– Scan&(fetch&all&records&in&a&file)&– Search(with(equality(selec7on&(fetch&records&that&match&an&equality&condiKon)&– Search(with(range(selec7on&(fetch&records&that&match&a&range&condiKon)&– Insert&(insert&a&new&record&into&a&file)&– Delete&(delete&a&record&from&a&file)(Copyright&©&Ben&CartereNe& 23&Cost&Model&• To&esKmate&cost,&we&need&a&model&of&total&&execuKon&Kme&• Our&model&is&a&simplified&one:&– B&is&the&number&of&pages&(assuming&100%&capacity)&– R&is&the&number&of&records&per&page&(100%&capacity)&– D&is&the&average&Kme&to&read/write&a&page&from/to&disk&– C&is&average&Kme&to&process&a&record&• Consider&the&average%case%• This&is&good&enough&to&indicate&trends&Copyright&©&Ben&CartereNe& 24&3/27/11&3&Heap&Files&• Heap&file&=&randomly&ordered&records&• Costs:&– Scan:&&B(D+RC)&– Search&with&equality&selecKon:&&0.5B(D+RC)&&• (if&equali ty&field&i s&a&key;&same&as&scan&if&not)&– Search&with&range&selecKon:&&B(D+RC)&– Insert:&&2D+C&– Delete:&&search&cost&+&C+D&Copyright&©&Ben&CartereNe& 25&Sorted&Files&• Records&stored&directly,&sorted&on&one&or&more&fields&• Costs:&– Scan:&&B(D&+&RC)&– Search&with&equality&selecKon:&&D&log2&B&+&C&log2&R&• (assuming&selecKon&field&is&the&sort&field)&– Search&with&range&selecKon:&&D&log2&B&+&C&log2&R&– Insert:&&search&+&B(D&+&RC)&– Delete:&&search&+&B(D&+&RC)&Copyright&©&Ben&CartereNe& 26&3/27/11&4&Clustered&File/Tree&Index&• Sorted&file&with&B++tree&index&on&sort&field&– F&=&fanout,&max&number&of&pointers&in&a&node&• AssumpKon:&&pages&at&67%&capacity&• Costs:&– Space&overhead:&&0.5B&– Scan:&&1.5B(D&+&RC)&– Search&with&equality&selecKon:&&&• D&logF&(1.5B)&+&C&log2&R&– Search&with&range&selecKon:&&same&– Insert:&&search&+&D&– Delete:&&same&Copyright&©&Ben&CartereNe& 27&Heap&File/Unclustered&Tree&Index&• Unsorted&file&with&B++tree&on&search&key&• AssumpKon:&&index&data&entry&=&10%&record&size&• Costs:&– Space&overhead:&&0.15B(F&–&1)&– Scan:&&BD(R&+&0.15)&+&2BRC&– Search&with&equality&selecKon:&&&• D&logF(0.15B)&+&C&log2(6.7R)&+&D&– Search&with&range&selecKon:&&“same”&(plus&another&D&for&each&matching&record)&– Insert:&&3D&+&C&+&D&logF(0.15B)&+&C&log2(6.7R)&• (heap&insert&+&tree&search&+&leaf&node&page&write)&– Delete:&&heap&delete&+&tree&search&+&leaf&node&page&write&Copyright&©&Ben&CartereNe& 28&3/27/11&5&Heap&File/Unclustered&Hash&Index&• Unsorted&file&with&hash&funcKon&on&search&key&– H&=&Kme&to&compute&hash&funcKon&• AssumpKon:&&index&pages&are&80%&full&• Costs:&– Space&overhead:&&0.125B&– Scan:&&BD(R&+&0.125)&+&2BRC&– Search&with&equality&selecKon:&&H&+&2D&+&4RC&– Search&with&range&selecKon:&&B(D&+&RC)&– Insert:&&heap&insert&+&H&+&2D&+&C&– Delete:&&search&+&2D&Copyright&©&Ben&CartereNe& 29&Comparison&of&I/O&Costs&File(type(Scan((Equal(search(Range(search(Insert((Delete((Heap(BD&0.5BD&BD&2D&Search&+&D&Sorted(BD&Dlog2&B&Dlog2&B&+&m&Search&+&BD&Search&+&BD&Clustered(1.5BD&DlogF&1.5B&DlogF&1.5B&+&m&Search&+&D&Search&+&D&Tree(index(BD(R&+&0.15)&D(1&+&logF&0.15B)&D(logF&0.15B&+&m)&D(3&+&logF&0.15B)&Search&+&2D&Hash(index(BD(R&+&0.125)&2D&BD&4D&Search&+&2D&Copyright&©&Ben&CartereNe& 30&• &&Best&for&scanning:&&sorted&• &&Best&for&equality&search:&&hash&index&• &&Best&for&range&search:&&clustered&• &&Best&for&inserKons:&&heap&• &&Best&for&deleKons:&&heap&• &&Worst&for&scanning:&&any&index&• &&Worst&for&equality&search:&&tree&index&• &&Worst&for&range&search:&&hash&index&• &&Worst&for&inserKons:&&sorted&• &&Worst&for&deleKons:&&sorted&3/27/11&6&CreaKng&Indexes&in&MySQL&CREATE&INDEX&indexName&&ON&RelaKon&(A1,&…)&&USING&[HASH|BTREE]&• Creates&an&index&of&the&specified&type&on&the&specified&column(s)&&• MySQL&automaKcally&maintains&it&with&inserts/updates/deletes&• MySQL&creates&clustered&indexes&on&primary&keys&by&default&– If&no&primary&key,&clustered&index&on&first&UNIQUE&NOT&NULL&field&– If&no&UNIQUE&NOT&NULL&field,&clustered&index&on&internal&row&ID&Copyright&©&Ben&CartereNe& 31&Indexes&and&Performance&Tuning&• A&workload&is&a&mix&of&queries&and&update&operaKons&• We’d&like&to&create&indexes&that&will&support&the&expected&workload&efficiently&• For&each&query&in&the&workload:&– What&relaKons&does&it&access?&– What&aNributes&are&retrieved?&– Which&aNributes&are&involved&in&select/join&clauses?&– How&selecKve&are&those&condiKons?&• For&each&update&in&the&workload:&– What&type&of&update&(INSERT/UPDATE/DELETE)?&– What&aNributes&are&affected?&Copyright&©&Ben&CartereNe& 32&3/27/11&7&Choosing&Indexes&• What&indexes&shoul d&we&create?&– Which&relaKons&need&indexes?&– What&fields&should&be&used&as&search&keys?&– Do&we&need&more&than&one&index&for&a&relaKon?&• What&type&of&index?&– Clustered?&&Hash?&&B++tree?&Copyright&©&Ben&CartereNe& 33&Index&SelecKon&Guidelines&• ANributes&in&WHERE&clauses&are&good&candidates&– Many&exact&matches&&hash&– Many&range&queries&&tree&• If&WHERE&clauses&ouen&contain&several&condiKons,&consider&mulK+aNr&index&–


View Full Document

UD CISC 637 - Hash-­Based Indexing

Download Hash-­Based Indexing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hash-­Based Indexing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash-­Based Indexing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?