5/9/10&1&Distributed&Databases&CISC437/637,&Lecture&Ben&Cartere?e&1&Copyright&©&Ben&Cartere?e&Distributed&Database&Systems&• A&distributed)database&consists&of&loosely&coupled&sites&with&no&shared&physical&components&• A&DBMS&runs&at&each&site,&independent&of&the&others&– Contrast&with&a¶llel&DBMS:&&a&single&parent&process&that&can&use&many&CPUs&• TransacPons&may&access&data&at&one&or&more&sites&2&Copyright&©&Ben&Cartere?e&5/9/10&2&Homogeneous&vs&Heterogeneous&• Distributed&databases&can&be&either&homogeneous&or&heterogeneous)• Homogeneous:&– Same&DBMS&soSware&running&at&each&site&– Sites&know&about&each&other&and&agree&to&cooperate&in&processing&requests&• Sites&give&up&control&over&soSware&and&schema&– Appears&as&a&single&system&to&a&user&• Heterogeneous:&– Different&DBMS&soSware&at&each&site&– Possibly&incompaPble&schema&across&sites&– Sites&may¬&be&aware&of&each&other&Copyright&©&Ben&Cartere?e& 3&Classical&View&of&Distributed&DBs&• Two&properPes&were&considered&desirable:&– Distributed)data)independence&–&users&should¬&have&to&know&where&the&data&is&– Distributed)transac3on)atomicity&–&users&should&be&able&to&submit&transacPons&that&access&data&across&sites&just&like&purely&local&transacPons&• In&pracPce,&these&properPes&are¬&always&easy&to&support,&and&may¬&even&be&desirable&• Distributed&DBs&is&an&evolving&topic&– The&“right”&way&depends&heavily&on&what&the&uses&are&Copyright&©&Ben&Cartere?e& 4&5/9/10&3&Distributed&DB&Architectures&• Client7server&– DBMS&servers&manage&data&and&execute&transacPons&independently&– Clients&submit&a&query&to&a&single&server&• Collabora3ng)server&– DBMS&servers&cooperate&in&execuPng&transacPons&spanning&servers&– One&server&receives&a&query;&determines&best&distributed&processing&plan&across&server&• Middleware)systems)– One&server&is&capable&of&distribuPng&queries;&the&rest&are&completely&local&Copyright&©&Ben&Cartere?e& 5&DistribuPng&Data&• RelaPons&will&be&stored&on&disks&across&sites&• RelaPons&can&be&fragmented&(parPPoned)&in&different&ways,&but¬&rebparPPoned&during&querying&– Horizontal)fragmenta3on:&&full&rows&stored&at&different&sites&– Ver3cal)fragmenta3on:&&full&columns&stored&at&different&sites&• RelaPons&can&be&replicated&at&more&than&one&site&– For&increased&availability&of&data&and&faster&query&evaluaPon&– Synchronous&vs&asynchronous&determines&how¤t&the&replicants&are)Copyright&©&Ben&Cartere?e& 6&5/9/10&4&Distributed&DB&Catalog&• Processing&distributed&queries&requires&keeping&track&of&how&data&is&distributed&– Replicas&of&fragments&must&be&uniquely&idenPfiable&across&sites&• A&global&naming&scheme&can&work&in&homogeneous&seengs,&but¬&if&local&control&is&required&• Instead,&use&a&twobfield&naming&scheme:&– <local&name,&origin&site,&[replica&id]>&stored&at&origin&site&– Global)replica)name)– To&find&a&relaPon,&look&into&its&origin&site&catalog&Copyright&©&Ben&Cartere?e& 7&Simple&Distributed&Queries&• SelecPng&from&a&single&relaPon&• Consider&three&cases:&– Horizontally&fragmented:&&querying&may&require&formulaPng&two&subqueries&then&unioning&results&– VerPcally&fragmented:&the&enPre&relaPon&must&be&reconstructed&before&the&query&can&be&evaluated&– Full&replicaPon:&the&query&can&be&executed&at&any&one&site&• Any&fragmentaPon&requires&sending&data&over&a&network,&which&adds&to&processing&Pme&Copyright&©&Ben&Cartere?e& 8&5/9/10&5&Distributed&Joins&• Distributed&joins&could&be&very&expensive&• Two&simple&strategies:&– Fetch)as)needed&&• Nestedbloop&join,&transmieng&records&from&one&site&to&another&as&necessary&• Network&transmission&costs&dominate,&even&if&indexes&can&be&leveraged&– Ship)to)one)site&• Send&all&th e&relaPons&to&a&single&site,&then&compute&join&Copyright&©&Ben&Cartere?e& 9&Semijoin&• User@site&1&requests&join&of&R1@S1,&R2@S2&• Three&steps:&1. At&S1,&project&R1&to&common&columns;&send&to&S2&2. At&S2,&compute&natural&join&of&R2&and&projecPon&of&R1&to&produce&the&reduc3on&of&R2&w.r.t.&R1&3. Send&the&reducPon&of&R2&back&to&S1&to&complete&join&• Trades&cost&of&compuPng&and&sending&projecPons/reducPons&for&cost&of&sending&full&relaPon&Copyright&©&Ben&Cartere?e& 10&5/9/10&6&Bloomjoin&• Similar&to&semijoin,&but&use&bit)vectors&of&length&k&• Steps:&– At&S1,&hash&common&column&values&into&range&[0,&kb1];&send&resulPng&bit&vector&to&S2&&– At&S2,&hash&each&record’s&join&column&values&into&same⦥&if&bit&j&in&received&vector&is&0,&discard&record&• This&is&the&reducPon&of&S2&w.r.t.&S1&– Send&remaining&records&back&to&S1&to&be&joined&• Bit&vectors&can&be&very&small,&but&collisions&are&possible&Copyright&©&Ben&Cartere?e& 11&Distributed&Query&OpPmizaPon&• Again&want&to&consider&all&plans&and&pick&the&cheapest&• Three&new&consideraPons:&– CommunicaPon&costs&– Local&site&autonomy&– New&distributed&join&algorithms&• Query&site&constructs&a&global&plan&along&with&suggesPons&for&local&plans&at&each&site&– Individual&sites&are&free&to&change&their&plans&Copyright&©&Ben&Cartere?e& 12&5/9/10&7&Distributed&Updates&• ReplicaPon&entails&redundancy&– And&therefore&possibility&of&update&anomalies&• Two&types&of&replicaPon:&– Synchronous,&in&which&all&copies&of&the&data&are&updated&before&the&transacPon&commits&– Asynchronous,&in&which&copies&of&data&are&periodically&updated)• Asynchronous&is&more&efficient,&but&requires&users&to&know&about&the&data&distribuPon&Copyright&©&Ben&Cartere?e& 13&Synchronous&ReplicaPon&• Two&algorithms:&– Vo3ng&–&a&modifying&transacPon&does¬&update&every©,&just&a&majority&• When&a&transacPon&executes&a&read,&it&reads&enough&replicaPons&to&guarantee&it&has&the&most&recent&data&• Expensive&due&to&many&reads&– Read7any)write7all&–&modifying&transacPons&update&every©&• A&reading&transacPon&can&read&any©&• May&be&feasible&if&updates&are&rare&• When&locking&is&considered,&things&get&messy&Copyright&©&Ben&Cartere?e& 14&5/9/10&8&Asynchrono us&ReplicaPon&• Two&approaches:&– In&peer7to7peer,&one&or&more&copies&are&designated&as&masters&that&can&be&updated&•
View Full Document