DOC PREVIEW
UW CSE 444 - Introduction to Database Systems

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Database Systems CSE 444, Winter 2011 Lecture 24: Parallel Databases h"p://www.cs.washington.edu/educa3on/courses/cse444/11wi/7777Version7March74,720117Outlook 27Overview  Parallel7architectures7and7operators:7Ch.720.17 Map@reduce:7Ch.720.27 Semijoin7reduc3ons,7full7reducers:7Ch.720.47 We7covered7this7a7few7lectures7ago737Parallel v.s. Distributed Databases  Parallel7database7system:7 Improve7performance7through7parallel7implementa3on7 Distributed7database7system:7 Data7is7stored7across7several7sites,7each7site7managed7by7a7DBMS7capable7of7running7independently747Parallel DBMSs 57 Goal7 Improve7performance7by7execu3ng7mul3ple7opera3ons7in7parallel77 Key7benefit7 Cheaper7to7scale7than7relying7on7a7single7increasingly7more7powerful7processor7 Key7challenge7 Ensure7overhead7and7conten3on7do7not7kill7performance7Performance Metrics for Parallel DBMSs  Speedup77 More7processors7→7higher7speed7 Individual7queries7should7run7faster7 Should7do7more7transac3ons7per7second7(TPS)7 Fixed7problem7size7overall,7vary7#7of7processors7("strong7scaling”)7 Scaleup7 More7processors7→7can7process7more7data7 Fixed7problem7size7per(processor,7vary7#7of7processors(("weak7scaling”)7 Batch7scaleup7 Same7query7on7larger7input7data7should7take7the7same73me7 Transac3on7scaleup7 N@3mes7as7many7TPS7on7N@3mes7larger7database7 But7each7transac3on7typically7remains7small767Linear v.s. Non-linear Speedup 77#7processors7(=P)7Speedup7Linear v.s. Non-linear Scaleup 87#7processors7(=P)7AND7data7size77Batch7Scaleup7×17 ×57×107×157Challenges to Linear Speedup and Scaleup  Startup7cost77 Cost7of7star3ng7an7opera3on7on7many7processors7 Interference7 Conten3on7for7resources7between7processors7 Skew7 Slowest7processor7becomes7the7bo"leneck797 Shared7memory7 Shared7disk7 Shared7nothing7107Architectures for Parallel DatabasesShared Memory 117Interconnec3on7Network7P7 P7Global7Shared7Memory7D7 D7 D7P7Shared Disk 127Interconnec3on7Network7P7 P7 P7M7 M7 M7D7 D7 D7Shared Nothing 137Interconnec3on7Network7P7 P7 P7M7 M7 M7D7 D7 D7Shared Nothing  Most7scalable7architecture7 Minimizes7interference7by7minimizing7resource7sharing7 Can7use7commodity7hardware7 Also7most7difficult7to7program7and7manage7 Processor7=7server7=7node77 “Processor”7!=7core7 P7=7number7of7nodes7147We7will7focus7on7shared7nothing7Important7ques3on:7what7exactly7can7we7actually7parallelize7in7a7parallel7database?7 Inter@query7parallelism7 Each7query7runs7on7one7processor7 Inter@operator7parallelism7 A7query7runs7on7mul3ple7processors7 An7operator7runs7on7one7processor7 Intra@operator7parallelism7 An7operator7runs7on7mul3ple7processors7157Taxonomy for Parallel Query EvaluationHorizontal Data Partitioning  Rela3on7R7split7into7P7chunks7R0,7…,7RP@1,7stored7at7the7P7nodes7 37ways7to7horizontally7par33on7a7rela3on:7 Round7robin:7tuple7ti7to7chunk7(i7mod7P)7 Hash7based7par33oning7on7a"ribute7A:7 Tuple7t7to7chunk7h(t.A)7mod7P7 Range7based7par33oning7on7a"ribute7A:7 Tuple7t7to7chunk7i7if7vi@17<7t.A7<7vi7167Horizontal Data Partitioning  All7three7choices7are7just7special7cases:7 For7each7tuple,7compute7bin(=(f(t)( Different7proper3es7of7the7func3on7f7determine7hash7vs.7range7vs.7round7robin7vs.7anything7177Parallel Selection Compute7σA=v(R),7or7σv1<A<v2(R)7 On7a7conven3onal7database:7cost7=7B(R)7(cost7is7here7~73me,7and7in7worst7case)7 Q:7What7is7the7cost7on7a7parallel7database7with7P7processors7?7 Round7robin7 Hash7par33oned7 Range7par33oned7187Parallel Selection  Q:7What7is7the7cost7on7a7parallel7database7with7P7processors7?7(cost7is7here7~73me,7and7in7worst7case)7 σ:7B(R)7/7P7in7all7cases7 However,7different7processors7do7the7work:7 Round7robin:7all7servers7do7the7work7 Hash:7one7server7for7σA=v(R),7all7for7σv1<A<v2(R)7 Range:7one7server7only7197Data Partitioning Revisited 207What7are7the7pros7and7cons7?7 Round7robin7 Good7load7balance7but7always7needs7to7read7all7the7data7 Hash7based7par33oning7 Good7load7balance7but7works7only7for7equality7predicates7and7full7scans77 Range7based7par33oning7 Works7well7for7range7predicates7but7can7suffer7from7data7skew7Parallel Group By: γA, sum(B)(R)  Step71:7server7i7par33ons7chunk7Ri7using7a7hash7func3on7h(t.A)7mod7P:7Ri0,7Ri1,7…,7Ri,P@1777 Step72:7server7i7sends7par33on7Rij7to7serve7j7 Step73:77server7j7computes7γA,7sum(B)7on77R0j,7R1j,7…,7RP@1,j77217Parallel Group By: γA, sum(B)(R): Cost Recall7conven3onal7cost7=773B(R)7 Cost7of7Step71:77B(R)/P77I/O7opera3ons7 Cost7of7Step72:7(P@1)/P7B(R)7blocks7are7sent,7no7cost7 Network7costs7assumed7to7be7much7lower7than7I/O7 Cost7of7Step73:727B(R)/P7 Why7?7 When7can7we7reduce7it7to707?7Total7=73B(R)7/7P777(communica3on7costs7ignored)7227Parallel Group By: γA, sum(B)(R)  Can7we7do7be"er?7 Sum?7 Count?7 Avg?7 Max?7 Median?7237Parallel Group By: γA, sum(B)(R)  Sum(B)7=7Sum(B0)7+7Sum(B1)7+7…7+7Sum(Bn)7 Count(B)7=7Count(B0)7+7Count(B1)7+7…7+7Count(Bn)7 Max(B)7=7Max(Max(B0)7+7Max(B1)7+7…7+7Max(Bn))7 Avg(B)7=7Sum(B)7/7Count(B)7 Median(B)7=77247distribu3ve7(bad7name)7algebraic7holis3c7Source:7Naming7from:7[Gray7et7al.:7DataCube,7ICDE71996]7Parallel Join: R ⋈A=B S  Step717 For7all7servers7in7[0,k],7server7i7par33ons7chunk7Ri7using7a7hash7func3on7h(t.A)7mod7P:7Ri0,7Ri1,7…,7Ri,P@1777 For7all7servers7in7[k+1,P],7server7j7par33ons7chunk7Sj7using7a7hash7func3on7h(t.A)7mod7P:7Sj0,7Sj1,7…,7Rj,P@17777 Step72:77 Server7i7sends7par33on7Riu7to7server7u7 Server7j7sends7par33on7Sju7to7server7u77 Steps73:7Server7u7computes7the7join7of7Riu7with7Sju7257Parallel Join: R ⋈A=B S: Cost  Step71:77(B(R)7+7B(S))/P7 Step72:7707 (P@1)/P7(B(R)7+7B(S))7blocks7are7sent,7but7we7assume7network7costs7to7be7<<7disk7I/O7costs7 Step73:7 07if7smaller7table7fits7in7main7memory:7B(S)/p7<=M7 4(B(R)+B(S))/P7otherwise77


View Full Document

UW CSE 444 - Introduction to Database Systems

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Introduction to Database Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Database Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Database Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?