UW CSE 444 - Parallel Databases - D342563

Home> Schools> University of Washington> Computer Science and Engineering (CSE) > CSE 444> Parallel Databases

DOC PREVIEW

UW CSE 444 - Parallel Databases

School name University of Washington

Course Cse 444- Database Internals

Pages 36

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture'22:'Parallel'Databases'Wednesday,'May'26,'2010'Dan Suciu -- 444 Spring 2010 1Overview'• Parallel'architectures'an d'op erators:'Ch.'20.1'• MapBreduce:'Ch.'20.2'• Semijoin'reducFons,'full'reducers:'Ch.'20.4'– We'covered'this'a'few'lectures'ago'Dan Suciu -- 444 Spring 2010 2Parallel'v.s.'Distributed'Databases'• Parallel'database'system:'– Improve'performance'through'parallel'implementaFon'• Distributed'database'system:'– Data'is'stored'across'several'sites,'each'site'managed'by'a'DBMS'capable'of'running'independently'Dan Suciu -- 444 Spring 2010 3Parallel'DBMS s'• Goal'– Improve'performance'by'execuFng'mulFple'operaFons'in'parallel'• Key'beneﬁt'– Cheaper'to'scale'than'relying'on'a'single'increasingly'more'powerful'processor'• Key'challenge'– Ensure'overhead'and'contenFon'do'not'kill'performance'Dan Suciu -- 444 Spring 2010 4Performance'Metrics''for'Parallel'DBMSs'• Speedup'– More'processors''higher'speed'– Individual'queries'should'run'faster'– Should'do'more'transacFons'per'second'(TPS)'• Scaleup'– More'processors''can'process'more'data'– Batch'scaleup'• Same'query'on'larger'input'data'should'take'the'same'Fme'– TransacFon'scaleup'• NBFmes'as'many'TPS'on'NBFmes'larger'database'• But'each'transacFon'typically'remains'small'Dan Suciu -- 444 Spring 2010 5Linear'v.s.'NonBlinear'Speedup'Dan Suciu -- 444 Spring 2010 # processors (=P) Speedup 6Linear'v.s.'NonBlinear'Scaleup'# processors (=P) AND data size Batch Scaleup ×1 ×5 ×10 ×15 Dan Suciu -- 444 Spring 2010 7Challenges'to''Linear'Speedup'and'Scaleup'• Startup'cost''– Cost'of' starFng'an'operaFon'on'many'processors'• Interference'– ContenFon'for'resources'between'processors'• Skew'– Slowest'processor'becomes'the'boWleneck'Dan Suciu -- 444 Spring 2010 8Architectures'for'Parallel'Databases'• Shared'memor y'• Shared'disk'• Shared'nothing'Dan Suciu -- 444 Spring 2010 9Shared'Memory'InterconnecFon'Network'P' P' P'Global'Shared'Memory'D' D' D'Dan Suciu -- 444 Spring 2010 10Shared'Disk'InterconnecFon'Network'P' P' P'M' M' M'D' D' D'Dan Suciu -- 444 Spring 2010 11Shared'Nothing'InterconnecFon'Network'P' P' P'M' M' M'D' D' D'Dan Suciu -- 444 Spring 2010 12Shared'Nothing'• Most'scalable'architecture'– Minimizes'interference'by'minimizing'resource'sharing'– Can'use'commodity'hardware'• Also'most'diﬃcult'to'program'and'manage'• Processor'='ser ver'='node'• P'='number'of'nodes'Dan Suciu -- 444 Spring 2010 We'will'focus'on'shared'nothing'13QuesFon'• What'exactly'can' we'parallelize'in'a'parallel'DB'?'Dan Suciu -- 444 Spring 2010 14Taxonomy'for'Parallel'Quer y'EvaluaFon'• InterBquery'parallelism'– Each'query'runs' on'o ne'processor'• InterBoperator'parallelism'– A'query'ru ns'on'mulFple'processors'– An'operator'run s'on'one'processor'• IntraBoperator'parallelism'– An'operator'run s'on'mulFple'processors'Dan Suciu -- 444 Spring 2010 We'study'only'intraBoperator'parallelism:'most'scalable'15Horizontal'Data'ParFFoning'• RelaFon'R'split'into'P'chunks'R0,'…,'RPB1,'stored'at'the'P'nodes'• Round'robin:'tuple'ti'to'chunk'(i 'mod'P)'• Hash'based'parFFoning'on'aWribute'A:'– Tuple't'to'chunk'h(t.A)'mod'P'• Range'based'parFFoning'on'aWribute'A:'– Tuple't'to'chunk'i'if'viB1'<'t.A'<'vi'Dan Suciu -- 444 Spring 2010 16Parallel'Sel ecFon'Compute'σA=v(R),'or'σv1<A<v2(R)'• On'a'convenFonal'database:'cost'='B(R)'• Q:'What'is'th e'cost'on'a'parallel'database'with'P'processors' ?'– Round'robin'– Hash'parFFoned'– Range'parFFoned'Dan Suciu -- 444 Spring 2010 17Parallel'Sel ecFon'• Q:'What'is'th e'cost'on'a'parallel'database'with'P'processors' ?'• A:'B(R)'/'P'in'all'cases'• However,'diﬀerent'processors'do'the'work:'– Round'robin:'all'servers'do'th e'work'– Hash:'one'ser ver'for'σA=v(R),'all'for'σv1<A<v2(R)'– Range:'one'server'onl y'Dan Suciu -- 444 Spring 2010 18Data'ParFFoning'Revisited'What'are'the'pros'and'cons'?'• Round'robin'– Good'load'balance'but'always'needs'to'read'all'the'data'• Hash'based'parFFoning'– Good'load'balance'but'works'only'for'equality'predicates'and'full'scans'• Range'based'parFFoning'– Works'well'fo r'range'predicates'but'can'suﬀer'from'data'skew'Dan Suciu -- 444 Spring 2010 19Parallel'Group'By:''γA,'sum(B)(R)'• Step'1:'server'i'parFFons'chunk'Ri'using'a'hash'funcFon'h(t.A)'mod'P:'Ri0,'Ri1,'…,'Ri,PB1'''• Step'2:'server'i'sends'parFFon'Rij'to'serve'j'• Step'3:''server'j'computes'γA,'sum(B)'on''R0j,'R1j,'…,'RPB1,j''Dan Suciu -- 444 Spring 2010 20Cost'of'Parallel'Group'By'Recall' convenFonal'cost'=''3B(R)'• Cost'of'Step'1:''B(R)/P''I/O'operaFons'• Cost'of'Step'2:'(PB1)/P'B(R)'blocks'are'sent'– Network'costs'assumed'to'be'much'lower'than'I/O'• Cost'of'Step'3:'2'B(R)/P'– Why'?'– When'can'we'reduce'it'to'0'?'Total'='3B(R)'/'P''+'communi caFon'costs'Dan Suciu -- 444 Spring 2010 21Parallel'Joi n:''R'⋈A=B'S'• Step'1'– For'all'servers'in'[0,k],'ser ver'i'parFFons'chunk'Ri'using'a'hash'funcFon'h(t.A)'mod'P:'Ri0,'Ri1,'…,'Ri,PB1'''– For'all'servers'in'[k+1,P],'ser ver'j'parFFons'chunk'Sj'using'a'hash'funcFon'h(t.A)'mod'P:'Sj0,'Sj1,'…,'Rj,PB1'''• Step'2:''– Server'i'sends'parFFon'Riu'to'server'u'– Server'j'sends'parFFon'Sju'to'server'u'• Steps'3:'Server'u'computes'the'join'of'Riu'with'Sju'Dan Suciu -- 444 Spring 2010 22Cost'of'Parallel'Join'• Step'1:''(B(R)'+'B(S))/ P'• Step'2:''0'– (PB1)/P'(B(R)'+'B(S))'blocks'are'sent,'but'we'assume'network'costs'to'be'<<'disk'I/O'costs'• Step'3:'– 0'if'smaller'table'ﬁts'i n'mai n'memory:'B(S)/p'<=M'– 2(B(R)+B(S))/P'otherwise'Dan Suciu -- 444 Spring 2010 23Parallel'Dataﬂow'ImplementaFon'• Use'relaFonal'operators'unchanged''• Add'special'split'and'merge'operators'– Handle'data'rouFng ,'bu ﬀering,'and'ﬂow'control'• Example:'exchange'operator''– Inserted'between'consecuFve'operators'in'the'query'plan'– Can'act'as'either'a'producer'or'consumer'–

View Full Document