DOC PREVIEW
UW CSE 444 - Parallel Databases

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture'22:'Parallel'Databases'Wednesday,'May'26,'2010'Dan Suciu -- 444 Spring 2010 1Overview'• Parallel'architectures'an d'op erators:'Ch.'20.1'• MapBreduce:'Ch.'20.2'• Semijoin'reducFons,'full'reducers:'Ch.'20.4'– We'covered'this'a'few'lectures'ago'Dan Suciu -- 444 Spring 2010 2Parallel'v.s.'Distributed'Databases'• Parallel'database'system:'– Improve'performance'through'parallel'implementaFon'• Distributed'database'system:'– Data'is'stored'across'several'sites,'each'site'managed'by'a'DBMS'capable'of'running'independently'Dan Suciu -- 444 Spring 2010 3Parallel'DBMS s'• Goal'– Improve'performance'by'execuFng'mulFple'operaFons'in'parallel'• Key'benefit'– Cheaper'to'scale'than'relying'on'a'single'increasingly'more'powerful'processor'• Key'challenge'– Ensure'overhead'and'contenFon'do'not'kill'performance'Dan Suciu -- 444 Spring 2010 4Performance'Metrics''for'Parallel'DBMSs'• Speedup'– More'processors''higher'speed'– Individual'queries'should'run'faster'– Should'do'more'transacFons'per'second'(TPS)'• Scaleup'– More'processors''can'process'more'data'– Batch'scaleup'• Same'query'on'larger'input'data'should'take'the'same'Fme'– TransacFon'scaleup'• NBFmes'as'many'TPS'on'NBFmes'larger'database'• But'each'transacFon'typically'remains'small'Dan Suciu -- 444 Spring 2010 5Linear'v.s.'NonBlinear'Speedup'Dan Suciu -- 444 Spring 2010 # processors (=P) Speedup 6Linear'v.s.'NonBlinear'Scaleup'# processors (=P) AND data size Batch Scaleup ×1 ×5 ×10 ×15 Dan Suciu -- 444 Spring 2010 7Challenges'to''Linear'Speedup'and'Scaleup'• Startup'cost''– Cost'of' starFng'an'operaFon'on'many'processors'• Interference'– ContenFon'for'resources'between'processors'• Skew'– Slowest'processor'becomes'the'boWleneck'Dan Suciu -- 444 Spring 2010 8Architectures'for'Parallel'Databases'• Shared'memor y'• Shared'disk'• Shared'nothing'Dan Suciu -- 444 Spring 2010 9Shared'Memory'InterconnecFon'Network'P' P' P'Global'Shared'Memory'D' D' D'Dan Suciu -- 444 Spring 2010 10Shared'Disk'InterconnecFon'Network'P' P' P'M' M' M'D' D' D'Dan Suciu -- 444 Spring 2010 11Shared'Nothing'InterconnecFon'Network'P' P' P'M' M' M'D' D' D'Dan Suciu -- 444 Spring 2010 12Shared'Nothing'• Most'scalable'architecture'– Minimizes'interference'by'minimizing'resource'sharing'– Can'use'commodity'hardware'• Also'most'difficult'to'program'and'manage'• Processor'='ser ver'='node'• P'='number'of'nodes'Dan Suciu -- 444 Spring 2010 We'will'focus'on'shared'nothing'13QuesFon'• What'exactly'can' we'parallelize'in'a'parallel'DB'?'Dan Suciu -- 444 Spring 2010 14Taxonomy'for'Parallel'Quer y'EvaluaFon'• InterBquery'parallelism'– Each'query'runs' on'o ne'processor'• InterBoperator'parallelism'– A'query'ru ns'on'mulFple'processors'– An'operator'run s'on'one'processor'• IntraBoperator'parallelism'– An'operator'run s'on'mulFple'processors'Dan Suciu -- 444 Spring 2010 We'study'only'intraBoperator'parallelism:'most'scalable'15Horizontal'Data'ParFFoning'• RelaFon'R'split'into'P'chunks'R0,'…,'RPB1,'stored'at'the'P'nodes'• Round'robin:'tuple'ti'to'chunk'(i 'mod'P)'• Hash'based'parFFoning'on'aWribute'A:'– Tuple't'to'chunk'h(t.A)'mod'P'• Range'based'parFFoning'on'aWribute'A:'– Tuple't'to'chunk'i'if'viB1'<'t.A'<'vi'Dan Suciu -- 444 Spring 2010 16Parallel'Sel ecFon'Compute'σA=v(R),'or'σv1<A<v2(R)'• On'a'convenFonal'database:'cost'='B(R)'• Q:'What'is'th e'cost'on'a'parallel'database'with'P'processors' ?'– Round'robin'– Hash'parFFoned'– Range'parFFoned'Dan Suciu -- 444 Spring 2010 17Parallel'Sel ecFon'• Q:'What'is'th e'cost'on'a'parallel'database'with'P'processors' ?'• A:'B(R)'/'P'in'all'cases'• However,'different'processors'do'the'work:'– Round'robin:'all'servers'do'th e'work'– Hash:'one'ser ver'for'σA=v(R),'all'for'σv1<A<v2(R)'– Range:'one'server'onl y'Dan Suciu -- 444 Spring 2010 18Data'ParFFoning'Revisited'What'are'the'pros'and'cons'?'• Round'robin'– Good'load'balance'but'always'needs'to'read'all'the'data'• Hash'based'parFFoning'– Good'load'balance'but'works'only'for'equality'predicates'and'full'scans'• Range'based'parFFoning'– Works'well'fo r'range'predicates'but'can'suffer'from'data'skew'Dan Suciu -- 444 Spring 2010 19Parallel'Group'By:''γA,'sum(B)(R)'• Step'1:'server'i'parFFons'chunk'Ri'using'a'hash'funcFon'h(t.A)'mod'P:'Ri0,'Ri1,'…,'Ri,PB1'''• Step'2:'server'i'sends'parFFon'Rij'to'serve'j'• Step'3:''server'j'computes'γA,'sum(B)'on''R0j,'R1j,'…,'RPB1,j''Dan Suciu -- 444 Spring 2010 20Cost'of'Parallel'Group'By'Recall' convenFonal'cost'=''3B(R)'• Cost'of'Step'1:''B(R)/P''I/O'operaFons'• Cost'of'Step'2:'(PB1)/P'B(R)'blocks'are'sent'– Network'costs'assumed'to'be'much'lower'than'I/O'• Cost'of'Step'3:'2'B(R)/P'– Why'?'– When'can'we'reduce'it'to'0'?'Total'='3B(R)'/'P''+'communi caFon'costs'Dan Suciu -- 444 Spring 2010 21Parallel'Joi n:''R'⋈A=B'S'• Step'1'– For'all'servers'in'[0,k],'ser ver'i'parFFons'chunk'Ri'using'a'hash'funcFon'h(t.A)'mod'P:'Ri0,'Ri1,'…,'Ri,PB1'''– For'all'servers'in'[k+1,P],'ser ver'j'parFFons'chunk'Sj'using'a'hash'funcFon'h(t.A)'mod'P:'Sj0,'Sj1,'…,'Rj,PB1'''• Step'2:''– Server'i'sends'parFFon'Riu'to'server'u'– Server'j'sends'parFFon'Sju'to'server'u'• Steps'3:'Server'u'computes'the'join'of'Riu'with'Sju'Dan Suciu -- 444 Spring 2010 22Cost'of'Parallel'Join'• Step'1:''(B(R)'+'B(S))/ P'• Step'2:''0'– (PB1)/P'(B(R)'+'B(S))'blocks'are'sent,'but'we'assume'network'costs'to'be'<<'disk'I/O'costs'• Step'3:'– 0'if'smaller'table'fits'i n'mai n'memory:'B(S)/p'<=M'– 2(B(R)+B(S))/P'otherwise'Dan Suciu -- 444 Spring 2010 23Parallel'Dataflow'ImplementaFon'• Use'relaFonal'operators'unchanged''• Add'special'split'and'merge'operators'– Handle'data'rouFng ,'bu ffering,'and'flow'control'• Example:'exchange'operator''– Inserted'between'consecuFve'operators'in'the'query'plan'– Can'act'as'either'a'producer'or'consumer'–


View Full Document

UW CSE 444 - Parallel Databases

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Parallel Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Databases 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?