DOC PREVIEW
Berkeley COMPSCI C267 - Parallel Sorting

This preview shows page 1-2-3-4-29-30-31-32-59-60-61-62 out of 62 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallel SortingSome Sorting algorithmsLogP – model to predict, understand performanceBottom Line on CM-5 using Split-C (Preview)Bitonic Sort (1/2)Bitonic Sort (2/2)Bitonic Sort for n=16 – all dependencies shownBitonic Sort: time per keySample SortSample Sort: TimesSample Sort Timing BreakdownSequential Radix Sort: Counting SortRadix Sort: Separate Key Into PartsHisto-radix sortHisto-Radix Sort (again)Radix Sort: TimesRadix Sort: Timing BreakdownLocal Sort Performance on CM-5Radix Sort: Timing dependence on Key distributionBottom Line on CM-5 using Split-CSorting ConclusionsExtra slidesRadix: Stream Broadcast ProblemWhat’s the right communication mechanism?ComparisonOutlineMeasuring PerformanceAmdahl’s Law (review)Amdahl’s Law (for 1024 processors)Scaled SpeedupScaled Speedup (background)Little’s LawSlide 33In Performance Analysis: Use more DataExample: Immersed Boundary SimulationPerformance AnalysisBuilding a Performance ModelA Performance ModelModel Success and FailureOSKI SPMV: What We ExpectWhat We Get (The Need for Search)Using Multiple ModelsMultiple ModelsSlide 44Extended Example Using Performance Modeling (LogP) To Explain Data Application to SortingDeriving the LogP ModelLogPUsing the LogP ModelUse of the LogP Model (cont)LogP "philosophy"Typical SortCosts Split-C (UPC predecessor) OperationsLogP modelLogP Parameters TodayLocal Computation Parameters - EmpiricalOdd-Even Merge - classic parallel sortWhere’s the Parallelism?Mapping to a Butterfly (or Hypercube)Bitonic Sort with N/P per nodeBitonic: BreakdownBitonic: Effect of Key DistributionsSlide 6204/27/2009 CS267 Lecture 241Parallel SortingJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr09Some Sorting algorithms•Choice of algorithm depends on•Is the data all in memory, or on disk/tape?•Do we compare operands, or just use their values?•Sequential or parallel? Shared or distributed memory? SIMD or not?•We will consider all data in memory and parallel:•Bitonic Sort•Naturally parallel, suitable for SIMD architectures•Sample Sort•Good generalization of Quick Sort to parallel case•Radix Sort•Data measured on CM-5•Written in Split-C (precursor of UPC)•www.cs.berkeley.edu/~culler/papers/sort.ps04/27/2009 CS267 Lecture 242LogP – model to predict, understand performanceInterconnection NetworkMPMPMP° ° °P ( processors )Limited Volume( L/ g to or from a proc)o (overhead)L (latency)og (gap)•Latency in sending a (small) message between modules•overhead felt by the processor on sending or receiving msg•gap between successive sends or receives (1/BW)•ProcessorsBottom Line on CM-5 using Split-C (Preview)N/Pus/key0.0020.0040.0060.0080.00100.00120.00140.001638432768655361310722621445242881048576Bitonic 1024Bitonic 32Column 1024Column 32Radix 1024Radix 32Sample 1024Sample 32•Good fit between predicted (using LogP model) and measured (10%)•No single algorithm always best•scaling by processor, input size, sensitivity•All are global / local hybrids•the local part is hard to implement and modelAlgorithm, #Procs04/27/2009 CS267 Lecture 24404/27/2009 CS267 Lecture 245Bitonic Sort (1/2)•A bitonic sequence is one that is:1. Monotonically increasing and then monotonically decreasing2. Or can be circularly shifted to satisfy 1•A half-cleaner takes a bitonic sequence and produces1. First half is smaller than smallest element in 2nd2. Both halves are bitonic•Apply recursively to each half to complete sorting•Where do we get a bitonic sequence to start with?0011100000001011000001110000101104/27/2009 CS267 Lecture 246Bitonic Sort (2/2)•A bitonic sequence is one that is:1. Monotonically increasing and then monotonically decreasing2. Or can be circularly shifted to satisfy •Any sequence of length 2 is bitonic•So we can sort it using bitonic sorter on last slide•Sort all length-2 sequences in alternating increasing/decreasing order•Two such length-2 sequences form a length-4 bitonic sequence•Recursively •Sort length-2k sequences in alternating increasing/decreasing order• Two such length-2k sequences form a length-2k+1 bitonic sequence (that can be sorted)Bitonic Sort for n=16 – all dependencies shownBlock Layoutlg N/p stages are local sort – Use best local sortremaining stages involve Block-to-cyclic, local merges (i - lg N/P cols)cyclic-to-block, local merges ( lg N/p cols within stage)Similar pattern as FFT: similar optimizations possibleBitonic Sort: time per keyPredicted using LogP ModelN/Pus/key010203040506070801638432768655361310722621445242881048576MeasuredN/Pus/key0102030405060708016384327686553613107226214452428810485765122561286432#Procs04/27/2009 CS267 Lecture 248Sample Sort1. compute P-1 values of keys that would split the input into roughly equal pieces.– take S~64 samples per processor– sort P·S keys (on processor 0)– let keys S, 2·S, . . . (P-1)·S be “Splitters”– broadcast Splitters2. Distribute keys based on Splitters Splitter(i-1) < keys  Splitter(i) all sent to proc i3. Local sort of keys on each proc[4.] possibly reshift, so each proc has N/p keysIf samples represent total population, then Splitters should dividepopulation into P roughly equal pieces 04/27/2009 CS267 Lecture 249Sample Sort: TimesPredictedN/Pus/key0510152025301638432768655361310722621445242881048576MeasuredN/Pus/key05101520253016384327686553613107226214452428810485765122561286432# Processors04/27/2009 CS267 Lecture 2410Sample Sort Timing BreakdownN/Pus/key0510152025301638432768655361310722621445242881048576SplitSortDistSplit-mSort-mDist-mPredicted and Measured (-m) times04/27/2009 CS267 Lecture 2411001/14/19CS267 Lecture 2412Sequential Radix Sort: Counting Sort•Idea: build a histogram of the keys and compute position in answer array for each elementA = [3, 5, 4, 1, 3, 4, 1, 4]•Make temp array B, and write values into position B = [1, 1, 3, 3, 4, 4, 4, 5]•Cost = O(#keys + size of histogram)•What if histogram too large (eg all 32-bit ints? All words?)012340 1 2 3 4 5001/14/19CS267 Lecture 2413Radix Sort: Separate Key Into Parts•Divide keys into parts, e.g., by digits (radix)•Using counting sort on these each radix:•Start with least-significantsat run sat pinsaw pin saw runtip tip tip satrun sat pin sawpin saw run tipsort on 3rd charactersort on 2nd charactersort on 1st charactersat run sat pinsaw pin saw runtip tip tip satrun sat pin sawpin saw run tipsat run sat pinsaw pin saw runtip tip pin satrun sat tip sawpin saw run tipsat run


View Full Document

Berkeley COMPSCI C267 - Parallel Sorting

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Parallel Sorting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Sorting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Sorting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?