DOC PREVIEW
MIT 6 893 - Study Guide

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Radix Sort and Hash-Join for Vector ComputersPowerPoint PresentationSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Radix Sort and Hash-Join for Vector ComputersRipal Nathuji6.893: Advanced VLSI Computer Architecture 10/12/006.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersWhat is Radix Sorting?•Sort by least significant digit instead of most significant digit•Better than sorting by most significant digit since it saveshaving to keep track of multiple sort jobs6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersProperties of Radix Sorting Algorithms•Treat keys as multidigit numbers, where each digit is an integer from <0…(m-1)> where m is the radix•The radix m is variable, and chosen to minimize runningtimeExample: 32-bit key as 4 digit number m is equal to the number of distinct digits so m =•Performance: Runs in O(n) Other comparison based sorts such as quicksort run in O(n log n) time ***Not advantageous for machines w/cache2562284/326.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersSerial Radix Sort•N = # of keys to sortK = array of keysD = array of r-bit digitsValues of Bucket[] after each phase:•Histogram-Keys:Bucket[i] contains the numberof digits having value i•Scan-Buckets:Bucket[i] contains the numberof digits with values < i•Rank-And-Permute:Each key of value i is placedin its final location by gettingthe offset from Bucket[i] andincrementing the bucket6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersHow Can We Parallelize the Serial Radix Sort?Problem:•Loop dependencies in all three phases Solution:•Use a separate set of buckets for each processorEach processor takes care of N/P keys where P is number of processors.This resolves the data dependencies, but creates a newproblem with Scan-Buckets: How can we sort the digits globally instead of just within the scope of each individual processor.6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersFully Parallelizing Scan-BucketsInstead of having each processor simply scan its own buckets, after doing Scan-Buckets we would like the value of Buckets[i,j] to be:The sum can be calculated by flattening the matrixand executing a Scan-Buckets on the flattened matrix6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersTechniques Used In the Data-Parallel Radix Sort•Virtual Processors•Loop Rakings•Processor Memory Layout6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersVirtual Processors•Vector multiprocessors offer two levels of parallelism:multiprocessor facilities and vector facilities. •To take advantage of this, view each element of a vector register as a virtual processor. So a machine with register length L and P processors has L x P virtual processors. •Now the total number of keys can be divided into L x P sets.6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersLoop Raking•Usually operations on arrays are vectorized using strip mining. In strip mining an element of a vector registerhandles every Lth-element•Unfortunately using strip mining each virtual processorwill have to handle a strided set of keys instead of acontiguous block as required by the parallel algorithm •Using a technique called loop raking, each virtual processorhandles a contiguous block of keys. Loop raking uses aconstant stride of N/L to access elements6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersProcessor Memory Layout•A memory location X is contained in bank (X mod B)where B is the number of banks•Simultaneous accesses to the same bank result in delayThere are two possible ways to lay out the buckets in memory:•Place the buckets for each virtual processor in contiguousmemory locations:This approach could cause multiple virtual processors toaccess the same bank simultaneously.•Place the buckets so that the buckets used by each virtualprocessor are in separate memory banks (i.e. Place all the buckets of a certain value from all virtual processors in contiguous memory locations):This approach keeps multiple virtual processors fromaccessing the same bank simultaneously6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersProcessor Memory Layout: Example0x00000x00010x00020x00030x00040x00050x00060x0007Bank 0Bank 1Bank 2Bank 36.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersImplementation of Radix Sort on 8-processor CRAY Y-MPFour Routines:1. Extract Digit: •Extracts current digit from keys and computes an index into the array of buckets•Uses loop raking•Time for routine: TExtract-Digit=1.2.N/P2. Histogram Keys:•Uses loop raking•Time for routine: 2 steps TClear-Buckets=1.1. 2r.L THistogram-Keys=2.4.N/P3. Scan Buckets:•Uses loop raking•Time for routing: TScan-Buckets=2.5.2r.L.P/P= 2.5.2r.L6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersImplementation of Radix Sort on 8-processor CRAY Y-MP4. Permute Keys:•Uses loop raking•Time to permute a vector ranges from 1.3 cycles/elementto 5.5 cycles/element•Time for routine: TRank-And-Permute=3.5.N/p6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersPerformance AnalysisTotal sorting times:•TCounting-Sort=L.2r.Tbucket+N/P.Tkey•TRadix-Sort=b/r(L.2r.Tbucket+N/P.Tkey)Choice of Radix:•The optimal value for r increases with the number of elements per processor•Choosing r below the optimal value puts too much work on keys, choosing r above the optimal value putstoo much work on buckets•Value for r and approximation of total sort time:6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersChoosing a Value for r6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersPredicted vs. Measured Performance6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersOther Factors of Performance•Vector Length•Multiple Processors•Space6.893 Ripal NathujiRadix Sort and Hash-Join for Vector ComputersVarying Vector Length•Decreasing the vector length decreases the number of virtualprocessors•Advantage: decreases the time for cleaning and scanningbuckets •Disadvantage: increases the cost per element for performing the histogram, Tkey•Conclusion: Reducing the


View Full Document

MIT 6 893 - Study Guide

Documents in this Course
Toolkits

Toolkits

16 pages

Cricket

Cricket

29 pages

Quiz 1

Quiz 1

8 pages

Security

Security

28 pages

Load more
Download Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?