DOC PREVIEW
UT CS 395T - Hoard- A Scalable Memory Allocator for Multithreaded Applications

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hoard: A Scalable Memory Allocator for Multithreaded ApplicationsOutlineMotivationDesired allocator attributes on a multiprocessor systemThe problem of false sharingThe problem of fragmentationExample: Pure Private Heaps AllocatorHow to Break Pure Private Heaps: FragmentationExample II: Private Heaps with OwnershipHow to Break Private Heaps with Ownership:FragmentationExisting approachesUniprocessor Allocators on MultiprocessorsExisting Multiprocessor AllocatorsHoard as the solutionHoard OverviewSuperblock managementHoard pseudo-codeHeap contentionHeap contention (2)Experimental EvaluationBenchmarksSpeedScalability - threadtestScalability – LarsonScalability - BEMengineFalse sharing behaviorFragmentation resultsHoard ConclusionsDiscussion PointsHoard: A Scalable Memory Allocator for Multithreaded ApplicationsEmery Berger, Kathryn McKinley, Robert Blumofe, Paul WilsonPresented by Ivan Jibaja(Some slides adapted from Emery Berger’s presentation)1Outline•Motivation•Problems in allocator design–False sharing–Fragmentation•Existing approaches•Hoard design•Experimental evaluation2Motivation•Parallel multithreaded programs prevalent–Web servers, search engines, DB managers etc.–Run on CMP/SMP for high performance•Memory allocation is a bottleneck–Prevents scaling with number of processors 3Desired allocator attributes on a multiprocessor system•Speed–Competitive with uniprocessor allocators on 1 cpu•Scalability–Performance linear with the number of processors•Fragmentation (=max allocated / max in use)–High fragmentation  poor data locality  paging•False sharing avoidance 4The problem of false sharing•Program causes false sharing•Allocate number of objects in a cache line, pass objects to different threads •Allocators cause false sharing!•Actively: •malloc satisfies different thread requests from same cache line•Passively:•free allows future malloc to produce false sharingprocessor 1 processor 2x2 = malloc(s);x1 = malloc(s);A cache linethrash… thrash…5The problem of fragmentation•Blowup:–Increase in memory consumption when allocator reclaims memory freed by program, but fails to use it for future requests–Mainly a problem of concurrent allocators–Unbounded (worst case) or bounded (O(P)) 6Example: Pure Private Heaps Allocator•Pure private heaps:•one heap per processor.•malloc gets memoryfrom the processor's heap or the system•free puts memory on the processor's heap•Avoids heap contention•Examples: STL, Cilkx1= malloc(s)free(x1) free(x2)x3= malloc(s)x2= malloc(s)x4= malloc(s)processor 1 processor 2= allocated by heap 1= free, on heap 27How to Break Pure Private Heaps: Fragmentation•Pure private heaps:•memory consumption can grow without bound!•Producer-consumer:•processor 1 allocates•processor 2 frees•Memory always unavailable to producerfree(x1)x2= malloc(s)free(x2)x1= malloc(s)processor 1 processor 2x3= malloc(s)free(x3)8Example II: Private Heaps with Ownership•free puts memory back on the originating processor's heap.•Avoids unbounded memory consumption•Examples: ptmalloc,LKmalloc x1= malloc(s)free(x1)free(x2)x2= malloc(s)processor 1 processor 29How to Break Private Heaps with Ownership:Fragmentation•memory consumption can blowup by a factor of P.•Round-robin producer-consumer:processor i allocatesprocessor i+1 frees•Program requires 1 (K) blocks, allocator gets 3 (P*K) blocksfree(x2)free(x1)free(x3)x1= malloc(s)x2= malloc(s)x3=malloc(s)processor 1 processor 2 processor 310Existing approaches11Uniprocessor Allocators on Multiprocessors•Fragmentation: Excellent–Very low for most programs [Wilson & Johnstone]•Speed & Scalability: Poor–Heap contention•A single lock protects the heap•Can exacerbate false sharing–Different processors can share cache lines12Existing Multiprocessor Allocators•Speed:•One concurrent heap (e.g., concurrent B-tree): •O(log (#size-classes)) cost per memory operation•too many locks/atomic updates Fast allocators use multiple heaps•Scalability:•Allocator-induced false sharing •Other bottlenecks (e.g. nextHeap global in Ptmalloc)•Fragmentation:•P-fold increase or even unbounded13Hoard as the solution14Hoard Overview•P per-processor heaps & 1 global heap•Each thread accesses only its local heap & global •Manages memory in page-sized superblocks of same-sized objects (LIFO free-list)–Avoids false sharing by not carving up cache lines–Avoids heap contention – local heaps allocate & free small blocks from their superblocks•Avoids blowup by–Moving superblocks to global heap when fraction of free memory exceeds some threshold15Superblock managementEmptiness threshold: (ui ≥ (1-f)*ai)∨(ui ≥ ai – K*S)f = ¼K = 0•Multiple heaps  Avoid actively induced false sharing•Block coalescing  Avoid passively induced false sharing•Superblocks transferred are usually empty and transfer is infrequent16Hoard pseudo-codemalloc(sz)1. If sz > S/2, allocate the superblock from the OS and return it.2. i  hash(current thread)3. Lock heap i4. Scan heap i’s list of superblocks from full to least (for the size class of sz)5. If there is no superblock with free space {6. Check heap 0 (global) for a superblock7. If there is none {8. Allocate S bytes as superblock s & set owner to heap i9. } Else {10. Transfer the superblock s to heap i11. u0  u0 – s.u; ui ui + s.u12. a0  a0 - S; ai  ai + S13. }14. }15. ui ui + sz; s.u  s.u + sz16. Unlock heap i17. Return a block from the superblockfree(ptr)1. If the block is “large”2. Free superblock to OS and return3. Find the superblock s this blocks comes from4. Lock s5. Lock heap i, the superblock’s owner6. Deallocate the block from the superblock7. ui ui – block size8. s.u  s.u – block size9. If (i = 0) unlock heap i, superblock s and return10. If (ui < ai – K*S) and (ui<(1-f)*ai) {11. Transfer a mostly-empty superblock s1 to heap 0 (global)12. u0 u0 + s1.u; ui  ui – s1.u13. a0  a0 + S; ai  ai – S14. } 15. Unlock heap i and superblock s17Heap contention•Per-processor Heap contention–1 allocator thread / multiple threads free•Inherently unscalable–Pairs of producer/consumer threads•malloc/free calls serialized•At most 2X slowdown (undesirable but scalable)–Empirically only a small fraction of memory is freed by another thread  Contention expected to be low 18Heap


View Full Document

UT CS 395T - Hoard- A Scalable Memory Allocator for Multithreaded Applications

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Hoard- A Scalable Memory Allocator for Multithreaded Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hoard- A Scalable Memory Allocator for Multithreaded Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hoard- A Scalable Memory Allocator for Multithreaded Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?