COMP 530 Operating Systems The Art and Science of small Memory Allocation Don Porter 1 COMP 530 Operating Systems Lecture goal This lecture is about allocating small objects Less than one page in size 4KB Past lectures have focused on allocating physical pages or segments Understand how memory allocators work Understand trade offs and current best practices 2 COMP 530 Operating Systems Background Embedded Lists In CS2 I learned to code linked lists like this next object A B A B next object A B A B next object A B A B Requires 2 allocations per node object node A common C idiom is to turn this around Next Next Next 3 COMP 530 Operating Systems Virtual Address Space Code text 0 Big Picture n heap heap empty stack libc so 0xffffffff int main struct foo x malloc sizeof struct foo void malloc ssize t n if heap empty mmap add pages to heap find a free block of size n Key idea Sub divide a page for each malloc call 4 libc soheapCode text stackn COMP 530 Operating Systems Today s Lecture How to implement malloc or new Note that new is essentially malloc constructor malloc is part of libc and executes in the application malloc gets pages of memory from the OS via mmap and then sub divides them for the application A brief history of Linux internal kmalloc implementations 5 COMP 530 Operating Systems Bump allocator malloc 6 malloc 12 malloc 20 malloc 5 6 COMP 530 Operating Systems Bump allocator Simply bumps up the free pointer How does free work It doesn t Well you could try to recycle cells if you wanted but complicated bookkeeping Controversial observation This is ideal for simple programs You only care about free if you need the memory for something else 7 COMP 530 Operating Systems Assume memory is limited Hoard best of breed concurrent allocator User applications Seminal paper Your lab 3 is a simplified version of Hoard No concurrency no large 2K objects no realloc etc There are other good designs out there jemalloc supermalloc 8 COMP 530 Operating Systems Overarching issues Fragmentation Allocation and free latency Implementation complexity 9 COMP 530 Operating Systems Fragmentation Review What is it Why does it happen What is Internal fragmentation Wasted space when you round an allocation up External fragmentation When you end up with small chunks of free memory that are too small to be useful Which kind does our bump allocator have 10 COMP 530 Operating Systems Hoard Superblocks At a high level allocator operates on superblocks Chunk of virtually contiguous pages All objects in a superblock are the same size A given superblock is treated as an array of same sized objects They generalize to powers of b 1 In usual practice b 2 11 COMP 530 Operating Systems Superblock intuition Free list in LIFO order Store list pointers in free objects Free next next next next 512 byte object heap 4 KB page 4 KB page next next next Free space Each page an array of objects 12 4 KB page4 KB pageFree list in LIFO orderEach page an array of objectsStore list pointers in free objects Power of 2 3 Free List 7 8 9 Free List Free List Free List 10 Free List 11 COMP 530 Operating Systems Big picture for one CPU Superblocks sub divided differently at each level Some sizes can be empty Free List One of these per CPU and one shared 13 Superblocks sub divided differently at each levelSome sizes can be empty COMP 530 Operating Systems Superblock Intuition malloc 8 1 Find the nearest power of 2 heap 23 8 2 Find free object in superblock 3 Add a superblock if needed Goto 2 14 COMP 530 Operating Systems 512 byte object heap Pick first free object malloc 400 Free next next next next 4 KB page 4 KB page Free space next next next 15 4 KB page4 KB pagePick first free object COMP 530 Operating Systems Superblock example Suppose my program allocates objects of sizes 14 15 17 34 and 40 bytes How many superblocks do I need if b 2 3 16 32 and 64 byte chunks If I allocate a 15 byte object from an 16 byte superblock doesn t that yield internal fragmentation Yes but it is bounded to 50 Give up some space to bound worst case and complexity 16 COMP 530 Operating Systems High level strategy Allocate a heap for each processor and one shared heap Note not threads but CPUs Can only use as many heaps as CPUs at once Requires some way to figure out current processor Try per CPU heap first If no free blocks of right size then try global heap Why try this first If that fails get another superblock for per CPU heap 17 COMP 530 Operating Systems Example malloc on CPU 0 Global Heap Second try global heap First try per CPU heap If global heap full grow per CPU heap CPU 0 Heap CPU 1 Heap 18 First try per CPU heapSecond try global heapIf global heap full grow per CPU heap COMP 530 Operating Systems Big objects If an object size is bigger than half the size of a superblock just mmap it Recall a superblock is on the order of pages already What about fragmentation Example 4097 byte object 1 page 1 byte Argument More trouble than it is worth Extra bookkeeping potential contention and potential bad cache behavior 19 COMP 530 Operating Systems Memory free Simply put back on free list within its superblock How do you tell which superblock an object is from Suppose superblock is 8k 2pages And always mapped at an address evenly divisible by 8k Object at address 0x431a01c Just mask out the low 13 bits Came from a superblock that starts at 0x431a000 Simple math can tell you where an object came from 20 COMP 530 Operating Systems free x Add to front of free list Free list next next next X next next 512 byte object heap 4 KB page 4 KB page Free space X 21 4 KB page4 KB pageXAdd to front of free list COMP 530 Operating Systems LIFO Why are objects re allocated most recently used first Aren t all good OS heuristics FIFO More likely to be already in cache hot Recall from undergrad architecture that it takes quite a few cycles to load data into cache from memory If it is all the same let s try to recycle the object already in our cache 22 COMP 530 Operating Systems Hoard Simplicity The bookkeeping for alloc and free is straightforward Many allocators are quite complex looking at you slab Overall CPUs 1 heaps Per heap 1 list of superblocks per object size 22 211 Per superblock Need to know which how many objects are free LIFO list of free blocks 23 Order 5 Free List 7 8 9 Free List Free List Free List 10 Free List 11 COMP 530 Operating Systems CPU 0 Heap Illustrated Free List LIFO order Some sizes can be empty Free List One of these per CPU and one shared 24
View Full Document