Carnegie MellonIntroduction to Computer Systems15-213/18-243, spring 200919thLecture, Mar. 26thInstructors:Gregory Kesden and Markus PüschelCarnegie Mellonvm_nextvm_nextLast Time: Linux VM as Collection of “Areas” task_structmm_structpgdmmmmapvm_area_structvm_endvm_protvm_startvm_endvm_protvm_startvm_endvm_protvm_nextvm_startprocess virtual memorytextdatashared libraries00x080480000x0804a0200x40000000 pgd: Page directory address vm_prot: Read/write permissions for this area vm_flags Shared with other processes or private to this processvm_flagsvm_flagsvm_flagsCarnegie MellonLast Time: Memory Mapping Creation of new VM area Create new vm_area_struct and page tables for area Area can be backed by (i.e., get its initial values from) : File on disk copy-on-write possible (e.g., fork()) Nothing (e.g., .bss) demand-zero Key point: no virtual pages are copied into physical memory until they are referenced! Known as “demand paging”Carnegie MellonLast Time: P6 Address TranslationCPUVPN VPO20 12TLBT TLBI416virtual address (VA)...TLB (16 sets, 4 entries/set)VPN1 VPN21010PDE PTEPDBRPPN PPO20 12Page tablesTLBmissTLBhitphysicaladdress (PA)result32...CT CO20 5CI7L2 and DRAML1 (128 sets, 4 lines/set)L1hitL1missCarnegie MellonToday Performance optimization for VM system Dynamic memory allocationCarnegie MellonLarge Pages 4MB on 32-bit, 2MB on 64-bit Simplify address translation Useful for programs with very large, contiguous working sets Reduces compulsory TLB misses How to use (Linux) hugetlbfs support (since at least 2.6.16) Use libhugetlbs {m,c,re}alloc replacementsVPN VPO20 12PPN PPO20 12VPN VPO10 12PPN PPO10 12versusCarnegie MellonBuffering: Example MMM Blocked for cache Assume blocking for L2 cache say, 512 MB = 219B = 216doubles = C 3B2< C means B ≈ 150a bi1*c=c+Block size B x BCarnegie MellonBuffering: Example MMM (cont.) But: Look at one iteration Consequence Each row is on different page More rows than TLB entries: TLB thrashing Solution: buffering = copy block to contiguous memory O(B2) cost for O(B3) operationsa b*c=c+assume > 4 KB = 512 doublesblocksizeB = 150each row used O(B) timesbut every time O(B2) ops betweenCarnegie MellonToday Performance optimization for VM system Dynamic memory allocationCarnegie MellonProcess Memory Imagekernel virtual memoryrun-time heap (via malloc)program text (.text)initialized data (.data)uninitialized data (.bss)stack0%espmemory protectedfrom user codethe “brk” ptrAllocators requestadditional heap memoryfrom the kernel using the sbrk() function:error = sbrk(amt_more)Carnegie MellonWhy Dynamic Memory Allocation? Sizes of needed data structures may only be known at runtimeCarnegie MellonDynamic Memory Allocation Memory allocator? VM hardware and kernel allocate pages Application objects are typically smaller Allocator manages objects within pages Explicit vs. Implicit Memory Allocator Explicit: application allocates and frees space In C: malloc() and free() Implicit: application allocates, but does not free space In Java, ML, Lisp: garbage collection Allocation A memory allocator doles out memory blocks to application A “block” is a contiguous range of bytes of any size, in this context Today: simple explicit memory allocationApplicationDynamic Memory AllocatorHeap MemoryCarnegie MellonMalloc Package #include <stdlib.h> void *malloc(size_t size) Successful: Returns a pointer to a memory block of at least size bytes(typically) aligned to 8-byte boundary If size == 0, returns NULL Unsuccessful: returns NULL (0) and sets errno void free(void *p) Returns the block pointed at by p to pool of available memory p must come from a previous call to malloc() or realloc() void *realloc(void *p, size_t size) Changes size of block p and returns pointer to new block Contents of new block unchanged up to min of old and new size Old block has been free()'d (logically, if new != old)Carnegie MellonMalloc Examplevoid foo(int n, int m) {int i, *p;/* allocate a block of n ints */p = (int *)malloc(n * sizeof(int));if (p == NULL) {perror("malloc");exit(0);}for (i=0; i<n; i++) p[i] = i;/* add m bytes to end of p block */if ((p = (int *)realloc(p, (n+m) * sizeof(int))) == NULL) {perror("realloc");exit(0);}for (i=n; i < n+m; i++) p[i] = i;/* print new array */ for (i=0; i<n+m; i++)printf("%d\n", p[i]);free(p); /* return p to available memory pool */}Carnegie MellonAssumptions Made in This Lecture Memory is word addressed (each word can hold a pointer)Allocated block(4 words)Free block(3 words)Free wordAllocated wordCarnegie MellonAllocation Examplep1 = malloc(4)p2 = malloc(5)p3 = malloc(6)free(p2)p4 = malloc(2)Carnegie MellonConstraints Applications Can issue arbitrary sequence of malloc() and free() requests free() requests must be to a malloc()’d block Allocators Can’t control number or size of allocated blocks Must respond immediately to malloc() requests i.e., can’t reorder or buffer requests Must allocate blocks from free memory i.e., can only place allocated blocks in free memory Must align blocks so they satisfy all alignment requirements 8 byte alignment for GNU malloc (libc malloc) on Linux boxes Can manipulate and modify only free memory Can’t move the allocated blocks once they are malloc()’d i.e., compaction is not allowedCarnegie MellonPerformance Goal: Throughput Given some sequence of malloc and free requests: R0, R1, ..., Rk, ... , Rn-1 Goals: maximize throughput and peak memory utilization These goals are often conflicting Throughput: Number of completed requests per unit time Example: 5,000 malloc() calls and 5,000 free() calls in 10 seconds Throughput is 1,000 operations/second How to do malloc() and free() in O(1)? What’s the problem?Carnegie MellonPerformance Goal: Peak Memory Utilization Given some sequence of malloc and free requests: R0, R1, ..., Rk, ... , Rn-1 Def: Aggregate payload Pk malloc(p) results in a block with a payload of p bytes After request Rkhas completed, the aggregate payload Pkis the sum of currently allocated payloads all malloc()’d stuff minus all free()’d stuff Def: Current heap size = Hk Assume Hkis monotonically nondecreasing reminder: it grows when allocator uses sbrk() Def: Peak memory
View Full Document