Garbage Collection IssuesFinding the Object GraphFinding the Object Graphldots Finding the Object Graphldots Pointer MapsPointer Mapsldots Pointer Mapsldots The Mark-and-Sweep AlgorithmAlgorithm: Mark and SweepAlgorithm: Mark and Sweepldots Algorithm: Mark and Sweepldots Marking PhaseMarking Phaseldots Marking: ``Look Ma, No Stack!''Marking: ``Look Ma, No Stack!''ldots Marking: ``Look Ma, No Stack!''ldots Marking: ``Look Ma, No Stack!''ldots Where do the markbits go???Where do the markbits goldots Sweeping: CompactionCost of Garbage CollectionCost of Garbage CollectionCost of GC --- Mark-and-SweepCost of GC --- Mark-and-Sweepldots Readings and References520 —Spring 2008 — 8CSc 520Principles of ProgrammingLanguages8 : Garbage Collection — Mark and SweepChristian [email protected] of Computer ScienceUniversity of ArizonaCopyrightc 2008 Christian Collberg[1]520 —Spring 2008 — 8Garbage Collection Issues[2]520 —Spring 2008 — 8Finding the Object GraphFinding the roots: The dynamic objects in a program form agraph. Most GC algorithms need to traverse this graph.The roots of the graph can be in1. global variables2. registers3. local variables/formal parameters on the stack.Hence, the compiler must communicate to the GCwhich registers/variables contain roots.[3]520 —Spring 2008 — 8Finding the Object Graph...Finding internal pointers: Structured variables (arrays,records, objects) may contain internal pointers. Thesemust be known to the GC so that it can traverse thegraph. Hence, the compiler must communicate to theGC the type of each dynamic object and the internalstructure of each type.Finding the beginning of objects: What happens if the onlypointer to an object points somewhere in the middle ofthe object? We must either be able to find the beginningof the object, or make sure the compiler does notgenerate such code.[4]520 —Spring 2008 — 8Finding the Object Graph...8:AC%r8ExecutionStackforARPforARQforARRtemplateGGlobalsPtrs: [12]Size:96Template for PPtrs: [0]Size:4Template for MAINSize:32Ptrs: [4,12]Template for T1Ptrs: [8]Size: 24Template for T2HeapO: templateO: template8:12:O: template8:[5]520 —Spring 2008 — 8Pointer MapsThe internal structure of activation records & structuredvariables is described by run-time templates.Every run-time object has an extra word that points to atype descriptor (or Temaplate), a structure describingwhich words in the object are pointers. This map isconstructed at compile-time and stored statically in thedata segment of the executable.[6]520 —Spring 2008 — 8Pointer Maps...When the GC is invoked, registers may also containvalid pointers. The compiler must therefore alsogenerate (for every point where the GC may be called)a pointer map that describes which registers hold livepointers at this point. For this reason, we usually onlyallow the GC to run at certain points, often the pointswhere new is called.We must also provide pointer maps for every functioncall point. A function P may call Q which calls newwhich invokes the GC. We need to know which words inP ’s activation record that at this point contain livepointers.[7]520 —Spring 2008 — 8Pointer Maps...How does the GC look up which pointer map belongs toa particular call to procedure P at a particular addressa? The pointer maps are indexed by the return addressof P ! So, to traverse the stack of activation records, theGC looks at each frame, extracts the return address,finds the pointer map for that address, and extractseach pointer according to the map.[8]520 —Spring 2008 — 8The Mark-and-Sweep Algorithm[9]520 —Spring 2008 — 8Algorithm: Mark and SweepThe basic idea behind Mark-and-Sweep is to traverseand mark all the cells that can be reached from the rootcells.A root cell is any pointer on the stack or in globalmemory which points to objects on the heap.Once all the live cells (those which are pointed to by aglobal variable or some other live cells) have beenmarked, we scan through the heap and separate thelive data from the garbage.If we are dealing with equal size objects only (this isthe case in LISP, for example) the we scan the heapand link all the unmarked objects onto the free list.At the same time we can unmark the live cells.[10]520 —Spring 2008 — 8Algorithm: Mark and Sweep...If we have cells of different sizes, just linking thefreed objects together may result in heapfragmentation. Instead we need to compact theheap, by collecting live cells together in a contiguousmemory area on the heap and doing the same withthe garbage cells in another area.[11]520 —Spring 2008 — 8Algorithm: Mark and Sweep...Marking Phase:1. Mark all objects unmarked.2. Find all roots, i.e. heap pointers in stack, regs & globals.3. Mark reachable blocks using a depth first searchstarting at the roots.(a) DFS may run out of stack space!(b) Use non-recursive (Deutsch-Schorr-Waite) DFS.Scanning Phase:same-size-cells Scan heap and put un-marked(non-reachable) cells back on free-list.different-size-cells Compact the heap to preventfragmentation.[12]520 —Spring 2008 — 8Marking PhaseA straight-forward implementation of mark and sweepmay run into memory problems itself! Adepth-first-search makes use of a stack, and the size ofthe stack will be the same as the depth of the objectgraph.Remember that the stack and the heap share the samememory space, and may even grow towards eachother.So, if we’re out of luck we might run into this situation:the heap is full (otherwise we wouldn’t be gc:ing!),the object graph is deep,we run out of stack space during the marking phase.We’re now out of memory alltogether. Difficult torecover from![13]520 —Spring 2008 — 8Marking Phase...Fortunately, there is a smart algorithm for marking inconstant space, called the Deutsch-Schorr-Waitealgorithm. Actually, it was developed simultaneously byPeter Deutsch and by Herbert Schorr and W. M. Waite.The basic idea is to store the DFS stack in the objectgraph itself. When a new node (object) is encountered1. we set the “marked”-bit to 1,2. the node (object) is made to point to the previousnode,3. two global variables current and previous areupdated.current points to the current cell, previous to thepreviously visited cell.[14]520 —Spring 2008 — 8Marking: “Look Ma, No Stack!”Use pointer reversal to encode the DFS stack in theobject graph itself.When the DFS reaches a new cell, change a pointer inthe cell to point back to the DFS parent cell. When wecan go no
View Full Document