EE392C Advanced Topics in Computer Architecture Polymorphic Processors Stanford University Lecture 13 Thursday 22 May 2003 Dynamic Compilation I Lecture 13 Lecturer Scribe 1 Tuesday 13 May 2003 Navneet Aron Sorav Bansal Varun Malhotra Janani Ravi Introduction With the modern software heavily utilizing shared libraries dynamic class loading for instance in Java and runtime binding the scope of static compiler analysis is becoming restrictive The optimizations by the static compiler are limited by the information available at static compile time Using profiling information may improve the accuracy of the information of run time program behavior but this approach doesn t hold promise for all general purpose programs This motivates shifting of optimizations from static compilation time to runtime Additionally these days software is being shipped as a collection of DLLs rather than a single executable and complexity is shifting from hardware to software Dynamic compilation exploits program staging by postponing certain optimizations until initial stages of execution have been completed which permits accurate prediction of program behavior Dynamic Compiler can also optimize binaries that have not been compiled by the vendors with high levels of static optimization in order to make debugging easy Several optimizations are suited to a dynamic optimization framework like Adaptive optimizations require instant responses to changes in program behavior for instance online profiling techniques can be used to identify hot regions allowing optimizations to concentrate on areas where they can be most effective Architecture specific code transformations can be performed at runtime without restricting the executable to target a single processor and thus allowing the executable to remain generic Inter module optimizations for programs using shared and dynamically loaded libraries can be done by a dynamic compiler as all the code is available to a dynamic optimizer However since the dynamic compiler uses cycles and other resources like memory at run time the overhead of the optimization must be optimized before any improvements can be seen This limits the scope of optimizations that can be done at runtime and makes the efficiency of the optimizations extremely critical 2 EE392C Lecture 13 2 Summary of the papers 2 1 The JRPM System for Dynamically Parallelizing Java Programs Java Runtime Parallelizing Machine JRPM 1 is a system for parallelizing sequential programs automatically JRPM consists of following components Hydra Chip Multiprocessor provides low latency inter processor communication Test profiler provides the mechanism for obtaining statistical data about dynamic dependencies thread sizes and amount of buffering space needed for speculative state Thread Level Speculation TLS support Allows converting an otherwise sequential program into parallel execution model while ensuring that the memory accesses between threads maintain original sequential program order Java Virtual Machine VM Provides an environment where dynamic optimizations can be performed without modifying the source binaries Hydra Chip multiprocessor CMP provides following hardware support for thread level or loop level speculation It has a co processor attached to each main processor for controlling speculative state Apart from that it has speculative tag bits for the purpose of memory disambiguation between threads and has a L2 cache to store speculative data In order to tackle hazards caused by various data dependencies JRPM employs various techniques It uses forwarding of data to threads in sequential order and restarting threads to deal with RAW read after write hazards For WAW write after write violations buffered writes are made available only to sequentially later threads For the case of WAR write after read hazards buffering and committing the speculative writes is done in program order An important consideration in JRPM is to choose speculative thread loops STL a consequence of which is to identify the thread size Small loops are good because they have lesser chances of speculative buffer overflow and they have lesser penalties if RAW hazards occur later in execution On the other hand larger threads are benefiacial as they have lesser overheads Apart from looking at the thread size load dependency state overflow analysis are also done to identify best loops to be chosen as STL Certain control routines are inserted by the compiler for speculative execution after the selection of loops STL A faster compilation is achieved by interleaving compilation stages and by minimizing compiler passes To enhance the performance of this speculative code further optimizations are done Non communication loop inductors are used by assigning loop indices to processors at static time and incrementing them by 4 after every iteration processors in Hydra 4 Several other optimizations like loop invariant register allocation EE392C Lecture 13 3 resettable non communicating loop inductors thread synchronization locks reduction operator optimization multilevel STL decomposition hoisted startup and shutdown are used While handling exceptions only the exceptions caused by a non speculative thread need to be handled For the purpose of garbage collection mark and sweep is used An optimization used in JRPM is to split the free list of objects into several free lists one for each processor Synchronization mechanism in original sequential code are modified for speculative execution to avoid unnecessary synchronization JRPM claims to achieve a speedup of 3 4 for floating point applications 2 3 for multimedia applications and 1 5 2 5 for integer applications with Hydra 4 processors Although JRPM achieves good overall performance with low overheads of profiling and dynamic compilation but the target applications are restricted to a subset of programs For instance it is not possible to parallelize programs with system calls in the critical code regions 2 2 Dynamo A Transparent Dynamic Optimization System Heavyweight static compiler optimization techniques are limited in their scope and capability This is particularly true with the growing trend towards Object Oriented languages using dynamic linking and dynamic code generation environments for instance JAVA JITs and dynamic binary translators Dynamo 2 is a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the
View Full Document
Unlocking...