Stanford CS 295 - Embra- Fast and Flexible Machine Simulation - D2242869

Home> Schools> Stanford University> Computer Science (CS) > CS 295> Embra- Fast and Flexible Machine Simulation

Stanford CS 295 - Embra- Fast and Flexible Machine Simulation

Pages 12

Download Save

Unformatted text preview:

Sigmetrics ‘96Embra: Fast and Flexible Machine SimulationEmmett WitchelLaboratory for Computer ScienceMassachusetts Institute of [email protected]://www.pdos.lcs.mit.edu/~witchel/Mendel RosenblumComputer Systems LaboratoryStanford [email protected]://www-flash.stanford.edu/SimOS/AbstractThis paper describes Embra, a simulator for the processors,caches, and memory systems of uniprocessors and cache-coherentmultiprocessors. When running as part of the SimOS simulation en-vironment, Embra models the processors of a MIPS R3000/R4000machine faithfully enough to run a commercial operating systemand arbitrary user applications. To achieve high simulation speed,Embra uses dynamic binary translation to generate code sequenceswhich simulate the workload. It is the first machine simulator to usethis technique. Embra can simulate real workloads such as multi-process compiles and the SPEC92 benchmarks running on SiliconGraphic’s IRIX 5.3 at speeds only 3 to 9 times slower than nativeexecution of the workload, making Embra the fastest reported com-plete machine simulator. Dynamic binary translation also givesEmbra the flexibility to dynamically control both the simulationstatistics reported and the simulation model accuracy with low per-formance overheads. For example, Embra can customize its gener-ated code to include a processor cache model which allows it tocompute the cache misses and memory stall time of a workload.Customized code generation allows Embra to simulate a machinewith caches at slowdowns of only a factor of 7 to 20. Most of thestatistics generated at this speed match those produced by a slowerreference simulator to within 1%. This paper describes the tech-niques used by Embra to achieve high performance, focusing on therequirements unique to machine simulation, including modelingthe processor, memory management unit, and caches. In order tostudy Embra’s memory system performance we use the SimOSsimulation system to examine Embra itself. We present a detailedbreakdown of Embra’s memory system performance for two cachehierarchies to understand Embra’s current performance and to showthat Embra’s implementation techniques benefit significantly fromthe larger cache hierarchies that are becoming available. Embra hasbeen used for operating system development and testing as well asfor studies of computer architecture. In this capacity it has simulat-ed large, commercial workloads including IRIX running a relation-al database system and a CAD system for billions of simulatedmachine cycles.1 IntroductionThis paper describes Embra, a high speed simulator of the pro-cessors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. Embra models the hardware of these ma-chines in enough detail to boot and run commercial operating sys-tems with arbitrary application workloads. Embra’s high-speed,detailed simulation has allowed us to construct a sophisticated ma-chine simulation environment capable of supporting research onoperating systems and computer architecture. Using Embra, we canrun large, complex workloads, such as commercial database man-agement systems, in a simulation environment, enabling us to studythe workload’s execution and how it interacts with the operatingsystem and computer architecture.Embra achieves high speed through the aggressive use of on-the-fly or dynamic binary translation. Rather than simulating CPUsby interpreting a workload’s instructions, Embra translates blocksof instructions into code that, when executed, simulates the execu-tion of the original block. This use of binary translation allows Em-bra to eliminate most of the overhead of instruction interpretation.The result is that Embra can simulate workloads running at up toone fourth the speed of the unsimulated workload, faster than anyother complete machine simulator described in the literature.Binary translation also allows Embra to support a high degreeof simulation flexibility without high performance costs. Embra cancustomize the translations it generates to model specific machinefeatures or to compute specific information about the simulated ex-ecution. The translations only include the code needed to performthe tasks specified by the user, so extra features incur no cost whenthey are not being used. For example, operating system developerscan use Embra to test new algorithms with quick turn-around time.Once the code is known to execute correctly, the developer can in-struct Embra to model processor caches, producing more accurateperformance estimates. Embra’s cache modeling mode enables it togenerate workload statistics, most of which match a much slowerreference simulator within 1%.The dynamic nature of Embra’s translations allows the user tochange the level of detail in the middle of a simulation run. This al-lows the user to employ a high speed mode to skip over uninterest-ing parts of the workload, and switch to a more detailed mode forthe sections of interest. This ability to simulate in detail only the in-teresting parts of a workload is important when studying complexworkloads that have long initialization or setup periods before asteady-state is reached.In this paper, we describe Embra in enough detail to allow oth-er developers to build similar systems. Section 2 describes the Si-mOS machine simulation environment, of which Embra is a part.SimOS both provides motivation for high speed simulation andplaces requirements for features Embra must support. Section 3presents a detailed description of the basic machine simulationtechniques used in Embra. This includes the use of on-the-fly binarytranslation for fast instruction set interpretation and support for fastmodeling of memory management hardware for instruction fetchesand data accesses. We also describe a set of optimizations we foundwere necessary to maintain high speed for large, complex work-loads and for modeling multiprocessors. Section 4 presents Em-bra’s technique of customized translations which provide flexibilityin what is modeled and reported. The section focuses on transla-tions customized to include modeling of processor caches.We measure the simulation speed and accuracy of Embra inSection 5. This section also contains a study of Embra’s memorysystem behavior for two different cache hierarchies. The study al-lows us to better understand Embra’s current performance and topredict its performance for future cache hierarchies. Section 6

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 295 - Embra- Fast and Flexible Machine Simulation

Sign up for free to view:

Please select your school