DOC PREVIEW
TAMU CSCE 614 - Simplescalar2008

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to SimpleScalar (Based on SimpleScalar Tutorial)OverviewA Taxonomy of Simulation ToolsFunctional vs. PerformanceTrace- vs. Execution-DrivenSimpleScalar Tool SetAdvantages of SimpleScalarSimulator SuiteSim-FastSim-CacheSim-BpredSim-ProfileSim-OutorderSim-Outorder HW ArchitectureSim-Outorder (Main Loop)RUU/LSQ in Sim-OutorderSpecifying Sim-outorderBenchmarkSimPointReferences1Introduction to SimpleScalar(Based on SimpleScalar Tutorial)CPSC 614Texas A&M UniversityOverview•What is an architectural simulator?–a tool that reproduces the behavior of a computing device•Why we use a simulator?–Leverage a faster, more flexible software development cycle•Permit more design space exploration•Facilitates validation before H/W becomes available•Level of abstraction is tailored by design task•Possible to increase/improve system instrumentation•Usually less expensive than building a real system23A Taxonomy of Simulation ToolsShaded tools are included in SimpleScalar Tool Set4Functional vs. Performance•Functional simulators implement the architecture.–Perform real execution–Implement what programmers see•Performance simulators implement the microarchitecture.–Model system resources/internals–Concern about time–Do not implement what programmers see5Trace- vs. Execution-Driven•Trace-Driven–Simulator reads a ‘trace’ of the instructions captured during a previous execution–Easy to implement, no functional components necessary•Execution-Driven–Simulator runs the program (trace-on-the-fly)–Hard to implement–Advantages•Faster than tracing•No need to store traces•Register and memory values usually are not in trace•Support mis-speculation cost modeling6SimpleScalar Tool Set•Computer architecture research test bed–Compilers, assembler, linker, libraries, and simulators–Targeted to the virtual SimpleScalar architecture–Hosted on most any Unix-like machine7Advantages of SimpleScalar•Highly flexible–functional simulator + performance simulator•Portable–Host: virtual target runs on most Unix-like systems–Target: simulators can support multiple ISAs•Extensible–Source is included for compiler, libraries, simulators–Easy to write simulators•Performance–Runs codes approaching ‘real’ sizes8Simulator SuiteSim-Fast Sim-Safe Sim-ProfileSim-CacheSim-BPredSim-Outorder-300 lines-functional-4+ MIPS-350 lines-functional w/checks-900 lines-functional-Lot of stats-< 1000 lines-functional-Cache stats-Branch stats-3900 lines-performance-OoO issue-Branch pred.-Mis-spec.-ALUs-Cache-TLB-200+ KIPSPerformanceDetail9Sim-Fast•Functional simulation•Optimized for speed•Assumes no cache•Assumes no instruction checking•Does not support Dlite!•Does not allow command line arguments•<300 lines of code10Sim-Cache•Cache simulation•Ideal for fast simulation of caches (if the effect of cache performance on execution time is not necessary)•Accepts command line arguments for:–level 1 & 2 instruction and data caches–TLB configuration (data and instruction)–Flush and compress– and more•Ideal for performing high-level cache studies that don’t take access time of the caches into account11Sim-Bpred•Simulate different branch prediction mechanisms•Generate prediction hit and miss rate reports•Does not simulate the effect of branch prediction on total execution timenottakentakenperfectbimod bimodal predictor2lev 2-level adaptive predictorcomb combined predictor (bimodal and 2-level)12Sim-Profile•Program Profiler•Generates detailed profiles, by symbol and by address•Keeps track of and reports•Dynamic instruction counts–Instruction class counts–Branch class counts–Usage of address modes–Profiles of the text & data segment13Sim-Outorder•Most complicated and detailed simulator•Supports out-of-order issue and execution•Provides reports–branch prediction–cache–external memory–various configuration19年 1年 13年14FetchDispatchRegisterSchedulerExeWriteback CommitI-CacheMemorySchedulerMemVirtual MemoryD-Cache D-TLBI-TLBSim-Outorder HW Architecture15Sim-Outorder (Main Loop) •sim_main() in sim-outorder.c ruu_init();for(;;){ ruu_commit(); ruu_writeback(); lsq_refresh(); ruu_issue(); ruu_dispatch(); ruu_fetch();}•Executed once for each simulated machine cycle•Walks pipeline from Commit to Fetch–Reverse traversal handles inter-stage latch synchronization by only one pass16RUU/LSQ in Sim-Outorder•RUU (Register Update Unit)–Handles register synchronization/communication–Serves as reorder buffer and reservation stations–Performs out-of-order issue when register and memory dependences are satisfied•LSQ (Load/Store Queue)–Handles memory synchronization/communication–Contains all loads and stores in program order•Relationship between RUU and LSQ–Memory dependencies are resolved by LSQ–Load/Store effective address calculated in RUUSpecifying Sim-outorder-bpred <type>-bpred:bimod <size>-bpred:2lev <l1size> <l2size> <hist_size>…-config <file>-dumpconfig <file>17-fetch:ifqsize <size> -instruction fetch queue size (in insts)-fetch:mplat <cycles> - extra branch miss-prediction latency (cycles)…For Assignment #1, change at least l1size.$ sim-outorder –config <file> <benchmark command line>Benchmark•SPEC CPU 2000–Integer/Floating Point–http://www.spec.org–For homework: Alpha binaries, input data files18CFP2000CINT2000179.art datareftesttraininputoutputDirectory organizationsrc……164.gzip…SimPoint•Goal–To find simulation points that accurately representatives the complete execution program based on phase analysis•Single Simulation Points (Standard for homework)–If the Simulation Point is 90, then you start simulating at instruction 90 * 100 million (9 billion) and stop simulating at instruction 9.1 billion.•Multiple Simulation Points1920References•SimpleScalar Tutorial/Hack Guide–Read tutorial/Run, test, and debug•WWW Computer


View Full Document

TAMU CSCE 614 - Simplescalar2008

Documents in this Course
Load more
Download Simplescalar2008
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Simplescalar2008 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Simplescalar2008 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?