This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CPU InterfaceMemory ControllersAddress MappingPaging PolicySimulating Different DRAM Access ProtocolsMASE DRAM User’s Guide: DRAM Related OptionsUniversity of Maryland DRAM Simulator ManualThe dram system enhancement as described herein is more than just the simple addition of a single memory model to simplescalar. The dram enhancement consists of the addition of a bus interface unit, one or more transaction driven memory controllers, and one or more command driven memory systems. This documentation has been prepare to familiarize the user to the termi-nology used in the design of the memory system, and provide a brief explanation of the basic assumptions of the dram simulation system. We hope that with this document, the user is able to quickly ramp up to speed, and be able to modify and otherwise take advantage of the framework of the memory system provided in this enhancement.In figure 1, we show our basic assumption for the “computer system”. We implicitly assume three distinct and separate entities that interact in the life of a memory transaction request: the processor, the memory controller(s), and the DRAM memory system(s). Each of these three distinct and separate entities are assumed to be independently clocked synchronous state FetchDecodeWBMemExecvirtual to physical address translation(DTLB access) [A1][A2] L1 D-Cacheaccess. If missthen proceed to[A3] L2 Cacheaccess. If missthen send to BIUBus Interface Unit (BIU)obtains data from mainmemory [A4 + B][B1] BIU arbitrates[B2] requestsent to system controller[B8] system controller returns data to CPUStages of instruction executionProceeding throughthe memory hierarchyin a modern processorFigure 1: Execution of a Load Instruction in an Abstract Modern Processor[B3]physical addr. to memory addr.translation. [B4] memory L1cacheL2cacheDTLBProcessor CoreBIU (Bus Interface Unit)DRAM Systemfor ownership ofaddress bus ** [B5] memoryaddr. Setup requestscheduling**(RAS/CAS)[A1][B8][A4][A2][A3]** Steps not required for some processor/system controllers. protocol dependant.[B4][B3][B2][B1]I/O to memory trafficmemory request schedulingphysical to memory addrmapping[B7][B5]readdatabuffermemory controllerprocessorDRAM core[B6][B6, B7] DRAM dev.obtains data and returns to controller Part A: Searchingon-chip for dataPart B: Goingoff-chip for data(CPU clocking domain)(DRAM clockingdomain)machines that operate in separate clocking domains. However, in our current implementation of the dram system, we assume that there are only two clocking domains: the CPU clock domain and the DRAM memory system clock domain. We assume that the DRAM memory system as well as the memory controller operate in the DRAM memory system clock domain, and the CPU operates in the CPU clock domain. This assumption holds true for legacy systems with separate memory controllers, while newer systems where the memory controllers is integrated into the CPU core the assumption may be reversed. In such a system, the memory controller is assumed to operate in the same clocking domain as the CPU. A more generalized model would operate the three sepa-rate entities as three independent clock domains, then the frequency of each clock domain may be set separately, and the model may be altered as necessary. However, at this time we believe that such an implementation would be unnecessarily complex, and decreases simulation speed for minimal increase in the system simulation model flexibility and accuracy.CPU InterfaceWe illustrate our basic assumptions about the CPU in figure 2. In essence, we assume an out of order execution core, where different portions of the processor can all generate memory requests. We assume that each request is tagged with a request id (rid), so that when the memory callback function is called, we would be able to uniquely identify the functional unit that had gen-erated the request and also identify the specific pending operation by the request id. We assume that each functional unit may sustain more than one memory transaction miss at a given instance in time, and the memory transaction may be returned out of order by the memory system. We assume that the life of a memory transaction request begins when a requesting functional unit generates a DRAM memory request. The requesting unit begins this process by attempting to place the request into a slot in the bus interface unit. The bus interface unit is assumed to have multiple slots, and these slots in this data structure does not have any assumed ordering. The mase-fe mase-exec mase-commitBIU: bus interface unitFigure 2a: Bus Interface in Simulated CPUstatus rid start_time address access_typeD ReadD WriteI FetchInvalidValidValidValidInvalidInvalid0360-1-1-1-114540xXXXX0xXXXX0xXXXXFigure 2b: Bus Interface Data Structurerequesting unit may place the request into any slot. If there is a free slot available, then the request will be successfully placed into the bus interface unit, and the status MEM_UNKNOWN will be returned to the requesting functional unit, implying that the memory system will return the latency of the request at a later time when the latency is known. If all of the slots have been filled, and no free slot is available, then MEM_RETRY will be returned to the requesting functional unit, implying that the functional unit must retry the request at a later time to see if a memory slot has become available at the later time.Memory ControllersIn figure 3, we show a generalized system controller that supports multiple processors. Although our current implementation only supports a single threaded CPU as simulated by the MASE simulation code, the technique describe here is generic in nature, and may be extended to a small-scale Symmetric Multiprocessor System with shared memory. The simulation of the memory controller begins with the selection of the memory request from the bus interface unit. Currently, we default to a simple FIFO scheme, but we do allow Read Requests and Instruction Fetch Requests to bypass Writes Requests for increased performance. A more intelligent memory request selection scheme can also increase system throughput in a shared memory multiprocessor system by choosing the appropriate entry from the various bus interface units. The intelligent selection of entries from the separate bus interface units of each processor implies that sophisti-cated arbitration hardware exists to select the most critical memory transaction request. Such hardware will be difficult to design


View Full Document

UMD ENEE 759H - DRAM Simulator Manual

Download DRAM Simulator Manual
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DRAM Simulator Manual and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DRAM Simulator Manual 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?