DOC PREVIEW
UMD ENEE 759H - Memory Arithmetic Unit Interface

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25ResultsSlide 27Slide 28Memory Arithmetic Unit InterfaceJason M. MeierJustin S. TellerTom J. KeeleyMemoryControllerCurrent ParadigmTask 1CPU:Task 2MEMORY:CPUMEMORYCTRL:DRAM SystemDone: Task 1Active Pages Implementation•Used Configurable DRAM - RADRAM•Reconfigurable logic implements various memory functions•“Active Page” consists of a page of data and a set of associated functions•Works on individual DRAM chips•Processor-centric and Memory-centric partitioning* Active Pages - Oskin, Chong, Sherwood – ISCA ‘98MAUI ImplementationTask 1CPU:MEMORY:CPUMEMORYCTRL/MAUI:Task 1DRAM SystemTask 2MAUIMemoryControllerMAUDone: Task 11) CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus.2) MC interprets command and places a Read command in the transaction queue.3) DRAM performs read.4) Result is stored in appropriate register in the MAUI register file.MAUI Instruction SetLOAD REGCPU:DRAM:RMC/MAUI:DRAM SystemMAUIMemoryControllerMAU12341234MAUI_LD <m_rd>,offset(<cpu_rs>)1) CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus.2) MC interprets command and places integer in the appropriate register in the MAUI register file.MAUI Instruction Set IILOADI REGCPU:DRAM:MC/MAUI:DRAM SystemMAUIMemoryControllerMAU1212MAUI_LDI <rd>,<cpu_rs>1) CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2) CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus.3) MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue.4) Step 3 repeats for the length of the array.MAUI Instruction Set IIIMAU_ADDCPU:DRAM:WMC/MAUI:124MAUI_ADD <rd>,<rs1>,<rs2>,<rsz>CPUDRAM SystemMAUIMemoryControllerMAU1233R R W4Issues: Read & Write LocksIssues: Address MappingTLBVirtual SpacePhysicalSpaceMemory that is Contiguous in Virtual Space may not be Contiguous in Physical Space•MAUI assumes consecutive addressing (size register)•MAUI operations which cross page boundaries must be split into separate operations for each page •Programmer will not know mapping scheme•Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.•The compiler will be responsible for deciding when MAUI instructions should be used.•This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.Issues: Compiler IssuesIssues: Task InterruptsTask 1CPU:Task 2MEMORY:CPUMEMORYCTRL/MAUI:Task 1 Task 1DRAM SystemTask 2Task 2MAUIMemoryControllerMAUMemorymaui_ld r1, 0Transaction QueueBIUSize(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr R2_statusR3_Data R3_Addr R3_statusMAU_Status = openmaui_ld r1, 0Example: maui_add IMemory ControllerMemorymaui_ld r2, 5Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr R3_statusMAU_Status = openmaui_ld r2, 5Example: maui_add IITransaction QueueMemory ControllerBIUMemorymaui_ld r3, 10Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = openmaui_ld r3, 10Example: maui_add IIITransaction QueueMemory ControllerBIUMemorymaui_ld r4, 2Size(r4) = 2 OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = openmaui_ld r4, 2Example: maui_add IVTransaction QueueMemory ControllerBIUMemorymaui_add r3, r1, r2R, 0R, 5Size(r4) = 2 Offset = 0RL1_beg = 0 RL1_end = 1RL2_beg = 5 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data R1_Addr = 0 R1_status = wR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupiedmaui_add r3, r1, r2Example: maui_add VTransaction QueueMemory ControllerBIUMemoryRead 10D1[0]maui_add r3, r1, r2*Example: maui_add VITransaction QueueMemory ControllerBIUMemoryD2[0]Read 10maui_add r3, r1, r2*Example: maui_add VIITransaction QueueMemory ControllerBIUMemoryR, 1R, 6W,10, D1[0]+D2[0]Read 10maui_add r3, r1, r2*Example: maui_add VIIITransaction QueueMemory ControllerBIUMemoryWrite 6, DD1[1]maui_add r3, r1, r2*Example: maui_add IXTransaction QueueMemory ControllerBIUMemoryD2[1]Write 6, Dmaui_add r3, r1, r2*Example: maui_add XTransaction QueueMemory ControllerBIUMemoryNext InstructionW,10, D1[1]+D2[1]Example: maui_add XITransaction QueueMemory ControllerBIUAdvantages & DisadvantagesAdvantages•Better performance for DRAM latency bound computations•Lower latency to DRAM compared to CPU•Reduced traffic on front-side bus•Concurrent executionDisadvantages•MAUI operates at a lower clock frequency•Increased compiler complexity•Increased fabrication costs (More Logic = More $$)•Recently used data may not be cachedAlternative ImplementationMAUI Occupies its Own Read & Write BusCPUDRAM SystemMAUIMAUMemoryControllerMAUI Read &Write Bus•Eliminate Contention with CPU for DRAM system resources.•Create Circular Data flow resulting in increased performance•Need Specialized Triple-Ported DRAM system leading to increased production costsGOODGOOD X BAD•Simulated on SimpleScalar version 4.0•One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses. •Found up to a 43% speedup!Test SetupResultsTotal CPU CyclesFuture Enhancements IDRAM SystemMAUIMemoryControllerMAUSMAU Multi-taskingTask 1CPU:Task 2MEMORY:MEMORYCTRL/MAUI:Task 1Task 2Task 3Task 3Larger RegisterFileMore MAUs for ParallelismSmallCacheFuture Enhancements IIMAU_ADDCPU:DRAM:WMC/MAUI:Better PipeliningR R WR R R R R R WWDRAM SystemMAUIMemoryControllerMAULarger RegisterFile to HoldIntermediate


View Full Document

UMD ENEE 759H - Memory Arithmetic Unit Interface

Download Memory Arithmetic Unit Interface
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Arithmetic Unit Interface and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Arithmetic Unit Interface 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?