Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors - D2800065

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Macro instruction synthesis for embedded processors

DOC PREVIEW

Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Macro instruction synthesis for embedded processorsMotivationRISC8 ArchitectureMethodologyDifferent Levels of expression treesExpression treesInstruction EnumerationMachine Code Level Tree ReconstructionSlide 9Table-Driven Assembly Development ToolsTable-driven back-end tool automationOp-Code ReuseImplementationBenchmarksGSM encoderConclusionsMacro instruction synthesis for embedded processorsPinhong ChenYunjian Jiang (william)- CS252 project presentationMotivationStart from a simple processor coreFind new macro instructions to enhance performance and reduce code sizeApplication-specificUsing dedicated hardware to speed upI/DMem.ALURegBus unitcontrolApplicationMacro Instr. Ext.ControlReg/MemAccessRISC8 ArchitectureWhy RISC8?Simple8-bit ISA with 43 InstructionsAddressable space 64K bytesComplete ISA, including Load/Store, Arithmetic, Logical , Branch, Multiplication,Division, Stack Operation, Subroutine call, Interrupt Operations, etc. Small Verilog core size is 3.5K gates in 0.25umclock speed of 300MHz is reported (our result is about 200MHz)Synthesizable RTL CoreFree assemblerMethodologyApplication (*.c)IR (exp. tree)Front endCode Gen.Asm. codesimulationmach. codeAssemblerperformanceInstr. Profiling RTL exp. treeIstr. SynIstr. SynIstr. SynDifferent Levels of expression treessum += c & 5ASSIGNADDVARANDVARVAR CONASSIGNADDbyte con08regaccaccregANDacc regbytebyteSUIF IR RTL IR after code genASSIGNADDVAR con08ANDaddr16MOVVARaddr16Reconstructed from mach. codeExpression treesSUIF IR•Data type carried•Inaccurate cost•No profiling•Simple – less tree nodes•Machine independentRegister level •Data type carried•One-to-one between macro instructions•Profiling data can be back annotated•Machine dependentMachine code •Data type lost•One-to-one between machine instructions•Profiling data accurate•Large expression trees•Machine dependentInstruction EnumerationTraverse tree structure in post-orderNormalize sub-tree ordersCombine patterns from sub-treesHash new instruction patternsCollect register usage and memory access for evaluationAnnotate profiling informationADDbyte con08accregANDacc regbyteMachine Code Level Tree ReconstructionBuild IR tree from machine codesRecover data dependencies from assembly codeClear definition by ISAeg. AND r2 ==> acc=acc & r2Limited to a basic blockEliminate intermediate storage nodesADDbyte con08accregANDacc regbyteMachine Code Level Tree ReconstructionADDbyte con08ANDbyteBuild IR tree from machine codesRecover data dependencies from assembly codeClear definition by ISAeg. AND r2 ==> acc=acc & r2Limited to a basic blockEliminate intermediate storage nodesSpecial Instr.Special Instr.Table-Driven Assembly Development ToolsAsm. codemach. codeAssemblerperformanceInstr. Profile DisassemblerSimulatorNew Instr. SelectInstr. TableNew Instruction CandidatesAsm. codeIstr. SynTable-driven back-end tool automation@new_ins=( 'mac'=>{otree=>['r0','nADD','r0',['nMUL','Rn','addr16']], pattern=>'Rn addr16', code=>['00000011','00000$Rn','$addr16[0]','$addr16[1]'], sim=>'$R[0]+=$R[$Rn]*$memory[$addr16]', cycles=>13, decode=>'$Rn=$memory[$pc++] & 0x7; $addr16[0]=$memory[$pc++]; $addr16[1]=$memory[$pc++]; $addr16=$addr16[0]|($addr16[1]<<8);‘});Op-Code ReuseOp codes may not be fully used in a specific applicationRemove un-used instruction op-codesTypical applications use far less than 256 op-codesCost of op-code reuseDecoding logicLess flexibilityapplication FIR ADPCM GSM max7219 LCD4x20 PRN-IOOpcodes 28 49 32 39 40 30Implementation Compiler front-end: SUIFCode generator: SPAM-oliveRetargeted to RISC8RTL pattern enumeration: C++RISC8 assembler: PERLRISC8 simulator: PERLMachine level pattern enumeration: PERLMacro driven instruction implementation automation: PERLBenchmarks Benchmark Instructions #adpcmnull:nASSIGN(word,nAND(areg,const16))null:nASSIGN(word,nADD(areg,word))bool:nBOOL(areg,const16)bool:nBOOL(nAND(areg,const16),areg)areg:nIOR(nAND(areg,const16),word)4040863624GSM-encoderacc:nAND(acc,const08)acc:nAND(nASR(acc,const08),reg) acc:nIOR(nAND(acc,const08),reg) acc:nASR(acc,const08) acc:nIOR(nAND(nASR(acc,const08),const08),reg)796492414330621PRN-IOacc:nIOR(acc,const08)null:nASSIGN(byte,nIOR(acc,reg))null:nASSIGN(byte,nIOR(acc,const08))bool:nBOOL(nAND(areg,const16),const16)240969660LCD_4X20bool:nBOOL(acc,const08)null:nASSIGN(byte,nADD(reg,one))9930max7219Acc:nIOR(acc,const08)bool:nBOOL(nAND(acc,reg),zero)14048GSM encoderHardware/software tradeoffSoftware gain: execution speed, code sizeHardware cost: functional unit, decoding logic, data path configuration05001000150020002500300035004000base instr#1 instr#2 instr#3 instr#4code-sizecyclehardwareConclusions RTL level pattern enumerationKey to automating instruction identification, code-generation, assembly and simulationNo need to change algorithm source codeHardware/software trade-offGood estimation of performance gain and hardware cost at register-transfer levelOp-code

View Full Document

Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors

Sign up for free to view:

Please select your school