DOC PREVIEW
Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Macro instruction synthesis for embedded processorsMotivationRISC8 ArchitectureMethodologyDifferent Levels of expression treesExpression treesInstruction EnumerationMachine Code Level Tree ReconstructionSlide 9Table-Driven Assembly Development ToolsTable-driven back-end tool automationOp-Code ReuseImplementationBenchmarksGSM encoderConclusionsMacro instruction synthesis for embedded processorsPinhong ChenYunjian Jiang (william)- CS252 project presentationMotivationStart from a simple processor coreFind new macro instructions to enhance performance and reduce code sizeApplication-specificUsing dedicated hardware to speed upI/DMem.ALURegBus unitcontrolApplicationMacro Instr. Ext.ControlReg/MemAccessRISC8 ArchitectureWhy RISC8?Simple8-bit ISA with 43 InstructionsAddressable space 64K bytesComplete ISA, including Load/Store, Arithmetic, Logical , Branch, Multiplication,Division, Stack Operation, Subroutine call, Interrupt Operations, etc. Small Verilog core size is 3.5K gates in 0.25umclock speed of 300MHz is reported (our result is about 200MHz)Synthesizable RTL CoreFree assemblerMethodologyApplication (*.c)IR (exp. tree)Front endCode Gen.Asm. codesimulationmach. codeAssemblerperformanceInstr. Profiling RTL exp. treeIstr. SynIstr. SynIstr. SynDifferent Levels of expression treessum += c & 5ASSIGNADDVARANDVARVAR CONASSIGNADDbyte con08regaccaccregANDacc regbytebyteSUIF IR RTL IR after code genASSIGNADDVAR con08ANDaddr16MOVVARaddr16Reconstructed from mach. codeExpression treesSUIF IR•Data type carried•Inaccurate cost•No profiling•Simple – less tree nodes•Machine independentRegister level •Data type carried•One-to-one between macro instructions•Profiling data can be back annotated•Machine dependentMachine code •Data type lost•One-to-one between machine instructions•Profiling data accurate•Large expression trees•Machine dependentInstruction EnumerationTraverse tree structure in post-orderNormalize sub-tree ordersCombine patterns from sub-treesHash new instruction patternsCollect register usage and memory access for evaluationAnnotate profiling informationADDbyte con08accregANDacc regbyteMachine Code Level Tree ReconstructionBuild IR tree from machine codesRecover data dependencies from assembly codeClear definition by ISAeg. AND r2 ==> acc=acc & r2Limited to a basic blockEliminate intermediate storage nodesADDbyte con08accregANDacc regbyteMachine Code Level Tree ReconstructionADDbyte con08ANDbyteBuild IR tree from machine codesRecover data dependencies from assembly codeClear definition by ISAeg. AND r2 ==> acc=acc & r2Limited to a basic blockEliminate intermediate storage nodesSpecial Instr.Special Instr.Table-Driven Assembly Development ToolsAsm. codemach. codeAssemblerperformanceInstr. Profile DisassemblerSimulatorNew Instr. SelectInstr. TableNew Instruction CandidatesAsm. codeIstr. SynTable-driven back-end tool automation@new_ins=( 'mac'=>{otree=>['r0','nADD','r0',['nMUL','Rn','addr16']], pattern=>'Rn addr16', code=>['00000011','00000$Rn','$addr16[0]','$addr16[1]'], sim=>'$R[0]+=$R[$Rn]*$memory[$addr16]', cycles=>13, decode=>'$Rn=$memory[$pc++] & 0x7; $addr16[0]=$memory[$pc++]; $addr16[1]=$memory[$pc++]; $addr16=$addr16[0]|($addr16[1]<<8);‘});Op-Code ReuseOp codes may not be fully used in a specific applicationRemove un-used instruction op-codesTypical applications use far less than 256 op-codesCost of op-code reuseDecoding logicLess flexibilityapplication FIR ADPCM GSM max7219 LCD4x20 PRN-IOOpcodes 28 49 32 39 40 30Implementation Compiler front-end: SUIFCode generator: SPAM-oliveRetargeted to RISC8RTL pattern enumeration: C++RISC8 assembler: PERLRISC8 simulator: PERLMachine level pattern enumeration: PERLMacro driven instruction implementation automation: PERLBenchmarks Benchmark Instructions #adpcmnull:nASSIGN(word,nAND(areg,const16))null:nASSIGN(word,nADD(areg,word))bool:nBOOL(areg,const16)bool:nBOOL(nAND(areg,const16),areg)areg:nIOR(nAND(areg,const16),word)4040863624GSM-encoderacc:nAND(acc,const08)acc:nAND(nASR(acc,const08),reg) acc:nIOR(nAND(acc,const08),reg) acc:nASR(acc,const08) acc:nIOR(nAND(nASR(acc,const08),const08),reg)796492414330621PRN-IOacc:nIOR(acc,const08)null:nASSIGN(byte,nIOR(acc,reg))null:nASSIGN(byte,nIOR(acc,const08))bool:nBOOL(nAND(areg,const16),const16)240969660LCD_4X20bool:nBOOL(acc,const08)null:nASSIGN(byte,nADD(reg,one))9930max7219Acc:nIOR(acc,const08)bool:nBOOL(nAND(acc,reg),zero)14048GSM encoderHardware/software tradeoffSoftware gain: execution speed, code sizeHardware cost: functional unit, decoding logic, data path configuration05001000150020002500300035004000base instr#1 instr#2 instr#3 instr#4code-sizecyclehardwareConclusions RTL level pattern enumerationKey to automating instruction identification, code-generation, assembly and simulationNo need to change algorithm source codeHardware/software trade-offGood estimation of performance gain and hardware cost at register-transfer levelOp-code


View Full Document

Berkeley COMPSCI 252 - Macro instruction synthesis for embedded processors

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Macro instruction synthesis for embedded processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Macro instruction synthesis for embedded processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Macro instruction synthesis for embedded processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?