1Introduction to Bluespec: A new methodology for designing methodology for designing Hardware ArvindComputer Science & Artificial Intelligence Lab.Massachusetts Institute of TechnologyMassachusetts Institute of TechnologyFebruary 7, 2011 http://csg.csail.mit.edu/6.375 L02-1What is needed to make hardware design easierExtreme IP reuseMultiple instantiations of a block for “Intellectual Property”Multiple instantiations of a block for different performance and application requirements Packaging of IP so that the blocks can be assembled easily to build a large system (black box model)Ability to do modular refinementAbility to do modular refinementWhole system simulation to enable concurrent hardware-software development February 7, 2011http://csg.csail.mit.edu/6.375 L02-22data_inpush_req_ndata_outfulltIP Reuse sounds wonderful until you try it ...Example: Commercially available FIFO IP bl kpop_req_nclkrstnemptyFIFO IP blockThese constraints are spread over many pages of the documentation...February 7, 2011http://csg.csail.mit.edu/6.375 L02-3Bluespec promotes compositionthrough guarded interfacestheModuleAnot fullnot emptytheModuleBtheFifo.enq(value1);theFifo.deq();value2 = theFifo.first();nrdyenabenabenqdeqFIFOtheFifonot emptynot emptytheFifo.enq(value3);theFifo.deq();value4 = theFifo.first();nrdyrdydfirstFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-43Bluespec: A new way of expressing behavior using Guarded Atomic Actions Formalizes composition Modules with guarded interfacesBluespecModules with guarded interfaces Compiler manages connectivity (muxing and associated control)Powerful static elaboration facility Permits parameterization of designs at all levelsTransaction level modelingTransaction level modeling Allows C and Verilog codes to be encapsulated in Bluespec modules February 7, 2011http://csg.csail.mit.edu/6.375 L02-5Bluespec: State and Rules organized into modulesmoduleinterfaceAll state (e.g., Registers, FIFOs, RAMs, ...) is explicit.Behavior is expressed in terms of atomic actions on the state:Rule: guard actionRules can manipulate state in other modules only via their interfaces.February 7, 2011http://csg.csail.mit.edu/6.375 L02-64GCD: A simple example to explain hardware generation from BluespecFebruary 7, 2011 http://csg.csail.mit.edu/6.375 L02-7Programming withrules: A simple exampleEuclid’s algorithm for computing the Greatest Common Divisor (GCD):15 6answer:February 7, 2011http://csg.csail.mit.edu/6.375 L02-85module mkGCD (I_GCD);Reg#(Int#(32)) x <- mkRegU;Reg#(Int#(32))y<-mkReg(0);GCD in BSVStatexyswap subReg#(Int#(32)) y <-mkReg(0);rule swap ((x > y) && (y != 0));x <= y; y <= x;endrulerule subtract ((x <= y) && (y != 0));y <= y – x;endruleInternalbehaviormethod Actionstart(Int#(32) a, Int#(32) b) if (y==0);x <= a; y <= b;endmethodmethod Int#(32) result() if (y==0);return x;endmethodendmoduleExternalinterfaceAssume a/=0February 7, 2011http://csg.csail.mit.edu/6.375 L02-9enabInt#(32)tartInt#(32)GCD Hardware ModulerdyInt#(32)rdystresultGCDmoduley == 0y == 0implicit conditionsinterface I_GCD;methodActionstart (Int#(32) a, Int#(32) b);method Int#(32) result();endinterfaceThe module can easily be made polymorphicMany different implementations can provide the same interface: module mkGCD (I_GCD)February 7, 2011http://csg.csail.mit.edu/6.375 L02-106module mkGCD (I_GCD);Reg#(Int#(32)) x <- mkRegU;Reg#(Int#(32)) y <-mkReg(0);GCD: Another implementationCombine swap and subtract ruleReg#(Int#(32)) y <-mkReg(0);rule swapANDsub ((x > y) && (y != 0));x <= y; y <= x - y;endrulerule subtract ((x<=y) && (y!=0));y <= y – x;endrulemethod Actionstart(Int#(32) a, Int#(32) b)and subtract rulemethod Actionstart(Int#(32) a, Int#(32) b) if (y==0);x <= a; y <= b;endmethodmethod Int#(32) result() if (y==0);return x;endmethodendmoduleFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-11Bluespec Tool flowBluespec SystemVerilog sourceVerilog 95 RTLVerilog simBluespec CompilerRTL synthesisCBluesimCycleAccurateVCD outputDebussyVisualizationgatesFPGAPower estimation toolFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-127module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start,result,RDY_result);input CLK; inputRST N;Generated Verilog RTL: GCDpp_// action method startinput [31 : 0] start_a; input [31 : 0] start_b; input EN_start;output RDY_start;// value method resultoutput [31 : 0] result; output RDY_result;// register x and yreg [31 : 0] x;wire [31 : 0] x$D_IN; wire x$EN;reg [31 : 0] y;i[31 0]$D INi$ENwire[31 : 0] y$D_IN; wire y$EN;...// rule RL_subtractassign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ;// rule RL_swapassign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ;...February 7, 2011http://csg.csail.mit.edu/6.375 L02-13Generated Hardwarexyartx_eny_enxy>!(=0)yenrdyxrdystaresultx_en = y_en =swap? subtract?rule swap ((x>y)&&(y!=0));x <= y; y <= x; endrulerule subtract ((x<=y)&&(y!=0));y <= y – x; endruleFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-148Generated Hardware Modulexyartx_eny_enxy>!(=0)subyenrdyxrdystaresultstart_enstart_enx_en = swap?y_en = swap? OR subtract?swap? subtract?rdy =February 7, 2011http://csg.csail.mit.edu/6.375 L02-15GCD: A Simple Test Benchmodule mkTest ();Reg#(Int#(32)) state <- mkReg(0);IGCD d <kGCD()I_GCD gcd <-mkGCD();rule go (state == 0);gcd.start (423, 142);state <= 1;endrulerulefinish (state == 1);rulefinish (state == 1);$display (“GCD of 423 & 142 =%d”,gcd.result());state <= 2;endruleendmoduleFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-169GCD: Test Benchmodule mkTest ();Reg#(Int#(32)) state <- mkReg(0);Reg#(Int#(4)) c1 <-mkReg(1);Feeds all pairs (c1,c2) 1 < c1 < 7Reg#(Int#(4)) c1 <mkReg(1);Reg#(Int#(7)) c2 <- mkReg(1);I_GCD gcd <- mkGCD();rule req (state==0);gcd.start(signExtend(c1), signExtend(c2));state <= 1;endruleruleresp (state==1);1 < c2 < 63to GCDruleresp (state==1);$display (“GCD of %d & %d =%d”, c1, c2, gcd.result());if (c1==7) begin c1 <= 1; c2 <= c2+1; endelse c1 <= c1+1; if (c1==7 && c2==63) state <= 2 else state <= 0;endruleendmoduleFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-17GCD: Synthesis resultsOriginal (16 bits)Original (16 bits) Clock Period: 1.6 ns Area: 4240 m2Unrolled (16 bits) Clock Period: 1.65nsArea: 5944 m2Area: 5944 mUnrolled takes 31% fewer cycles on the testbenchFebruary 7, 2011http://csg.csail.mit.edu/6.375 L02-1810Need for a rule schedulerFebruary 7, 2011
View Full Document