Instruction Set Comparisons CS 740 Sept. 18, 2000Procedure ExamplesAlpha rfactVAXVAX RegistersVAX Operand SpecifiersVAX Procedure LinkageVAX rfactSparse Matrix CodeCSR EncodingCSR ExampleCSR Multiply: Clean VersionCSR Multiply: Fast VersionOptimized Inner LoopVAX Inner LoopPower / PowerPCPowerPC CuriositiesPowerPC StructureIBM Compiler PPC Inner LoopCodeWarrior Compiler PPC Inner LoopPerformance ComparisonSummaryInstruction Set ComparisonsCS 740Sept. 18, 2000Topics•Code Examples–Procedure Linkage–Sparse Matrix Code•Instruction Sets–Alpha–PowerPC–VAXCS 740 F’00– 2 –Procedure ExamplesProcedure linkage•Passing of control and data•stack managementControl•Register tests•Condition codes•loop countersint rfact(int n){ if (n <= 1) return 1; return n*rfact(n-1);}Code ExampleCS 740 F’00– 3 –Alpha rfactRegisters•$16 Argument n•$9 Saved n–Callee save•$0 Return ValueStack Frame•16 bytes•$9•$26 return PC rfact:ldgp $29,0($27) # setup gprfact..ng:lda $30,-16($30) # $sp -= 16.frame $30,16,$26,0stq $26,0($30) # save return addrstq $9,8($30) # save $9.mask 0x4000200,-16.prologue 1bis $16,$16,$9 # $9 = ncmple $9,1,$1 # if (n <= 1) thenbne $1,$80 # branch to $80subq $9,1,$16 # $16 = n - 1bsr $26,rfact..ng # recursive callmulq $9,$0,$0 # $0 = n*rfact(n-1)br $31,$81 # branch to epilogue.align 4$80:bis $31,1,$0 # return val = 1$81:ldq $26,0($30) # restore retrn addrldq $9,8($30) # restore $9addq $30,16,$30 # $sp += 16ret $31,($26),1save $9save $26$sp + 8$sp + 0CS 740 F’00– 4 –VAXPinnacle of CISC•Maximize instruction density•Provide instructions closely matched to typical program operationsInstruction format•OP, arg1, arg2, …–Each argument has arbitrary specifier–Accessing operands may have side effectsCondition Codes•Set by arithmetic and comparison instructions•Basis for successive branchesProcedure Linkage•Direct implementation of stack disciplineCS 740 F’00– 5 –VAX Registers•R0–R11 General purpose–R2–R7 Callee save in example code–Use pair to hold double•R12 AP Argument pointer–Stack region holding procedure arguments•R13 FP Frame pointer–Base of current stack frame•R14 SP Stack pointer–Top of stack•R15 PC Program counter–Used to access data in code•N C V Z Condition codes–Information about last operation result–Negative, Carry, 2’s OVF, ZeroArgsArg CountSavedCallerStateLocalVarsTempSpaceCurrentFramePreviousFrameAPFPSPVaries with push/popAddressCS 740 F’00– 6 –VAX Operand SpecifiersFormsNotation Value Side Ef UseRi ri General purpose register$v v Immediate data(Ri) M[ri] Memory referencev(Ri) M[ri+v] Mem. ref. with displacementA[Ri] M[a+ri*d] Array indexing–A is specifier denoting address a–d is size of datum(Ri)+ M[ri] Ri += d Stepping pointer forward-(Ri) M[ri-d] Ri –= d Stepping pointer backExamples•Push src move src, –(SP)•Pop dest move (SP)+, destCS 740 F’00– 7 –VAX Procedure LinkageCaller•Push Arguments–Relative to SP•Execute CALLS narg, proc–narg denotes number of argument words–proc starts with mask denoting registers to be saved on stackCALLS Instruction•Creates stack frame–Saved registers, PC, FP, AP, mask, PSW•Sets AP, FP, SPCallee•Compute return value in R0•Execute ret–Undoes effect of CALLSArg1Argn•••SavedStatenargAPSPArg1Argn•••SPCALLS nargFPproc: mask1st instr.CS 740 F’00– 8 –VAX rfactRegisters•r6 saved n•r0 return valueStack Frame•save r6Note•Destination argument last_rfact: .word 0x40 # Save register r6 movl 4(ap),r6 # r6 <- n cmpl r6,$1 # if n > 1 jgtr L1 # then goto L1 movl $1,r0 # r0 <- 1 ret # returnL1: pushab -1(r6) # push n-1 calls $1,_rfact # call recursively mull2 r6,r0 # return result * n retn1Temp SpaceCurrentFrameAPFPSPr6Saved CallerStatePreviousFrameCS 740 F’00– 9 –Sparse Matrix CodeTask•Multiply sparse matrix times dense vector–Matrix has many zero entries–Save space and time by keeping only nonzero entries–Common applicationCompressed Sparse Row Representation [(0,3.5) (1,0.9) (3,2.2)] [(1,4.1) (3,1.9)] [(0,4.6) (2,0.7) (3,2.7) (5,3.0)] [(2,2.9)] [(2,1.2) (4,2.8)] [(5,3.4)]3.50.9 2.24.1 1.94.6 0.72.7 3.02.91.2 2.83.4CS 740 F’00– 10 –CSR EncodingParameters•nrow Number of rows (and columns)•nentries Number of nonzero matrix entriesVal•List of nonzero values (nentries)Cindex•List of column indices (nentries)Rstart•List of starting positions for each row (nrow+1)typedef struct { int nrow; int nentries; double *val; int *cindex; int *rstart;} csr_rec, *csr_ptr;CS 740 F’00– 11 –CSR ExampleParameters•nrow = 6•nentries = 13Val [3.5, 0.9, 2.2, 4.1, 1.9, 4.6, 0.7, 2.7, 3.0, 2.9, 1.2, 2.8, 3.4]Cindex [0, 1, 3, 1, 3, 0, 2, 3, 5, 2, 2, 4, 5]Rstart [0, 3, 5, 9, 10, 12, 13] [(0,3.5) (1,0.9) (3,2.2)] [(1,4.1) (3,1.9)] [(0,4.6) (2,0.7) (3,2.7) (5,3.0)] [(2,2.9)] [(2,1.2) (4,2.8)] [(5,3.4)]CS 740 F’00– 12 –CSR Multiply: Clean VersionInnermost Operation•z[r] += M[r,c] * x[c]•Column c given by cindex[ci]•Matrix element M[r,c] by val[ci]void csr_mult_smpl(csr_ptr M, double *x, double *z){ int r, ci; for (r = 0; r < M->nrow; r++) { z[r] = 0.0; for (ci = M->rstart[r]; ci < M->rstart[r+1]; ci++) z[r] += M->val[ci] * x[M->cindex[ci]]; }}xMz=*rcindex[ci]CS 740 F’00– 13 –CSR Multiply: Fast VersionPerformance•Approx 2X faster•Avoids repeated memory referencesvoid csr_mult_opt(csr_ptr M, ftype_t *x, ftype_t *z){ ftype_t *val = M->val; int *cindex_start = M->cindex; int *cindex = M->cindex; int *rnstart = M->rstart+1; ftype_t *z_end = z+M->nrow; while (z < z_end) { ftype_t temp = 0.0; int *cindex_end = cindex_start + *(rnstart++); while (cindex < cindex_end) temp += *(val++) * x[*cindex++]; *z++ = temp; }}CS 740 F’00– 14 –Optimized Inner LoopInner Loop Pointerscip steps through cindexvalp steps through ValMultiply next matrix value by vector element and add to sum while (...) temp += *(valp++) * x[*cip++];CS 740 F’00– 15 –VAX Inner LoopRegisters•r4 cip•r2,r3 temp•r5 valp•r6 cip_end •r10 x Observe•muld3 instruction does 1/2 of the work! while (...) temp += *(valp++) * x[*cip++];L36: movl (r4)+,r0 # r0 <- *cip++ muld3 (r5)+,(r10)[r0],r0 # r0,r1 <- *valp++ * x[r0] addd2 r0,r2 # temp += r0,r1
View Full Document