DOC PREVIEW
CMU CS 15740 - Instruction Set Comparisons

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Instruction Set Comparisons CS 740 Sept. 18, 2000Procedure ExamplesAlpha rfactVAXVAX RegistersVAX Operand SpecifiersVAX Procedure LinkageVAX rfactSparse Matrix CodeCSR EncodingCSR ExampleCSR Multiply: Clean VersionCSR Multiply: Fast VersionOptimized Inner LoopVAX Inner LoopPower / PowerPCPowerPC CuriositiesPowerPC StructureIBM Compiler PPC Inner LoopCodeWarrior Compiler PPC Inner LoopPerformance ComparisonSummaryInstruction Set ComparisonsCS 740Sept. 18, 2000Topics•Code Examples–Procedure Linkage–Sparse Matrix Code•Instruction Sets–Alpha–PowerPC–VAXCS 740 F’00– 2 –Procedure ExamplesProcedure linkage•Passing of control and data•stack managementControl•Register tests•Condition codes•loop countersint rfact(int n){ if (n <= 1) return 1; return n*rfact(n-1);}Code ExampleCS 740 F’00– 3 –Alpha rfactRegisters•$16 Argument n•$9 Saved n–Callee save•$0 Return ValueStack Frame•16 bytes•$9•$26 return PC rfact:ldgp $29,0($27) # setup gprfact..ng:lda $30,-16($30) # $sp -= 16.frame $30,16,$26,0stq $26,0($30) # save return addrstq $9,8($30) # save $9.mask 0x4000200,-16.prologue 1bis $16,$16,$9 # $9 = ncmple $9,1,$1 # if (n <= 1) thenbne $1,$80 # branch to $80subq $9,1,$16 # $16 = n - 1bsr $26,rfact..ng # recursive callmulq $9,$0,$0 # $0 = n*rfact(n-1)br $31,$81 # branch to epilogue.align 4$80:bis $31,1,$0 # return val = 1$81:ldq $26,0($30) # restore retrn addrldq $9,8($30) # restore $9addq $30,16,$30 # $sp += 16ret $31,($26),1save $9save $26$sp + 8$sp + 0CS 740 F’00– 4 –VAXPinnacle of CISC•Maximize instruction density•Provide instructions closely matched to typical program operationsInstruction format•OP, arg1, arg2, …–Each argument has arbitrary specifier–Accessing operands may have side effectsCondition Codes•Set by arithmetic and comparison instructions•Basis for successive branchesProcedure Linkage•Direct implementation of stack disciplineCS 740 F’00– 5 –VAX Registers•R0–R11 General purpose–R2–R7 Callee save in example code–Use pair to hold double•R12 AP Argument pointer–Stack region holding procedure arguments•R13 FP Frame pointer–Base of current stack frame•R14 SP Stack pointer–Top of stack•R15 PC Program counter–Used to access data in code•N C V Z Condition codes–Information about last operation result–Negative, Carry, 2’s OVF, ZeroArgsArg CountSavedCallerStateLocalVarsTempSpaceCurrentFramePreviousFrameAPFPSPVaries with push/popAddressCS 740 F’00– 6 –VAX Operand SpecifiersFormsNotation Value Side Ef UseRi ri General purpose register$v v Immediate data(Ri) M[ri] Memory referencev(Ri) M[ri+v] Mem. ref. with displacementA[Ri] M[a+ri*d] Array indexing–A is specifier denoting address a–d is size of datum(Ri)+ M[ri] Ri += d Stepping pointer forward-(Ri) M[ri-d] Ri –= d Stepping pointer backExamples•Push src move src, –(SP)•Pop dest move (SP)+, destCS 740 F’00– 7 –VAX Procedure LinkageCaller•Push Arguments–Relative to SP•Execute CALLS narg, proc–narg denotes number of argument words–proc starts with mask denoting registers to be saved on stackCALLS Instruction•Creates stack frame–Saved registers, PC, FP, AP, mask, PSW•Sets AP, FP, SPCallee•Compute return value in R0•Execute ret–Undoes effect of CALLSArg1Argn•••SavedStatenargAPSPArg1Argn•••SPCALLS nargFPproc: mask1st instr.CS 740 F’00– 8 –VAX rfactRegisters•r6 saved n•r0 return valueStack Frame•save r6Note•Destination argument last_rfact: .word 0x40 # Save register r6 movl 4(ap),r6 # r6 <- n cmpl r6,$1 # if n > 1 jgtr L1 # then goto L1 movl $1,r0 # r0 <- 1 ret # returnL1: pushab -1(r6) # push n-1 calls $1,_rfact # call recursively mull2 r6,r0 # return result * n retn1Temp SpaceCurrentFrameAPFPSPr6Saved CallerStatePreviousFrameCS 740 F’00– 9 –Sparse Matrix CodeTask•Multiply sparse matrix times dense vector–Matrix has many zero entries–Save space and time by keeping only nonzero entries–Common applicationCompressed Sparse Row Representation [(0,3.5) (1,0.9) (3,2.2)] [(1,4.1) (3,1.9)] [(0,4.6) (2,0.7) (3,2.7) (5,3.0)] [(2,2.9)] [(2,1.2) (4,2.8)] [(5,3.4)]3.50.9 2.24.1 1.94.6 0.72.7 3.02.91.2 2.83.4CS 740 F’00– 10 –CSR EncodingParameters•nrow Number of rows (and columns)•nentries Number of nonzero matrix entriesVal•List of nonzero values (nentries)Cindex•List of column indices (nentries)Rstart•List of starting positions for each row (nrow+1)typedef struct { int nrow; int nentries; double *val; int *cindex; int *rstart;} csr_rec, *csr_ptr;CS 740 F’00– 11 –CSR ExampleParameters•nrow = 6•nentries = 13Val [3.5, 0.9, 2.2, 4.1, 1.9, 4.6, 0.7, 2.7, 3.0, 2.9, 1.2, 2.8, 3.4]Cindex [0, 1, 3, 1, 3, 0, 2, 3, 5, 2, 2, 4, 5]Rstart [0, 3, 5, 9, 10, 12, 13] [(0,3.5) (1,0.9) (3,2.2)] [(1,4.1) (3,1.9)] [(0,4.6) (2,0.7) (3,2.7) (5,3.0)] [(2,2.9)] [(2,1.2) (4,2.8)] [(5,3.4)]CS 740 F’00– 12 –CSR Multiply: Clean VersionInnermost Operation•z[r] += M[r,c] * x[c]•Column c given by cindex[ci]•Matrix element M[r,c] by val[ci]void csr_mult_smpl(csr_ptr M, double *x, double *z){ int r, ci; for (r = 0; r < M->nrow; r++) { z[r] = 0.0; for (ci = M->rstart[r]; ci < M->rstart[r+1]; ci++) z[r] += M->val[ci] * x[M->cindex[ci]]; }}xMz=*rcindex[ci]CS 740 F’00– 13 –CSR Multiply: Fast VersionPerformance•Approx 2X faster•Avoids repeated memory referencesvoid csr_mult_opt(csr_ptr M, ftype_t *x, ftype_t *z){ ftype_t *val = M->val; int *cindex_start = M->cindex; int *cindex = M->cindex; int *rnstart = M->rstart+1; ftype_t *z_end = z+M->nrow; while (z < z_end) { ftype_t temp = 0.0; int *cindex_end = cindex_start + *(rnstart++); while (cindex < cindex_end) temp += *(val++) * x[*cindex++]; *z++ = temp; }}CS 740 F’00– 14 –Optimized Inner LoopInner Loop Pointerscip steps through cindexvalp steps through ValMultiply next matrix value by vector element and add to sum while (...) temp += *(valp++) * x[*cip++];CS 740 F’00– 15 –VAX Inner LoopRegisters•r4 cip•r2,r3 temp•r5 valp•r6 cip_end •r10 x Observe•muld3 instruction does 1/2 of the work! while (...) temp += *(valp++) * x[*cip++];L36: movl (r4)+,r0 # r0 <- *cip++ muld3 (r5)+,(r10)[r0],r0 # r0,r1 <- *valp++ * x[r0] addd2 r0,r2 # temp += r0,r1


View Full Document

CMU CS 15740 - Instruction Set Comparisons

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Instruction Set Comparisons
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Set Comparisons and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Set Comparisons 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?