DOC PREVIEW
U of I CS 433 - Lecture notes

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS433: Computer System OrganizationDimensions of the ProblemBranch Delay Penalty on VLIWPredicationPredicationPredicationPredicationCompiled Using Conditional BranchConverting to Predicated FormScheduling NaivelySome DifficultiesMore Sophisticated Conversion and SchedulingSchedulingEfficiency and UtilizationConverting Predicated AssignmentsIf we have more than CMOVMid-Term ReviewCS433: Computer System OrganizationLuddy HarrisonCompiling for VLIWs part 2:PredicationDimensions of the Problemz Exposing adequate ILPz Unrollingz Unroll and Jamz Software pipeliningz Register renamingz Allocating Register Banks and Functional Units(last time)z Instruction Scheduling and Register Allocationz Scheduling around interlocksz Trace Schedulingz Predication(this time)Branch Delay Penalty on VLIWBC L6 cycle penalty =24 lost instructionissue opportunitiesL:PredicationX = a[i];if (x > 0) {b[i] = x – c[i];} else {e[i] = x + d[i];}If there is a 50% probability of a branch, then this code will suffer aP/2 cycle penalty (or a PW/2 instruction issue opportunity penalty)on the average, whereP is the penalty for a conditional branch (or mispredicted branch)W is the issue width of the machinePredicationX = a[i];if (x > 0) {b[i] = x – c[i];} else {e[i] = x + d[i];}The condition guarding a section(s) of code will become the predicate. It will be used in both positive and negative (complemented) form.PredicationX = a[i];if (x > 0) {b[i] = x – c[i];} else {e[i] = x + d[i];}If we are compiling: reads must be either conditionalized on predicate (to prevent illegal access) or else proved to be in-boundsand aligned.If programming by hand, reads may be executed unconditionally but at a cost of additional bandwidth consumption.PredicationX = a[i];if (x > 0) {b[i] = x – c[i];} else {e[i] = x + d[i];}Stores must be conditionalized (predicated). These change the visible state of the machine after the conditional section has finished.When programming by hand, we may occasionally discover that a conditional write can be done unconditionally, but this is an oddity.Compiled Using Conditional BranchP1 = CMP R1, 0 // compare X to 0BLT P1 LR1 = R2 + R3 // compute c[i]R4 = LOAD R1 // load c[i]R5 = R0 – R4R6 = R7 + R3 // compute b[i]STORE R5, R6JMP ML:R11 = R12 + R3 // compute d[i]R14 = LOAD R11 // load d[i]R15 = R0 – R14R16 = R17 + R3STORE R15, R15M:X = a[i];if (x > 0) {b[i] = x – c[i];} else {e[i] = x + d[i];}Converting to Predicated FormP1 = CMP R1, 0 // compare X to 0IF P1 R1 = R2 + R3 // compute c[i]IF P1 R4 = LOAD R1 // load c[i]IF P1 R5 = R0 – R4IF P1 R6 = R7 + R3 // compute b[i]IF P1 STORE R5, R6IF !P1 R11 = R12 + R3 // compute d[i]IF !P1 R14 = LOAD R11 // load d[i]IF !P1 R15 = R0 – R14IF !P1 R16 = R17 + R3IF !P1 STORE R15, R16If every instruction type can be predicated, it is relatively simple to convert into a naïve predicated form.Scheduling NaivelyP1 = CMP R1, 0IF P1 R1 = R2 + R3 || IF !P1 R11 = R12 + R3IF P1 R4 = LOAD R1 || IF !P1 R14 = LOAD R11IF P1 R5 = R0 – R4 || IF !P1 R15 = R0 – R14 ||IF P1 R6 = R7 + R3 || IF !P1 R16 = R17 + R3IF P1 STORE R5, R6 || IF !P1 STORE R15, R16Some DifficultiesP1 = CMP R1, 0IF P1 R1 = R2 + R3 || IF !P1 R11 = R12 + R3IF P1 R4 = LOAD R1 || IF !P1 R14 = LOAD R11IF P1 R5 = R0 – R4 || IF !P1 R15 = R0 – R14 ||IF P1 R6 = R7 + R3 || IF !P1 R16 = R17 + R3IF P1 STORE R5, R6 || IF !P1 STORE R15, R16•If we do this in virtual register form, it appears that the predicated assignments do not kill their destinationsR1 = 19…IF P1 R1 = x+yIF P1 R2 = R1+7it looks as though the first assignment to R1 reaches the use of R1•Nothing can be hoisted above the comparisonP1 = CMP…IF P1 R1 = R2 + R3More Sophisticated Conversion and SchedulingA // A-D are unrelated instructions prior to the CMPBCDP1 = CMP R1, 0 // compare X to 0R1 = R2 + R3R4 = LOAD R1// only OK if we are sure it can’t trapR5 = R0 – R4R6 = R7 + R3IF P1 STORE R5, R6 // must be predicated if we want the same resultR11 = R12 + R3R14 = LOAD R11 // only OK if we are sure it can’t trapR15 = R0 – R14R16 = R17 + R3IF !P1 STORE R15, R16SchedulingR1 = R2 + R3 || R11 = R12 + R3AR4 = LOAD R1 || R14 = LOAD R11 || P1 = CMP R1, 0BR5 = R0 – R4 || R15 = R0 – R14 || R6 = R7 + R3 || R16 = R17 + R3CIF P1 STORE R5, R6 || IF !P1 STORE R15, R16DPredication•Expands the basic blocks (straight-line segments) of the code•Creates additional scheduling opportunities•Comes at a cost: useless work is performed unconditionallyEfficiency and UtilizationR1 = R2 + R3 || R11 = R12 + R3R4 = LOAD R1 || R14 = LOAD R11 || P1 = CMP R1, 0R5 = R0 – R4 || R15 = R0 – R14 || R6 = R7 + R3 || R16 = R17 + R3IF P1 STORE R5, R6 || IF !P1 STORE R15, R16If this is a 4-wide machine, then•Utilizationis 11 / 16•Efficiencyis 6 / 16true case: 6 useful instructions in 16 slotsfalse case: 6 useful instructions in 16 slotsaverage is (6 + 6)/2 instructions in 16 slotsConverting Predicated Assignmentsif (x > 0)y = a + b;elsey = c + d;CMP R8, 0 // x > 0R1 = R2 + R3 // a + bR4 = R5 + R6 // c + dCMOV GT R4, R1 // if (x > 0) R4 = c + d•at this point, R4 holds the value of y•The first 3 instructions can be done in parallel•It is common for machines to have conditional move as their only support for predication•(not so common in the case of VLIWs however)CMP R8, 0 ||R1 = R2 + R3 ||R4 = R5 + R6…CMOV GT R4, R1y is assigned onboth sides of the “if”If we have more than CMOVif (x > 0)y = a + b;elsey = c + d;CMP R8, 0…IFGT R4 = R2 + R3 || IFLE R4 = R5 + R6y is assigned onboth sides of the “if”The TigerSHARC does this, expressed as an IF .. ELSE form.Mid-Term Reviewz Bandwidth and Latencyz “Saturating” the bandwidth of one dimension of the machinez Conditionsz Condition codesz Conditional branchingz Data typesz Integer, fractional, saturation, etc.z Pipeliningz Instruction Setsz MIPS, ARM, Thumb, TigerSHARC, C6Xz Static ILP Exploitationz Vector processingz VLIW processingz Compiler techniquesz Unroll / Jamz Schedulingz Predicationz This isn’t an exhaustive listz The homeworks are a good guide


View Full Document

U of I CS 433 - Lecture notes

Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?