DOC PREVIEW
CSUN COMP 546 - Vertical

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Table 14.1 Reported Speedups of Superscalar-Like Machines Reference Speedup [TJAD70] 1.8 [KUCK77] 8 [WEIS84] 1.58 [ACOS86] 2.7 [SOHI90] 1.8 [SMIT89] 2.3 [JOUP89b] 2.2 [LEE91] 7Table 14.2 Cortex-A8 Memory System Effects on Instruction Timings Replay event Delay Description Load data miss 8 cycles 1. A load instruction misses in the L1 data cache. 2. A request is then made to the L2 data cache. 3. If a miss also occurs in the L2 data cache, then a second replay occurs. The number of stall cycles depends on the external system memory timing. The minimum time required to receive the critical word for an L2 cache miss is approximately 25 cycles, but can be much longer because of L3 memory latencies. Data TLB miss 24 cycles 1. A table walk because of a miss in the L1 TLB causes a 24-cycle delay, assuming the translation table entries are found in the L2 cache. 2. If the translation table entries are not present in the L2 cache, the number of stall cycles depends on the external system memory timing. Store buffer full 8 cycles plus latency to drain fill buffer 1. A store instruction miss does not result in any stalls unless the store buffer is full. 2. In the case of a full store buffer, the delay is at least eight cycles. The delay can be more if it takes longer to drain some entries from the store buffer. Unaligned load or store request 8 cycles 1. If a load instruction address is unaligned and the full access is not contained within a 128-bit boundary, there is a 8-cycle penalty. 2. If a store instruction address is unaligned and the full access is not contained within a 64-bit boundary, there is a 8-cycle penalty.Table 14.3 Cortex-A8 Dual-Issue Restrictions Restriction type Description Example Cycle Restriction Load/store resource hazard There is only one LS pipeline. Only one LS instruction can be issued per cycle. It can be in pipeline 0 or pipeline 1 LDR r5, [r6] STR r7, [r8] MOV r9, r10 1 2 2 Wait for LS unit Dual issue possible Multiply resource hazard There is only one multiply pipeline, and it is only available in pipeline 0. ADD r1, r2, r3 MUL r4, r5, r6 MUL r7, r8, r9 1 2 3 Wait for pipeline 0 Wait for multiply unit Branch resource hazard There can be only one branch per cycle. It can be in pipeline 0 or pipeline 1. A branch is any instruction that changes the PC. BX r1 BEQ 0x1000 ADD r1, r2, r3 1 2 2 Wait for branch Dual issue possible Data output hazard Instructions with the same destination cannot be issued in the same cycle. This can happen with conditional code. MOVEQ r1, r2 MOVNE r1, r3 LDR r5, [r6] 1 2 2 Wait because of output dependency Dual issue possible Data source hazard Instructions cannot be issued if their data is not available. See the scheduling tables for source requirements and stages results. ADD r1, r2, r3 ADD r4, r1, r6 LDR r7, [r4] 1 2 4 Wait for r1 Wait two cycles for r4 Multi-cycle instructions Multi-cycle instructions must issue in pipeline 0 and can only dual issue in their last iteration. MOV r1, r2 LDM r3, {r4-r7} LDM (cycle 2) LDM (cycle 3) ADD r8, r9, r10 1 2 3 4 4 Wait for pipeline 0, transfer r4 Transfer r5, r6 Transfer r7 Dual issue possible on last transferTable 14.4 Cortex-A8 Example Dual Issue Instruction Sequence for Integer Pipeline Cycle Program Counter Instruction Timing Description 1 0x00000ed0 BX r14 Dual issue pipeline 0 1 0x00000ee4 CMP r0,#0 Dual issue in pipeline 1 2 0x00000ee8 MOV r3,#3 Dual issue pipeline 0 2 0x00000eec MOV r0,#0 Dual issue in pipeline 1 3 0x00000ef0 STREQ r3,[r1,#0] Dual issue in pipeline 0, r3 not needed until E3 3 0x00000ef4 CMP r2,#4 Dual issue in pipeline 1 4 0x00000ef8 LDRLS pc,[pc,r2,LSL #2] Single issue pipeline 0, +1 cycle for load to pc, no extra cycle for shift since LSL #2 5 0x00000f2c MOV r0,#1 Dual issue with 2nd iteration of load in pipeline 1 6 0x00000f30 B {pc}+8 #0xf38 dual issue pipeline 0 7 0x00000f38 STR r0,[r1,#0] Dual issue pipeline 1 7 0x00000f3c: LDR pc,[r13],#4 Single issue pipeline 0, +1 cycle for load to pc 8 0x0000017c ADD r2,r4,#0xc Dual issue with 2nd iteration of load in pipeline 1 9 0x00000180 LDR r0,[r6,#4] Dual issue pipeline 0 9 0x00000184 MOV r1,#0xa Dual issue pipeline 1 12 0x00000188 LDR r0,[r0,#0] Single issue pipeline 0: r0 produced in E3, required in E1, so +2 cycle stall 13 0x0000018c STR r0,[r4,#0] Single issue pipeline 0 due to LS resource hazard, no extra delay for r0 since produced in E3 and consumed in E3 14 0x00000190 LDR r0,[r4,#0xc] Single issue pipeline 0 due to LS resource hazard 15 0x00000194 LDMFD r13!,{r4-r6,r14} Load multiple loads r4 in 1st cycle, r5 and r6 in 2nd cycle, r14 in 3rd cycle, 3 cycles total 17 0x00000198 B {pc}+0xda8 #0xf40 dual issue in pipeline 1 with 3rd cycle of LDM 18 0x00000f40 ADD r0,r0,#2 ARM Single issue in pipeline 0 19 0x00000f44 ADD r0,r1,r0 ARM Single issue in pipeline 0, no dual issue due to hazard on r0 produced in E2 and required in


View Full Document

CSUN COMP 546 - Vertical

Download Vertical
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Vertical and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Vertical 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?