Unformatted text preview:

Advanced Computer Architecture, under Dr. Scott RixnerMarc PowerEric FurbishIndraneel Datta[efurbish@hoss simplesim-2.0]> wc -l trace_cache.c666Frequently Used Trace CacheMotivation• Large trace caches suffer increased latency• Lowers performance below idealNorm. IPC vs TC Size (1-cycle lat.)0.88000.90000.92000.94000.96000.98001.00001.0200256512102420484096819216384SizeIPC Normalized to Maxammptest-fmathtest-llongtest-lswlrtest-mathtest-printf• We’d like to minimize the performance lost to latency.Norm. IPC vs TC Size (variable lat.)0.80.850.90.9511.05SizeIPC Normalized to Maxammptest-mathmc ftest-fmathtest-llongtest-lsw lrtest-printfConcept & Hypothesis• To recover from the ill effects of latency in the trace cache, we need some low latency means of getting at the same data.• Normally, solve by implementing L1/L2 cache hierarchy• Contention in the L1 can evict frequently used lines with infrequently used ones• Contention could possibly be reduced by intelligently filling the L1• Frequently Used Trace Cache (FUTC) –• Single-cycle latency, small L1• Judiciously filled from L2 with frequently used lines• Given the new gizmo, we hoped:• IPC(TCs+ FUTCi) > IPC(TCs+ L1TCi) > IPC(TCs)FUTC Implementation• One saturating counter per trace cache line• Incremented on read hit• Counters cleared on TC writes• Rewrites not propagated into FUTC• Saves read port• Maintains logical meaning of counters (apply to a specific trace)• Lines with counters over threshold are promoted• Promoted on a read hit• Counter is not resetFUTC @ 10000 ft.Figure Modified from (E. Rotenbert, S. Bennett, J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. of Wisc.-Madison, 1996)Area and Power Costs• Size of FUTC (8-64KB is reasonable) in order to remain under single-cycle latency• Counters consume 1-3KB for 1MB trace cache• Assumes threshold of 1-8 (1-3 bits)• Power cost is minimal• No worse than L1 trace cache• Single-ported for speed and powerMethodology1. Use “infinite” backend to stress fetch mechanism2. Select ‘Optimal’ TC Size• Based on ideal (single-cycle) performance vs. performance with latency accounted for• Pick best candidate for improvement with FUTC3. Run varying “approximate L1” sizes (8-64KB)• L1 = FUTC with threshold 04. Run varying FUTC sizes (8-64KB) with thresholds of 1, 2, 4 and 85. Compare TC/FUTC and L1/FUTCVerification• Branch predictor accuracies of 90-99%• Exactly the range we would expect• Performance and trace cache hit rates very similar to trace cache paper• Dispatch and commit streams are identical to unmodified sim-outorder• Program flow is guaranteed to be correct• It has seen billions of instructions without flawsResultsIPC vs. FUTC size/Threshold0.20.40.60.811.21.41.68KB0124816KB0124832KB0124864KB01248Size/ThresholdIPCtest-fmathtest-llongtest-lswlrtest-mathtest-printfammpvprmcfResults (cont.)FUTC Miss Rate vs FUTC Size/Threshold0.50.60.70.80.911.18KB0124816KB0124832KB0124864KB01248Size/ThresholdMiss Ratetest-fmathtest-llongtest-lswlrtest-mathtest-printfammpvprmcfBest-Case FUTC ExaminedBest-Case FUTC vs L1 and TC00.20.40.60.811.21.41.61.82ammp mcf test-printf test-mathNormalized IPCTC OnlyL1 TCFUTC32KB16KB16KB 16KBFuture Work• Run larger programs• Increased contention could make FUTC effective• Use partial matching and inactive issue• Higher TC hit rate = more promotions = more contentionConclusions• Hypothesis was partly correct• Small auxiliary cache always helps over TC• Contention in L1 outweighed by FUTC “warmup”• New Hypothesis:• IPC(TCs+ L1 TCi) > IPC(TCs+ FUTCi) > IPC(TCs)• Really: IPC(TCs+ AUX-TCi) >


View Full Document
Download Frequently used Trace Cache
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Frequently used Trace Cache and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Frequently used Trace Cache 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?