Instruction and Data Address Trace Compression Aleksandar Milenkovi collaborative work with Milena Milenkovi and Martin Burtscher Electrical and Computer Engineering Department The University of Alabama in Huntsville Email milenka ece uah edu Web http www ece uah edu milenka http www ece uah edu lacasa Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address trace compression Data address stride caches for data address trace compression Results Conclusions 2 Program Execution Traces Streams of recorded events Basic block traces Address traces Instruction words Operands Trace uses Computer architects for evaluation of new architectures Computer analysts for workload characterization Software developers for program tuning optimization and debugging 3 Instruction and Data Address Traces An Example for i 0 i 100 i c i s a i b i sum sum c i Dinero Execution Trace Instruction Type Address Data Address 0x020001f4 mov r1 r12 lsl 2 2 0x020001f4 0x020001f8 ldr r2 r4 r1 0 0x020001f8 0xbfffbe24 0x020001fc ldr r3 r14 r1 0 0x020001fc 0xbfffbc94 0x02000200 mla r0 r2 r8 r3 2 0x02000200 0x02000204 add r12 r12 1 1 0 2 0x02000204 0x02000208 cmp r12 99 99 0 2 0x02000208 0x0200020c add r6 r6 r0 2 0x0200020c 0x02000210 str r0 r5 r1 1 0x02000210 0x02000214 ble 0x20001f4 2 0x02000214 0xbfffbb04 4 Trace Issues Trace issues Traces tend to be very large Capture Compression Processing In terabytes for a minute of program execution Expensive to store transfer and use Effective reduction techniques Lossless High compression ratio Fast decompression 5 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address trace compression Data address stride caches for data address trace compression Results Conclusions 6 Trace Compression General purpose compression algorithms Ziv Lempel gzip Burroughs Wheeler transformation bzip2 Sequitur Trace specific compression techniques Tuned to exploit redundancy in traces Better compression faster can be further combined with general purpose compression algorithms 7 Trace Specific Compression Techniques Lossless Compression Instructions Instructions data Link data addresses to dynamic basic block Offset Mache Samples 1989 LBTC Luo and John 2004 Replacing an execution sequence with its identifier Acyclic path WPP Larus 1999 Time Stamped WPP Zhang and Gupta 2001 Control flow graph trace of transitions QPT Larus 1993 N tuple Milenkovic Milenkovic and Kulick 2003 Pleszkun 1994 SBC Milenkovic and Milenkovic 2003 Offset repetitions PDATS Johnson Ha and Zaidi 2001 Link data addresses to loop Regenerate addresses Instruction PDI Johnson Ha and Zaidi 2001 Graph with number of repetitions in nodes Abstract execution Hamou Lhadj and Lethbridge 2002 Eggers et al 1990 Larus 1993 Elnozahy 1999 SIGMA DeRose et al 2002 Value Predictor VPC Burtscher and Jeeradit 2003 TCGEN Burtscher and Sam 2005 8 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 9 Why Trace Compression in Hardware Problem 1 Capture program traces In software trap after each instruction or taken branch E g IBM s Performance Inspector Slowdown 100 times Multiple cores on a single chip more detailed information needed e g time stamps of events Problem 2 debugging is far from fun Stop execution on breakpoints examine the state Time consuming difficult may miss a critical state leading to erroneous behavior Stopping the CPU may perturb the sequence of events making your bugs disappear Need an unobtrusive real time tracing mechanism 10 Trace Compression in Hardware Goals Small on chip area and small number of pins Real time compression never stall the processor Achieve a good compression ratio Solution A set of compression algorithms targeting on the fly compression of instruction and data address traces 11 Exploiting Stream and Strides Instruction address trace compression Limited number and strong temporal locality of instruction streams Replace an instruction stream with its identifier Data address trace compression Spatial and temporal locality of data addresses Recognize regular strides 12 Trace Compressor System Overview Processor Core System Under Test Task Switch Counter Processor Core Data Address Program Data Address Buffer PC DA Memory Stream Cache SC Trace Compressor Trace port External Trace Unit for Storing Processing PC or Intelligent Drive SCIT SCMT Predictor Byte rep FSM Data Address Stride Cache DASC DT DMT Byte rep FSM Trace Output Controller To External Unit 13 Outline Program Execution Traces Trace Compression Trace Compression in Hardware Stream caches and predictors for instruction address traces Data address stride caches for data address traces Results Conclusions 14 Stream Detector Stream Cache 0x020001f4 0x020001f8 0x02000214 PC Stream Cache SC PPC SA SL Instruction Stream S SA S L Buffer S SA S L 0x020001f4 0x09 00 0 iWay NWAY 1 1 0 0 4 reserved 1 F S SA S SL iWay 0x0E i SA iSet NSET 1 Hit Miss 0x00 it 0 SCIT Stream Cache Stream Cache 0x0E it 1 Index Trace Miss Trace 0x020001f4 0x09 SCMT SA SL 0x0E it 99 SA SA L S SA S L From Instruction Stream Buffer 15 SC Itrace Compression Compress instruction stream 1 Get the next instruction stream record from the instruction stream buffer S SA S SL 2 Lookup in the stream cache with iSet F S SA S SL 3 if hit 4 Emit iSet iWay to SCIT 5 else 6 Emit reserved value 0 to SCIT 7 Emit stream descriptor S SA S SL to SCMT 8 Select an entry iWay in the iSet set to be replaced 9 Update stream cache entry SC iSet iWay Valid 1 SC iSet iWay SA S SA SC iSet iWay SL S SL 10 Update stream cache replacement indicators Design Decisions Instruction Stream Buffer size Not to stall processor e g have consecutive very short instruction streams Stream cache Size Associativity Replacement policy Mapping function 16 SC Itrace Compression An Analytical Model Size Dinero I Size SCIT Size SCMT Size Dinero I N 4 Bytes N log N N Size SCIT 2 SET WAYS Bytes SL Dyn 8 N Size SCMT 1 SC Hit N SET NWAYS 5 Bytes SL Dyn 4 SL Dyn CR SC I 1 log 2 N SET NWAYS 5 1 SC Hit N SET NWAYS 8 CR SC I Lim CR SC I Lim SC Hit 1 SC Hit 1 N SET NWAYS 256 N SET NWAYS 128 N SET NWAYS 64 32 SL Dyn log 2 N SET NWAYS Lim CR SC I 4 SL Dyn Legend CR SC I compression ratio N number of instructions SL Dyn average stream
View Full Document
Unlocking...