U of U CS 6810 - Quantitative Analysis - D220859

Home> Schools> University of Utah> Computer Science (CS) > CS 6810> Quantitative Analysis

U of U CS 6810 - Quantitative Analysis

Pages 7

Download Save

Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Quantitative Analysis Today’s topics: • failure analysis • performance analysis • some basic quantitative principles • caution – pot holes – it’s easy to lie w/ numbers 2 CS6810 School of Computing University of Utah Some Issues So Far • And it’s only the 2nd Class • You’ll note my preference  conceptual stuff in the lectures  practical stuff in the homeworks » give me feedback when this approach isn’t good enough • Text isn’t in the bookstore  major screw-up » due to late teaching assignment change » order it on-line • it’ll be faster and cheaper • Homework #1 will be on the web later today  make sure you start early  holiday weekend ahead » maybe you’d like to enjoy it 3 CS6810 School of Computing University of Utah Reliability • Reliability is a key concern in some segments  mission critical embedded systems » e.g. nuclear power plants, automotive, aero & space, …  when high availability is needed » either due to monetary loss or contract • SLA’s and SLO’s • Weakest link theory  useful acronyms (note these are averages & “user mileage may vary”) » MTTF – mean time to failure » MTTR – mean time to repair » MTBF (B=between) = MTTF + MTTR » availability = MTTF/MTBF  hook? » simple for a module – more complex for a larger system 4 CS6810 School of Computing University of Utah Failure Mechanisms • 2 types  hard – permanent failure  transient – temporary failure » due to environmental issues • alpha particles, heat, cross-talk, noise, vibration, … • Device specific (small set of examples)  IC’s » transistors can fail due to excess heat & current • extremely reliable in general » wires fail due to excess current – “metal migration”  Disks (checkout recent Google paper on this) » MHD’s: oxide deterioration, head saturation, coil-motor accuracy » SSD’s: block erase oxide thinning  DRAM’s (checkout recent Google paper on this too!) » IC’s but alpha particles disrupt stored chargePage 2 5 CS6810 School of Computing University of Utah Improving Reliability • 2 strategies  build more reliable devices » more costly & a very slippery slope  use more of them  redundancy • Redundancy shows up in lots of costumes  extra bits – CRC & ECC codes » even more exotic: Turbo, Viterbi, etc.  extra gates and wires » seldom used today  redundant blocks » 2: compare and signal error if they don’t agree » some odd number: vote and take majority, flag anyway  redundant everything » retry elsewhere if something fails  hybrid » e.g. NAND Flash – ECC on block, quarantine block before things get nasty 6 CS6810 School of Computing University of Utah Performance • 2 aspects  throughput: rate of completion of multiple jobs, processes, or threads  single thread performance or execution time  making one better usually degrades the other • Comparing: performance = 1/execution_time  similar game for throughput comparisons 7 CS6810 School of Computing University of Utah Measuring Performance • Tricky in today’s multiprocessing world  alias factors » elapsed time (stopwatch) is load dependent » context switch • process is swapped out part of the time it’s supposedly running » page faults • only fair if your workload is the only one running » I/O delays • processing may be dwarfed by slow I/O response time » OS overheads • fair if OS service is important part of your workload • unfair if service to other workloads are observed • Fortunately  tools exist to help break out time into different bins » still some cruft gets swept under the rug 8 CS6810 School of Computing University of Utah Tools • Unix time command  otb> time » 0.898u 0.311s 2:39.79 0.7% 0+0k 0+0io 9pf+02  meaning » u = seconds of user process execution time » s = seconds of system execution time (OS) » 2:39.79 minutes of elapsed time • includes page faults, I/O overhead, etc. (a.k.a. external overheads) » k = KB of text + data used » io = amount of i/o sent » pf: major plus minor page faults • major: page was on disk • minor: TLB miss but page in main-memory (DRAM)  Beware: OS “system time” undervalued » call and return linkages usually charged to user time • Higher fidelity  use on chip counters via some tool like Intel’s vTunePage 3 9 CS6810 School of Computing University of Utah Lots of Performance Analysis Tools • Key is to learn what they’re good at  some are good at » tracking certain HW events – cache misses, TLB misses, IPC » course grained phase changes • aggregate finer details into a larger “average” • Point  use the right tool for the job  seems obvious but often users don’t get it • Some things are very hard  each tool has a “probe effect” » often hard to determine the overhead • partially because it may be inconsistent 10 CS6810 School of Computing University of Utah Evaluating Machines • Which programs do you choose?  real programs » ideal but problematic • you can’t just read about them • it’s a lot of work • what you care about may be diverse and change over time  kernels » computationally intensive pieces of your programs • same problem as above PLUS – you have to profile your code to find the right stuff – intuition of where the time goes is suspect • use existing kernels – e.g. Livermore Loops & Linpack – small loops over big data sets – good chance they don’t represent your computational needs – not real programs anyway – just stress the CPU • What would you do?  without looking at the next slide! 11 CS6810 School of Computing University of Utah Benchmarks • Industry standard reporting mechanism  burden » need to understand what the benchmark measures • int, float, cache, main-memory, interconnect, …. » enormous diversity in today’s benchmarks • Common benchmark suites  SPEC:

View Full Document


School:
Email:
New Password:
Confirm Password:

U of U CS 6810 - Quantitative Analysis

Sign up for free to view:

Please select your school