The following paper was originally published in theProceedings of the USENIX 1996 Annual Technical ConferenceSan Diego, California, January 1996For more information about USENIX Association contact:1. Phone: 510 528-86492. FAX: 510 548-57383. Email: [email protected]. WWW URL: http://www.usenix.orglmbench: Portable Tools for Performance AnalysisLarry McVoy, Silicon GraphicsCarl Staelin, Hewlett-Packard Laboratorieslmbench: Portable tools for performance analysisLarry McVoySilicon Graphics, Inc.Carl StaelinHewlett-Packard LaboratoriesAbstractlmbench is a micro-benchmark suite designed tofocus attention on the basic building blocks of manycommon system applications, such as databases, simu-lations, software development, and networking. Inalmost all cases, the individual tests are the result ofanalysis and isolation of a customer’s actual perfor-mance problem. These tools can be, and currently are,used to compare different system implementationsfrom different vendors. In several cases, the bench-marks have uncovered previously unknown bugs anddesign flaws. The results have shown a strong correla-tion between memory system performance and overallperformance. lmbench includes an extensibledatabase of results from systems current as of late1995.1. Introductionlmbench provides a suite of benchmarks thatattempt to measure the most commonly found perfor-mance bottlenecks in a wide range of system applica-tions. These bottlenecks have been identified, iso-lated, and reproduced in a set of small micro-benchmarks, which measure system latency and band-width of data movement among the processor andmemory, network, file system, and disk. The intent isto produce numbers that real applications can repro-duce, rather than the frequently quoted and somewhatless reproducible marketing performance numbers.The benchmarks focus on latency and bandwidthbecause performance issues are usually caused bylatency problems, bandwidth problems, or some com-bination of the two. Each benchmark exists because itcaptures some unique performance problem present inone or more important applications. For example, theTCP latency benchmark is an accurate predictor of theOracle distributed lock manager’s performance, thememory latency benchmark gives a strong indicationof Verilog simulation performance, and the file systemlatency benchmark models a critical path in softwaredevelopment.lmbench was dev eloped to identify and evaluatesystem performance bottlenecks present in manymachines in 1993-1995. It is entirely possible thatcomputer architectures will have changed andadvanced enough in the next few years to render partsof this benchmark suite obsolete or irrelevant.lmbench is already in widespread use at manysites by both end users and system designers. In somecases, lmbench has provided the data necessary todiscover and correct critical performance problemsthat might have gone unnoticed. lmbench uncovereda problem in Sun’s memory management software thatmade all pages map to the same location in the cache,effectively turning a 512 kilobyte (K) cache into a 4Kcache.lmbench measures only a system’s ability totransfer data between processor, cache, memory, net-work, and disk. It does not measure other parts of thesystem, such as the graphics subsystem, nor is it aMIPS, MFLOPS, throughput, saturation, stress, graph-ics, or multiprocessor test suite. It is frequently run onmultiprocessor (MP) systems to compare their perfor-mance against uniprocessor systems, but it does nottake advantage of any multiprocessor features.The benchmarks are written using standard,portable system interfaces and facilities commonlyused by applications, so lmbench is portable andcomparable over a wide set of Unix systems.lmbench has been run on AIX, BSDI, HP-UX, IRIX,Linux, FreeBSD, NetBSD, OSF/1, Solaris, andSunOS. Part of the suite has been run on Win-dows/NT as well.lmbench is freely distributed under the FreeSoftware Foundation’s General Public License [Stall-man89], with the additional restriction that results maybe reported only if the benchmarks are unmodified.2. Prior workBenchmarking and performance analysis is not anew endeavor. There are too many other benchmarksuites to list all of them here. We compare lmbenchto a set of similar benchmarks.• I/O (disk) benchmarks: IOstone [Park90] wants tobe an I/O benchmark, but actually measures the mem-ory subsystem; all of the tests fit easily in the cache.IObench [Wolman89] is a systematic file system anddisk benchmark, but it is complicated and unwieldy.In [McVoy91] we reviewed many I/O benchmarks andfound them all lacking because they took too long torun and were too complex a solution to a fairly simpleproblem. We wrote a small, simple I/O benchmark,lmdd that measures sequential and random I/O farfaster than either IOstone or IObench. As part of[McVoy91] the results from lmdd were checkedagainst IObench (as well as some other Sun internalI/O benchmarks). lmdd proved to be more accuratethan any of the other benchmarks. At least one diskvendor routinely uses lmdd to do performance testingof its disk drives.Chen and Patterson [Chen93, Chen94] measure I/O per-formance under a variety of workloads that are auto-matically varied to test the range of the system’s per-formance. Our efforts differ in that we are more inter-ested in the CPU overhead of a single request, ratherthan the capacity of the system as a whole.• Berkeley Software Distribution’s microbenchsuite: The BSD effort generated an extensive set oftest benchmarks to do regression testing (both qualityand performance) of the BSD releases. We did not usethis as a basis for our work (although we used ideas)for the following reasons: (a) missing tests — such asmemory latency, (b) too many tests, the results tendedto be obscured under a mountain of numbers, and (c)wrong copyright — we wanted the Free SoftwareFoundation’s General Public License.• Ousterhout’s Operating System benchmark:[Ousterhout90] proposes several system benchmarks tomeasure system call latency, context switch time, andfile system performance. We used the same ideas as abasis for our work, while trying to go farther. Wemeasured a more complete set of primitives, includingsome hardware measurements; went into greater depthon some of the tests, such as context switching; andwent to great lengths to make the benchmark portableand extensible.• Networking benchmarks: Netperf measures net-working bandwidth and latency and was written
View Full Document