UCI P 140C - Object Duplication for Improving Reliability

Unformatted text preview:

Object Duplication for Improving ReliabilityG. Chen, G. Chen, M. Kandemir, N. Vijaykrishnan, M. J. IrwinDepartment of Computer Science and EngineeringThe Pennsylvania State UniversityUniversity Park, PA 16802, USAe-mail: {guilchen,gchen,kandemir,vijay,mji}@cse.psu.eduAbstract— Soft errors are becoming a common problem incurrent systems due to the scaling of technology that results inthe use of smaller devices, lower voltages, and power-saving tech-niques. In this work, we focus on soft errors that can occur inthe objects created in heap memory, and investigate techniquesfor enhancing the immunity to soft errors through various objectduplication schemes. The idea is to access the duplicate objectwhen the checksum associated with the primary object indicatesan error. We implemented several duplication based schemes andconducted extensive experiments. Our results clearly show thatthis spectrum of schemes enable us to balance the tradeoffs be-tween error rate and heap space consumption.I. INTRODUCTIONA major reliability concern due to the increasing size ofembedded memories is soft errors. Soft errors occur when amemory bit flips its value du e to external radiation effects, thuscorrupting the stored data. The need for reliable memory hasbecome even more acute due to the use of power-savings tech-niques such as voltage scaling in current embedded systems.A common approach to handling these memory errors is touse error detection and correction hardware [14, 13]. However,embedded systems are usually sold in huge quantities and thustend to be more sensitive to the per device cost as comparedto their high-performance counterparts. Consequently, a hard-ware approach, which increases the overall cost of the system,may not be attractive for low-cost embedded systems. Fur-ther, an embedded system may run a set of applications and notall of them may require fault-tolerance. Employing expensivehardware for just a few applications that need fault-tolerancemay not be the best economic option. In comparison, a soft-ware scheme can take application specific requirements intoaccount and tune the policy, considering the limited resourcesin the embedded device.The focus of this work is on handling soft errors in object-oriented frameworks. We select an embedded Java Virtual Ma-chine (JVM) as our target object-oriented environment, and in-ject errors into the h eap memory that stores the objects in orderto investigate techniques for enhancing immunity to soft errorsthrough various object duplication schemes. While a lot ofwork has been done on the problem of reliable computation atthe circuit, arch itectural, operating system, and application lev-els [1, 2, 10, 11, 15, 17, 18], our work focuses explicitly on theintegrity of objects, and is complementary to model checkingand verification based work [7, 12].The rest of this paper is organized as follows. Sec-tion II discusses our error injection model. Section IIIpresents our object duplication schemes, including full dupli-cation, compression-based duplication, and selective duplica-tion schemes. Section IV presents an experimental evaluationof these schemes. Section V concludes the paper with a sum-mary of our major observations.II. THE ERROR INJECTION MODELWe use Sun’s KVM [5] to implement the object duplication-based error protection techniques proposed in this work. KVMis a compact, portable Java Virtual Machine specifically de-signed for small, resource-constrained devices. KVM uses ahandle-free mark-sweep-compact collector. An error manage-ment module is added into KVM to store the error informationfor each object. For every bytecode executed, KVM invokesour error injection function to inject errors into the object in-stances in the heap. The error injection function scans theheap; every bit in th e object instances has a fixed probability ofincurring an error. When an object is accessed, we check theerror management module to determine whether the accessedpart has any error(s) in it. The default value for the error in-jection probability for our base experiments is 10−10. Whilewe perform experiments with different error injection rates, therates used in our experiments are generally higher than thosewith the current technology. The main reason for this is thaterrors are more likely to happen only when an application ex-ecutes for long durations of time or when it is executed re-peatedly, and we need an accelerated testing environment. Itshould be noted that accelerated testing is meaningful becausethere are many embedded Java applications that need to be op-erational without errors for long durations ranging from severalhours (e.g., cell phones) to months (e.g., sensors).III. DUPLICATION SCHEMESA. Motivation for Object DuplicationIn this work, unless stated otherwise, we assume that eachobject is protected using a “checksum-based scheme” (calledCHK). In this scheme, each object has a single checksum at-tached to it. The checksum calculations are performed in asimilar fashion to that in [3]. Specifically, each object headeris extended with one additional word to store the precomputedchecksum. This checksum is checked upon a read request to afield, and updated upon a write request.The Java applications used in this stud y are given in Ta-ble I. Calc, firstaid, jpeg, and mvideo are taken from theTABLE ITHE JAVA BENCHMARK CODES AND THEI R CHARACTERIS TICS.Benchmark Description Execution Errors Injected/Cycles Consumed/Detectedauction ticket auction 467.4 75 / 27 / 25calc calculator 338.5 71 / 14 / 14firstaid firstaid info 618.5 423 / 127 / 126jpeg jpeg viewer 1,052.9 1314 / 329 / 313image photo album 1,157.1 430 / 302 / 271manyballs bouncing balls 475.2 53 / 13 / 11mvideo video player 2,732.6 63 / 44 / 40pushpuzzle puzzle game 479.7 72 / 14 / 130%20%40%60%80%100%auctioncalcfirstaidimagejpegmanyballsmvideopushpuzzleAverage Heap OccupancyFig. 1. Average heap occupancy of our applications.http://www.micro java.com site, and auction, image, many-balls, and pushpuzzle come with the MIDP 1.0.3 reference im-plementation [8]. T he third column gives the execution cycles(in million s) for the “base execution.” In this base execution(denoted BASE in the rest of the paper), the objects are not pro-tected. The last column shows statistics on the behavior of theCHK scheme. It gives the total number of errors injected intothe memory , the number of errors that have been consumedby the application, and the number of errors detected by


View Full Document

UCI P 140C - Object Duplication for Improving Reliability

Download Object Duplication for Improving Reliability
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Object Duplication for Improving Reliability and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Object Duplication for Improving Reliability 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?