Unformatted text preview:

Area Efficient Architectures for Information Integrity in Cache MemoriesSeongwoo Kim and Arun K. SomaniDepartment of Electrical and Computer EngineeringIowa State UniversityAmes, IA 50011, USAfskim, [email protected] integrity in cache memories is a fundamen-tal requirement for dependable computing. Conventionalarchitectures for enhancing cache reliability using checkcodes make it difficult to trade between the level of data in-tegrity and the chip area requirement. We focus on transientfault tolerance in primary cache memories and develop newarchitectural solutions to maximize fault coverage when thebudgeted silicon area is not sufficient for the conventionalconfiguration of an error checking code. The underlyingidea is to exploit the corollary of reference locality in theorganization and management of the code. A higher protec-tion priority is dynamically assigned to the portions of thecache that are more error-prone and have a higher prob-ability of access. The error-prone likelihood prediction isbased on the access frequency. We evaluate the effective-ness of the proposed schemes using a trace-driven simula-tion combined with software error injection using four dif-ferent fault manifestation models. From the simulation re-sults, we show that for most benchmarks the proposed ar-chitectures are effective and area efficient for increasing thecache integrity under all four models.1. IntroductionMemory hierarchy is one of the most important elementsin modern computer systems. The reliability of the mem-ory significantly affects the overall system dependability.The purposes of integrating an error checking scheme in thememory system are to preventany error that has occurred inthe memory from propagating to other components and toovercome the effects of errors locally, contributing to theoverall goal of achieving failure-free computation.Transient faults, which occur more often than permanentfaults [6], [14], can corrupt information in the memory, i.e.,instruction and data errors. These soft errors may result inerroneous computation. In particular, errors in cache mem-ory, which is the closest data storage to the CPU, can eas-ily propagate into the processor registers and other mem-ory elements, and eventually cause computation failures.Although the cache memory quality has improved tremen-dously due to advances in VLSI technology, it is not pos-sible to completely avoid transient fault occurrence. As aresult, data integrity checking, i.e., detecting and correctingsoft errors, is frequently used in cache memories.The primary technique for ensuring data integrity is theaddition of information redundancy to the original data.Whenever a data item is written into the cache, a check(or protection) code such as parity or error-correcting code(ECC) is also included. We denote a pair of data and checkcode by a parity group. When an item is requested, thecorresponding parity group is read and an error syndromeis generated to check and correct the error if there is one.The capability of the protectioncode needs to be determinedproperly depending on the degree of required data integrity,expected error rate based on harshness of the operating en-vironment, and design and test cost.Despite the fact that predicting the exact rate and behav-ior of transient faults in a system is not possible, currentdata integrity checking schemes for caches are generally se-lected on a single-bit failure model basis. Thus, byte-parityscheme (one bit parity per 8-bit data) [15] and single errorcorrecting-double error detecting (SEC-DED) code [5] arewidespread. Many higher capability codes for byte or bursterror control have also been studied [3], [8].Check codes employed for increased reliability in thecaches are constructed in a uniform structure, i.e., everyunitof data is protected by a check code of the chosen capability.This conventional method is reasonable under the assump-tion that each cache item has the same probability of erroroccurrence. However, it has the following deficiencies.Check code in the uniform structure is an expensiveway to enhance cache reliability. Therefore, it isoverkill under extremely low error rates.It is not flexible in terms of chip area requirement, asthe area occupied by the check code is directly pro-portional to the cache size. If the budgeted area is notsufficient for the uniform structure, no intermediate ar-chitectures are currently available. The high overheadmay result in sacrificing the integrity checking.The uniform structure enables every item to be checked.However, error checking is necessary only for those itemsthat are likely to be corrupted. If it is possible to predictsuch cache items, a higher data integrity can be achievedwith a smaller amount of chip area for the check code. Inpractice, there are several reasons that soft error occurrencetends to concentrate in a few locations. Information in thecache can be altered duringread/write operations due to lownoise margins, and thus cache lines that are frequently ac-cessed may have a higher probability of corruption. Cross-coupling effects may also induce errors in neighboring lo-cations of a line being accessed. On the other hand, globalrandom disturbances commonly affect any location. Moreimportantly, errors in unused lines are no concern.In this paper, we take these factors into account to de-velop area efficient architectural solutions for improvingcache integrity. The underlying idea is that more error-prone and more likely used cache lines must be protectedfirst. The random faults are not biased to a specific locationor time. However, if a fault occurs during the access of aline, it is more likely to affect the data being accessed. Asa result, access frequency makes a difference in the proba-bility of error occurrence between active (more access) andinactive (less access) lines. With large caches, the majorityof cache accesses are usually localized in a small portion ofthe cache. This frequently accessed part is considered moreerror-prone. The corrupted items in the most frequentlyused (MFU) lines are likely to be used as instructions oroperands, quickly affecting the computation. On the otherhand, errors in inactive lines have a higher probability ofbeing replaced or overwritten with new, correct data [13].Data errors are harmful only if they are used for operation,suggesting that not providing check codes for inactive linesmay not affect the integrity of the computation.We present


View Full Document
Download Area efficient architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Area efficient architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Area efficient architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?