Unformatted text preview:

Eric Hill Pranay Koka CS838 Final Report Introduction For our CS838 course project, we chose to look at availability solutions in CMPs. More specifically, we investigated portions of two recently proposed availability schemes, SafetyNet, and ReVive. The goal of this project was to characterize how performance would be affected if either of these two schemes were implemented. This performance characterization was done by contrasting the storage overheads created by SafetyNet with the bandwidth constraints imposed by ReVive. We focus primarily on the memory hierarchy in a CMP because that is the source of the majority of the overheads associated with availability. Both of the schemes we explore in this project focus on backward error recovery. Backward Error Recovery (BER) schemes periodically checkpoint system state, and rollback to a previously validated checkpoint upon fault detection [1]. This is in contrast to forward error recovery schemes, which rely on redundant hardware to detect system errors. The experiments conducted in this study all focus on how checkpoint storage affects both levels of caches in a CMP. Motivation Our main motivation for this project was that we view RAS (Reliability, Availability, and Serviceability) as an important feature of business class servers. For a large company, server downtime is equivalent to lost money. The ability to recover from hardware faults is a valuable feature for these servers. We believe that future business-class servers will be made up of CMPs, so we thought it would be worthwhile to study previously proposed SMP availability schemes in the context of a CMP. BER Design Parameters Checkpoint ConsistencyCheckpoint consistency refers to when different components in an MP system decide to checkpoint their state. The simplest form of consistency is global, which is when every node in the system checkpoints its state at the same point in physical time. In a system which uses global consistency, all components must synchronize before checkpointing their state. In order to synchronize in physical time, a global skew less clock must be distributed throughout the system. This is relatively difficult thing to do in large systems. Also, after synchronization, all outstanding transactions must complete before the global checkpoint is taken [2]. These implementation details can potentially limit the performance of a system using this type of consistency. A looser form of checkpoint consistency is coordinated local consistency. Components in systems using this form of consistency all checkpoint their state at the same point in logical time. A global logical clock must be distributed throughout such systems. System events such as coherence transactions can be used as a base of logical time, or a logical time base can be derived from a physical clock [3]. The latter option is used for our experiments. The most unorganized form of checkpoint consistency is uncoordinated local consistency. Components in this type of system checkpoint their state independently of all other components. These systems also have the advantage of simplicity, but they have the problem of cascading rollbacks. Recording of Checkpoint State There are several different ways that system state can be checkpointed. The simplest method of checkpointing is flushing of data. The obvious drawback to doing this is that flushing consumes bandwidth that could be have been allocated to do useful work. Another alternative is incremental logging. Instead of flushing all checkpointed state, changes to the cache state can be logged as they occur. Logging allows the checkpointing overhead to be spread out over the entire checkpoint interval. Location of Checkpointed StateFor our purposes, there are only two options for where checkpointed state can reside, on chip or off-chip. The advantage of keeping the checkpointed system state on chip is that there is no overhead accessing the state during checkpointing or recovery. The primary drawback is that the checkpointed system state consumes on chip area that could have been allocated to other resources. Conversely, there is no storage overhead associated with storing checkpointed state off-chip, but there is overhead accessing the state during checkpointing and recovery. SafetyNet The first availability scheme we looked at, SafetyNet, was originally proposed by Sorin et al [3]. This scheme uses coordinated local checkpointing, incremental logging of data, and stores checkpointed state on chip in checkpoint log buffers (CLBs). In order to use coordinated local checkpointing, SafetyNet distributes a global logical clock derived from a physical clock throughout the system. This distributed clock also has a small skew. Sorin et al [3] argue that this skew is not a problem as long as it is smaller than the minimum transfer time through the interconnection network. As long as the skew is less than the minimum transfer latency, the situation where a message leaves one node at time n, and arrives as another node at time n – 1 is guaranteed not to occur [3]. ReVive The second availability scheme we looked at, Revive, was originally proposed by Prvulovic et al [2]. This scheme uses global checkpointing, flushing of data, and stores checkpointed L2 cache state off chip in memory. We initially thought that ReVive would be a good idea for a single chip CMP system, since skew less clock distribution and synchronization should be less expensive in a CMP. SimulatorA Simics-based CMP simulator provided by the Multifacet group was used to run experiments measuring storage and bandwidth. The means by which I/O statistics were collected is described later in this report. Table 1 below summarizes our simulation parameters. L1 Cache Size 64 KB L2 Cache Size 16 MB, 4-way set associative Processor model 4-way OOO superscalar Checkpoint Interval 100,000 cycles Simulation Length 25 transactions Figure 1. Simulation Parameters Experiments In order to collect data for this project, we added various event counters to the ruby simulator. A detailed description of the counters we used and their significance is included in the next section. We also modified the opal simulator to stop every checkpoint interval (which we set to 100,000 cycles), and return information about the counters placed in ruby. In order to estimate the number of cache blocks that need to be flushed for the ReVive


View Full Document

UW-Madison COMPSCI 838 Topic - CS 838 Final Report

Download CS 838 Final Report
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 838 Final Report and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 838 Final Report 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?