A Software Layer for Disk Fault Injection Jake Adriaens Dan Gibson CS 736 Spring 2005 Instructor Remzi Arpaci Dusseau Outline 1 2 3 4 5 6 Introduction Motivation Challenges Related Work Implementation Details IDE Driver Fault Model Methods Evaluation Summary Overview 1 Software system for modeling IDE disk faults in an x86 Linux based computer Modification to IDE driver for read write event interception Overview 2 Disks faults described at a high level Faults passed to kernel level module On read write event IDE driver calls kernel module to perform request modification Before write event module may modify data to bewritten After read event module may modify data read from disk Motivation Why purposely cause disk failures Commodity HW and SW fails usually at unexpected times Causing failures at expected times can help improve fault tolerance measures Can be used to determine fault tolerance of systems Various flavors of RAID need fault injection Motivation Faults can happen at the worst time In the middle of a PowerPoint presentation Challenges Drivers are typically written with reliability in mind May have error detection correction measures Should these be removed Fooled Applauded Low level drivers critically affect performance and stability of the system Disk faults need not be stable but shouldn t have unusual side effects Challenges Failure models difficult to justify Disk manufacturers don t offer details on how why their disks fail Failstop model is widely used models complete detected disk failure Other models must be chosen generally to account for many different disks controllers etc Outline 1 2 3 4 5 6 Introduction Motivation Challenges Related Work Implementation Details IDE Driver Fault Model Methods Evaluation Summary Related Work Software fault injection Huang et al and many others use software fault injection for modifying cached web pages ACM ProcWWW Jarboui et al inject software faults into the Linux kernel and observe system behavior Nagaraja et al inject faults into cluster based systems Related Work Disk Faults Modeling Detection Kaaniche et al inject disk faults to study RAID behavior Kari et al presents fault detection and diagnosis techniques separate studies Various other RAID and or FS papers use some form of fault injection to model failures Related Work Hardware Fault Injection Outline 1 2 3 4 5 6 Introduction Motivation Challenges Related Work Implementation Details IDE Driver Fault Model Methods Evaluation Summary Implementation Core components User level parser In kernel injection module In driver upcalls System calls Added 20 lines to IDE driver code Kernel module is demand loaded 250 lines in size 2 System calls inject fault and getdrivesize 120 lines Implementation User level Console Used for fault definition Console interface for fault definition Processes batch files Checks faults for validity Sector ranges probability etc more later Passes faults to kernel module Implementation IDE Driver Modification Added upcalls to injection module Pass I O requests to module for modification Provide callback service on I O completion Added special purpose code for certain fault models Failstop model requires in driver actions Implementation Kernel Module Receives fault lists from user level console Called by IDE driver to perform insertion when LBA sector SCSI like becomes known sector may be modified Write is initiated data to be written may be modified Read completes data may be modified before returning control to I O initiator Implementation System Calls Added two system calls inject faults Used to pass fault definitions to kernel module from user space getsectors Used to determine raw sector ranges of IDE devices by name there are other ways to do this Implementation Faults Defined Disk Request Control Returns I O Initiated I O Returns Faults Injected Upcall Modified Request Bus Traffic IDE Driver 2 4 26 Linux Kernel Important structures struct request Information about an IDE request READ WRITE Number of sectors Etc struct ide drive s t Information about a drive Drive name eg hdc Sizing addressing information Etc IDE Driver 2 4 26 Linux Kernel Functions ide do rw disk 3 versions Common choke point for reads writes Many other similar functions only this one in use Two versions swapped by preprocessor directives one for DMA one for PIO Outline 1 2 3 4 5 6 Introduction Motivation Challenges Related Work Implementation Details Fault Model Methods Evaluation Summary Failure Model Models selected to represent generic IDE disk No modeling of specific failure i e Western Digital s classic servo malfunction Models based on ranges of affected logical sectors ala SCSI Failure Model Fault Types sectorfail Models inability of a given sector block or sector range to store data reliably Excited on read of sector Data read is permuted in some way Randomized Set to specific value Added to offset Shifted by one or more bytes Failure Model Fault Types sectorro Writes to block have no effect on stored value Excited on writes to sector Write requests ignored sectorwrong Traffic to a given block is directed to a different block Excited on reads writes Address permuted similarly to data Failure Model Fault Types transaddr Sector number wrong for first fault excitation but right for all others Excited on reads writes Sector permuted as in sectorwrong transdata Data is wrong for first fault excitation Data permuted as in sectorfail Failure Model Fault Types failstop Drive is totally unresponsive performs no reads or writes Differs from traditional Failstop in that our failstop is invisible Drive does not report any errors simply fails to perform reads or writes to any sector Outline 1 2 3 4 5 6 Introduction Motivation Challenges Related Work Implementation Details Fault Model Methods Evaluation Summary Verification of Faults Faults excited and observed by microbenchmarks tailored to individual fault types Techniques similar to latent fault detection Kari et al and other studies Verification of faults is fault specific Verification sectorfail Corrupts data when read from disk 1 2 3 4 Write known data to disk observe location using printk statement Inject sectorfail fault at location of file on disk Unmount remount FS flush cache Attempt to read faulty file with cat Verification sectorro Ignores writes to a given location 1 2 3 4 5 6 Write known data to disk Inject sectorro fault Flush file cache Write different data to same location Flush file cache Read data from 1 from
View Full Document
Unlocking...