Unformatted text preview:

Fault Isolation for Device Drivers Jorrit N Herder Herbert Bos Ben Gras Philip Homburg and Andrew S Tanenbaum Dept of Computer Science VU University Amsterdam The Netherlands E mail jnherder herbertb beng philip ast cs vu nl This work explores the principles and practice of isolating low level device drivers in order to improve OS dependability In particular we explore the operations drivers can perform and how fault propagation in the event a bug is triggered can be prevented We have prototyped our ideas in an open source multiserver OS MINIX 3 that isolates drivers by strictly enforcing least authority and iteratively refined our isolation techniques using a pragmatic approach based on extensive software implemented fault injection SWIFI testing In the end out of 3 400 000 common faults injected randomly into 4 different Ethernet drivers using both programmed I O and DMA no fault was able to break our protection mechanisms and crash the OS In total we experienced only one hang but this appears to be caused by buggy hardware Keywords Operating Systems Device Drivers Bugs Dependability Fault Isolation SWIFI Testing Have no fear of perfection you ll never reach it Salvador Dal 1904 1989 1 INTRODUCTION Other Net Fs Drivers Arch 5000000 4000000 3000000 2000000 1000000 0 ec D n Ju ec D 04 03 8 l0 Ju 13 08 n Ja 09 7 l0 Ju 10 07 n Ja 10 6 l0 Ju 15 06 n Ja 08 5 l0 Ju 04 15 24 16 18 Despite recent research advances commodity operating systems still fail to meet public demand for dependability Studies seem to indicate that unplanned downtime is mainly due to faulty system software 13 37 A survey across many languages found well written software to have 6 faults KLoC with 1 fault KLoC as a lower bound when using the best techniques 16 In line with this estimate FreeBSD reportedly has 3 35 post release faults KLoC 5 even though this project has strict testing rules and anyone is able to inspect the source code It is now beyond a doubt that extensions such as device drivers are responsible for the majority of OS crashes Even though extensions typically comprise up to two thirds of the OS code base they are generally provided by untrusted third parties and have a reported error rate of 3 7 times higher than other code 3 Indeed Windows XP crash dumps showed that 65 83 of all crashes can be attributed to extensions and drivers in particular 10 26 The reason that these crashes can occur is the close integration of untrusted extensions with the trusted core kernel This violates the principle of least authority by granting excessive power to potentially buggy components As a consequence a malfunctioning device driver can for example wipe out kernel data structures or overwrite servers and drivers Not surprisingly memory corruption was found to be one of the main OS crash causes 35 Fixing buggy drivers is infeasible since configurations are continuously changing with for example 88 new drivers per day in 2004 26 On top of this maintainability of existing drivers is hard due to changing kernel interfaces and growth of the code base 29 Our analysis of the Linux 2 6 kernel shows a sustained growth in LoC of about 5 5 every 6 months as shown in Fig 1 Over the past 4 5 years the kernel has grown 49 2 and now surpasses 5 1M lines of executable code largely due to device drivers comprising 57 6 of the kernel or 3 0M lines of code While there is a consensus that drivers need to be isolated e g 19 20 21 36 the issue to be addressed in each approach is Who can do what and how can this be done safely We strongly believe that least authority should be the guiding principle in any dependable design Every program should operate using the least set of privileges necessary to complete its job Primarily this principle limits the damage that can result from an accident or error It also reduces the number of potential interactions among privileged programs so that unintentional unwanted or improper uses of privilege are less likely to occur 31 Lines of Executable Code LoC Abstract Figure 1 Growth of the Linux 2 6 kernel since its release 1 1 Contribution and Paper Outline In contrast to earlier work 17 this study addresses the fundamental issue of fault isolation for device drivers The main contributions are i a classification of driver operations that are root causes of fault propagation and ii a set of isolation techniques to curtail these powers in the face of bugs We believe this analysis as well as the isolation techniques proposed to be an important result for any effort to isolate faults in drivers in any OS A secondary contribution consists of the full integration of our isolation techniques in a freely available open source OS MINIX 3 MINIX 3 strictly adheres to least authority As a baseline each driver is run in a separate user mode UNIX process with a private IO MMU protected address space This takes away all privileges and renders each driver harmless Next because this protection is too coarse grained we have provided various fine grained mechanisms to grant selective access to resources needed by the driver to do its job Different per driver policies can be defined by the administrator The kernel and trusted OS servers act as a reference monitor and mediate all accesses to privileged resources such as CPU device I O memory and system services This design is illustrated in Fig 2 Rather than proving isolation formally 7 we have taken a pragmatic empirical approach and iteratively refined our isolation techniques using software implemented fault injection SWIFI After several design iterations MINIX 3 is now able to withstand millions of faults representative for system code Even though we injected 3 400 000 faults not a single fault was able to break the driver s isolation or corrupt other parts of the OS We did experience one hang but this appears to be caused by buggy hardware This paper continues as follows First we relate our work to other approaches Sec 2 and discuss assumptions and limitations Sec 3 Next we introduce isolation techniques based on a classification of privileged driver operations Sec 4 and illustrate our ideas with a case study Sec 5 Then we describe the experimental setup Sec 6 and the results of our SWIFI tests Sec 7 Finally we discuss lessons learned Sec 8 and conclude Sec 9 Super User Multiserver OS Grant Selective Access User Space Unprivileged Processes Kernel Space Mediate Resource Access Hardware Enforce Protection Domains Isolation Policy Driver Manager Isolated Driver Store


View Full Document

PSU CSE 544 - Fault Isolation for Device Drivers

Loading Unlocking...
Login

Join to view Fault Isolation for Device Drivers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fault Isolation for Device Drivers and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?