Xen ophobia On profiling boot startup Theophilus A Benson Steven Kappes tbenson kappes cs wisc edu Computer Sciences Department University of Wisconsin Madison May 15 2008 Abstract dows panics and prompts the user to restart the system System reboot as a recovery model is not native to Windows Operating Systems in 7 Prabhakaran et al show that the system reboot model is employed by several Linux file systems There are many reasons for the prevalent usage of this model The reasons range from simplicity of implementation to negligible overhead Although simple this model is not without its drawbacks it forces the system to perform the entire boot up sequence The bootup sequence is often time consuming as it loads and initializes modules and drivers required by programs that use the system The amount of time spent executing the boot up sequence directly impacts the availability of the system system availability is the The fact that systems fail should come as no surprise to anymore who has ever developed or worked on a system A failure reduces the availability of the system and hence the productivity of entities using this system To increase system availability several approaches have been developed these approaches range from simple techniques such as restarting the entire system to complex algorithms that isolate and restart the failed subsystem We find the simple approach of a system reboot to be particularly interesting because it is a widely deployed approach We profile the set of instructions excuted during the startup phase of the system and identify heavily utilized segments We implement in Xen a framework for both monitoring the start up sequence and identifying highly utilized segments of code We show that modifications to the identified segments affect the start up sequence Finally we examine the identified segments and suggest modifications that will if implemented increase availability by reducing the boot up time M eanT imeT oF ailure M eanT imeT oF ailure M eanT imeT oRecovery 1 Mean time to recovery is essentially the amount of time spend executing the boot up sequence and it follow from eq 1 that optimizating the boot up sequence will increase the system availability In this paper we pose the following question Is it possible to identify a small portion of the boot up sequence which if optimized will result in significant improvements in the availability of the system To answer this question we design and implement a framework to profile the boot up sequence This framework utilizes pc sampling to identify the distribution of time spent in each segment of code executed during boot up The framework 1 Introduction The system restart plays a key role in the recovery model for many large and popular systems For example The Blue Screen of Death or system failure is perceived as the default recovery behavior for failures in the Windows O S 4 In the famous Blue Screen of Death scenario Win1 Our current approach differs from 2 in that our framework doesnt discriminate between user level processes and treats all user level process as one Unlike our architecture the architecture presented in 2 targets specific processes and only analyses time spent in user space Unlike prior kernel profilers 1 which require the kernel to load certain modules before profiling can be initiated our approach can begin profile from startup time In comparison our method suffers a major draw back because we only profile kernels that have been modified to run on the xen virtual machine These modifications alter the shape of our results and add noise that would otherwise not be present in the environment profiled by other kernel profilers Contrary to the approach taken in 1 we perform software level monitoring Finally seminal work in 6 presents a framework similar to ours however we defer in two things where profiling analysis is performed and the amount of data structures used 6 runs profiling anlysis tools from within the profiled operating system while we run our tools from the main operating system running in domain 0 identifies and ranks code segments based on the frequency with which we sample them With the framework in place we develop and apply a ballooning technique to increase the amount of instruction executed in a segment The goal of ballooning is to quantify validate the achievable gains from the optimization of the identified regions We show that by ballooning and increasing the number of executed instructions we increase the boot up time and the rank of the modified segment We claim that the inverse of this holds deflating a function or decreasing the number of executed instructions should reduce boot up time and the rank of the modified segments Upon examination and analysis of the top 5 segments identified by the framework we developed a few suggestions for reducing the cycles spent executing boot up sequence Finally we validate our the framework by analyzing the profiles of certain code segments over different boot up sequences The rest of this paper is as follows section 2 presents a brief literature survey of the domain space In section 3 we discuss the implementation of our framework We then present and evaluate the results of running our framework on a real linux operating system in 4 Finally we conclude with a summary of our achievements in section 5 3 Architecture Our profiler implementation requires changes in three different levels the Xen Hypervisor the 2 Related Works Linux kernel and user level tools This impleOur work builds on 7 which identifies failure mentation supports flexibility and ease of use models employed in various portions of the file with regard to acquiring profiling data system code Our work looks into increasing the availability of systems that employ system 3 1 Program Counter Sampling restart recovery model We believe that system restart recovery is the most widely applied A useful way to determine a performance profailure recovery model in the systems commu file is with program counter sampling The nity The systems community has worked for program counter contains the address of the many years on profiling techniques to identify next instruction to be executed By reading this code which requires optimization Most of the value it is possible to determine what code is work in the profiling space focuses on user level currently running Therefore this value can be sampled many applications 2 and kernel code 1 Recently however 6 presented a seminal approach
View Full Document
Unlocking...