SLYK A Transparent Fault Tolerant Migration Platform Jasper Lin Jennifer Shu Olivier Koch Shuchyng You jasperln jshu olivierk yoshu117 mit edu Abstract The recent trend towards mobile computing has introduced new challenges such as migrating a user s computing environment as he moves from location to location Although laptops offer a great deal of mobility they still suffer from traditional drawbacks such as having weak computing power compared to desktops and being relatively expensive and encumbering In the past few years the concept of a virtual environment that can be suspended at one place and resumed at another has started to emerge opening the door to true mobility We propose a virtual machine based migration platform that preserves active network connections across machine migrations To make our system fully deployable we require neither cooperation from the outside world nor any modification to the host or guest operating system The platform provides machine and network transparency as well as fault tolerance and data integrity 1 Figure 1 SLYK facilitates the migration of user state between network connected machines and recover his latest work on any computer with Internet access This novel view of computing however opens up a new batch of interesting challenges that need to be addressed before such a system can be successfully deployed First for maximum deployability no cooperation from the rest of the network should be required In other words if a user is migrating from one machine to another the rest of the network should not be aware of the change and the user s active network connections should not be broken We refer to this challenge as network transparency Second in order to offer the most flexibility the migration platform needs to be hardware and operating system independent One promising approach is to use virtual machines VMs which by nature do not depend on the underlying operating system and therefore allow true machine transparency 8 Third our system needs to provide a fault tolerant approach for data storage If a machine running the migration platform suffers a hard disk HD failure for example then other machines should be able to resume without loss of data In this paper we attempt to address the above challenges while focusing on the problem of migrating user state as users traverse machines We propose SLYK1 a virtual machine based migration platform that preserves active connections across machine migrations and offers a high level of fault tolerance Since a VM simulates a complete architecture users Introduction In today s computing environment it is common for one user to encounter several different computers in the course of a day Computers are increasingly being viewed as public utilities that are as ubiquitous as electricity and water As a result users no longer need to value computers as an expensive resource instead they can place greater worth on the state including personal data and open applications stored on these computers A mobile working environment would be useful to virtually anyone who uses a computer For instance a student working on a desktop in his lab could transfer his state to his laptop at home and be able to switch back and forth every day If he were in the middle of a long simulation for example he could resume it on different computers without having to start over or wait in lab until it finished Similarly a businessman could travel across the United States 1 SLYK pronounced Slick is taken from the authors last names Shu Lin You and Koch 1 are permitted to run any operating system and application compatible with the emulated architecture The state of any virtual system running on SLYK can be packaged and sent over a network to be resumed by any other machine running SLYK as shown in Figure 1 2 tion In contrast VM approaches involve potentially needing to send much more state than needed However the general approach adopted by VM migration platforms allows the migration of many more operating systems and applications without any modification Furthermore several optimizations can be performed to reduce the inherent overhead of migration via VMs 13 15 Related Work 3 Previous work on using VMs to migrate state focuses mainly on optimizing the performance of emulation and the speed of migration 15 Although performance is important for the mainstream adoption of VMs there are other important factors such as faulttolerance and transparent operation with the outside world Internet Suspend and Resume ISR 9 presents a straightforward implementation of a VM migration infrastructure Upon suspend the state of the VM is stored on a remote NFS server When resumed the state of the VM is copied from NFS onto the target machine and the VM is started Optimizing the Migration of Virtual Computers describes several optimizations to speed migration time 15 Their goal is to make it practical to migrate state between home and work computers over a 384kbps link During the process of migration virtual HD blocks are left on the source machine to be requested as needed by the target machine Both of these systems are based on VMWare 13 so they only work on the x86 platform Additionally these projects suffer from two other drawbacks First all active network connections are lost during migration Applications that depend on these connections need to be reset on the target machine Second HD blocks which have not been requested and cached locally may become inaccessible when their host machines go down Mobile IP 14 provides mobility by always routing packets first to a static home host then to the mobile host which works when a static host is always available and not separated by the network However failure of this home host results in loss of all active mobile connections SLYK uses a Mobile IPlike infrastructure to migrate active connections but the home host can be dynamically specified There have been several proposals to migrate state at a finer granularity than full system migration 3 18 21 These systems exploit specific knowledge about the state or execution environment to ship the minimum amount of data needed for seamless transi Challenges We face many difficulties in designing a migration platform that is transparent fault tolerant and efficient In order for a migrated machine to function as if it were still operating on the original host we need to transfer a large amount of state including that of the HD RAM and CPU Unfortunately the majority of
View Full Document
Unlocking...