Unformatted text preview:

Virtual Machines 2 Advanced Topics in Computer Systems CS262A Prof Eric Brewer Live Migration of Virtual Machines ReVirt Virtual Machine Logging and Replay Goal of this lecture illustrate some of value of leveraging the VMM interface for new properties we cover two here migration and exact replay but there are many others as well including debugging and reliability I Live Migration Migration is useful o o o o o Load balancing for long lived jobs why not short lived Ease of management controlled maintenance windows Fault tolerance move job away from flaky but not yet broken hardware Energy efficiency rearrange loads to reduce A C needs Data center is the right target Two basic strategies o Local names move the state physically to the new machine Local memory CPU registers local disk if used typically not in data centers Not really possible for some physical devices e g tape drive o Global names can just use the same name in the new location Network attached storage provides global names for persistent state Network address translation or layer 2 names allows movement of IP addresses Historically migration focused on processes o Typically move the process and leave some support for it back on the original machine e g old host handles local disk access forwards network traffic these are residual dependencies old host must remain up and in use Hard to move exactly the right data for a process which bits of the OS must move E g hard to move TCP state of an active connection for a process o See Zap paper for best of process based migration VMM Migration o o o o Move the whole OS as a unit don t need to understand the OS or its state Can move apps for which you have no source code and are not trusted by the owner Can avoid residual dependencies in data center thanks to global names Non live VMM migration is also useful 1 10 21 09 Virtual Machines 2 migrate your work environment home and back put the suspended VMM on a USB key or send it over the network Collective project Internet suspend and resume Goals o Minimize downtime maximize availability o Keep the total migration time manageable o Limit the impact of migration on both the migratee and the local network Live migration approach o Allocate resources at the destination to ensure it can receive the domain o Iteratively copy memory pages to the destination host Service continues to run at this time on the source host Any page that gets written will have to be moved again Iterate until a only small amount remains or b not making much forward progress Can increase bandwidth used for later iterations to reduce the time during which pages are dirtied o Stop and copy the remaining dirty state Service is down during this interval At end of the copy the source and destination domains are identical and either one could be restarted Once copy is acknowledged the migration is committed in the transactional sense o Update IP address to MAC address translation using gratuitous ARP packet Service packets starting coming to the new host May lose some packets but this could have happened anyway and TCP will recover o Restart service on the new host o Delete domain from the source host no residual dependencies Types of live migration o Managed migration move the OS without its participation o Managed migration with some paravirtualization Stun rogue processes that dirty memory too quickly Move unused pages out of the domain so they don t need to be copied o Self migration OS participates in the migration paravirtualization harder to get a consistent OS snapshot since the OS is running Excellent results on all three goals o o o o downtimes are very short 60ms for Quake 3 impact on service and network are limited and reasonable total migration time is minutes Once migration is complete source domain is completely free 2 Virtual Machines 2 II ReVirt Idea use VM interface to replay non deterministic attacks exactly The overall problem o o o o o Many ways to take over a machine even the kernel has many bugs The number of ways and sophistication is increasing CERT advisories The PC is too complex to be correct So can t really prevent problems but we can eliminate holes as we find them Goal make it much easier to find exploited hole after an attack Basic approach log the events that took place audit trail so that we can reconstruct the attack o Problem 1 integrity attacker can change remove the logs o Problem 2 the logs are incomplete and may miss key events o Problem 3 the events may not be enough to recreate non deterministic bugs can t decrypt past traffic either since keys are one thing that is non deterministic ReVirt approach o 1 Log in the VMM to ensure integrity Compromised OS does not have access to VMM logs even if logger runs in another domain rather than in the VMM proper Small code base reduces bugs in VMM o 2 record non deterministic events using logs and checkpoints and replay everything exactly to pinpoint the exploit o Narrow VMM interface makes it possible to log all events well no need to understand drivers hardware etc Alternative solutions o OS on OS o o o o o User mode linux run linux OS as an app inside the real OS This is a kind of paravirtualization Must map all low level events to Unix signals Use host OS devices instead of raw devices ReVirt would work better on top of Xen which is newer which is both faster and has a much smaller trusted computing base General replay approach o start from a safe checkpoint then roll forward using the log watching for the exploit o May not find it on the first pass but should learn something So restart and roll forward again based on new information repeat In theory could implement a step backward debugging option which would restore the checkpoint and roll forward almost all the 3 Virtual Machines 2 way to the present except for the last step o Insight only need to log non deterministic events and inputs the rest by definition are deterministic and will replay naturally Time must record the exact time of each event and replay it at the same virtual time For example replay an interrupt during the exact same instruction as before Input keyboard mouse network packet etc must log the exact input as well as its time This is easy except for packets which can consume much space ReVirt logging specifics o Copy disk image as first checkpoint o Log events in the VMM into a circular buffer then periodically move to disk o For events log the instruction and the of branches since the last interrupt


View Full Document

Berkeley COMPSCI 262A - Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay

Loading Unlocking...
Login

Join to view Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?