Berkeley COMPSCI 262A - Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay

Unformatted text preview:

Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and ReplayI. Live MigrationII. ReVirt1Virtual Machines 2Advanced Topics in Computer Systems, CS262AProf. Eric Brewer10/21/09Live Migration of Virtual MachinesReVirt: Virtual-Machine Logging and ReplayGoal of this lecture: illustrate some of value of leveraging the VMM interface for new properties; we covertwo here (migration and exact replay), but there are many others as well including debugging and reliabil-ity.I. Live MigrationMigration is useful:o Load balancing for long-lived jobs (why not short lived?)o Ease of management: controlled maintenance windowso Fault tolerance: move job away from flaky (but not yet broken hardware)o Energy efficiency: rearrange loads to reduce A/C needso Data center is the right targetTwo basic strategies:o Local names: move the state physically to the new machine• Local memory, CPU registers, local disk (if used -- typically not in data centers)• Not really possible for some physical devices, e.g. tape driveo Global names: can just use the same name in the new location• Network attached storage provides global names for persistent state• Network address translation or layer 2 names allows movement of IP addressesHistorically, migration focused on processes:o Typically move the process and leave some support for it back on the original machine• e.g. old host handles local disk access, forwards network traffic• these are “residual dependencies” -- old host must remain up and in use• Hard to move exactly the right data for a process -- which bits of the OS must move? • E.g. hard to move TCP state of an active connection for a processo See Zap paper for best of process-based migrationVMM Migration:o Move the whole OS as a unit -- don’t need to understand the OS or its stateo Can move apps for which you have no source code (and are not trusted by the owner)o Can avoid residual dependencies in data center thanks to global nameso Non-live VMM migration is also useful:Virtual Machines 22• migrate your work environment home and back: put the suspended VMM on a USB key or send it over the network• Collective project, “Internet suspend and resume”Goals:o Minimize downtime (maximize availability)o Keep the total migration time manageableo Limit the impact of migration on both the migratee and the local networkLive migration approach:o Allocate resources at the destination (to ensure it can receive the domain)o Iteratively copy memory pages to the destination host• Service continues to run at this time on the source host• Any page that gets written will have to be moved again• Iterate until a) only small amount remains, or b) not making much forward progress• Can increase bandwidth used for later iterations to reduce the time during which pages are dirtiedo Stop and copy the remaining (dirty) state• Service is down during this interval• At end of the copy, the source and destination domains are identical and either one could be restarted• Once copy is acknowledged, the migration is committed in the transactional senseo Update IP address to MAC address translation using “gratuitous ARP” packet• Service packets starting coming to the new host• May lose some packets, but this could have happened anyway and TCP will recovero Restart service on the new hosto Delete domain from the source host (no residual dependencies)Types of live migration:o Managed migration: move the OS without its participationo Managed migration with some paravirtualization• Stun rogue processes that dirty memory too quickly• Move unused pages out of the domain so they don’t need to be copiedo Self migration: OS participates in the migration (paravirtualization)• harder to get a consistent OS snapshot since the OS is running!Excellent results on all three goals:o downtimes are very short (60ms for Quake 3 !)o impact on service and network are limited and reasonableo total migration time is minuteso Once migration is complete, source domain is completely free3Virtual Machines 2II. ReVirtIdea: use VM interface to replay non-deterministic attacks exactlyThe overall problem:o Many ways to take over a machine; even the kernel has many bugso The number of ways and sophistication is increasing (CERT advisories)o The PC is too complex to be correcto So, can’t really prevent problems, but we can eliminate holes as we find themo Goal: make it much easier to find exploited hole after an attackBasic approach: log the events that took place (audit trail) so that we can reconstruct the attacko Problem 1: integrity: attacker can change/remove the logso Problem 2: the logs are incomplete and may miss key eventso Problem 3: the events may not be enough to recreate non-deterministic bugs; can’tdecrypt past traffic either since keys are one thing that is non-deterministicReVirt approach:o 1) Log in the VMM to ensure integrity• Compromised OS does not have access to VMM logs, even if logger runs in another domain (rather than in the VMM proper)• Small code base reduces bugs in VMMo 2) record non-deterministic events using logs and checkpoints and replay everythingexactly to pinpoint the exploito Narrow VMM interface makes it possible to log all events well -- no need to understanddrivers, hardware, etc.Alternative solutions:oOS on OS:o User mode linux: run linux OS as an “app” inside the real OSo This is a kind of paravirtualizationo Must map all low-level events to Unix signalso Use host OS devices instead of raw deviceso ReVirt would work better on top of Xen (which is newer), which is both faster and has amuch smaller “trusted computing base”General replay approach:o start from a safe checkpoint, then roll forward using the log, watching for the exploito May not find it on the first pass, but should learn something. So restart and roll forwardagain based on new information; repeat. In theory, could implement a “step backward”debugging option, which would restore the checkpoint and roll forward almost all theVirtual Machines 24way to the present (except for the last “step”)o Insight: only need to log non-deterministic events and inputs, the rest by definition aredeterministic and will replay naturally• Time: must record the exact time of each event and replay it at the same (virtual) time. For example, replay an interrupt during the exact same instruction as before.• Input (keyboard, mouse, network packet, etc.): must log the exact input as well as


View Full Document

Berkeley COMPSCI 262A - Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay

Download Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Live Migration of Virtual Machines ReVirt: Virtual-Machine Logging and Replay 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?