While most of the buzz around server virtualisation in general, and VMware Infrastructure in particular, have been about server consolidation and greening the data centre, disaster recovery may be the IT area where server virtualisation technology has the biggest impact.
Disaster recovery (DR) planning for mission-critical applications historically called for replicating the data for these applications and having servers standing by at the DR site ready to take over at a moment's notice.
Most organisations can save money by virtualising these standby servers. A single offsite server can act as the standby domain controller, SQL server, Exchange server and several more. Not only can you save the cost of all those physical servers, but also the rack space and power charges from your DR site.
Saving money and still providing the same level of protection that your old expensive physical server solution could is a good thing. But the real payoff is improving the recovery time of the applications that you wouldn't dedicate a standby server to. Most organisations soon realise they can move some applications up from the secondary tier to having standby servers, since the standby servers are essentially free.
Solving bare metal restore to different hardware
In the "old days," secondary applications were limited to restore from tape as their protection model, resulting in multiday recovery points and recovery
The "different hardware" problem is solved because virtual machines are indeed virtual machines -- they all run with the same set of drivers and can't tell if they've been moved from one host to another. In addition, virtual machine snapshots from VMware or even Microsoft's Virtual Server or Hyper-V are just files, so restoring a virtual machine is just a matter of mounting the files on a new host.
Rather than relying on tape transfers, you can schedule snapshots of your virtual machines and transfer them to the DR site over the replication link. And if your network guys can prioritise traffic properly, it won't interfere with real-time replication.
The real fun comes when a disaster is declared and you have to start switching over to the standby servers. Because the suspenders-and-belt crowd set up their DR infrastructure to be able to take over at full speed the minute the switch was thrown, their DR site has lots of compute horsepower. (Of course lots of horsepower means lots of money.)
The more frugal companies take advantage of VMotion, which moves virtual servers from one host to another dynamically while they're still running and, in addition, DR providers like SunGard's "shared server" offerings. With shared servers, you pay a few shekels to the DR provider every month for the right to claim servers out of their stock at the DR site when you declare an emergency. Once you declare that, you get the servers for your exclusive use and can install VMWare ESX on them.
Then, once the new hosts are up, you can use VMotion (or even better VMware DRS) to dynamically allocate virtual servers to hosts based on load and to mount your virtual servers on the new hosts. This will boost your application performance. . . probably before your users can get to their new workplaces to use the applications.
Note: The same trick -- albeit with longer recovery times -- can be used with vendor's server quick ship programs that will ship you new servers in the event of a disaster.
This was first published in February 2008