I was building out a new cloud site the other day and was thinking about how system migrations and disaster recovery tech are really just two sides of the same coin. I've always thought that there are significant similarities between the two at their cores, and have brought that up with people when planning out one or the other.
Building a disaster recovery environment involves taking an inventory of your current production environment, building a clone at a different site, replicating data, updating software, providing users with access, and executing dry run cutovers to make sure that it's all set up correctly.
Migrating your systems involves taking an inventory of your current production environment, building a clone at a different site, replicating data, updating software, providing users with access, and executing dry run cutovers to make sure that it's all set up correctly.
There are, of course, differences when you dig into the details. But they're still closely related.
System Migration Specifics
With system migrations, if you're fortunate enough to not be under a time-crunch, it's usually a good long-term idea to take a close look at the systems you have running and see if anything can be resized, consolidated, migrated to a "better" cloud resource, or just eliminated altogether. You'll usually encounter a lot of resistance from people because they hate change, so you need to be able to distinguish between what's real and what people may say, and do some convincing.
The Lift-N-Shift is the simplest kind of migration, where you just replicate everything from your current production site to your new site. It will almost always cost you more in the long run though, especially with the high pricing that cloud providers charge for the types of "legacy" resources that you'd be lifting and shifting.
You can also potentially plan for a gradual or phased migration, instead of an all-or-nothing scenario, to lower the risk of failure or reduce the level of inconvenience to the business.
Disaster Recovery Specifics
You don't necessarily need to create an exact clone of your production environment at your disaster recovery site -- you only need to ensure that the important business can continue until your primary site is back online. This means you can usually identify a few things to leave out and save some money, even if it does cause some inconveniences.
Disaster recovery sites do need to be tested and validated on a regular basis. It's easy to forget about them or let them fall into disarray, but you're only doing yourself and everyone else a disservice. Schedule regular tests and fix problems immediately. Otherwise you'll be caught with your pants down in an emergency and the business will grind to a halt.
Your data should be replicated to your disaster recovery site in near-realtime to ensure minimal data loss. One of the worst things that can happen to a business is losing all records of a transaction -- you'll be at risk of very expensive lawsuits and will most likely lose one or more customers.
Why We Like the Cloud
Having access to cloud resources makes things much more interesting.
We can use relatively minuscule-sized resources while our migration/failover site is sitting idle to keep costs down. Once we're ready to activate the new site, we can up-size those resources (usually with a simple point-and-click) to accommodate actual production-level traffic. This is usually very difficult with on-premises environments when you need to justify the purchase of additional hardware that will be sitting idle most of the time.
It's important that we periodically re-evaluate our cloud resource sizes, or else we may be overpaying for something we don't need or forcing underperforming systems on our users and creating additional headaches for ourselves.
We can also stand up our failover sites in far-flung geographic locations that we wouldn't normally have access to, as easily as though they were in our local datacenter. For those of us in earthquake-prone California, this is a huge boon.
Connecting the Dots
Being able to take a holistic view of tech and recognize the similarities between different approaches -- especially in areas like system migration and disaster recovery -- is a valuable skill in today's complex, interconnected world.