The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery. In this section we talk about the basic process of backing up databases and whether this is enough to meet your requirements.
Let’s leave aside all the cool options you get natively and talk about something more mundane. Backups. Regardless of what else you have in place, you still want to be doing your backups regularly. If everything goes completely pear shaped, if all your best laid plans fall apart, so long as you have a native SQL backup then you still have your data. Without all the other options we are going to be discussing you can still recover from a serious outage if you have a secured backup. There might be some trial and error if you haven’t documented users and settings, it might take some time and whenever you resort to restoring from backup you are making a call on acceptable data loss – but you haven’t lost everything. You can recover.
To make it easier on yourself there’s other things that are critical to the application using the database you should be documenting and regularly scripting as well. What version and patch level is the server? Are the logins being scripted? What about agent jobs? Linked servers? SQL config settings?
I’ve come across businesses whose only recovery options was from their scheduled backups and they were not interested in any additional options. Sometimes it’s a little surprising considering the nature of their businesses. Some are financial and any dataloss would be measured in actual dollars, actual transactions being unrecorded and sold items not being sent to customers who would probably not be too happy about that. One in particular springs to mind where the company had a defined and documented RTO and RPO. It was backed by SLA’s around those RTO’s and RPO’s and they could take a day of downtime and lose up to 24 hours of data and still meet those targets. In a disaster they had a documented build process and they would simply install(from scratch) a new server, install and patch SQL, restore the databases, reapply their standard permissions and repoint their application server.
This all raises the question of whether just taking your backups qualifies as Higher Availability or Disaster Recovery. To me…..so long as you are able to meet the RTO and RPO then it absolutely is. It’s also going to be largely dependent on the data sizes you are talking about. When you are talking megabytes or low gigabytes you can probably get away with this sort of strategy. But when you are talking terrabytes just the time to retrieve and copy your backup to a new server will probably blow your RTO, let alone restoring it and hoping like hell it works right on the first attempt. It comes back to the process of establishing those RTO and RPO values. If everyone agrees on values that are very high, and they will actually admit they agreed to them if there really is a disaster, then using a standard backup strategy as your only form of DR\HA is fine. It’s something that should be done even if you do have yourself covered with the other HA\DR options because it gives you a backstop should everything else go wrong.
Now having said that…….and this applies to everything I’ll be saying about the other technologies as well…..test it. When you start putting together your backup strategy you need to think backwards. Think of the process you are going to go through to restore everything. Not just what do you need, but what order are you going to bring it back. And document all the steps, because it may ultimately not be you doing the process, and it should be a process that more than one person can pick up and run with.
One of the other speakers at SQL Saturday was Argenis Fernandez. I think the first blog post of his I ever read he was talking about a disaster situation where the DBA was confident he had everything in space. Regular backups – taken off to tape each night – tape stored securely in a safe offsite. Unfortunately he never tested the restore process. So when a disaster rolled around and he put the tape back in to grab the backup from – the tape was completely blank. The magnetic lock on the safe erased it. You get a lot of peace of mind from knowing that the process that you are going to use to recover your system has been tested and works.
(EDIT – For the life of me I can’t find the post from Argenis I refer to above. So perhaps I dreamt it).
(EDIT 2 – Here I am 3 years later re-writing my presentation from 2013 and I found the post I incorrectly attributed to Argenis. It was actually written by Anthony Bartolo and can be found here.)