I’ve written a bit about various backup products for SQL, but I wanted to extend on that a little bit to focus on some of the specifics. In this post we will be talking about the division in responsibility for backups and restores between the DBA and the system administrators.
These two people are often the same, but in this post I will treat them as if they are two different people. At a minimum it is two different functions of their job, and I’m a believer that if you have to wear many hats you yourself should act as though you have two jobs. That way when your boss comes along and offers you another employee to help you out, you can just hand over one of your hats like it’s a pre-packaged job.
Despite the improvement of specialist products for taking SQL backups, I still have a strong preference that native SQL Server backups are used. If pushed I will say – the best backup solution is probably the one which the person doing the restore is most familiar with, which is a slightly different proposition, but I’m going to assume the person responsible for the backups should be the DBA, because they are the one most likely to be in the firing line if the database can’t be restored.
I try and simplify backups to something approximating the below:
- The DBA takes native SQL Server backups to a local disk. These backups are retained for X days.
- The system administrator is responsible for taking a copy of the native SQL backups from the local drive and copying them to long term storage, where they are retained for X2 days\months\years.
It then follows that if a restore is needed to a database server within X, then the DBA can do that directly from disk. If a restore point is between X and X2, the system administrator can restore the backup from long term storage back to the local drive, where the DBA can proceed with the install.
Then all that is required is defining X and X2, and scheduling in occasional test restores to random dates to prove the whole system works.
Now most people reading this will (I hope) be waiting for something interesting in all this. It’s an incredibly simple setup, very clear who is responsible for what and if a backup can ever not be restored – it should be very clear as to who has dropped the ball.
But here’s the thing, in my role I get to see an insight into multiple different SQL environments, and it seems even this very simple process can get horribly tangled up resulting in a whole lot of confusion, finger pointing, and ultimately data loss. The situation is complicated enough when one person is responsible for both jobs. WHen it’s two people it becomes a little murkier, and when it’s multiple people across multiple business units then it quickly sinks to a level where it’s really easy to find you don’t have a backup right at the most critical point – when you need a restore. Add in 1 or more external vendors and things can quickly spiral out of control.
Do you know your restore processes? If I was to ask for a restore of a given database to a given date could you tell me?
- Who I should actually be asking?
- What is the longest ago I can restore to?
- What points in time can I restore to? Real time? Nightly backups? Monthly archives?
- Who needs to authorize a production restore?
- Who needs to perform a production restore?
If all of those questions are not immediately clear, then I’m going to have to raise the D word – Documentation. Someone needs to take responsibility for documenting the backup and restore processes. They need to interview all the people we think are involved in the process and get a really clear understanding of what they are doing, what they think other people are doing and then where there are any mismatches.
As a job – it sucks, but it can be quite enlightening. And it’s better doing a job that sucks that averts a disaster, than not doing the job that sucks, and finding out it sucks not to have a job.