In my previous post on backups I talked about division of responsibility and documenting your process. In this post I wanted to talk about backup retention, how to decide what that retention should be, and how to get the most millage out of your local disks.
I’m going to be talking about a simple backup process involving the DBA taking care of native SQL backups to disk, and the system administrator taking those flat file backups and moving them to long term storage. I defined it in my previous post like this:
I try and simplify backups to something approximating the below:
1. The DBA takes the native SQL Server backups to a local disk. These backups are retained for X days.
2.The system administrator is responsible for taking a copy of the native SQL backups from the local drive and copying them to long term storage, where they are retained for X2 days\month\years.
Once you have agreed the above process it should be a relatively simple task of working out what is X and what is X2 and then putting it in place. You’ll notice the most technical I have got to this point is referring to ‘native SQL backups’ and ‘long term storage’. I haven’t talked about full backups, differentials or log backups. I haven’t spoken about high availability or disaster recovery. The reason for this is right now this is not a technical discussion, it’s still a business one. It’s really easy for us as DBA’s to jump in and put in place a standardized recovery process, and in most cases that’s good enough. But actually what we need to do first is understand what the business needs for retention.
X and X2 are going to define your Recovery Point Objective – the point in time you could recovery to. In other words how much data could you potentially lose if you restore to a certain point in time. This is important, but it’s also only part of the picture.
A consideration that is regularly overlooked is the Recovery Time Objective – or how long can an application be offline. This is critical and something you simply don’t know unless you perform semi-regular restore tests. The reason I say that is that if you have an RTO of 1 hour, and your full backup takes two hours to restore – you will never be able to meet that target with backups alone. You simply can’t make restores run faster after a certain point. If you have an RTO that is less than the time it takes to recovery your backups, then you need to provide that availability another way – and that’s where the discussion of high availability needs to begin. We’ll dig more into that in a future post.
Assuming that you can restore within the RTO in all your known scenario’s then the only question is developing a backup strategy to meet your RPO. The starting point for a lot of DBA’s is a nightly full backup and 5,10,15 or 60 minute transaction log backups. This meets the most common restore scenario I have seen(most restores are within the past 24 hours in my experience). But it also plays havoc with the amount of data you are moving to long term storage.
I’ve recently been experimenting with much longer recovery chains. A monthly full backup, weekly differentials and log backups every 15 minutes has proven to be a good starting point in a lot of the bigger databases we support. These databases are often 95% or more data that never gets updated, and the updates are quite small and not that heavy. In one case we moved from a 254GB nightly backup to this model and the total combined size of a months differential and log backups was 23GB. That allowed us a massive saving of disk space with much faster restores to any point in time in the last month.
My point is it’s not 2000 any more, the GUI allows us to create restore scripts so we are not relying on manually crafting restore processes with 500 log backups – SSMS spits the code out for us. It’s as much work for you to hit F5 whether it’s restoring 10 log backups or 200.
The other approach to this is breaking archived data into partitions or using file groups so that you are only ever taking backups of the data that is actually changing. SQL Servers backup command is incredibly flexible so get to know your data and design a plan that gives you the best bang for buck rather than persisting with a generic plan on all your databases.