The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery. This post covers the basics of Always On Availability Groups.
So if you have mirroring you need to fail over each database, and sometimes you are going to have multiple databases servicing the same application and some will fail over automatically and leave you with databases in different places, attempting to join with each other and failing, and connection strings bouncing back and forwards between servers and generally….a big mess.
Wouldn’t it be nice if we could guarantee that if one database failed over, it took all of the other associated or reliant databases across with it. And wouldn’t it be nice if we could use a virtual name like with a cluster so we could point an application at one place and then the windows cluster, active directory and SQL server would do the donkey work? Well….the good news is if you have SQL Server 2012 then you can.
Availability groups are like mirroring, except well…better. You can have multiple databases in a group and guarantee you fail them over together, you can reliable report off any of those mirrors and you can use a single connection string and point it at a virtual name and let it decide which server it needs to action your query on.
You can have up to 4 secondary replicas as well as your read\write primary. You have the same options around whether you want any individual replica to be Synchronous or Asynchronous. You can specify a shared backup location so when you fail over between replicas your recoverability chain remains intact. You can take backups from replicas in copy only mode.
Initially….initially it looks like all the best bits of mirroring and clustering have been beautifully merged together to give us the HA\DR solution we have been waiting for. But…not quite.
SQL 2012 is the first crack at Availability Groups, and they are pretty good, but they still have a few things to sort out and are going to get better.
Let’s start with backups. If you take backups on different replicas you need to be aware that each instance only keeps backup history from it’s own msdb database. So while the backup chain itself will still be fine, you can’t query the history of backups on any individual instance, you need to write some smarts which looks at all the msdb databases and collates that information. Now obviously you can be taking copy only backups off your secondary replicas, but they are going to be copy only.
SQL 2012 is really built to be running on Windows 2012 server, but it’s not required. And there are a whole heap of considerations if you are using Windows Server 2008 R2. There’s bugs that are known(and you can do a google to find out what they are) running availability groups on Windows 2008 R2 purely because when Windows Server 2008 R2 was being developed the whole concept of “Always On” availability was just an idea in the back of some developers head. Mostly these are around the way and the times that the server quorums are calculated, but availabilities groups push windows clustering technology as hard….or probably harder….than any other application does, so there’s things being found all the time, and patches coming out all the time.
There are a lot of single points of failure. Is the listener accepting writeable connections. Is it sending read connections to the correct place. Is load being balanced between replicas? Are the replicas up to date. These are all new considerations that you probably aren’t actively monitoring for with your current environment. There’s a whole bunch of new moving parts in this setup and if one of them fails you are not necessarily going to know what and how if you haven’t worked out what needs to be monitored. At best you may find that everything is going to your primary replica so you aren’t getting any scaled out performance, at worst – you may not even get that.
There’s no combined statistics and DMV’s. So if you are doing your performance tuning you are going to look at a DMV and note a whole bunch of indexes not being used. But the reason they aren’t being used might not be that they aren’t good….it might be that you are offloading all of the queries that would be using them to a different replica. Likewise, a query might be performing badly and you go to troubleshoot it and find it’s incredibly fast on the server you test it on. But maybe that’s because of the load on the server the user was being directed to.
Basically using Always on Availability groups is introducing a more complex environment, and while not all the things above are bad things, they are things you need to be aware of and consider. The effective monitoring of an AOAG environment has different rules than monitoring a standalone server and the DBA needs to keep up with those. But change is what we are all about in IT right?