Category Archives: SQL Server

An Introduction to Always On Availability Groups

The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery.  This post covers the basics of Always On Availability Groups.

So if you have mirroring you need to fail over each database, and sometimes you are going to have multiple databases servicing the same application and some will fail over automatically and leave you with databases in different places, attempting to join with each other and failing, and connection strings bouncing back and forwards between servers and generally….a big mess.

Wouldn’t it be nice if we could guarantee that if one database failed over, it took all of the other associated or reliant databases across with it.  And wouldn’t it be nice if we could use a virtual name like with a cluster so we could point an application at one place and then the windows cluster, active directory and SQL server would do the donkey work?  Well….the good news is if you have SQL Server 2012 then you can.

Availability groups are like mirroring, except well…better.  You can have multiple databases in a group and guarantee you fail them over together, you can reliable report off any of those mirrors and you can use a single connection string and point it at a virtual name and let it decide which server it needs to action your query on.

You can have up to 4 secondary replicas as well as your read\write primary.  You have the same options around whether you want any individual replica to be Synchronous or Asynchronous.  You can specify a shared backup location so when you fail over between replicas your recoverability chain remains intact.  You can take backups from replicas in copy only mode.

Initially….initially it looks like all the best bits of mirroring and clustering have been beautifully merged together to give us the HA\DR solution we have been waiting for.   But…not quite.

SQL 2012 is the first crack at Availability Groups, and they are pretty good, but they still have a few things to sort out and are going to get better.

Let’s start with backups.  If you take backups on different replicas you need to be aware that each instance only keeps backup history from it’s own msdb database.  So while the backup chain itself will still be fine, you can’t query the history of backups on any individual instance, you need to write some smarts which looks at all the msdb databases and collates that information.  Now obviously you can be taking copy only backups off your secondary replicas, but they are going to be copy only.

SQL 2012 is really built to be running on Windows 2012 server, but it’s not required.  And there are a whole heap of considerations if you are using Windows Server 2008 R2.  There’s bugs that are known(and you can do a google to find out what they are) running availability groups on Windows 2008 R2 purely because when Windows Server 2008 R2 was being developed the whole concept of “Always On” availability was just an idea in the back of some developers head.  Mostly these are around the way and the times that the server quorums are calculated, but availabilities groups push windows clustering technology as hard….or probably harder….than any other application does, so there’s things being found all the time, and patches coming out all the time.

There are a lot of single points of failure.  Is the listener accepting writeable connections.  Is it sending read connections to the correct place.  Is load being balanced between replicas?  Are the replicas up to date.  These are all new considerations that you probably aren’t actively monitoring for with your current environment.  There’s a whole bunch of new moving parts in this setup and if one of them fails you are not necessarily going to know what and how if you haven’t worked out what needs to be monitored.  At best you may find that everything is going to your primary replica so you aren’t getting any scaled out performance, at worst – you may not even get that.

There’s no combined statistics and DMV’s.  So if you are doing your performance tuning you are going to look at a DMV and note a whole bunch of indexes not being used.  But the reason they aren’t being used might not be that they aren’t good….it might be that you are offloading all of the queries that would be using them to a different replica.  Likewise, a query might be performing badly and you go to troubleshoot it and find it’s incredibly fast on the server you test it on.  But maybe that’s because of the load on the server the user was being directed to.

Basically using Always on Availability groups is introducing a more complex environment, and while not all the things above are bad things, they are things you need to be aware of and consider.  The effective monitoring of an AOAG environment has different rules than monitoring a standalone server and the DBA needs to keep up with those.  But change is what we are all about in IT right?

 

 

 

 

Mirror, mirror….

The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery.  This post covers the basics of SQL Server Mirroring.

Like log shipping, mirroring is something that you are going to set up at the database level.   That means that if you change anything outside the database on the SQL instance, you are going to need to make the change on both sides of the equation.  So if you add a login on the principal then you need to manually make the same change on the mirror.  Incidentally you’ll notice there’s a fresh set of jargon with each new technology.  Logshipping has it’s primaries and secondaries.  Clustering has it’s active and passive nodes.  Mirroring has it’s principals and mirrors and replication has it’s publishers and subscribers.  Availability groups continue the pattern with primary and secondary replicas.

But anyway – we’re talking about mirroring.  So what mirroring is doing is redo-ing every insert, update or delete that happens on the principal server on another server.  So you are increasing your availability by effectively having two copies of the data which are pretty close to exactly the same.  I say pretty close because there’s a couple of ways you can set this up.

Firstly you can set it up as a synchronous mirror.  With a synchronous mirror every transaction that happens on the principal is written into the log on the principal, then moved over to the mirror, written into the log and committed on the mirror, then it sends an acknowledgement back to the principal to say the transaction is committed and only then does it actually commit the transaction and notify the application the transaction is successful.  It’s also known as High-Safety because your mirror is always going to hold all the data that the principal has….sometimes more infact.

Now if you are mirroring in high safety mode you can also add another server into the mix here – a witness server.  When you have a witness and your mirroring is configured in high safety mode you have a situation where automatic failover is available.  So, if something goes wrong on the principal, it’s automatically going to fail over to the mirror.  The principal now becomes the mirror and the mirror the principal.

That’s cool from the database point of view, and it’s nice to see that happen….when it’s meant to…and being able to think what a clever chap you are for keeping the data available, but it’s not like clustering, you aren’t pointing to a virtual server name that just knows where the actual data is sitting.  You need to make sure the applications connecting to the database have failover connection strings, so they know that if the principal server isn’t there, then they can try to get the data from a different server.  You also need to be pretty sure that all your logins and any agent jobs, maintenance plans and backups can all handle the state of the database.  So does your backup check to see if the database is online before it kicks off, or does it just fail?

Now, all that is available in the standard edition of SQL Server, and gives you a nice high availability option.  But if you spend your pennies on the Enterprise version….you can also have Asynchronous mirroring.  What does that do?  Well basically the same thing, it’s just that instead of committing on the mirror before it will commit on the principal, SQL just goes ahead and writes your transaction locally then fires off a copy of it to the mirror.  It doesn’t care if you have committed to a mirror, it just wants to get on with writing things locally and returning responses to the application.  You don’t have a witness involved and there is no automatic failover.  So….it seems like this is worse right?  You are paying Enterprise licensing for something that doesn’t give you an equal level of availability?  That seems like a pretty poor deal, but what you do buy is….performance.  Because there is no latency on the write across the network your application is going to perform much much faster.  And that’s why the other name for this is High-Performance mode.

There’s also some nice bonuses to mirroring.  In 2008 and higher….and we are all on 2008 or higher right?…..you have Automatic page repair.  So if there’s an attempt to read a page on the principal and it finds that page is bad it will go over to the mirror and try and get a clean copy of the page and fix the principal then send that info back to the application.

Also introduced in 2008 the data stream was compressed which lowers the amount of data being sent both ways, and as a result lowers the latency of the transaction.

And you can report off the mirrored databases.  CAN being the important word here.  You do this using snapshots on the mirror side and snapshots of course require enterprise edition.  It’s got some serious drawbacks around the point in time nature of the reporting, the database needing to be synchronised before you can do the snapshot.  I’m not going to talk about it too much more because I’ve never seen an implementation of it I’d be happy to call a reporting solution.

On the minus side, there’s implications to doing this in the SQL log and you need to do some planning around log sizes and physical structure.  And there’s network issues, particularly around High Safety.  You are sending a lot of information across the network and you want it to be pretty close to realtime.  If you are using High-Safety then you aren’t going to get a response to your application until it’s committed on the mirror as well as the principal, so if there’s lag, be it from limited hardware or just purely from volume of data being transferred, there’s a relatively high chance of a time out.  This can be mitigated by having good hardware, good network speeds and applications having good code.  If your lucky you may even be able to influence one of those things.

And an important point – Mirroring is deprecated.  That’s different from discontinued – it’s still there, it’s just on the chopping block.  In a future version it will be discontinued and we will be using Availability groups instead.

An Introduction to SQL Clustering

The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery.  This post talks about SQL Server Failover Clustering.

This post is going to cover the basic consepts of Windows Clustering.  So, while logshipping is definitely a disaster recovery strategy, clustering is all about availability.  With clustering you have two (or more) windows servers and they form a windows failover cluster.  On top of that you are going to install a SQL Server failover cluster.  When you do this it needs to be specifically installed that way.  So with the other technologies we are talking about you always have the option of saying later – let’s add mirroring, or let’s add some logshipping to that.  You don’t have the ability to do that with clustering – you need to plan it from day one.  If you have an existing SQL server and you want to cluster it then you are basically looking at installing the cluster and doing a server migration.

Clusters are much easier to setup now than in previous versions of SQL, but there’s still quite a bit involved.  You start with two(or more) windows servers and you form a Windows Failover Cluster with them.  Then you assign some shared disk which is accessible to all the nodes of the windows cluster.  After that you install a SQL clustered instance and that becomes the owner of the disk resource you added.  When you do this you have to configure a few things at the active directory level – a cluster name, some IP addresses and a handful of permissions.  You can then add as many other sql nodes as your SQL edition allows.  Now from SQL 2005 you can have a 2 node active passive cluster with Standard Edition.  In 2000, and in SQL 7 you needed Enterprise.  There’s still a good advantage to using enterprise cluster, because you can have many more nodes in your cluster.

With a cluster there is a single set of data which is stored on the shared disk.  If one node fails, or if you manually fail over to another node then the disk is detached from the current node, attached to the new node and the SQL Service is automatically started.  This has a whole bunch of advantages.  First of all, you have a virtual instance name.  You point the application at that and then it doesn’t matter which actual windows server owns the disks and the sql service – active directory and the windows failover cluster sort that out and direct your application to the right place.  It also means that if you need to schedule downtime on an instance – for patching as an example – you just move the service to a new node.

This is increasingly important as people look to consolidate their databases on bigger and bigger SQL Servers.  When we are talking 64 bit architecture, multi-core processors and high end SAN back ends the limitations around hardware are far less than what they have been previously, so you can have a single SQL Instance supporting multiple high usage applications.  So it’s more and more common to see these big general usage SQL Servers.  But you know what they say about great power…….the server responsible for this increased load is increasingly important.  It becomes more difficult to schedule downtime because you need to co-ordinate that downtime with multiple different application owners or business units.

I have a little professional disagreement with a colleague over this.  His view is that if you have to patch 20 servers supporting one application each and need 1 hour of outage time to do that patch then you need to arrange 20 hours of outage.  If you have them all on the same server there’s still 20 applications to arrange an outage with so you still have to arrange a 1 hour outage with 20 different applications.  That’s true – but it’s also not the point.  The point is that those 20 outages need to be AT THE SAME TIME.  And that’s where it gets tricky, because in order to load balance your business use, and your data loads and your backups and….all the other scheduled stuff that happens to your server, you will have spread the load evening across the hours in the day.  I hope.  So there’s always likely to be something going on.

At any rate, adding a cluster gives you 2 big benefits here.  Firstly there’s the obvious failover capacity if a disaster happens on a single server, but additionally your outage window now need only be the failover times, and believe me it’s easier to sell a 1 minute ‘micro-outage’ than a 1 or 2 hour outage.

How to resolve “No Global Profile Is Configured” issue in SQL Server

One of the reasons I’m not a huge fan of Maintenance Plans is that they do everything in a very set way.  My biggest gripe around this remains around lack of smarts in optimisation of indexes but I recently came across another annoyance and that is that the “Notify an operator” task relies on a public or global operator.  While you can quite happily send an email with TSQL by specifying a private mail account, if there is no public account this task will fail and you will get a message like below:

Msg 14636, Level 16, State 1, Procedure sp_send_dbmail, Line 94

No global profile is configured. Specify a profile name in the @profile_name parameter

The error gives you most of what you need to know to figure this one out, although not clearly enough for my liking.  It tells us the problem is in the procedure that sends dbmail, so we know it’s a database mail issue.  And it tells us that there is no global profile defined so we should specify a value for @profile_name.  Well….yes I would do that….if I was using TSQL.  But I’m not, I’m using a maintenance plan.  So I have two options.

  1. Instead of a notify operator task, I can use a TSQL task and write a piece of custom tsql to send myself a message, in which case I can specify my private profile.  Or…
  2. Define a new default profile or change an existing private profile to be the default.

Now, in my mind a tsql task is a fine option because you can do a few additional things which can’t be done with a notify operator task(like attach a file such as the error log for example) , but it can be a pain cutting and pasting the TSQL across all your various notifications.  So if you want to resolve the error all you need to do is:

1.  Open database mail configuration by logging into SQL Server Management studio and expanding Management and right clicking on DatabaseMail, then choosing configure:

Public Mail Profile 1

2.  Choose Manage Profile Security.

Public Mail Profile 2

3.  Configure your profile to be the default profile:

Public Mail Profile 3

There’s no need to restart anything, your maintenance plan should now send email fine.

How to Gain Emergency Sysadmin rights to a SQL Server

As a SQL Consultant there have been a number of times where I have had to gain ‘emergency access’ to a SQL Server installation.  This can be because a previous DBA has not documented login details, a server has always been around and no-one is sure exactly what it does, or because all staff with administrator rights have left the company.  On several occasions I have had to resolve an issue where a junior DBA or staff member has removed sysadmin rights from the BUILTINAdministrators group without realizing that this was the very group they were gaining their sysadmin rights from (Yes, SQL Server let’s you drop your own sysadmin rights, and once you’ve done it you can’t undo it under your own credentials!).

Regardless of the reason if you need emergency access to a server, there is an easy way and a hard way.  There’s also a really easy way, but it does rely on the stars being aligned in a certain fashion, or more precisely the server being configured in a particular fashion.

The Really Easy Way

   In SQL Server 2005 the BUILTINAdministrators group was automatically assigned full system administrator rights within SQL Server.  This was considered something of a security hole and in most cases the rights were removed by the DBA’s looking after the servers.  However it’s possible the SQL instance you are looking at is a default install of 2005, has been upgrade from a default install, or has had the logins transferred from one.  So the first thing to try is simply to add your account in as a local administrator on the windows server and attempt to log in to SQL Server.  You might get lucky.

The Easy Way

If the really easy way fails, you are left with the easy way and the hard way.  Let’s start with the easy way, because it’s…well…easier.  The downside of the easy way is that it requires a server restart.  If you can’t restart the server then I’m afraid you’ll have to do things the hard way.

The easy way is documented in this Microsoft article and simply involves restarting the SQL Server in single user mode and then logging in.  When logging in to single user mode any member of the local administrator group on the windows server will also have system administrator rights within SQL Server.  It’s basically a more secure method of the BUILTINAdministrator method described above.

The Hard Way

Finally we come to the hard way.  The hard way makes use of the fact that the NT Authority System login is automatically given administrator rights in a SQL instance.  Again this can sometimes be removed by a protective DBA, but there’s less reason to than with BUILTINAdministrators as this method is far less well known and also requires a deliberate attempt to gain sysadmin rights.  You can’t accidently follow this process whereas there are a number of reasons for adding a login to the local administrators group.

Let’s begin:

  1.  Open up notepad and enter the following text.

 sqlcmd.exe -S MYSQLSERVER -q “Exec sp_addsrvrolemember ‘BUILTINAdministrators’, ‘sysadmin’;”

pause

‘MYSQLSERVER’ needs to be replaced with the name of your SQL instance.  Now save this as type ‘all files’ with an extension of ‘.cmd’.  You’ve just created a batch file that if run will add the group ‘BUILTINAdministrators’ to the system administrator role.  Of course, if you run that file it will fail, because you don’t have the rights to do that just yet.

  1. So, to run it as NTAUTHORITY SYSTEM we need to make use of the ‘AT’ command.  Schedule a task to run your file by doing the following:
      1. Open a command prompt(cmd at the run menu).  You may need to do this as an administrator.
      2. Type the following:  at 16:30 /interactive “C:tempsqladmin_restorer.cmd” – Obviously substitute the path and filename for your own file.

2.  This job will pop up on your screen at 4:30pm letting you know the batch file has run.  The BUILTINAdministrator should now be a member of the sysadmin role, and anyone set up as a local administrator will have full sql admin access.

All three of these methods do rely on you having local administrator rights on your SQL Server machine.  The first 2 methods are built in functionality from Microsoft.  It’s only the final method which is a true workaround.

 

An Introduction to Log Shipping

The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery.  This post covers the basics of SQL Server log shipping.

Log shipping is the natural extension of relying on backups alone, because it’s a process which automates the restoration of your backups.  It’s basically a restore of your full backup on the day you establish log backups followed by a series of log backup restores.  So on your primary server you are going to take log backups at a set interval(the default is every 15 minutes) and your secondary has a job which will periodically check for any new log backups and copy them from the primary to the secondary server.  A third job will then look at the settings and restore any appropriate transaction logs.  So we are achieving a few things here.  Not only have we set up a warm standby of the production server which is up to date within 15-30 minutes, we are also actively confirming our log backups are usable, because if they are not log shipping will break.

One of the things that makes log shipping such an attractive DR option is that it is incredibly easy, and in most cases FREE!  Why do I say most cases?  Well, you basically have a passive SQL license available to you for each licensed instance, provided that it is a purely passive server for disaster recovery purposes.  I’ll throw out my standard disclaimer that I’m not Microsoft, and don’t sell their licenses for them, so check with your licensing guys, because there are a few gotchas.

For all it’s simplicity, log shipping gives us a lot of flexibility.  Firstly, you can logship to multiple servers.  Secondly you have the option to restore the database in standby mode which allows you to report off it.  Thirdly you can put in a load delay, so the secondary won’t load logs until, say, 8 hours have passed(You can configure this entirely to your own requirements).  And fourthly – you can combine all the above things together to suit both your disaster recovery and business requirements.  So you can have one production database taking it’s backups each 15 minutes.  Then you can have a pure DR secondary which restores a copy of the database and is the target for a failover.  Then add another secondary on your BI server and set that up so it’s only running restores at 8 o’clock in the morning, but doing it in read only so it can act as a data source for your BI reports.  So BI is now accessing all the data in your prod database, up to the current day – and it’s not messing up production.

And because you have a load delay, when your rogue developer hits F5 in the wrong window and updates the account balance of everyone to $100(because he forgot he commented the where clause and he thought he was on the dev box anyway) – well, you can bring that standby online and grab that table data and take it back to production.

You can even logship to the same server as the production database, because logshipping allows you to rename the destination database to whatever you want.  Why would you do that?  Well it allows you to avoid locking issues by reporting off a second database.

There’s a few gotchas though.  Firstly log shipping relies on your recovery chain being intact.  If someone takes a single log backup that isn’t available to the secondary then logshipping is going to break.  Remember that log shipping is just the process of going through a regular restore.  If you lose a log backup out of your regular recovery chain then you can’t restore further.  Of course, personally I’d prefer find out some rogue 3rd party application is taking random log backups by log shipping falling over than in a live disaster situation.  Setting up log shipping has actually revealed exactly this issue on more than one client site I’ve dealt with.

Secondly – if you want to use the secondary server to report off then it is considered a production server for licensing purposes.  You can also have only one passive standby license.  So while you can theoretically have as many logshipped secondaries as you like, after the first one you are paying for them all.

Are Backups Valid DR?

The below is an extract from my presentation at SQL Saturday 252 on high availability and disaster recovery.  In this section we talk about the basic process of backing up databases and whether this is enough to meet your requirements.

Let’s leave aside all the cool options you get natively and talk about something more mundane.  Backups.  Regardless of what else you have in place, you still want to be doing your backups regularly.  If everything goes completely pear shaped, if all your best laid plans fall apart, so long as you have a native SQL backup then you still have your data.  Without all the other options we are going to be discussing you can still recover from a serious outage if you have a secured backup.  There might be some trial and error if you haven’t documented users and settings, it might take some time and whenever you resort to restoring from backup you are making a call on acceptable data loss – but you haven’t lost everything.  You can recover.

To make it easier on yourself there’s other things that are critical to the application using the database you should be documenting and regularly scripting as well.  What version and patch level is the server? Are the logins being scripted?  What about agent jobs?  Linked servers?  SQL config settings?

I’ve come across businesses whose only recovery options was from their scheduled backups and they were not interested in any additional options.  Sometimes it’s a little surprising considering the nature of their businesses.  Some are financial and any dataloss would be measured in actual dollars, actual transactions being unrecorded and sold items not being sent to customers who would probably not be too happy about that.  One in particular springs to mind where the company had a defined and documented RTO and RPO.  It was backed by SLA’s around those RTO’s and RPO’s and they could take a day of downtime and lose up to 24 hours of data and still meet those targets.  In a disaster they had a documented build process and they would simply install(from scratch) a new server, install and patch SQL, restore the databases, reapply their standard permissions and repoint their application server.

This all raises the question of whether just taking your backups qualifies as Higher Availability or Disaster Recovery.  To me…..so long as you are able to meet the RTO and RPO then it absolutely is.  It’s also going to be largely dependent on the data sizes you are talking about.  When you are talking megabytes or low gigabytes you can probably get away with this sort of strategy.  But when you are talking terrabytes just the time to retrieve and copy your backup to a new server will probably blow your RTO, let alone restoring it and hoping like hell it works right on the first attempt.  It comes back to the process of establishing those RTO and RPO values.  If everyone agrees on values that are very high, and they will actually admit they agreed to them if there really is a disaster, then using a standard backup strategy as your only form of DR\HA is fine.  It’s something that should be done even if you do have yourself covered with the other HA\DR options because it gives you a backstop should everything else go wrong.

Now having said that…….and this applies to everything I’ll be saying about the other technologies as well…..test it.  When you start putting together your backup strategy you need to think backwards.  Think of the process you are going to go through to restore everything.  Not just what do you need, but what order are you going to bring it back.  And document all the steps, because it may ultimately not be you doing the process, and it should be a process that more than one person can pick up and run with.

One of the other speakers at SQL Saturday was Argenis Fernandez.  I think the first blog post of his I ever read he was talking about a disaster situation where the DBA was confident he had everything in space.  Regular backups – taken off to tape each night – tape stored securely in a safe offsite.  Unfortunately he never tested the restore process.  So when a disaster rolled around and he put the tape back in to grab the backup from – the tape was completely blank.  The magnetic lock on the safe erased it.  You get a lot of peace of mind from knowing that the process that you are going to use to recover your system has been tested and works.

(EDIT – For the life of me I can’t find the post from Argenis I refer to above.  So perhaps I dreamt it).

(EDIT 2 – Here I am 3 years later re-writing my presentation from 2013 and I found the post I incorrectly attributed to Argenis.  It was actually written by Anthony Bartolo and can be found here.)