Author Archives: Rob

About Rob

http://gumption.co.nz/about-me/

TSQL Tuesday – SQL 2022 Planning for the Plan to Fail

If you fail to plan, then you plan to fail. But sometimes your plan sucks and you fail anyway. In these situations, a plan B is nice to have up your sleeve.

Glenn Berry has invited us to talk about our favourite feature in SQL 2022 for TSQL Tuesday this month and for me it’s all about the extra information in Query Plans and the prospect of getting some of my clients busier workloads experiencing the gains from Built-In Query Intelligence.

The query plan is the critical secret SQL sauce in delivering data back to a user as quickly as efficiently as possible. Well, nearly as quickly and efficiently. Before SQL 2022 various forms of black magic were involved in forcing compiled plans to perform consistently. The query optimizer didn’t plan to fail, but if the parameters passed weren’t what it expected it often would, sometimes horribly. In SQL 2022 we have the addition of a Plan B.

Built in query intelligence is a real gamechanger for consistent performance of query plans. And it all starts with a very little change that I haven’t actually heard mentioned too much:

Query store turned on by default

The Query store is incredibly powerful, and simultaneously surprisingly underused in the wild. This is mostly because by default it’s not on, but that changes in SQL 2022. The simple fact that, as a consultant I can come to a new server and already have Query Store available means that the information I can feed back to a client before making any change in the environment is greatly enhanced.

And then the planning got better.

The classic parameter sniffing scenario starts with either a cached plan for an atypical portion of a data set, or of a dataset that represents very disparate usage scenario’s. A classic example would be a user data set with a mix of users from large and small towns and cities. The typical usage may reference users in Capital-City but the first execution is run against a user from Little-Shite-On-The-Mire and we get a seek cached instead of a scan. All subsequent queries run with that plan and server performance is abysmal.

The ability to have a plan cached for both scenario’s is huge and is an example of my favourite type of performance enhancement in SQL Server. You don’t need to turn stuff on or off, you don’t need to change code, you don’t need to modify user behaviour and the overhead is small\insignificant. Just upgrade to SQL 2022, set your compatibility mode correctly and stuff will run faster and more consistently. I love when discussing SQL upgrades with clients being able to discuss enhancements in the way queries are processed because it really does give a compelling reason for them to stay current when they don’t have huge resource to push into configuring and maintaining new features, but just want to know SQL will keep doing what it does, only better.

Add in enhancements to reading from Availability group replicas, hints in query stores, Cardinality Estimator Feedback in plans, and feedback and learnings being persisted between failovers and restarts and I’m really looking forward to getting some of the heavier workloads running against SQL 2022 and using the new toys we’ve been given.

Will the Basic Authentication change Break Database Mail?

The prospect of Microsofts upcoming deprecation of basic authentication has lead to a bunch of questions around what impact it will have for people using database mail. I thought I’d take a moment to record my understanding in the hope it saves some people out there some stress.

Database Mail should not be effected by this change. It uses SMTP authentication which Microsoft has explicitly stated will not be turned off.

Microsofts exact wording is “Please note this change does not affect SMTP AUTH – we will continue supporting Basic Authentication for the time being.  There is a huge number of devices and appliances that use SMTP for sending mail, and so we’re not including SMTP in this change – though we are working on ways to further secure SMTP AUTH and we’ll share more on that in due course.” which is pretty clear. The confusion I think comes from people who work with databases having no idea how to differentiate what is SMTP auth(Using basic authentication) and what is Basic Authentication.

Another thing to be aware of is that some of this work has already happened and not all of it will happen all at once on 1st October. By default basic authentication is not available in new tenancies and SMTP authentication is disabled. For existing tenancies basic authentication has been disabled if it is not detected as in use, and finally the change is slated to happen “from October 1st”. What that means is that October 1st is not necessarily a drop dead date and things might continue to operate after that depending how far down the change list you are. This is an important distinction for people planning to check that everything is working on October 1st-2nd(A weekend) to judge whether they have covered all the bases. Unfortunately that will not work as a strategy, you should have everything sorted by October 1st, but changes may not hit your tenancy until later in the month.

It Depends

Many, many years ago, I started a blog. It was mostly so I could write some bits of code and findings down because I have a memory like a gold fish and I figured having a record of some of the issues I encountered, and the ways they were overcome might be useful. I thought maybe it was worth sharing this with the world rather than keeping it to myself too, because you know….the SQL community is awesome about telling me the answers to my questions, and if jotting down a few things helps someone else out there solve the problem they are baffled by then…mission accomplished. Because I was a junior DBA when I started my blog, and because I had a habit of asking questions looking for a black and white answer and never received it, I also called my blog “It Depends”.

Many many years later Debra Melkin happened to choose “It Depends” as the title for her TSQL Tuesday blog invitation, and while I was a couple of weeks late to the party, I thought it was a good chance to revisit what is all to often a standard response to any question in the DBA world.

So way back in 2011 Paul Randal wrote a post that stated something along the lines of “The answer to every question in SQL consulting is ‘It Depends’, unless the question is ‘should I enable autoshrink?’ in which case the answer is ‘no‘.” What is so powerful about that post (and to be honest the SQLSkills approach to consulting in general) is that they don’t use “it depends” as a conversation ender – it’s a conversation starter. It turns a question searching for a black and white answer into a list of more targetted questions to find out what the right answer is for you, and for your particular base.

Despite actually naming my blog “It Depends” I try hard not to use the phrase itself, but I do find myself answering questions that could be answered with “It Depends” with 4-5 questions which allow me to give a targeted answer that fits the clients specific environment and acknowledges the business drivers rather than focusing on the specific technical answers to questions.

And with this in mind I (finally) get to my response to Debra’s invitation. She asks specifically about times someone asks a question about SQL which started off with an “It depends” redefinition of the question to find out what the thing they are actually asking about or trying to achieve is, and leads into a rant that may or may not be related. So here’s my rant.

“Your technical skills are not the point.”

This is both my biggest realization and biggest disappointment in my technical career. I think all people who reach the highest level in technical roles have to cross this bridge at some point. In general we cross the bridge with regret and more than a little reluctance, but the sooner we can cross it and get on with our career growth the better. You see for the first part of your career there is a massive wall of technical knowledge that we find ourselves battering against, constantly trying to get technically more and more proficient. But at some point, either independently or because someone higher up the food chain kindly takes us aside and explains it to us, we come to a realization that being technically brilliant is far less important than working our what it is that the business is trying to achieve.

Let’s use a question about availability groups as an example. “Would this work with availability groups?” is the type of question I see asked from time to time. The technical answer can be “yes” or “no” and too often that’s the answer I see given by junior technical people. And they are right, but the problem is they’ve answered the question presented to them instead of working out what question they should have been asked. Why are you asking me about availability groups? Do you want to report off this? Do you want it to be highly available? What is it that you actually want to achieve?

I see business people frustrated with IT slowing them down instead of empowering them. I see technical people frustrated with Salespeople selling things that can’t exist because there are technical limitations and I see developers cobbling together a solution that meets an immediate requirement but then hamstrings the performance of key business systems for years to come. And the reason for all these problems is usually the same. A business question is asked as a technical question, and the technical answer is misinterpreted as a business answer. In other words, people are speaking the language of their role, and no-one is translating.

That is why when you ask a junior if you can do something they will often say “Sure” and they’ll make a change and suddenly you can do the thing you wanted, but when you ask a more senior person the same thing you get met with a seeming wall of questions back.

IT systems are complex, and throwing one switch can often have a string of unexpected consequences. That’s why the more senior a person the more questions you’ll likely get asked before they take any action taken. But business systems can be even more complicated, and technical people need to be aware that ultimately they will not be judged on how perfectly a system is set up, but on how perfectly it meets business requirements and expectations.

I wish it wasn’t so. I wish when a business person asks me something they understood the technologies they were talking about and asked a correctly structured technical question. But the reality is that being the ‘translator’ is often the most critical role that can filled in any project. Getting the tech people to understand the business need and the business people to understand the technical possibilities is a skill that is both rare, and in the end very lucrative. Even if we wish it wasn’t so.

The Dangers of Incorrectly Set Default Database

I struck an interesting problem this week when migrating a database. Apparently the database was my users default, which immediately kicked me out of the server and prevented me logging back in. This is the story of how it happened and the steps I needed to take in order to resolve it.

It wasn’t a dark and stormy night, but considering what was about to go down it probably should have been. Typical weather really.

I was migrating the final database off and old SQL 2012 server. It was the final database on this server which we began migrating databases off over 18 months ago, so I was pleased to see it go. The database itself was tiny, so my part of the migration was just to perform a backup restore migration to the new server. The full backup ran fine in a couple of seconds, but when I ran the tail-log backup it completed and I immediately got disconnected.

I thought that was odd, but proceeded with the restore on the new server which went fine. We were using a CNAME redirect to catch any connection strings we didn’t know about, so the old server was renamed as part of the process. I asked the server engineer to hold off his rename and CNAME steps while I investigated the disconnect.

I quickly found I couldn’t log in with either my own account or the service account our application connects to SQL. Both received the same error.

Cannot open user default database. Login Failed. Login failed for user ‘xx’.

You shall not pass.

So, we have provided support for this client for a while, but the server predates our companies support contract. However what it looked like to me was the group containing our logins had it’s default database set to the database I’d just taken offline. Annoying, but not insurmountable. I asked one of the client testers I knew had security permissions on the server to log in and flip our groups default database back to master. But she got the same error.

Things were suddenly looking dark. I quickly completed the migration steps on the target server and we did the server rename and CNAME switch to the testers could test the successful migration, but what should have been me sitting quietly for an hour while they completed testing quickly became a mad flurry of trying everything to get access into the old server should we require a rollback. Now I wasn’t too concerned, we did have a snapshot taken prior to the change that was our sledgehammer rollback, but I didn’t know what was going on and wanted to understand.

There was a simple fix I should have done at this point, which is just to change the database to master in my connection string. When I did that it failed. I don’t know why, and haven’t been able to recreate that failure, which leads me to think that when I changed the database it didn’t ‘stick’ and I only tried it once, assumed it failed and moved on to find another solution. A lot of time should have been saved here by doing that correctly. For the record that is on the options->connection properties screen in SSMS. All you need to do is type in ‘master’ as per below:

THE SOLUTION – The “Connect to database” will usually show ‘<default>’ – and ‘<Browse Server>’ will fail because you can’t login to your default database to get a database list. Manually typing in ‘master’ will save the day.

That should allow you to log in and update the default database of users who can’t log in. It also shows why trying the same thing twice is sometimes NOT the definition of insanity, because you may have done it wrong the first time, particularly if you are restoring databases in the other window.

At any rate – assuming the direct connect with the correct default database was still failing we tried and failed the following steps:

  1. Access the server via “sa” – only to find the “sa” password was not recorded in the client password vault.
  2. Access the server via various other Windows accounts – All failed with the same default database error.
  3. Access the server via other SQL accounts – These succeeded but none had the credentials to add much to the troubleshooting or remediation.
  4. Force admin access via single user restart – This also failed with the default database issue.
  5. Force sysadmin access via DBATools. This is a really cool feature of DBATools, but in this case failed with an error that didn’t provide much context on the problem(I suspect it was the same default database login failure causing the issue).

So what we knew at this point was that all windows logins failed, while all SQL logins succeeded. I was beginning to suspect I knew the nature of what had happened. Some account that windows logins were a member of had been added with the default database set to the one I had just dropped. But how to prove it? And how to fix it?

I took the following steps:

  1. Create a local windows user.
  2. Add that user to the servers local administrator group
  3. Restart SQL in single user mode with access limited to SQLCMD connections
  4. Run my SQLCMD script to create a SQL login in the context of the new local windows account
  5. Let out a huge sign of relief when that worked.
  6. Restart the server in multi user
  7. Log in with my shiny new SQL sysadmin account and start the troubleshooting.

Once I had sysadmin access back troubleshooting took only a few seconds. Sure enough the DOMAIN\USERS account had been added to the server. It had no specific rights on anything but had the default database set to the database we’d just taken offline. I flicked that across to “master” and everyone was immediately able to log back in again.

All of this took around 1/2 hour to work through, and ultimately didn’t have any impact on the migration except for a couple of minutes delay before I handed the database over for the rename. I was able to do my troubleshooting in parallel with the application testing, so ‘no harm, no foul’ right? Well, yes. But it could have been so much worse if this wasn’t the last database we shifted off the server. This was an incredibly dangerous setting. If we had moved this database first instead of last the simple act of taking it offline would have taken every other application on the server offline.

Hopefully, if you find yourself in the same situation the information in this post will save you some time both diagnosing and resolving the problem. I’ve definitely added checks for Domain\Users accounts into my pre-migration checks, and while I don’t ever expect to see this scenario again, if I do it’s nice to know the fix is 10 seconds and 2 clicks away on the login screen.

  1. Be aware of the Domain Users group when making changes in a production environment.
  2. When troubleshooting – keep notes of all the bits you tried – because you may make a solution way more complicated than it actually needs to be

NT SERVICE\SQLSERVERAGENT won’t accept password – SOLVED

I was patching an old SQL 2016 instance this afternoon and post patch the SQL Agent didn’t start. I took a config screenshot and flipped it over to local system and it started fine. Shouldn’t have been a problem right, because the NT SERVICE\SQLSERVERAGENT account is a managed account so you just leave the password blank? Of course it was not going to be so easy.

The SQL configuration manager would not accept a blank password – even though the account doesn’t need to authenticate(It’s a service!) the SQL configuration manager requires a password to start the sql agent and finalise the change. When I try to enter that it fails. Repeatedly.

Invalid Parameter [0x80041008] when applying NT SERVICE\SQLSERVERAGENT system account

This certainly sounds like a bug, but luckily it’s a SQL Server bug. Starting up services.msc I was able to change the account in the normal way – entering NT SERVICE\SQLSERVERAGENT as the user and leaving the password blank.

Solved with services.msc

The easiest way to reclaim sysadmin access in SQL Server

Nearly a decade ago I posted a blog about how to get into SQL Server if you find yourself locked out. It remains one of the most viewed posts on the site because it’s a common issue. I strike it at least a couple of times a month where we have a client who wants us to assess their SQL environment, but they actually don’t know who can provide sysadmin access.

Back in 2013 I listed a bunch of ways to get into SQL if you didn’t have a SQL Sysadmin credential to use or access to someone who did, but I wanted to revisit it today because that’s not my go to anymore. A lot has changed in the last 9 years, and one of the most impactful has been the awesome community tool DBATools. You can check it out at DBATools.io. Honestly this set of powershell scripts streamlines so many of the processes I used to do manually and SQL access is one of these.

Using the Reset-DbaAdmin command you can do all the manual steps outlined in the post above in a line of powershell. You should be warned you are still not some super untraceable hacker – the command restarts SQL in the same way the manual method does, but it does it all in one line of code, handles a bunch of the more common issues with that method(such as someone else stealing your connection) and it does it all SUPER FAST.

So if you have just started a job and there’s a critical server that no-one knows the credentials for, or the password vault has a set of invalid credentials, or whatever reason is preventing you logging in as sysadmin – check out DBATools for a quick and easy way of getting back into your server – and as an added bonus you will now have access to one of the best free additions to your DBA toolbelt.

Chrome opens then shuts – no error message

My daughters both got new laptops for Christmas, one has had no issues whatsoever, one had an issue once we installed Chrome. Whenever you tried to open chrome it would flash open and then….nothing. No error dialogue, nothing logged in event log, just…nothing. This was incredibly frustrating to troubleshoot and it turned out to be Microsoft up to the old ‘directly sabotage the competition and call it security’ trick.

Generally I’m pretty chill, but this one has gotten me worked up. I’m quite angry that I lost the better part of a day of my holidays trying to figure out what was going on here. We even took the laptop back to the store we purchased it from and thier tech had no idea what was going on. He re-installed the operating system, reinstalled chrome, spouted some nonsense about 32-bit vs 64-bit that made no sense but I didn’t argue with because Chrome was magically working again. Chrome then worked happily for a week and then my daughter bought me her laptop and once again it just wouldn’t start.

I tried a bunch of stuff from various forums and blogs out there. Here’s a list of stuff I can remember:

  • Reinstalled Chome – starting with the big hammer. This made no difference.
  • Tried starting Chrome in incognito – same behaviour.
  • Entered exceptions in firewall. Same behaviour.
  • Disabled Firewall. Same behaviour.
  • Disabled anti-virus. Same behaviour.
  • Started the laptop in safe mode with network support – Chome Started!
  • Created a new non-MS account on the laptop and logged in as that – Chrome Started.
  • Installed Firefox which also failed, but started successfully under the two conditions above(Safe mode and different account).

The firefox step changed my thinking somewhat, I was now reasonably certain I had an account issue rather than a software, firewall or anti-virus issue. I was about to look into wiping the default profile when I happened to glance in my email and noticed an installation notification from Microsoft Family Safety from a few days ago. When I set up the girls previous laptops they were a lot younger, and I remember they were added to Microsoft Family. This never caused an issue on the old laptops, but I thought it was worth taking a look into. I logged into Microsoft Family and lo and behld….the smoking gun.

Microsoft had placed a filter on pretty much all their major browser competitors. Chrome, Firefox and opera are all listed as ‘blocked apps’. Presumably as non-Microsoft browsers Microsoft is unable to content monitor on them, so simple disabled the ability to use them. I removed this and instantly everything worked on my daughters laptop again. You can log into Microsoft Family here and see if this is the problem effecting you.

Now, I have as many questions as answers from this. Why did it work after the operating system reinstall? Why did we never have any issues on the old laptop. If it’s listed as blocked shouldn’t it always block? Why wouldn’t Microsoft have the decency to drop a little message into the event log? Something like “Google Chrome was prevented from starting due to your Microsoft Family Safety Policy” would have saved a huge amount of very frustrating troubleshooting time. But then, from what I can see the frustration with this problem online is all directed at Google, rather than Microsoft. And maybe that’s the point.