A quick one today caused by an interesting error I received. A client dropped me a message that they were running a stored procedure fine as themselves but when they dropped it into a SQL Agent Job it started to fail. The message was pretty clear:
“Executed as user: Domain\SQLAgent. Login failed for user ‘ Domain\SQLAgent ‘. [SQLSTATE 28000] (Error 18456). The step failed.”
I like problems that are easy to solve, just check the Domain\SQLAgent account on the SQL Instance and find it’s….enabled…and has all the permissions required to run this stored procedure? Hmmm.
Then I noted that despite getting a login failure message there was no corresponding login failure message in the SQL Error Logs. Time to dig into the code and I was able to identify that the stored procedure references a linked server. That linked server was (Hooray) set up to log in to the remote server in the context of the executing account. So the login wasn’t failing locally, it was failing on the remote server. Sure enough I could see login failure messages on the remote server that matched the times the job had tried to run.
Resolving the remote login failure allowed the job to run successfully. Hopefully this post will point you to your potential error and save some time banging your head against a wall!
This months TSQL Tuesday invitation is hosted by Gethyn Ellis who asks about the best career advice you ever got. For me it was more of a career realization, although it’s undoubtably been said to me multiple times I’m just a bit slow registering stuff. Some would probably say I still haven’t ‘got it’ when it comes to this particular idea.
I’m an IT guy, and probably somewhat stereotypical in that I got into this line of work because the power of computers in general, and data in particular have always been so interesting to me. Finally, here in 2023, with the advent of AI coming into the mainstream the general population is starting to understand some of the ‘coolness’ I saw in IT back when I started my career. The idea that a computer could be programmed to do the sets of tasks for me and increase the amount of stuff I could know and achieve so drastically has always been a wonder. IT is a fascinating and challenging field to be in.
But also fascinating, and definitely challenging, are people. When I look back on my career none of the milestones have been technical, they are all off the back of forming and evolving relationships with people. Whether that be walking into an IT company off the street and telling them I was available to work, the first beer with people who have kept an eye on my career, helping revive and grow a user group, speaking at a SQL event for the first time or running a SQL event for the first time – the key points have always been more about people than technology.
We have to do it when we first convince someone to give us a job in IT. Any given project is likely to be as much a series of political and scheduling challenges more so than purely technical ones. As we move up the pecking order we need to manage expectations and relationships with both those in a position to move us to the next rung, and those who may resent us moving off the rung we’re on. The technical stuff is important, for sure, but it’s not the thing that will define your career – your ability to form and nurture relationships with people is.
When it comes to consulting and management your technical skills may allow you to get a job done, but it is your people skills, your ability to express the technical requirements to non-technical people and how you present a solution that will win you that job in the first place. It’s how you turn up and deal with those around you that will dictate how a customer feels about coming back to you for assistance next time. And when you decide that paid employment is not getting you the financial results you want and decide to move into contract consulting or start your own business – you had better believe that how you present yourself to people and handle relationships is at least as important as how you perform the technical aspects of the projects themselves.
Now I’m not encouraging you to neglect the technical side of the job – you’ve still got to be able to do the thing – but understand that the success of the thing is not just defined by whether your output window is red or black – it’s defined by understanding why you are doing the job and who it effects, and engaging with those people to ensure they are getting the outcome they are after.
Microsofts recommendation on TempDB on Azure VM’s is to place it on the D-drive that every Azure server is created with. In most cases this is going to be the fastest storage on the virtual machine, and given that tempDB files do not need to be persisted this can give you access to a high performance TempDB configuration without the need to provision additional storage.
The only issue is that the Ephemeral drive is temporary. Everything is wiped from it whenever a machine is deallocated(Note – contents can survive a reboot, just not a shutdown-deallocation event). Because I keep forgetting how to set it up I thought I might as well store it here so next time I forget I don’t have to log into the last VM I set up and create it.
Step 1. Create a folder called scripts and create a file in it called SQL-Startup.ps1. Edit that file and enter the following code.
Start-Transcript -Path "C:\Scripts\Transcript$(((get-date).ToUniversalTime()).ToString("yyyyMMdd_hhmmss")).txt" -Append $SQLService="SQL Server (MSSQLSERVER)” $SQLAgentService="SQL Server Agent (MSSQLSERVER)" $tempfolder="D:\SQLTEMPDB" if (!(test-path -path $tempfolder)) { New-Item -ItemType directory -Path $tempfolder } Start-Service $SQLService Start-Service $SQLAgentService
$runtime= Get-Date echo “{$runtime}: I just ran” >> C:\scripts\outputfile.txt
Step 2. Create a Windows Scheduler Task which will run that script each time the Virtual Machine Starts.
Click on Windows button and type ‘Scheduler’ – select and open task scheduler.
Click “Create Task”
On General Tab:
Set name to “SQL Startup”
Change user to run as a service account with permissions to create a folder and Start SQL services.
Select “Run whether the user is logged in or not.
On Triggers Tab:
Select “New”
Select “At Startup” from the dropbox menu.
Ensure “Enabled” is checked and click Okay.
On the Actions Tab:
Select “New”
Select Action as “Start a program”
For the program type(or browse to) powershell.exe
For Add Arguments type Powershell.exe -ExecutionPolicy Unrestricted -File “C:\Scripts\SQL-Startup.ps1”
Click Okay.
On the Conditions Tab:
No Changes
On the Settings Tab:
No Changes
Click Okay – You will be prompted for the credentials for the service account you selected to run the job under.
Step 3. Set SQL to auto start(delayed)
Click Windows Start button and type “Services.msc”
Right click on the SQL Service and change Startup Type to “Automatic (Delayed Start)”
Click Okay.
Step 4. Ensure Service Account has correct rights
Ensure your service account has sufficient permissions to start SQL and create a folder.
Ensure your service account has “Log on as a batch job” rights.
Now you can stop SQL, delete the tempdb folder and restart the virtual machine. On bootup the script should recreate the folder and add a record of the restart to C:\scripts\outputfile.txt
Another audit picks up “Scan for Startup Procs” and flags it as a problem. The recommendation is to turn it off. I disagree, and here’s why…
Like my previous post about xp_cmdshell, I have a stake here. We use this stored procedure in our monitoring so whenever SQL starts we can check the error logs and calculate downtime etc. It’s useful, and this is exactly the scenario that I believe the configuration is made for.
Unlike xp_cmdshell though I can see a likely configuration where this can be a problem. So let’s talk about what it does.
If you enable this option SQL will scan for stored procedures marked to run on startup next time it restarts(As a side note you need to restart the server to enable it). You can mark a stored procedure to run on startup by using sp_procoption. And when I say you, I mean YOU, the sysadmin.
Yes the only user who has the ability to mark a stored procedure to run on startup is a sysadmin, so like xp_cmdshell this is a vulnerability that can only be exploited by a sysadmin. And if they are already a sysadmin they can turn it on anyway. Also…you know…they are a sysadmin. If they are malicious you have bigger problems than them running a stored procedure on startup. They can run ANY stored procedure, right now.
Of course, it’s a bit more complex than this. In this case the stored procedure is only required to mark a stored procedure to run on startup. It is possible for a user without sysadmin privileges who has the correct rights on the master database to alter the stored procedure. So yes, there is a vector here to elevate db owner on master to sysadmin, and then do whatever else you want.
This is where I have issues with some of these generic recommendations in audits. They are painted as black and white and as we all know the answer in SQL is always ‘It depends’. But when you just label something as ‘bad’ and people just accept that rather than ask why and how, perfectly safe and useful pieces of functionality are turned off.
A better approach would be to treat this finding as the start of a conversation, sit down with the server owners, understand if the functionality is needed and if so what mitigations can be put in place to ensure it is not a risk. There’s not many good reasons for giving a user elevated rights on the master database.
I’ll stop now because this post has a lot of potential to turn into a rant on the difference between compliance and security, but if you found this blog looking for a recommendation about using “Scan for Startup Procs” my advise would be this. If there’s another way of doing what you want that is equally efficient, use that. “Scan for Startup Procs” is a setting you will find yourself having to defend every time there is an audit of your server and you may not want to fight that fight. If you do want to fight it, making the setting safe is very simple. Don’t let anyone have rights to edit the stored procedure you create and you are fine. Bare in mind however that with all your good intentions, one day you may not be there, the guidance may not be followed and someone will be able to exploit this.
It’s happened again, xp_cmdshell being enabled is flagged as a security vulnerability. The recommendation is to turn it off. I disagree, and here’s why…
First of all let’s be open about the fact that I have a stake in this game. Our monitoring toolset uses xp_cmdshell for a few things. It shells out from SQL and checks the block sizes on the disks, whether power saving is enabled on the server and so forth. It’s extremely useful functionality which would be a pain to replace. But I do find I have to keep answering this question, and often asked by people who should know better.
Let’s also be clear on a few important points:
XP_CMDSHELL configuration is off by default.
It can be turned on and off by a SQL sysadmin
By default, if turned on it can ONLY be used by a SQL sysadmin.
If you want a non-sysadmin to use it you can enable that right. You then need to set up a proxy windows account for it to execute under.
If no proxy is in place a non-sysadmin can’t use it, and a sysadmin using it will generate a windows process in the context of the SQL Service account.
If you put all that together you will see that only a sysadmin can use xp_cmdshell unless you explicitly configure access for another user, and even more importantly you can’t stop a sysadmin using it.
Therefore whether the setting is enabled or not is irrelevant. The only person who can use it has the ability to turn it on anyway, and you can control the level of access to the operating system by correctly assigning least privilege to the SQL Service account or the proxy account.
Now…the question might rise about whether the functionality itself is a security vulnerability and I’m not going to argue that it’s not a possible attack vector. It provides a powerful integration between the SQL Server and the Operating system. But so does executing an agent job in CmdExec mode and I’ve never seen that flagged in a security audit.
Neither xp_cmdshell or cmdexec agent job functionality is something you can turn off in such a fashion a sysadmin can not turn it back on(although even typing that about 5 things to give a try occur to me) so I think the very best this ‘audit recommendation’ will ever achieve is protection against non-malicious ‘accidents’ by tired DBA’s. Disabling it will be no more than a minor inconvenience for a malicious user who has the rights to use it once enabled.
Well this was an odd one, and luckily on my test environment but definitely one I’ll look out for in the future. I was doing some maintenance removing old unused test databases when suddenly I got hit with this error:
Logon failed for login ‘ROBLT\RobSQL’ due to trigger execution. Changed database context to ‘master’. Changed language setting to us_english. (Microsoft SQL Server, Error: 17892)
This was not a fun error. Everything I tried to do in the object explorer generated this error. Basically I couldn’t create any new connections to the database, I did have an existing query window open to master database which was still working which turned out to be my saving grace, as you’ll see later.
The error is pretty clear about what is going on. There is a trigger that is failing which is preventing any new logins from being established. If this was a production server it would mean NO-ONE could connect at all, which would be raise the stakes somewhat from my play server. The thing is, as a test server I had no idea what trigger it was talking about. I vaguely remember playing with login triggers 6 months ago but have no idea which trigger I needed to stop. So from my open query window to master, I stopped them all with the below TSQL:
DISABLE TRIGGER ALL ON ALL SERVER
This immediately allowed me to access the object explorer and go and look for the specific trigger that had caused the problem:
This trigger had of course been trying to write into one of the databases I had cleaned up. Lesson learnt – check for server scoped triggers before you hit that delete button.
There’s a number of other scenario’s that could get you into this pickle – such as creating a trigger to whitelist logins and forgetting to add yourself, or creating a trigger that references an object that doesn’t exist. If you want to be more surgical you can check for the trigger name with:
select * from sys.server_triggers
All this of course assumes you can get into the server in the first place. I was lucky enough to have an open live connection. If you don’t a failing trigger will prevent EVERYONE from connecting to the server, including sysadmins. You don’t have to hand in your resignation just yet though. Microsoft has foreseen that people will do stupid things with login triggers so there’s a few ways to address it outlined in the documentation:
Login using the dedicated Administrator Connection: Details of how to do that are here if you are not sure. The DAC is not affected by login triggers. If you have enabled the setting “Remote Admin Connections”(And you should) you can do this remotely, otherwise you need to be on the same server that SQL is installed on.
Start SQL in minimal configuration mode. You can do this by adding the -f flag when you start the server.
Either of these options will allow you to log into the server and disable or fix the trigger.
Wow. I feel like I’ve just been tossed a hand grenade and I don’t have a pin. I don’t consider myself a developer by any stretch of the imagination, but there’s a bunch of code out there I have written that is running businesses. Some big businesses. And the idea that there is a grade that any code should pass before it goes into production always makes me a bit nervous, because I’m pretty sure the bar would need to be quite low for everything I’ve ever put out there to pass it.
So let’s step back a moment and talk about this place I live. It’s called reality. And in reality business problems often come to you like this:
Big Manager: The Scurbwazler is about to exceed it’s threshold, we need to offload the thingiwozly workload via the bropfizzle or the whole thing it going to fall over.
Developer: I can get a prototype done that will stop it blowing up, but once the emergency is over we need to go back and make sure it’s performant and secure.
Big Manager: Sure, sure, of course.
(Insert code producing sound effects here).
Developer: Okay it’s done. I’ll look over the security settings after lunc…
Other Big Manager: Hey, what’s going on with the BropFizzle? We’ve releasing our new product today and we need it’s full capacity.
Developer: I guess we could route that via the tertiary system…
And so on it goes.
Now I love the devops movement which makes so many of these problems go away, but I still see changes happen in exactly this way in so many environments, and I totally get the reasons.
So I’m going to try and answer this question from a slightly different angle than what the code should look like and instead talk about how it comes into being.
Production Grade Code should be planned.
Production Grade Code should be tested before it becomes production code.
Production Grade Code should be understood\Documented\In source control.
Let’s deal with them one by one.
Production Grade Code should be planned.
This seems pretty obvious. Before a line of code should be written someone, and probably a group of someones, should define what the code is for and what it’s meant to do. I keep getting told by my developers that the further along the dev process an error gets before being identified the more expensive it is to fix. It stands to reason then that the inception of a requirement is the place you want to focus the most attention to ensure that what you are asking your developers to build is actually the thing you want. Furthermore, time invested in explaining what you’ve just written as a requirement and allowing Q and A time is time that you prevents multiple periods of rework later in the process.
Production Grade Code should be tested before it becomes production code.
You all have a pipeline to run your code through right? Well, if you don’t you don’t need to feel like you are alone. A lot of businesses are still deploying code on a completely adhoc basis and their ‘test’ system bares very little similarity to their production system.
For code to be production grade, it needs to have passed the functionality tests defined in the planning stage, and ideally it will be stress tested against a similar load to production.
In SQL land the classic example is code that is non-SARGable. Functionally it may perform fine against your table with 1000 rows in it, but when it hits production and tries to run against a table with a few billion rows it will bring your server to it’s knees.
Production Grade Code should be understood\Documented\In source control
I’ve been in too many places where only one person understands how systems interact, or how data moves, or what relies on the system you are talking about upgrading\migrating. Inevitably that person will be unavailable. Sometimes they stick around till they retire, sometimes they leave for greener pastures, but they will inevitably leave.
I hate documentation, but I hate lack-of-documentation even more.
I think there’s a period in peoples career lifecycle where they just do heaps of stuff and never document anything and feel productive, but increasingly I don’t want me to be the only guy who understands(and therefore can fix) anything I write. I don’t want to be the bottle neck. I want any competent DBA or developer to be able to look at anything I’ve built and go…ohhh…I see what he’s done here. And then they can fix it instead of me.
And that really is the key to me for what “Production Grade” should be. It’s got to serve a clear purpose, do it’s job and be easily understood and maintainable in the future. To get there we’ve got to get over the idea that code ‘belongs’ to the person who wrote it and think of our code as part of the larger eco-system.