SQL Agent, Maintenance Plans, Auto-Heals and How to Waste an Afternoon

At my company, we’re lucky to have a custom tool that monitors EVERYTHING happening in our SQL Server environment. But every now and then, I like to explore what’s possible using only SQL Server’s built-in features-just in case you don’t have a comprehensive monitoring suite at your disposal.

Recently, someone made an off the cuff comment about using TSQL Tasks in Maintenance Plans to handle more complex logic that SQL Agent handled out of the box. I was briefly excited by the prospect of building improved logic flows directly into SQL Server Maintenance Plans. This months Andy Levy hosts TSQL Tuesday and asks us about how we handle SQL Agent Jobs and this seemed like a great opportunity for me to share a story about how I wasted an afternoon testing a few components I have a passing knowledge with, attempting to implement auto-healing and conditional logic flow, only to snap out of it a few hours later when I realized that I was trying to solve a problem that someone else cracked nearly 2 decades ago.

The visual workflow designer and conditional branching on success or failure in Maintenance Plans seemed like a great way to automate not just routine tasks, but also smart responses to issues as they arise. I love any opportunity to implement automatic healing into anything that is likely to alert me. Getting a message saying “A thing nearly broke last night – the problem is fixed” is so much better than getting a page to fix something that retrying or a known remediation action needs to be manually performed. So I looked into a little side by side comparison of SQL Agent vs Mainteance Plans to address a scenario where a DBA performs the same boring checks and actions when a backup fails, and how both technologies allowed for implementing retries and logic flow to handle this automatically rather than have a human intervene.

The flow for this process was good at first, but as I dug deeper, I realized that while Maintenance Plans can be powerful, you can achieve the same (and often better) results with a little creative thinking in SQL Server Agent Jobs. In fact, Agent Jobs offer even more flexibility, allowing you to mix and match different step types-T-SQL, PowerShell, SSIS packages, and more-within a single job. Let me walk you through what I learned, with some real-world examples.

Maintenance Plans: Visual Logic, Limited Flexibility

Let’s start with the Maintenance Plan approach. Here’s a screenshot of a workflow I put together to handle database backups with built-in remediation and notifications but the same ideas can apply to ETL or other processes.

  • The plan tries to back up all user databases.
  • If the backup succeeds, it updates a central monitoring table.
  • If the backup fails, it runs some auto-remediation steps (like clearing disk space or restarting a service), then tries the backup again.
  • If the second attempt succeeds it updates the central repository, and if it fails, it notifies the on-call operator.

This is a solid approach, and the visual editor makes it easy to follow the logic. But there are some limitations:

  • You’re mostly limited to backup, maintenance, and T-SQL tasks.
  • Complex branching or looping is awkward.
  • Integrating with external scripts or tools isn’t straightforward.

Agent Jobs: The Real Powerhouse

What’s more, my inital enthusiasm around the workflows was misplaced. SQL Agent actually has success and failure branching built in also at a per step level, it’s just in most cases the failure action is to say the job failed. But it doesn’t need to be.

With Agent Jobs, you can:

  • Chain as many steps as you want, each with its own success/failure logic.
  • Use different step types (T-SQL, PowerShell, CmdExec, SSIS, etc.).
  • Easily implement loops, retries, and advanced remediation.
  • Integrate with external systems, scripts, or APIs.

Here’s a screenshot of a way of achieving a better outcome using a non-standard workflow with SQL Agent. The job again handles backups with layered remediation and targeted notifications:

  1. Reset Autoheal Remediation Counter – Prepares the environment for a new backup attempt.
  2. Check Retry Loop – Determines if a retry is needed based on previous attempts. If the number of retries is exceeded the job generates a failure which redirects to the final step(Which is configured to alert the oncall operator)
  3. Backup User Databases – The main backup step.
  4. Update Central Reporting Repository – On successful backup the job proceeds to this step which logs the result and ends the job.
  5. Initiate Backup Drive Maintenance – If the backup fails, attempts to resolve likely issues.
  6. Perform Additional AutoResolution Tasks – Further remediation if needed. Following the completion of this step the job is redirected back to step two which increments the loop counter and retries the backup.
  7. Job Failed Notification – If all else fails, notifies the operator and ends the job.

Each step specifies exactly what should happen on success or failure, making the workflow both robust and transparent.

Why Agent Jobs Win

After experimenting with both approaches, I decided I hadn’t infact finally found a use for Maintenance Plans. Really the only thing they have going for them in this scenario is a nice graphic interface to visualise what is going on. That’s not to undersell the importance of a clearly identifiable workflow. The job step approach is very hard to follow if your workflows become more complicated. But here’s why I think Agent Jobs are the smarter choice for most advanced workflows:

  1. Multiple Step Types – You’re not limited to T-SQL or maintenance tasks. Want to run a PowerShell script, call an external API, or execute a batch file as part of your remediation? No problem.
  2. Granular Control – You can direct the job flow on a step-by-step basis, allowing for complex logic, conditional retries, and targeted notifications.
  3. Easy Integration – Agent Jobs make it simple to integrate with external monitoring systems, send custom alerts, or update centralized logging repositories.
  4. Maintainability – It’s easier to document, update, and troubleshoot jobs when each step is clearly defined and can be of any type you need.

Conclusion – So which should you use?

So obviously I’m leaving out some additioanl options in my side by side test. SQL Agent Jobs have auto retry built in at the per step level if you want, and the loop doesn’t have to be at the task or job step level – you can build that right into the TSQL if you like. I played around with both of these approaches for a while, trying to see if I could refine them to resolve the issues that I came up with. I really wanted that graphical format, but I also wanted the flexibility of the SQL Agent jobs. Which should I use?

When you delve into these things it’s really easy to get stuck with your blinkers on. The answer here was pretty clear. Microsoft actually built a whole tool for us to handle SQL data movement with complicated workflows and event handlers that has a flexible graphical presentation layer – and that tool is called Integration Services. Yes SSIS has some of it’s own issues but flexibility of logic flows is not one of them.

Another afternoon frittered away investigating a problem that was solved years ago….

TSQL Tuesday 184 – Mentoring, Sponsorship, Rubber Ducks and Mates

This month Deborah Melkin takes a turn as the host of TSQL Tuesday and asks some interesting questions about Mentorship and Sponsorship. When I started this blog it was mostly as a place to drop bits of code and technical how-to’s and largely has stayed away from personal development, leadership and career planning, largely because I’ve never considered myself any good at any of that stuff. And yet when I look back at my career and life(something I am doing far too much lately due to a big scary birthday looming in November…) it’s very obvious that certain relationships I have formed have been very influential on the way my career path has developed and where I am now.

For me, these relationships have been fairly organic, and I’ve largely never thought of them as Mentoring – rather an open exchange of challenges and solutions. While there are definitely people who have been in a more traditional mentor\mentoree relationship with me, I’d like to talk today about some friends in a less traditional ‘mentoring’ relationship. Actually – I want to talk about having lunch with your mates.

For over a decade, I have two friends who I will grab lunch every fortnight or so and the conversation flips fairly naturally between general friend chat, personal development and specific work issues we are facing. We are all far enough removed in the tech world not to directly influence each others careers, but close enough that the problems we encounter tend to be the same type of thing.

What’s interesting is that there’s no Mentor and Mentoree in these scenarios, just a group of guys who are hitting the same problems at different times and exchanging ideas on how to resolve them. Quite honestly I think any of us would be horrified to have the label ‘mentor’ slapped on us for these lunches, but they provide so many of the benefits.

I’ve heard mentorship described in a number of different ways, but one thing that has stuck with me is a discussion I had some years ago where I suggested someone had acted as a mentor in suggesting a course of action to resolve a problem. I was corrected, and told they had tutored me(given me a solution) rather than mentored me(empowered me to find the solution myself). This distinction is what I get from my regular lunch catchups. Sure – we give advice, but there’s never an expectation that the advice is any more than a suggested approach. As I say we are far enough apart in our job roles that our solutions are usually not specific implementations, but general approaches. The final step of taking the advice on board and acting on it always sits on the person with the problem, loosely ‘the mentoree’.

Furthermore, there is some accountability built in. My friends have a shared mindset when it comes to how much control we have in our lives. Specifically – we all are of the belief that we have the ability to make decisions and take actions and that the consequences are on us. None of us accept that we are stuck where we are, we all know that if we want to be doing something else we have the skills and experience to do it. As such if the same problem comes up two lunches in a row the answer is not usually a suggestion it’s a challenge – “What did you do to solve this since we last spoke about it?”

What exactly is a rubber duck

There is a developer technique known as ‘Rubber Duck Debugging’ where a problem should not be escalated to a senior resource before it has been explained to an inanimate object such as a rubber duck, or rock. The idea is that verbalizing the problem forces them to slow down and think through the steps and logic, which can often lead to discovering the solution. It’s a form of self-explanation that helps clarify complex ideas.

Lunch with trusted friends who are ready and willing to hear you explain your work problems is another form of Rubberducking. Often it’s just a case of taking the time to talk through something, and the solution presents itself. Other times the solution is already there, but you are just seeking the validation that you have reached the right conclusion.

I think there’s a key point in there somewhere about the role a mentor should play. They should be someone that you trust, someone that you respect, someone that has your best interests at heart and someone who will call you out if you are going off track. These are all attributes I look for in friends too.

I fully support the more traditional approach to Mentoring, and applaud the efforts that people like Deborah are making in helping people find these relationships through tools such as WITspiration, but I also encourage people to not get too hung up on the label of ‘Mentor’ and realize that mentoring doesn’t need to be a big formal arrangements, and that if you are prepared to open yourself up a little you can gain many of the benefits from a group of like minded friends.

SOS – I’m dead.

In a strange announcement this week, Microsoft announced on the Feb 7th that the Azure Data Studio would be retired on the future date of February 6th. You still have until February next year until it becomes unsupported, but I found the bungled announcement of the end of the products life was somehow fitting for a product that never quite took off. Personally I’d used it a few times, never really saw a compelling reason to dig deeper when Management Studio(SSMS) mostly did what I wanted, and just assumed I wasn’t the target market and went back to doing what I always did. Seems I wasn’t alone.

Microsoft Azure Data Studio(ADS), originally launched as SQL Operations Studio in November 2017 and I can still remember my reaction when someone first pointed out that the acronym for that was SOS. In IT, everything is going to get shortened to an acronym sooner or later, so to release a product without realizing what it’s acronym is likely to be showed….a certain lack of planning. Maybe we should have seen the writing on the wall.

Initially focusing on SQL Server database management, the software aimed to offer a modern, user-friendly interface, emphasizing usability and collaboration in data environments. By April 2018, Microsoft rebranded SQL Operations Studio to Azure Data Studio, maybe aligning it with it’s cloud platform, but probably just acknowledging that the original name was badly thought out. This rebranding highlighted its capability to specialize in cloud-based data operations while also supporting on-premises environments. The product was designed as a lightweight, cross-platform tool available for Windows, macOS, and Linux, providing flexibility for developers working in diverse ecosystems. The multi-platform support is something that SSMS can’t boast, but I’m a windows user, my clients are windows users and it was never functionality that those around me had needed. It also offered themes which some people who aren’t me care about and seems to be a problem too hard for SSMS to crack.

One of the touted features of Azure Data Studio was that it allowed users to integrate various extensions. This capability enables the customization of the environment to meet specific workflow requirements. Notably, Microsoft had included extensions for popular programming languages like Python and R, facilitating data science-centric workflows. The tool also supported Jupyter notebooks, boosting its appeal within data analysis and machine learning communities. And all that sounds nice until you tried to use it the same way you would use…say Visual Studio Code, and quickly realized that the extension model was heavily curated by what Microsoft was offering you. VSCode by comparison has a huge library of extensions with a fairly straightforward pathway to create your own.

So, without really seeing a great benefit for me, I continued using SSMS, so was disappointed when I found that ADS didn’t want to be ignored. If you installed SSMS you got ADS anyway. This is great fun to explain to clients when you are installing it into their environments. “You said you wanted to install SSMS, why are you installing this other stuff?”

Apart from me never really seeing the point, ADS also faced other criticisms from people who were using it. One recurring complaint was its performance issues when handling large datasets compared to SSMS. Users reported lag and responsiveness challenges, particularly when executing complex queries or managing big data environments.

So the announcement comes as a bit of a surprise to me, but is not likely to make a change to the way I do things. I guess some of the functionality that has been pushed into extensions for ADS(such as Azure migration assessments) will need to be rethought and repackaged over the next year, but overall I think it’s best for Microsoft to focus in on SSMS as it’s primary tool for SQL Database management. Version 21 apparently gives themes to people who feel that’s important, and also has copilot functionality.

TSQL Tuesday – Managing Permissions and the importance of visibility

Steve Jones (Blog) has taken off the co-ordinator hat and put on the hosting hat for this months TSQL Tuesday, and he’s come up with something that’s quite topical for me – Database Permissions. In the last week I’ve been in calls discussing internal permissions on our own databases, calls around managing application permissions, active directory groups and a permissions security cousins Encryption(Hint: If you are looking at a SaaS solution ask if they encrypt their databases, and then ask for proof).

The common thread is that this is not usually initiated from an application owners side, it is initiated from the DBA side. Application owners like their application to work, and it seems common that security is just an assumed thing that happens. It’s not that they don’t care about security – it’s just that they hope someone else is taking care of it. And that can easily be where it ends unless someone asks some very pointed questions about who all these people with access are and whether they really need it. So let’s start with happy land, or how I like to see things set up, and we can talk about reality later:

  • There is a single connection used by the application and actual user permissions to data are handled by roles within the database.
    • Preferably this is a Windows login, but it’s no big deal if it’s a SQL login, so long as the only thing using that SQL login is the application, and the password is sensible.
    • That login should have the minimum permissions that the application needs in the database. Usually this is going to be read and write permissions, maybe it also needs to update objects depending on how your releases hang together. When the documentation says ‘sa’ account is required…..it’s usually not. Certainly past the initial install.
  • Any direct access to the database is handled by windows logins, preferably with groups. Groups are something I love and hate. They are great for managing the reasons for accessing a database and assigning an equivalent to role-based access. They are less great for visibility as to who can actually see what data. When I get access to a client system it is usually at the SQL level and auditing nested AD groups to provide a list of who can actually see what is not always easy. Regardless, for any direct access least privilege is a must.
  • Changes are completed under change control with a recorded reason and authorization.
  • BI Developers don’t exist.

Now, in the real world you don’t often get that. At a minimum that last one is going to be tricky because Business Intelligence provides massive business benefit when done right and most applications don’t have quality reporting built in, or the reporting isn’t specific enough to your business, or it doesn’t provide the insights you are looking for. It’s not that I don’t like BI developers, it’s just that they enter a new variable in controlling my data. Specifically they want to query it in different ways than the application developers intended and usually they want to move it somewhere I no longer have control of it.

So out in the real world we don’t often see my happy land simple implementation, and instead we see a huge mish-mash of systems that talk to each other, replicate to secondary servers and have access from people all over the place. That access may or may not still be needed, or may or may not have been appropriate on the first place. My personal favourite was a server we reviewed just before Christmas that had “Domain Users” set up as a system admin. I’m sure Joe the cleaning temp enjoys knowing what his bosses make and will resist the temptation of bumping up his own salary.

If we can’t necessarily control all of the various ways that the databases are accessed, then we should at least ensure that we have visibility. This comes in two forms:

Visibility of what the current state is: Who has server level roles, who has database owner level access, and who can read or write to each database. Even if you don’t go beyond the built in server and database level roles if you have this information then you are well ahead of a good 90% of organizations out there. There are some specialized products that can help with this, or some organizations have custom solutions. Others simply export the environment permissions and keep a record in source control. Most don’t have it at all.

I want to include quite a few things in this visibility. System level role membership is a given. Database level role membership should also be relatively easy to collect. Individual permissions including allow and deny rules gets complicated, and often overlooked is comparing permissions between servers, particularly where high availability is involved. Following that, one of the key pieces of information I look for in this is ‘who has the ability to change permissions’ with the preferred answer being as few people as possible.

Visibility of changes: This is trickier, but having an alert when permissions change is very nice to have. The reality is that your SQL Server permissions just shouldn’t change that often. Ideally you set up the application and the permissions are static beyond that. New permissions should be tied to a new or changed business requirement. We have this implemented in a few places so when permissions changes are made there is an immediate alert that goes out to an administrator that includes what changed and who changed it. That means there is some nice visibility and just the fact that someone is being notified actually seems to reduce the number of incidents of random inappropriate escalations.

If you have the above two levels of visibility, then you are actually armed to start tidying up permissions or (we can hope) verifying that your current permissions structures and processes are appropriate and working well. If you do not, then getting that visibility should be a priority.

Starting the year thinking about Security

Okay, so last year it happened. A client got their entire infrastructure Cryptolocked and recovery has been a dog. This was not a SQL Server problem nor a SQL breach, but it brings into perspective so much of what I’ve been talking about in this blog of how SQL is part of a wider environment, and how everyone has to be on the same page when it comes to securing and, if worst comes to worst, recovering that environment.

My focus during the start of 2025 is going to be the development of our DataAssure Security module. We’ve got some pretty neat stuff in our managed service already in the security side, but we have developed based on a minimum threshold we want everyone to be at. So we capture your failed logins, we keep track of your system administrators and we highlight some glaring security holes(like weak passwords and incorrectly configured linked servers), but the security module will be taking that to the next level for our clients who are after a more complete solution.

This is going to mean a refresher for me on many of the security features of SQL and we’ll build out a bunch of the below:

  • TDE and Backup Encryption
  • Certificate Rotation
  • Monitoring Encrypted Connections
  • SQL Audit and Extended Events
  • Server and Database level permissions changes
  • Inactive and over-privileged logins.

Security is hard enough when you are just thinking about your own servers, but designing solutions that are flexible enough to cover everything from hardware stores to financial institutions is especially difficult. Still, no point in starting the year off with easy tasks is there?

Let’s go.

SQL Saturday Wellington 2024

I wanted to make sure that everyone in the Wellington region is aware that Anu and the team are lining up another weekend of high class SQL training THIS WEEKEND. I do apologize for the late post, but I’ve been chatting to a few Wellington based clients in the past week and letting them know and realize that not everyone is aware and I need to do my bit to get the message out.

I’m going to be sharing some of the good, the bad and the ugly that we’ve seen with live moves of various clients into the Azure cloud. It’s a session that compresses masses of experience into a (probably less than) one hour session, so come along and take the opportunity to learn from other peoples mistakes.

But check out the schedule, because there is 3 tracks of data goodness, and what looks to be a fascinating keynote from Kevin McCombe and Mike Winiata from the New Zealand Fire Service.

SQL Saturday South Island 2024

Hi all. Just a reminder if you are not already signed up to get over to:

https://sqlsaturday.com/2024-07-27-sqlsaturday1084

and sign up for SQL Saturday South Island 2024 edition. I’ll be giving a talk on Azure Database migration and sharing some of the good, the bad and the ugly we’ve experienced in moving clients to (and from) Microsoft Azure.

There’s also 4 streams of data good ness with the usual suspects all making an appearance. Check out the link above for the full list. See you there!