Category Archives: SQL Community

What does the next generation look like?

This Months TSQL Tuesday topic from John Sterrett is an interesting one because it’s something that has been occupying quite a bit of brainspace for me recently. John asks “What are you doing, or what can we do to encourage younger people to get involved in the SQL community while increasing the number of younger speakers?”. It’s also Wednesday here in NZ, and a week late – but the topic is important enough to discuss even if I’m late tot he party.

I have a big scary number birthday coming up in November, but I’m also aware of a transition in my work life from “Building a CV” to “Building a legacy”. Legacy seems like a strong word, but I am spending increasing ammount of time thinking about the guy who has to look after things after me. That is in my own job role, and in my client environments, and, as is important to this topic – within the SQL community.

I have now been speaking at SQL events for around 15 years. I am still staggered by the depth of knowledge of those around me who give their time and in most cases not insignificant expense to make themselves available for these (mostly) free training events. I am no less in awe of the amazing things some of these speakers are doing in the world today than I was 15 years ago. And I do wonder if that awe is maybe one of the things that holds some of our younger speakers back.

There is a preconception that to stand in front of a room of people and talk about a technical topic you need to be some sort of guru level savant. In the tech world we accumulate knowledge so it takes quite a serious amount of self confidence to make that first step and put yourself in front of a group of people, many of whom will be quite senior to you, and take that position of expertise.

But here’s my take – it’s actually easier to choose a topic to speak on now than it was 15 years ago. Why? Because technology is changing so fast that no-one has five years experience in most of the technologies we are using today. They only came out three years ago, or last year, or yesterday. Adding to that those very experienced guys in the room, whether presenting or listening – they have a valuable skillset which means they are very likely very busy. The ‘curious itch’ I had in my twenties and thirties remains, but there is just no way I can test and discovery all the things in all the cool technologies that are coming out. I **need** to hear stories of how people are using new technologies because the busier I get the less time I have to play with the new way of doing things. But for someone starting their career, who is more time-rich – you can tell me how you get one and what amazing results you got.

So what can we do to encourage more of this sharing? First of all I think we need to encourage the younger crowd to attend these events and importantly – stick around afterwards. There is so much that each group has to offer the other. But mostly you’ll discover that the biggest difference between you and the guys already on the speaking stage is that we are older. Usually IT draws a person with a brain that works a certain way. When I started going to SQL events I felt like I found my crowd. I could have a chat and I found that most everyone was on the same wavelength. So take advantage of the aftermatch of these events.

Secondly – I like the shorter quicker talks. It does take some time to put together a full session for a talk, so as a group maybe we can schedule short form content in with the longer form content. A 5-10 minute talk is much easier to get through than a full hour session. It’s a small window to show a single thing or feature off, rather than go through a product end to end.

And finally we need to remember the crowd. I’ve done very few talks at SQL Saturdays etc that are above 200 level. I worked out pretty early that the introductory stuff is the sweet spot. Yes there’s some experienced people in the audience, but there’s also a lot of people at the start of their story. I want those guys to be just as comfortable asking questions and making comments as the guys who have done this as long as me.

Why “Turn It Off and Turn It On Again” Still Works – Except for databases!

For this months TSQL Tuesday, Joe Fleming asks us “What’s a strange or unique problem you encountered and resolved, and how did you identify the cause/resolve the issue. What was your thought process?”

I am taking a slightly different tangent. My problem is neither strange or unique – in fact it’s infuriatingly common and it stems from one of the most common troubleshooting techniques in IT. While asking users “Have you tried turning it off and on again?” is a common go to for tech support call handlers, it is not a great idea when the “it” you are talking about is a database server

My story has happened to me in several forms over the years, but I still remember the first time it did. I had to deal with a problem where users reported a seemingly random set of job failures. Sometimes ETL jobs completed, sometimes they didn’t, sometimes they errored, sometimes they didn’t. Most systems have fairly useful logging of what is happening and SQL Server is no different.

The agent jobs that people were talking about had histories that regularly just stopped. Some had weird last run statuses of cancelled, or in one case running when there was no process attached to it. I didn’t need to look too much further at this point, but the SQL Engine error logs confirmed what I suspected – the server was being automatically started every night, and jobs were trying to run over this period.

It’s always nice to start a relationship with a new client with an easy win like this. Within about 5 minutes I could tell them what was going on, could point at the process that was doing it and had a clear path to resolve their sporadic load failure issues. Then they asked me the question “Why was this put in place?”

That’s a very difficult question to answer without having any history of the server, the jobs it has run previously and the people who had worked on it. But there was a clear clue in this case. The restart script contained a comment:

#Restart to help memory problem

Oh boy. I can only theorize at this point, but it seems like the old kneejerk to “SQL has a memory leak” response. Left to it’s own devices, SQL Server will happily grab up to the maximum amount of memory available to it (Mostly defined by the Max Server Memory setting), it’s very common for non DBA’s to see SQL using a huge chunk of memory, even when it is doing nothing right at that moment, and figure they can fix the problem by rebooting. That’s not a great response. That memory cache is a feature of SQL and what allows it to serve queries faster, not some bug or memory leak.

So in this case we could move jobs around to avoid the restart window until we’d gathered enough data to prove that the restart itself was not needed, but reflecting on it does raise the question – “Why does turning it off and turning it on again work so often?”

Here’s a subset of the reasons I can think of that a restart can work:

  • Stuck processes are cleared. All those background processes that can conflict and cause blockages are cleared and start from scratch.
  • Memory is cleared. Memory management is really complicated when you get into it, and over time inefficiencies in what is in memory and whether it needs to be can build up.
  • Re-establishes network connections. Networks are also complicated, and restarts will often reclaim IP and register devices, resolving some of these conflicts, or at least re-registering devices.
  • Updates and software glitches. There may be unapplied updates waiting on a restart to take effect. I’ve seen multiple cases where the very functionality that is supposed to avoid interrupting your workflow by holding off a restart until “a time of your choosing” actually stops it dead by breaking network connectivity or some other dependency. Likewise software bugs are not always there when you turn software on. Hopefully if they were that would be picked up in regular testing pre-release. More often a bug is triggered by a certain usecase or set of parameters. This is cleared with a restart.

In short, rebooting is a quick way to return a device to a known, clean state. It’s a universal first step because it’s fast, simple, and often effective—even if you don’t know what caused the problem in the first place.

But databases are different. For a start they are usually multi user devices, so turning them off affects every system connected to them, and turning them on again can often be a slow process. Again, this is not a bug, this is built into SQL to maintain consistency. You WANT SQL to play through the logs and bring the database to a consistent state before allowing access to the system. Consistency is only one problem that a reboot can cause, there’s a bunch of others.

  • Multi-System integration – SQL itself is designed to return the database to a consistent state following a transaction, but a SQL database is not a standalone system, it is often one of many that form an intricate web of data movement. Oftentimes complex systems have a very specific order that things need to be shut down and started up in to bring back online cleanly.
  • Connection Persistence – Some applications maintain a persistent connection and expect that connection to be there. One not uncommon Microsoft Enterprise Resource Planning system has the application create objects in TempDB when it first connects to the database. If the database server restarts unexpectedly these object dependencies must be known and the creation scripted to ensure application functionality.
  • Application State Management – Some databases are state management engines for their application, meaning the current state is stored and the process will continue following the restart.
  • High Availability implications – whatever happens on the primary server is also duplicated to secondary servers. This can lead to extended recovery periods as transactions need to be replicated to remote secondaries.

And of course the fact is that many servers are consolidated, running multiple systems, so if you restart to solve your problem, you are affecting multiple other systems at the same time. A restart is an outage on every system affected by it and may be breaching an uptime SLA on someone else’s system you aren’t even aware of.

Finally, for a user system, or even an application server, you can get away with the approach that says you just need things to work again as quickly as possible. For a database server that’s also true, but you want to understand why a problem arose in the first place too. Consult a DBA, find the root cause and fix it in a manner that you don’t see it again. Don’t just jump to “Turn it off and turn it on again.”

SQL Agent, Maintenance Plans, Auto-Heals and How to Waste an Afternoon

At my company, we’re lucky to have a custom tool that monitors EVERYTHING happening in our SQL Server environment. But every now and then, I like to explore what’s possible using only SQL Server’s built-in features-just in case you don’t have a comprehensive monitoring suite at your disposal.

Recently, someone made an off the cuff comment about using TSQL Tasks in Maintenance Plans to handle more complex logic that SQL Agent handled out of the box. I was briefly excited by the prospect of building improved logic flows directly into SQL Server Maintenance Plans. This months Andy Levy hosts TSQL Tuesday and asks us about how we handle SQL Agent Jobs and this seemed like a great opportunity for me to share a story about how I wasted an afternoon testing a few components I have a passing knowledge with, attempting to implement auto-healing and conditional logic flow, only to snap out of it a few hours later when I realized that I was trying to solve a problem that someone else cracked nearly 2 decades ago.

The visual workflow designer and conditional branching on success or failure in Maintenance Plans seemed like a great way to automate not just routine tasks, but also smart responses to issues as they arise. I love any opportunity to implement automatic healing into anything that is likely to alert me. Getting a message saying “A thing nearly broke last night – the problem is fixed” is so much better than getting a page to fix something that retrying or a known remediation action needs to be manually performed. So I looked into a little side by side comparison of SQL Agent vs Mainteance Plans to address a scenario where a DBA performs the same boring checks and actions when a backup fails, and how both technologies allowed for implementing retries and logic flow to handle this automatically rather than have a human intervene.

The flow for this process was good at first, but as I dug deeper, I realized that while Maintenance Plans can be powerful, you can achieve the same (and often better) results with a little creative thinking in SQL Server Agent Jobs. In fact, Agent Jobs offer even more flexibility, allowing you to mix and match different step types-T-SQL, PowerShell, SSIS packages, and more-within a single job. Let me walk you through what I learned, with some real-world examples.

Maintenance Plans: Visual Logic, Limited Flexibility

Let’s start with the Maintenance Plan approach. Here’s a screenshot of a workflow I put together to handle database backups with built-in remediation and notifications but the same ideas can apply to ETL or other processes.

  • The plan tries to back up all user databases.
  • If the backup succeeds, it updates a central monitoring table.
  • If the backup fails, it runs some auto-remediation steps (like clearing disk space or restarting a service), then tries the backup again.
  • If the second attempt succeeds it updates the central repository, and if it fails, it notifies the on-call operator.

This is a solid approach, and the visual editor makes it easy to follow the logic. But there are some limitations:

  • You’re mostly limited to backup, maintenance, and T-SQL tasks.
  • Complex branching or looping is awkward.
  • Integrating with external scripts or tools isn’t straightforward.

Agent Jobs: The Real Powerhouse

What’s more, my inital enthusiasm around the workflows was misplaced. SQL Agent actually has success and failure branching built in also at a per step level, it’s just in most cases the failure action is to say the job failed. But it doesn’t need to be.

With Agent Jobs, you can:

  • Chain as many steps as you want, each with its own success/failure logic.
  • Use different step types (T-SQL, PowerShell, CmdExec, SSIS, etc.).
  • Easily implement loops, retries, and advanced remediation.
  • Integrate with external systems, scripts, or APIs.

Here’s a screenshot of a way of achieving a better outcome using a non-standard workflow with SQL Agent. The job again handles backups with layered remediation and targeted notifications:

  1. Reset Autoheal Remediation Counter – Prepares the environment for a new backup attempt.
  2. Check Retry Loop – Determines if a retry is needed based on previous attempts. If the number of retries is exceeded the job generates a failure which redirects to the final step(Which is configured to alert the oncall operator)
  3. Backup User Databases – The main backup step.
  4. Update Central Reporting Repository – On successful backup the job proceeds to this step which logs the result and ends the job.
  5. Initiate Backup Drive Maintenance – If the backup fails, attempts to resolve likely issues.
  6. Perform Additional AutoResolution Tasks – Further remediation if needed. Following the completion of this step the job is redirected back to step two which increments the loop counter and retries the backup.
  7. Job Failed Notification – If all else fails, notifies the operator and ends the job.

Each step specifies exactly what should happen on success or failure, making the workflow both robust and transparent.

Why Agent Jobs Win

After experimenting with both approaches, I decided I hadn’t infact finally found a use for Maintenance Plans. Really the only thing they have going for them in this scenario is a nice graphic interface to visualise what is going on. That’s not to undersell the importance of a clearly identifiable workflow. The job step approach is very hard to follow if your workflows become more complicated. But here’s why I think Agent Jobs are the smarter choice for most advanced workflows:

  1. Multiple Step Types – You’re not limited to T-SQL or maintenance tasks. Want to run a PowerShell script, call an external API, or execute a batch file as part of your remediation? No problem.
  2. Granular Control – You can direct the job flow on a step-by-step basis, allowing for complex logic, conditional retries, and targeted notifications.
  3. Easy Integration – Agent Jobs make it simple to integrate with external monitoring systems, send custom alerts, or update centralized logging repositories.
  4. Maintainability – It’s easier to document, update, and troubleshoot jobs when each step is clearly defined and can be of any type you need.

Conclusion – So which should you use?

So obviously I’m leaving out some additioanl options in my side by side test. SQL Agent Jobs have auto retry built in at the per step level if you want, and the loop doesn’t have to be at the task or job step level – you can build that right into the TSQL if you like. I played around with both of these approaches for a while, trying to see if I could refine them to resolve the issues that I came up with. I really wanted that graphical format, but I also wanted the flexibility of the SQL Agent jobs. Which should I use?

When you delve into these things it’s really easy to get stuck with your blinkers on. The answer here was pretty clear. Microsoft actually built a whole tool for us to handle SQL data movement with complicated workflows and event handlers that has a flexible graphical presentation layer – and that tool is called Integration Services. Yes SSIS has some of it’s own issues but flexibility of logic flows is not one of them.

Another afternoon frittered away investigating a problem that was solved years ago….

SQL Saturday Wellington 2024

I wanted to make sure that everyone in the Wellington region is aware that Anu and the team are lining up another weekend of high class SQL training THIS WEEKEND. I do apologize for the late post, but I’ve been chatting to a few Wellington based clients in the past week and letting them know and realize that not everyone is aware and I need to do my bit to get the message out.

I’m going to be sharing some of the good, the bad and the ugly that we’ve seen with live moves of various clients into the Azure cloud. It’s a session that compresses masses of experience into a (probably less than) one hour session, so come along and take the opportunity to learn from other peoples mistakes.

But check out the schedule, because there is 3 tracks of data goodness, and what looks to be a fascinating keynote from Kevin McCombe and Mike Winiata from the New Zealand Fire Service.

SQL Saturday South Island 2024

Hi all. Just a reminder if you are not already signed up to get over to:

https://sqlsaturday.com/2024-07-27-sqlsaturday1084

and sign up for SQL Saturday South Island 2024 edition. I’ll be giving a talk on Azure Database migration and sharing some of the good, the bad and the ugly we’ve experienced in moving clients to (and from) Microsoft Azure.

There’s also 4 streams of data good ness with the usual suspects all making an appearance. Check out the link above for the full list. See you there!

SQL Saturday South Island – Wrap Up

What an amazing weekend.

Sometimes you can go to a free training event and take a few notes of things to go look up and that’s a win. And some days the stars align, you pick the right sessions and you walk away with solutions to problems you are working on right now.

I was lucky enough to attend Craig Ryans session on reducing your SQL spend. This is something I focus on in most of my client engagements so generally I think we have the bases covered. Not even close! Craig had a handful of different considerations that I need to work into our own offerings including comparing pricing in different regions and dusting off the cobwebs from the Web Edition of SQL Server – which to be quite honest I haven’t considered in my calculations for years. Then add in changing operating system and we saw an example of how to actieve the same levels of performance on the same hardware(or often better hardware) for half the price. Great session well worth the price of admission(which was free, but to be honest I would have paid twice that.)

My own session went okay. We went through the 5 types of Encryption that are available within SQL Server and demo’d each, making it clear that technical know-how is not a reason to not start the process of implementing Encryption in your environments. And a great thing happened after that session. I had fumbled a couple of times with the technical setup and switching between slide decks and Management Studio and one of the audience came up to me afterwards with a recommendation on using colour coding for session connections. This is something I often do when I am connecting to prod or non-prod client environments, but I think it would be a great addition to a session where I’m demo’ing in multiple different environments – both for me and the audience. So thank you anonymous random guy.

The next session I started going to was Warwick Rudds session on Azure Data Studio, but Warwick hit a technical snag when the presentation room monitor decided it had had enough and went to sleep. Warwick always presents great content, but I also know he distrubtes it really well so I promised to catch it on online and took the opportunity to grab a Lime scooter back to the hotel and drop off my laptop so I was set for a big night after the event without having to lug it around.

I got back in time for Heidi Hasting to talk about some of the processes she has automated and the tools she uses. I picked a couple of key words out of her presentation abstract and thought it may just help with an issue that a colleague is currently working on. Heidi had flown in late and had the last session of the day and presented on 2 hours sleep. But man did she deliver just what I was after with some pieces being perfect to solve a Powershell\PowerBI analytics\MS Licensing issues that have been on my periphery over the last week or two.

And as always the real value in these sessions is that if you don’t get the exact piece of information you are after in a session, you can go out afterwards, buy someone a drink and chat about the technologies we are working with and chew over the problems we are faced with in our day jobs. I literally got thousands of dollars of value out of the event and had an amazing time doing it. Thanks to Hamish and the team for another successful and enjoyable event.

Come to SQL Saturday South Island

Hi all – a quick note telling you to come along and support SQL Saturday South Island.

Once again Hamish has pulled together a great group of speakers from New Zealand, Australia and even the USA. As always it is a free day of training, and you even get a free lunch thrown in, so if not for bettering your professional career and learning lots of cool new stuff – come along and grab a free lunch – cause it turns out there IS such a thing as that.

SQL Saturday South Island 2023 (#1061)