Author Archives: Rob

About Rob

http://gumption.co.nz/about-me/

It’s a Corrupt World

Luckily, in this day and age we don’t see so much database corruption, but it still sends a chill down the spine every time it pops up.  Having dealt with a number of cases in a number of different places now it’s not the corruption that bothers me so much as the pure amount of time it takes to fix.  Any unscheduled application outage is going to make you the focus of some very close scrutiny, and there’s going to be a lot of pressure to get things resolved as quickly as you can.    I’m going to assume you have detected the corruption with DBCC CHECKDB and have a minimum repair of repair_allow_data_loss.  Here’s some tips to get the application online as quickly as possible.

1.  Don’t try and get the application online as quickly as possible.  That way leads to mistakes and can compound the error.  Your goal should not just be to make this inconvenient problem go away, but to identify its cause, scope and resolve it in the best way possible.  Having identified that corruption exists your first move should be to make sure everyone who uses the application is aware there’s a problem.  At this point you need to assume that you are going to be dealing with dataloss.  If you don’t talk to your users everything that is going into the database can potentially be a further headache for you later.

2.  Start the process of retrieving or securing backups.  Hopefully you are running regular DBCC checks.  It’s worth starting the process of getting all backups between the last known good check and the time the corruption occurred.  Also, move those backups somewhere secure.  There’s nothing worse than seeing your (possibly) good backup suddenly get overwritten with a (definitely) bad one.

3.  Isolate the corruption.  You need to know where it is.  Once you know, the first step is NOT to fix it.  Your first step is to relay that information back to the business(or users) and let them know where the problem lies.  At this point get the application support guys in and confirm the impact of data being lost in that particular table.  You can save yourself hours with an informed application support person.  In the instance of corruption I was dealing with today it looked like a financial transaction table and the first 5 rows had 2 $70,000 transactions.  Not a table I wanted to lose data from.  The application guy was able to advise it was a table used only for logging which was usually cleaned up by an archiving process that just hadn’t been turned on for this server.  Knowing a table is essentially unused makes the decision process for the business very much easier.  (Unfortunately this particular application support person undid all his good work by following up with advice to the client to turn off checksum on all databases, a piece of advise akin to saying we can reduce the number of people convicted of murder by making murder legal).

4.  Now you have defined the problem, and the required fix, go ahead and fix it.

I know I made number 4 the shortest step, and it’s often very complicated.  But this is not a step by step technical guide to fix corruption in your database.  This post is about how to prep those around you to deal with the impacts of database corruption when it is identified, and after it is fixed.  In some cases the decision of how to proceed will fall directly to you, in others your recommendations will be considered and followed, and in others they will be discarded completely.  That’s the nature of working in a company, or as a contractor where other people have a vested interest in the database you are managing.  Leaving aside the technical aspects of the fix, it’s critical you manage people and expectation well.

Corruption is a big scary thing, but the biggest scariest bit is that when it strikes people have to make hard decisions, often with other people yelling at them or angrily wondering why their application is offline.  As a DBA, corruption is one of the scariest things you will face.  It’s not your fault(well, usually) but now you are the only one who can get the application back up and running.  This is your reason for being employed, and it’s natural to feel a lot of pressure to get things done as quickly as possible.  The best thing you can do…the very best thing…is not to try to do it as quickly as possible, but to do it as well as possible.  When this problem is resolved, be that in 30 minutes or after pulling an all nighter, there shouldn’t be anyone who is surprised by the outcome.  Just because the repair option is called REPAIR_ALLOW_DATALOSS doesn’t mean everyone will understand that data will actually be lost.  People are weird like that.  It’s your job to make sure that all interested parties know the implications of any fix you put in place BEFORE you put it in place.  It’s your job to present the problem, and then clearly explain the pro’s and cons of the possible solutions.

 

 

TSQL to get last good DBCC CHECKDB

This is a piece of code I use to determine the last known good DBCC CHECKDB being run against a database.  I’m surprised this information is so tricky to find.  I’d expect it to be sitting on the database properties tab right under last known full backup.  But it’s not.  Instead it is listed as a database Info property, and we need to jump through some hoops to find it.  To save anyone else jumping through the same hoops I’ve put my script below that gives the last known DBCC CHECKDB date for all databases on a SQL Server instance.

CREATE TABLE #DBCCs
(
ID INT IDENTITY(1, 1)
PRIMARY KEY ,
ParentObject VARCHAR(255) ,
Object VARCHAR(255) ,
Field VARCHAR(255) ,
Value VARCHAR(255) ,
DbName NVARCHAR(128) NULL
)

EXEC sp_MSforeachdb N’USE [?];
INSERT #DBCCs
(ParentObject,
Object,
Field,
Value)
EXEC (”DBCC DBInfo() With TableResults, NO_INFOMSGS”);
UPDATE #DBCCs SET DbName = N”?” WHERE DbName IS NULL;’;

SELECT DbName,Value as [Last DBCC] from #DBCCs
WHERE Field = ‘dbi_dbccLastKnownGood’
–AND Value < GETDATE()-14  –Uncomment to return only databases that haven’t had a consistency check in the past 2 weeks.

Drop TABLE #DBCCs

 

Now, I called it ‘my code’ but in reality, like all my code, it’s based on something shamelessly stolen from smarter peoples minds and shaped to my own purpose.  In this case the core of that little snippet is in Brent Ozars Blitz Script.  If all you want is the last date of a consistency check then by all means help yourself to the above code.  But if you want to quickly identify a whole bunch of issues on a server you don’t know well then I’d suggest you take a look at what the full blitz script can do.

Optober – SQL sp_configure Options – Priority Boost

Optober rocks on today with a quick post on Priority Boost.  I hadn’t planned to deal with this one until much later in the piece, but I’ve been reviewing a very unstable server for a client today, and it tracks back to this option.  It’s another one of those things like ‘autoshrink’ which sound like a perfectly reasonable idea until you understand what it actuially does.

warning

Danger – Only enable this setting if you like running with scissors.

Option Name:  Priority Boost

Usage:  sp_configure ‘priority boost’, 1

Default: 0

Maximum: 1

Requires Restart:  Yes

What it does:  Just like the name says, this option boosts priority of the SQL Server processes.  Which sounds like a good idea, but what does it actually mean?  Behind the scenes here is what is happening – SQL Server threads are being made uber important.  They get precedence over just about everything else.  Now this still sounds like a good thing right – this box is dedicated to SQL, so it should get precedence over everything.  Only unfortunately in this case everything includes the operating system.  The stability of the SQL Server is dependent on the stability of the operating system.  I spent 4 years in Christchurch when it was shaking like a frozen puppy, so I can tell you that when the foundations rock, buildings fall over.  The only threads that are a higher priority than SQL Server with Priority Boost enabled are the REALTIME_PRIORITY_CLASS level threads, and the usage of those comes with all types of warnings.  You can read more about Windows Operating System Priority Scheduling to see all the big flashy warnings about doing this in an application.  SQL Server is just another application and all the standard warnings apply.

If you do go ahead and enable it, at the very least, you can expect your interactions with the server to be unpredictable.  You could be moving your mouse or clicking the start button, but SQL has priority over that.  You could be opening management studio – but SQL has priority over that.  You could be trying to stop your SQL instance because it is running horribly, but guess what…anything running in SQL has priority over that too.

It gets even worse if you are using a cluster or availability groups.  Giving priority to SQL can mean that your cluster heartbeat is interrupted and your instance initiates a failover.  I’m sure you don’t want your HA continually initiating failovers.

When should I use it:  Alright…so if you got this far, you haven’t really absorbed the above.  This setting should never be used unless explictely instructed by a Microsoft Engineer to resolve a specific issue on a server.  Even then I suspect I would argue against with them before throwing the switch.  Take a gun and fire buckshot into your server from 100 paces – it’s probably less damaging than having this setting enabled for any extended period.

What else should you know:  There is some light.  This configuration is on the chopping block and will be removed in a future version of SQL Server.  But every silver lining has a cloud….I’ve been reading that on the books online page for Priority Boost for as long as I can remember.

Optober – SQL sp_configure options – Optimise for Adhoc Workloads

The Optimise for Adhoc Workloads is a setting that I had never given a lot of thought to.  I credit my knowledge of it entirely to new MVP Martin Catherall, who casually dropped into conversation that it is an option he usually configures by default.  I asked a few other people around this and it seems that the option is either completely unknown or considered to be a standard deployment.  So I’ve done a bit of reading and a bit of testing, and now it’s one I consider pretty much a default deployment as well.

Option Name:  optimize for adhoc workloads

Usage:  sp_configure ‘optimize for adhoc workloads’, 1

Default: 0

Maximum: 1

Requires Restart:  No

What it does:  A certain portion of your SQL buffer pool is used to store cached execution plans to be reused by SQL Server if that query comes along again.  This efficiently saves SQL Server to go through the process of generating an execution plan for the same query over and over and over again(Unless your developers are in love with OPTION RECOMPILE of course).  This option is to protect against your cache being filled up with execution plans that only ever get used once.  Effectively it stores a ‘stub’ of that execution plan and then only stores the full plan if it sees it being executed again, which saves a lot of space.

When should I use it:  Almost always!  I’ve put the question to a lot of different people at various SQL Saturday expert panels and user group presentations.  The answer is pretty much always that there is really no downside in enabling this option.  If you have a server with a lot of adhoc querying occurring it will save space in the buffer pool than can be used for data pages = faster execution.  If you have a server with only a couple of queries executing multiple times there is a small impact in that the queries each have to run twice before the full plan is cached after a restart.  But in that instance your plan cache is likely to be smaller anyway due to the limited number of plans in it.

What else should you know:  To see if you are likely to get any benefit from the setting – check what’s happening on your server right now(disclaimer:  You’ll want to have had a little while since the last restart before running this script as the DMV’s it accesses are reset on service restart.)

SELECT objtype AS [CacheType]
, count_big(*) AS [Total Plans]
, sum(cast(size_in_bytes as decimal(18,2)))/1024/1024 AS [Total MBs]
, avg(usecounts) AS [Avg Use Count]
, sum(cast((CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) as decimal(18,2)))/1024/1024 AS [Total MBs – USE Count 1]
, sum(CASE WHEN usecounts = 1 THEN 1 ELSE 0 END) AS [Total Plans – USE Count 1]
FROM sys.dm_exec_cached_plans
GROUP BY objtype
ORDER BY [Total MBs – USE Count 1] DESC
go

(Code Courtesy of Kimberly Tripp’s blog post)

That code will tell you how many adhoc 1 use plans you have in the cache, together with how much space those plans are using right now.  You might be surprised!

Optober – SQL sp_configure options – Allow Updates

Option Name:  allow updates

Usage:  sp_configure ‘allow updates’, 1

Default: 0

Maximum: 1

Requires Restart:  No

What it does:  Once upon a time if you wanted to update system tables in your database you could do so, and this was the magic switch you needed to throw to do it.  In the current enlightened times people realize that updating system tables is generally speaking not a good idea, and if you really need to do it to (for example) fix an allocation error corruption, there is now emergency mode repair which can be used.

When should I use it:  Never after SQL 2000, and even in SQL 2000 only if you are very clear on what you are doing.  Books Online is pretty clear on this.  If you’ve updated system tables then your database is not in a supported state.  This feature currently has no effect and will be removed in a future version of SQL server.

What else should you know:  There are other ways to update system tables if there is corruption within your database.  But you need to be aware that if you change an underlying system table the date of the last change is logged in the header of the database.  It will be there for everyone who cares to look(ie Microsoft Support) to see.  I’m not going to go into details on that accept to point you over to Paul Randal’s blogs on database corruption.  Though if you are looking at this stuff god knows how you would have found my blog before his.

Optober – SQL sp_configure Options – Remote Admin Connection

On about day one of being a DBA I was told this was an option that should always be enabled.  It took me much longer to understand what it achieved, and over 5 years before I had to use the dedicated administrator connection ‘in anger’.  But if you use it once and save having to reboot a server, you will come to appreciate it very quickly.

Option Name:  remote admin connection

Usage:  sp_configure ‘remote admin connection’, 1

Default: 0

Maximum: 1

Requires Restart:  No

What it does:  SQL Server keeps a single scheduler dedicated to allow an administrator to connect.  To do this you connect as normal, but prefix your servername with ADMIN: – now even if SQL Server is completely tied up – this allows you to connect and work out what’s going on.  And that’s a very valuable thing to be able to do.  However, when SQL Server gets itself tied up into the sort of knots that would normally require using the dedicated admin connection, you’ll often find the server so slow to respond that trying to do anything on the instance itself is nearly impossible.  I’m assuming you’re not going to have your server hooked up to a mouse, monitor and keyboard on your desk(and if you do……another day perhaps).  So you are most likely going to need to RDP into the server.  When you RDP into a server you are actually adding even more load as your profile gets loaded and your remote session is maintained.  If the server is dying you don’t want to make it work harder.  If this option is enabled the dedicated admin connection can be used from a remote version of management studio or sqlcmd.

When should I use it:  Always.  This should be a standard deployment.  You may get objections about it being a security risk but seriously – you already have to be a system administrator to use it, there’s no danger there…or at least no additional danger.  If you have dodgy sysadmins then you have bigger problems than this setting.

What else should you know:   Be aware that you have one thread to play with if you connect to the DAC.  It’s meant as a method to get in, run quick diagnosis on what has put the server into the state it is in, and resolve the issue.  It’s not meant to run complicated query, there’s no chance of parellelism(Because it’s just ONE thread) and it’s really only there as a last ditch option to prevent you having to restart the server.

Also be aware that when used locally the DAC is listening on the loopback address.  If you are using the DAC remotely you are going to need to make sure all the necessary firewall ports are open.

Optober – SQL sp_configure options – Backup Compression Default

I was surprised today when doing a review that a client was still under the impression that backup compression was still an enterprise feature.  No no no no no.  It’s there for everyone with standard edition since SQL Server 2008 R2 and there’s really no reason not to be using it.  The only real question is why is it not a default setting?compression

Option Name:  backup compression default

Usage:  sp_configure ‘backup compression default’, 1

Default: 0

Maximum: 1

Requires Restart:  No

What it does:  Who wants smaller, faster backups?  Everyone!  Backup compression is very impressive and can reduce your backup sizes considerably.  On average I would say you can expect your compressed backup to be about 40% of the size of a non compressed backup, but some data is going to compress better and some not so well, so your mileage will vary.

When should I use it:  Almost always!  Not only are you going to get a smaller final backup size, but because the IO is reduced the backups are likely to complete faster than a regular backup.  Now, it’s worth noting that there is compression going on here, and that means extra CPU.  If your CPU is already being heavily taxed then you may want to steer away from backup compression as it will add to your processor load.

What else should you know:  There’s a few other bits that might be helpful with the ‘default backup compression’ option.

  • It’s only a default – It can be overridden in the actual backup command.  So if you want to leave it disabled, but find yourself short on space to take an adhoc backup – you can specify COMPRESSION as a backup option and your backup will be compressed.
  • You cannot mix compressed and non compressed backups within the same device.  I don’t know how common it is to append backups anymore.  I certainly don’t see it a lot.  But if you do that then you’ll need to remove\rename the old backups before switching from uncompressed backups to compressed.
  • Restore syntax is exactly the same whether a backup is compressed or not.  SQL just looks at the headers and figures out what to do.
  • All the other options you can throw at the backup command remain the same.  For example if you are doing a backup with checksum it works regardless of whether the backup is compressed or not.