From the moment I started playing with this SQL Server thing I have certain things repeated to me over and over again. One of those is the importance of removing fragmentation from indexes. There’s a whole bunch of stuff out there on the interweb on people rebuilding indexes and performance increasing, and so now…a few years further on in my journey it’s quite common for a client to come to me with questions about whether such and such an index has been rebuilt, or concern over fragmentation levels. Increasingly I find myself explaining that rebuilding the index may help, but it’s quite an expensive operation, can cause blocking and may not be the problem anyway.
Now, I’m not saying that indexes shouldn’t be rebuilt or maintained. Clearly they should, but there’s a bunch of other things that happen when we rebuild an index which may be what is actually resolving the performance issue that has raised the original flag. Let’s look at them:
Statistics Are Updated
Whenever you rebuild an index, all the statistics related to that index are updated. (Take Note: This applies to rebuilding the index, if you are reorganising it the statistics are not automatically maintained). Statistics are very powerful part of how SQL Server decides to access data. Even with autoupdate statistics enabled statistics can get out of date really quickly, and query plans can be generated using incorrect information. That can bring back information significantly slower than if the statistics are up to date. The most common affect I see around this is around queries based on dates(and in particular that use a comparison against something like getdate() or CURRENT_TIMESTAMP). Recently I saw a query which was running around 800 times a day and taking around 17 seconds per execution. Without adjusting anything but updating the statistics that dropped down to 8 seconds. Rebuilding the index would have resulted in the same performance improvement, but has a much higher impact on the server to achieve the same result.
Plans are recalculated
Related to the above, any plan that uses the index is recalculated based on the latest information – Fragmentation is removed(hopefully) and statistics are up to date, so SQL should choose a more appropriate plan. Or it could be slow due to a parametrization issue which generated a plan based on one value and is now re-using the plan for other, less appropriate values. Imagine a database which contained everyone in the worlds favourite colour. If a plan is run the first time to get information for New Zealand population, and all subsequent queries are trying to get information about the US or Europe, SQL is obviously going to have trouble accessing the larger volumes of data in the same way it would for the smaller set of data associated with New Zealand. Now, everyone in the world is going to be a big index to rebuild, where as in that case a recompile hint could see the same performance improvement.
So, in conclusion – yes your indexes need to be maintained, but if there is a performance issue on a server, it’s much better if you can narrow it down to a specific problem which can potentially be resolved surgically, than get out the second biggest hammer you have(behind restarting your server….and one day when I am brave enough I’ll post on what that is actually doing when it magically fixes your performance issues.)