The CXPacket Wait is one of the most misunderstood and worst explained wait types I’ve come across. It’s not so much that people don’t know what it is, but that people don’t know when they need to do something about it. There’s a lot of old posts telling you to set maximum Degree of parallelism to 1 and the wait type goes away(Which is true, but not the best advise), then there’s a lot of newer posts that tell you that’s a bad idea as it stops all parallel execution of queries, but don’t explain when we should and shouldn’t be worries about it.
So What is the CXPacket Wait Type?
Let’s start with a recap. A CX Packet is a “Class Exchange” Packet, and the old entries in Books Online tell us we “You may consider lowering the degree of parallelism if contention on this wait type becomes a problem”. There’s a large part of that statement that hangs on the ‘If contention on this wait type’ part of the statement and a lot of unanswered questions. How do I know if I have contention on this wait type? What should I do if I do?
What’s actually going on here?
When a query goes is able to go parallel(That is it has a cost greater than the cost threshold of parallelism setting, contains operators that would benefit from parallelism, and the maximum degree of parallelism setting not 1), it can be broken up into a group of threads to perform the tasks needed to complete the query. It also requires a thread dedicated to look after all the other threads, co-ordinate and collate their efforts. Brent Ozar explains this like a teacher and a class of students. Some tasks are so simple that either the teacher or any of the students could do them quickly alone, while others will complete faster if the whole class works on them with the teacher ensuring that everyone is ‘on task’. CXPacket waits occur when threads finish at different times and the query is waiting for the remainder of threads to finish. They also accumulate while the ‘teacher’ thread is waiting.
So it’s not really a bad thing in itself, hence the number of newer posts about it which are absolutely against the ‘Set max Degree of Parallelism to 1’ solution. I agree. On modern servers with dozens of cores it seems ludicrous that you would enable a server wide setting to that limits the use of those cores and effectively cripples the performance of queries that were executing fine in parallel. Which bring me back to the subject of the post…
Should I just Ignore CXPacket Waits?
I’ve seen the odd suggestion to just ignore it, or add it to the benign wait type list in whatever query or tool you use to monitor your waits. Most of the time you can, except then when it is a real problem you can’t. So what should you do?
I think the change in the Books Online entry for CXPacket waits gives us the best clue. In 2008 R2 and earlier it says this: “Occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if contention on this wait type becomes a problem.” In 2012 and 2014 it says this: “Occurs with parallel query plans when trying to synchronize the query processor exchange iterator. If waiting is excessive and cannot be reduced by tuning the query (such as adding indexes), consider adjusting the cost threshold for parallelism or lowering the degree of parallelism.” In other words you probably want to take this from a server level problem to an individual query problem.
Look at the queries that are generating high CXPacket waits. You’ll need to query out of the plan cache to identify them and grab the top 2 or 3 offenders. Take those over to your test system and verify you see the same behavior. Now test run-time with Max Degree of Parallelism set as per production and then set as 1. What you are most likely to see is that the query performs better with Parallelism enabled. This doesn’t prove CXPackets are not an issue, but does show how bad a solution setting Max Degree of Parallelism is. Now look at your other wait types – these are likely the true source of your issue. You are likely to see scheduler yields or IO Pagelatches which will indicate benefits can be gained from setting the right indexes. You may see the query run faster on test than in production – this can indicate that on production there is contention on another resource.
Once you’ve tuned the heck out of the top few offenders take them back to prod and you should see a reduction in CXPacket. You can repeat that exercise from time to time, particular the comparing run-times to see how much difference the Parallelism makes.
Some Considerations
- If you are using Max Degree of Parallelism at one you can still see CXPacket waits. Developers or DBA can fine-tune queries using the MAXDOP hint to bypass the server wide Max DoP settings.
- Some Applications(Ahem…Microsoft Dynamics AX, Microsoft Sharepoint….) have a vendor recommendation to disable server wide parallelism. See the above point on how to stay supported but also increase large query speed.
- Look at your plan cache as a whole and work out what constitutes a big query and a little one in terms of query cost. You can use Cost Threshold for Parallelism appropriately to dictate which queries go Parallel and which ones don’t.
- Don’t be hamstrung trying to find ‘best practice’. Don’t take any advise you find on the internet as gospel. Your production system is different from everyone elses, and you may need to do some experimentation to find the best setting for you.
And finally, a word of warning. Don’t write a blog post on Parallelism if you can’t spell Parallelism. I think I spelled it wrong about 20 times above:) This should probably expand to a lengthy client discussion on Parallelism, it’s hard enough to spell, you should try saying repeatedly in a conference call!