Microsoft – SQLServer (v2005) – Crashed with error info: Deadlocked Schedulers

Woke up a couple of days ago to a message stating that our Microsoft SQL Server Instance crashed twice during stress testing the night before.

SQL Crashed no way I thought.

But, once I got in to the office checked the SQL Server Error Logs and discovered that SQL had indeed crashed twice.

The errorlog registered something like what is pasted below:


Deadlocked Schedulers

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 97%. System Idle: 0%.

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 120 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 94%. System Idle: 3%.

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 180 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 95%. System Idle: 2%.

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 240 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 96%. System Idle: 1%.

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 96%. System Idle: 1%.

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 360 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 96%. System Idle: 1%.

 

A layman’s interpretation suggests the following

  1. Every 60 seconds or so SQL Server “queries” whether new requests are being picked up the “Engine”.  If new queries are not being serviced this information is reported along with the current system’s utilization.
  2. The system is also suggesting that we might want to increase the “max worker threads”

We thus went in and looked at the worker threads:

 

Syntax:


exec sp_configure 'max worker threads'

 

Result:

It is set at default of 0

Clarification:

0 is Microsoft’s default setting as it allows the system to control its own destiny

Since we know that a stress test was ongoing, we had a bit more to go with.

It seems that there were many requests being generated and it somewhat overwhelmed the system.

Next in line was to check the ‘max degree of parallelism’.  Some systems, especially ones that allow heavy customization, might generate in-efficient queries in terms of resultant query plans or large result sets.

To insulate the system, a bit better, try the following:

  1. Configure ‘max degree of parallelism’
    http://msdn.microsoft.com/en-us/library/ms181007.aspx
  2. Configure ‘query governor cost limit’ — Not recommended
    http://msdn.microsoft.com/en-us/library/ms190419.aspx

To set the ‘Max degree of Parallelism’ review the ‘Number of Physicals CPUs’ on you machine.  To get accurate CPU count consider using CPUID’s CPU-Z ( http://www.cpuid.com/softwares/cpu-z.html )

Our count was 2 so it is best to set ‘max degree of parallelism’ to 2 (to match CPU count).

1 is also a good count as it means that each single bad query will not be distributed among threads and cause ‘thread starvation’.

As Amit Banerjee (MS Blogger – SQL Server Escalation Services team at Microsoft) pointed out in his blogs postings:

One really wants to review the generated dump files that are available in the SQLError Log folder; especially the .mdmp files.

Run them through WinDmp.

In doing so, one might discover:

  • If there is a blocking issue
  • Number of threads running
  • If the threads are deadlocked and the matching pairs

This is a bit simple introduction.  To get more understanding:

  1. Debugging the deadlock but for the scheduler: Part 1 of 2
    http://troubleshootingsql.com/2011/08/15/debugging-the-deadlock-but-for-the-scheduler-part-1-of-2/
  2. How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888
    http://msdn.microsoft.com/en-us/library/cc917684.aspx

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s