Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012

This is an archive post of content I wrote for the NTDebugging Blog on MSDN. Since the MSDN blogs are being retired, I’m transferring my posts here so they aren’t lost. The post has been back-dated to its original publication date.

What is a bug check 0x133?

Starting in Windows Server 2012, a DPC watchdog timer is enabled which will bug check a system if too much time is spent in DPC routines. This bug check was added to help identify drivers that are deadlocked or misbehaving.  The bug check is of type “DPC_WATCHDOG_VIOLATION” and has a code of 0x133.  (Windows 7 also included a DPC watchdog but by default, it only took action when a kernel debugger was attached to the system.)  A description of DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff544084(v=vs.85).aspx.

The DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug check set to 0. In this case, the system’s time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2. Alternatively, if the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQL was raised to DPC level, the system will stop with a 0x133 with parameter 1 set to 1. Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on the system are set much higher.

How to debug a 0x133 (0, …

In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. For example, here is a debug of a 0x133 (0,…) kernel dump:

0: kd> .bugcheck
Bugcheck code 00000133
Arguments 00000000`00000000 00000000`00000283 00000000`00000282 00000000`00000000

Per MSDN, we know that this DPC has run for 0x283 ticks, when the limit was 0x282.

0: kd> k

Child-SP          RetAddr           Call Site

fffff803`08c18428 fffff803`098525df nt!KeBugCheckEx
fffff803`08c18430 fffff803`09723f11 nt! ??::FNODOBFM::`string'+0x13ba4
fffff803`08c184b0 fffff803`09724d98 nt!KeUpdateRunTime+0x51
fffff803`08c184e0 fffff803`09634eba nt!KeUpdateTime+0x3f9
fffff803`08c186d0 fffff803`096f24ae hal!HalpTimerClockInterrupt+0x86
fffff803`08c18700 fffff803`0963dba2 nt!KiInterruptDispatchLBControl+0x1ce
fffff803`08c18898 fffff803`096300d0 hal!HalpTscQueryCounter+0x2
fffff803`08c188a0 fffff880`04be3409 hal!HalpTimerStallExecutionProcessor+0x131
fffff803`08c18930 fffff880`011202ee ECHO!EchoEvtTimerFunc+0x7d  //Here is our driver, and we can see it calls into StallExecutionProcessor
fffff803`08c18960 fffff803`097258b4 Wdf01000!FxTimer::TimerHandler+0x92
fffff803`08c189a0 fffff803`09725ed5 nt!KiProcessExpiredTimerList+0x214
fffff803`08c18ae0 fffff803`09725d88 nt!KiExpireTimerTable+0xa9
fffff803`08c18b80 fffff803`0971fe76 nt!KiTimerExpiration+0xc8
fffff803`08c18c30 fffff803`0972457a nt!KiRetireDpcList+0x1f6
fffff803`08c18da0 00000000`00000000 nt!KiIdleLoop+0x5a

Let’s view the driver’s unassembled DPC routine and see what it is doing:

0: kd> ub fffff880`04be3409
ECHO!EchoEvtTimerFunc+0x54:
fffff880`04be33e0 448b4320        mov     r8d,dword ptr[rbx+20h]
fffff880`04be33e4 488b0d6d2a0000  mov     rcx,qword ptr [ECHO!WdfDriverGlobals (fffff880`04be5e58)]
fffff880`04be33eb 4883631800      and     qword ptr [rbx+18h],0
fffff880`04be33f0 488bd7          mov     rdx,rdi
fffff880`04be33f3 ff150f260000    call    qword ptr [ECHO!WdfFunctions+0x838(fffff880`04be5a08)]
fffff880`04be33f9 bbc0d40100      mov     ebx,1D4C0h
fffff880`04be33fe b964000000      mov     ecx,64h
fffff880`04be3403 ff15f70b0000    call    qword ptr[ECHO!_imp_KeStallExecutionProcessor (fffff880`04be4000)]   //It's calling KeStallExecutionProcessor with 0x64 (decimal 100) as a parameter

0: kd> u fffff880`04be3409
ECHO!EchoEvtTimerFunc+0x7d:
fffff880`04be3409 4883eb01        sub     rbx,1
fffff880`04be340d 75ef            jne     ECHO!EchoEvtTimerFunc+0x72 (fffff880`04be33fe)     //Here we can see it is jumping back to call KeStallExecutionProcessor in a loop
fffff880`04be340f 488b5c2430      mov     rbx,qword ptr[rsp+30h]
fffff880`04be3414 4883c420        add     rsp,20h
fffff880`04be3418 5f              pop     rdi
fffff880`04be3419 c3              ret
fffff880`04be341a cc              int     3
fffff880`04be341b cc              int     3

0: kd> !pcr
KPCR for Processor 0 at fffff80309974000:
    Major 1 Minor 1
      NtTib.ExceptionList: fffff80308c11000
          NtTib.StackBase: fffff80308c12080
         NtTib.StackLimit: 000000d70c7bf988
       NtTib.SubSystemTib: fffff80309974000
            NtTib.Version: 0000000009974180
        NtTib.UserPointer: fffff803099747f0
            NtTib.SelfTib: 000007f7ab80c000

                  SelfPcr: 0000000000000000
                     Prcb: fffff80309974180
                     Irql: 0000000000000000
                      IRR: 0000000000000000
                      IDR: 0000000000000000
            InterruptMode: 0000000000000000
                      IDT: 0000000000000000
                      GDT: 0000000000000000
                      TSS: 0000000000000000

            CurrentThread: fffff803099ce880
               NextThread: fffffa800261cb00
               IdleThread: fffff803099ce880

                DpcQueue:  0xfffffa80020ce790 0xfffff880012e4e9c [Normal] NDIS!NdisReturnNetBufferLists
                           0xfffffa800185f118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize
                           0xfffff8030994fda0 0xfffff8030972bc30 [Normal] nt!KiBalanceSetManagerDeferredRoutine
                           0xfffffa8001dbc118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize
                           0xfffffa8002082300 0xfffff88001701df0 [Normal] USBPORT

The !pcr output shows us queued DPCs for this processor. If you want to see more information about DPCs and the DPC Watchdog, you could dump the PRCB listed in the !pcr output like this:

dt nt!_KPRCB fffff80309974180 Dpc* 

Often the driver will be calling into a function like KeStallExecutionProcessor in a loop, as in our example debug.  To resolve this problem, contact the driver vendor to request an updated driver version that spends less time in its DPC Routine.

How to troubleshoot a 0x133 (1, …

Determining the cause of a stop 0x133 with a first parameter of 1 is a bit more difficult because the problem is a result of DPCs running from multiple drivers, so the call stack is insufficient to determine the culprit.  To troubleshoot this stop, first make sure that the NT Kernel Logger or Circular Kernel Context Logger ETW traces are enabled on the system.  (For directions on setting this up, see http://blogs.msdn.com/b/ntdebugging/archive/2009/12/11/test.aspx.)

Once the logging is enabled and the system bug checks, dump out the list of ETW loggers using !wmitrace.strdump. Find the ID of the NT Kernel logger or the Circular logger. You can then use !wmitrace.logsave (ID) (path to ETL) to write out the ETL log to a file. Load it up with Windows Performance Analyzer and add the DPC or DPC/ISR Duration by Module, Function view (located in the Computation group) to your current analysis window:

WPA DPC Graph
WPA DPC Graph

Next, make sure the table is also shown by clicking the box in the upper right of the view:

Enable Graph and Table view in WPA
Enable Graph and Table view in WPA

Ensure that the Address column is added on the left of the gold bar, then expand each address entry to see individual DPC enters/exits for each function. Using this data, you can determine which DPC routines took the longest by looking at the inclusive duration column, which should be added to the right of the gold bar:

DPC Duration
DPC Duration

In this case, these DPCs took 1 second, which is well over the recommended maximum of 100 us. The module column (and possible the function column, if you have symbols) will show which driver is responsible for that DPC routine. Since our ECHO driver was based on WDF, that is the module named here.

For an example of doing this type of analysis in xperf, see http://blogs.msdn.com/b/ntdebugging/archive/2008/04/03/windows-performance-toolkit-xperf.aspx.

More Information

For additional information about Stop 0x133 errors, see this page on MSDN: http://msdn.microsoft.com/en-us/library/windows/hardware/jj154556(v=vs.85).aspx.

For DPC timing recommendations and for advice on capturing DPC timing information using tracelog, see http://msdn.microsoft.com/en-us/library/windows/hardware/ff545764(v=vs.85).aspx.

Guidelines for writing DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff546551(v=vs.85).aspx.

-Matt Burrough