Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Debugging a Critical Section Deadlock with WinDbg

Tech May 12 3

While investigating an intermitent UI freeze in an application, I captured a memory dump and loaded it into WinDbg for analysis. The stack trace immediately revealed a critical section deadlock.

0:000:x86> kb
ChildEBP RetAddr  Args to Child              
0032dd0c 779ed993 00000710 00000000 00000000 ntdll_779b0000!NtWaitForSingleObject+0x15
0032dd70 779ed877 00000000 00000000 024023f0 ntdll_779b0000!RtlpWaitOnCriticalSection+0x13e
0032dd98 58a2fac3 02404c50 856fd57e 024023f0 ntdll_779b0000!RtlEnterCriticalSection+0x150
0032dffc 58a0d4d7 856fea8a 00000000 001c41a0 SogouSoftware_589d0000!CDownloadListUI::UpdateDownloadListUI+0x43

Examining the critical section details:

0:000:x86> !cs 02404c50
-----------------------------------------
Critical section   = 0x0000000002404c50 (+0x2404C50)
DebugInfo          = 0x0000000000611e08
LOCKED
LockCount          = 0xFFFFFFFF
WaiterWoken        = Yes
OwningThread       = 0x0000000000000710
RecursionCount     = 0x1A38
LockSemaphore      = 0x2433B08
SpinCount          = 0x0000000000000000

The output indicates that thread 0x710 owns the critical section. Attempting to switch to this thread resulted in an error, which typically means the thread has already terminated or been killed.

0:000:x86> ~~[710]
                 ^ Illegal thread error in '~~[710]'

The relevant code where the deadlock occurs:

void CDownloadListUI::UpdateDownloadListUI()
{
    m_vctLock.Lock();
    vector<int> vecDeleteItems(GetCount());
    std::iota(vecDeleteItems.begin(), vecDeleteItems.end(), 0);
    // ... processing ...
    m_vctLock.UnLock();
}

The m_vctLock object is an ATL wrapper around a critical section. After reviewing all locations where this lock is acquired, I confirmed that only the main thread and one worker thread access it. The worker thread was still running and its thread ID did not match 0x710. Could WinDbg be providing incorrect information? Let me examine the critical section structure directly:

0:000:x86> dt _RTL_CRITICAL_SECTION 02404c50
DuiLib!_RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x00611e08 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 0n-6
   +0x008 RecursionCount   : 0n1
   +0x00c OwningThread     : 0x00001a38 Void
   +0x010 LockSemaphore    : 0x00000710 Void
   +0x014 SpinCount        : 0

This reveals a discrepancy. The OwningThread field shows 0x1a38, while !cs reported 0x710. Let me identify which thread matches 0x1a38:

0:000:x86> ~~[1a38]
   6  Id: 2058.1a38 Suspend: 0  Teb: 7ef94000 Unfrozen
      Start: SogouSoftware_589d0000!_threadstartex (58a5192d) 
      Priority: 0  Priority class: 32  Affinity: f
0:000:x86> ~6s
ntdll_779b0000!ZwWaitForMultipleObjects+0x15:
779d019d 83c404          add     esp,4
0:006:x86> kb
ChildEBP RetAddr  Args to Child              
0370fa5c 768615f7 00000002 0370faac 00000001 ntdll_779b0000!ZwWaitForMultipleObjects+0x15
0370faf8 773519f8 0370faac 0370fb20 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
0370fb40 773541d8 00000002 7efde000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0
0370fb5c 589f6ba0 00000002 0370fb84 00000000 kernel32!WaitForMultipleObjects+0x18
0370fbd4 58a51907 58aab894 862df68e 00000000 SogouSoftware_589d0000!CThreadQueue<TagDownloadTask>::ThreadProc+0x100 
0370fc0c 58a51991 00000000 0370fc24 7735336a SogouSoftware_589d0000!_callthreadstartex+0x1b
0370fc18 7735336a 023f5170 0370fc64 779e9882 SogouSoftware_589d0000!_threadstartex+0x64
0370fc24 779e9882 023f5170 771cc6bb 00000000 kernel32!BaseThreadInitThunk+0xe
0370fc64 779e9855 58a5192d 023f5170 00000000 ntdll_779b0000!__RtlUserThreadStart+0x70
0370fc7c 00000000 58a5192d 023f5170 00000000 ntdll_779b0000!_RtlUserThreadStart+0x1b

Thread 6 is currently blocked in a wait state. Looking at the code, thread 6 has a callback function that accesses m_vctLock in the CDownloadListUI class. The callback itself has already returned, so there should be no reason for the lock to still be held. This pointed to a lock leak—where Lock() was called without a corresponding Unlock().

Inspecting the callback function revealed the root cause: an uncommon code path that returned early without calling Unlock(), causing a classic lock leak that resulted in the deadlock.

Interestingly, the !cs command provided misleading information in this scenario, initially suggesting a thread exited while holding the lock. Further investigation confirmed this was a WinDbg quirk. Testing revealed that the issue occurs specifically when analyzing 32-bit process dumps captured on 64-bit systems, while !cs works correctly for dumps taken on 32-bit systems.

Tags: WinDbg

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.