Root Cause of Failed Heap Tracing with WinDbg on Windows XP
To debug a use-after-free crash that reproduces consistently on Windows XP SP3 (but not on Windows 7), we planned to enable page heap debugging (DHP) to capture the location where the target object was prematurely freed. Since we had not used GFlags-enabled page heap debugging before, we created a small test program to validate our workflow first.
After configuring GFlags and symbol paths correctly, we triggered the crash and ran the standard !heap -p -a <target-pointer> command to get the free stack trace. Instead of the expected output, we got the following error:
004010d9 8b11 mov edx,dword ptr [ecx] ds:0023:0161cff0=????????
0:000> !heap -p -a ecx
ReadMemory error for address eeddccee
Use `!address eeddccee' to check validity of the address.
Here ecx holds the pointer to the freed object, and the mov instruction is attempting to load the object's virtual table pointer.
We checked the memory permissions of the target address with !address:
0:000> !address ecx
015d0000 : 0161b000 - 000b5000
Type 00020000 MEM_PRIVATE
Protect 00000001 PAGE_NOACCESS
State 00001000 MEM_COMMIT
Usage RegionUsagePageHeap
Handle 015d1000
The output confirms that the address is correctly marked as page heap memory, marked PAGE_NOACCESS after being freed, so page heap was properly enabled.
We verified the Image File Execution Options registry entry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options to confirm configuration was correct, tried multiple configuration approaches and different XP machines, all with the same error. The exact same workflow worked correctly on 32-bit Windows 7.
After searching public documentation and local resources with no resolusion, we followed the method from Software Debugging to enumerate the User Mode Stack Trace (UST) data base. We could find the allocation stack trace for our test object, but no matching entry for the free operation.
When querying active page heap handles, WinDbg only displayed a single handle truncated by the memory error, even though page heap was clearly enabled. Our test code was too simple to corrupt debug metadata, and the debugger broke immediately after crash, so the issue pointed to a parsing error in WinDbg.
Searching global forums confirmed that other users had encountered the same issue, and the workaround was to use WinDbg build 6.6.0007.5. Testing this build on XP worked correctly, producing the full user-mode stack trace of the free operation as expected.
The root cause lies in structural differences between the _STACK_TRACE_DATABASE used in XP and Windows 7:
- On XP, stack trace entries are stored in an array, and the offset of the
Bucketsmember is different from Windows 7. - On Windows 7, the storage structure was changed to a linked list, and the member offset was adjusted.
Newer versions of WinDbg use the updated Windows 7 structure definition even when debugging XP systems, leading to incorrect memory address calculation when parsing the UST database, which causes the observed ReadMemory error when accessing heap metadata.