Analysing Dumps
Okay, so you have your recurring problem, you prepared the system to generate the useful data and now you have something to look at... now what?
In order to make sense of the raw data in the dumps, you need the relevant symbols for (ideally all) the modules containing code in it - see common procedure 4
Now we're ready to actually load the dump file for analysis...
- Launch WinDbg
- Click File
- Click 'Open Crash Dump', browse to the .DMP file and double-click it
The header of the dump file will let the debugger know what type of dump it is and what symbols are essential to even make a start - i.e. for a dump of EXPLORER.EXE, at the very least we will need symbols for this file
WinDbg may therefore look like it isn't doing a great deal initially, whereas it is in fact checking for (and downloading) symbols it needs
There is a bar at the bottom for you to enter commands into the debugger once it has done the necessary downloading
Commands I typically enter when debugging, so I know what the debugger is doing when it appears to be idle:
!sym noisy
- This turns on "noisy" symbol loading so you can see where it is getting symbol files from (the symbols files can be large as take a while to download sometimes)
Code:
0:007> !sym noisy
noisy mode - symbol prompts on
.reload /f
- This forces a reload of symbols for all modules in the dump file, even those which have not yet been downloaded but might trigger a download in the middle of your debugging session
Code:
0:007> .reload /f
Reloading current modules
.
SYMSRV: Q:\Symbols\Script Checker Interceptor.pdb\527CFF7666C84738B5BDD72E225ADC321\Script Checker Interceptor.pdb not found
SYMSRV: http://msdl.microsoft.com/download/symbols/Script Checker Interceptor.pdb/527CFF7666C84738B5BDD72E225ADC321/Script Checker Interceptor.pdb not found
DBGHELP: Script Checker Interceptor.pdb - file not found
DBGHELP: O:\out_Win32\Release\Script Checker Interceptor.pdb - file not found
*** ERROR: Symbol file could not be found. Defaulted to export symbols for scrchpg.dll -
DBGHELP: scrchpg - export symbols
.
SYMSRV: iexplore.pdb from http://msdl.microsoft.com/download/symbols: 86673 bytes - copied
DBGHELP: iexplore - public symbols
Q:\Symbols\iexplore.pdb\3544BAF610664EC3B420AF05F04F589B2\iexplore.pdb
.
SYMSRV: ieframe.pdb from http://msdl.microsoft.com/download/symbols: 2160324 bytes - copied
DBGHELP: IEFRAME_4df0000 - public symbols
Q:\Symbols\ieframe.pdb\4A4E76B2DB544787AD0633C6BA8271CE2\ieframe.pdb
.
SYMSRV: Q:\Symbols\r3hook64.pdb\993432011C36406A93CDDA0CADB66DB41\r3hook64.pdb not found
SYMSRV: http://msdl.microsoft.com/download/symbols/r3hook64.pdb/993432011C36406A93CDDA0CADB66DB41/r3hook64.pdb not found
DBGHELP: r3hook64.pdb - file not found
DBGHELP: O:\out_win32\Release\r3hook64.pdb - file not found
*** ERROR: Symbol file could not be found. Defaulted to export symbols for r3hook.dll -
DBGHELP: r3hook - export symbols
.
DBGHELP: IEFRAME - public symbols
Q:\Symbols\ieframe.pdb\4A4E76B2DB544787AD0633C6BA8271CE2\ieframe.pdb
!analyze -v
- Most people start here, as it gives you a quick overview of an exception (assuming it wasn't a manual dump) and in some cases a "probably caused by" guess (and it IS a guess, in some cases very accurate and in others completely wrong)
!vm
- Presents a summary of virtual memory along with a list of the processes running at the time of the crash
lm ft
- "List Modules" (with file location & timestamp fields) to get a complete list of module information to look for old drivers or those known to be versions which are not stable
Code:
0:007> lmft
start end module name
00000000`003e0000 00000000`00405000 scrchpg scrchpg.dll Fri Mar 09 17:46:53 2007 (45F18F7D)
00000000`00c40000 00000000`00cdb000 iexplore C:\Program Files (x86)\Internet Explorer\iexplore.exe Tue Jun 26 03:50:59 2007 (46807103)
00000000`04df0000 00000000`053bb000 IEFRAME_4df0000 IEFRAME.dll Tue Jun 26 04:50:31 2007 (46807EF7)
00000000`10000000 00000000`10010000 r3hook r3hook.dll Fri Mar 09 17:51:14 2007 (45F19082)
00000000`724c0000 00000000`72a8b000 IEFRAME IEFRAME.dll Tue Jun 26 04:53:52 2007 (46807FC0)
00000000`72e60000 00000000`72ea5000 SCHANNEL SCHANNEL.dll Tue Jun 19 04:07:52 2007 (46773A78)
00000000`72eb0000 00000000`72ed1000 NTMARTA NTMARTA.dll Thu Nov 02 10:43:55 2006 (4549BDDB)
00000000`72fe0000 00000000`73010000 MLANG MLANG.dll Thu Nov 02 10:40:07 2006 (4549BCF7)
00000000`73060000 00000000`7320a000 gdiplus gdiplus.dll Thu Nov 02 10:38:55 2006 (4549BCAF)...
lm vm MODULENAME
- Display verbose information on module MODULENAME, often the author of the module and the version strings are in here
Code:
0:007> lmvm r3hook
start end module name
00000000`10000000 00000000`10010000 r3hook (export symbols) r3hook.dll
Loaded symbol image file: r3hook.dll
Image path: r3hook.dll
Image name: r3hook.dll
Timestamp: Fri Mar 09 17:51:14 2007 (45F19082)
CheckSum: 0001316A
ImageSize: 00010000
File version: 6.0.2.621
Product version: 6.0.2.621
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 1.0 App
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Kaspersky Lab
ProductName: Kaspersky Anti-Virus
InternalName: R3HOOK
OriginalFilename: R3HOOK.DLL
ProductVersion: 6.0.2.621
FileVersion: 6.0.2.621
FileDescription: Kaspersky Anti-Virus Ring 3 Hooker
LegalCopyright: Copyright © Kaspersky Lab 1996-2007.
LegalTrademarks: Kaspersky™ Anti-Virus ® is registered trademark of Kaspersky Lab.
kv 50
- show the last (up to) 50 stack entries for the current thread ('kv' by itself might show only the top portion of a stack if it's large)
Code:
0:000> kv 50
Child-SP RetAddr : Args to Child : Call Site
00000000`0017dca8 00000000`76f6ed73 : 00000000`00000001 00000000`7708ead9 00000000`00100246 00000000`00000000 : ntdll!NtWaitForMultipleObjects+0xa
00000000`0017dcb0 00000000`7708e96d : 00000000`00000001 00000000`0017ded0 00000000`00000000 00000000`00000000 : kernel32!WaitForMultipleObjectsEx+0x10b
00000000`0017ddc0 00000000`7708e85e : 00000000`00000001 00000000`0024c260 00000000`00279f50 00000000`00000000 : USER32!RealMsgWaitForMultipleObjectsEx+0x129
00000000`0017de60 000007fe`f92b8fdf : 00000000`002899f0 00000000`0028bb00 00000000`ffffffff 000007fe`f92c3ad9 : USER32!MsgWaitForMultipleObjectsEx+0x46
00000000`0017dea0 000007fe`f92ad845 : 00000000`00000001 00000000`0024c260 00000000`00279f50 00000000`ffffffff : IEUI!CoreSC::Wait+0x4f
00000000`0017def0 000007fe`f5d41b7d : 00000000`00275260 00000000`00000000 00000000`00000000 00000000`00000000 : IEUI!WaitMessageEx+0x75
00000000`0017df30 000007fe`f5d7ccf7 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`0024c260 : IEFRAME!CBrowserFrame::FrameMessagePump+0x1d0
00000000`0017dfa0 000007fe`f5d7dfa4 : 00000000`00000001 00000000`00275260 00000000`00000000 00000000`00272300 : IEFRAME!BrowserThreadProc+0x47
00000000`0017dfd0 000007fe`f5d7debf : 10e1f12f`0000000a 00000000`0024c260 00000000`001fc7c0 00000000`00000001 : IEFRAME!BrowserNewThreadProc+0x92
00000000`0017e010 000007fe`f5d7d6e8 : 00000000`0024c260 00000000`0024c260 00000000`00000001 00000000`00000000 : IEFRAME!SHOpenFolderWindow+0x202
00000000`0017f0c0 00000000`00a8d3d2 : 00000000`001fc7c0 00000000`00000001 00000000`00000001 00720074`00620027 : IEFRAME!IEWinMain+0x369
00000000`0017f370 00000000`00a91b6e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : iexplore!wWinMain+0x35a
00000000`0017f810 00000000`76f6cdcd : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : iexplore!StringVPrintfWorkerW+0x272
00000000`0017f8d0 00000000`7718c6e1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`0017f900 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
The analysis of a dump depends very much on the type of dump (user mode or kernel mode), whether it was a crash or a hang, and if it was a crash what the exception code was - there isn't a magic "!what_went_wrong" instruction to solve everything for you.
Without symbols, debugging is almost impossible - so if MyFancyApp.exe by MadeUpCompany keeps crashing randomly then chances are only the developers at MadeUpCompany could provide any useful diagnosis
Crash dumps I find generally easier to diagnose than hang dumps, as there is a point from which we can work backwards to work out how we might have arrived at the exception
That said, hang dumps are sometimes simply "deadlocks"
e.g. thread A owns resource X, holding it exclusively and wants resource Y, while thread B owns resource Y exclusively and wants resource X - the 2 threads are now deadlocked waiting for something that will never occur
For a crash dump, the stack of the running thread at the time of the crash is the first clue as it must have caused the exception - but it does NOT mean that it is guaranteed to be the bad guy (though this is very often the case)
Consider a case where driver P is naughty and overruns a buffer it allocated, extending into a memory area allocated by driver Q - driver P is able to work with its data quite happily, complete its work and hand control back to whoever wants it - then along comes driver Q at some future time to work on its data and finds it to be garbage
BOOM - unhandled exception in kernel mode = bugcheck - stack trace evdience says "probably caused by driver Q"
In order to do any decent kind of analysis, the more information we have, the better - in the case of dump analysis, this means more dumps
What is interesting it what the dumps have in common - because of how memory is used dynamically, in the "driver P/driver Q" scenario above it might present as different STOP codes in different drivers and so appear to have no particular pattern
A particularly useful tool if you suspect drivers is "Driver Verifier" - verifier.exe - which happens to be built into Windows
This is a troubleshooting tool and should only be used for identifying problems with system stability, as enabling debugging options here will have an impact on performance - don't go blindly enabling everything this tool can do or you may render Windows unbootable (though "Last Known Good" should get you back)
If, for example, you have a machine which keeps bugchecking with messages relating to "pool corruption", then the "speical pool" option on all 3rd party drivers could help - in the "driver P/driver Q" scenario it would actually bugcheck the system earlier, with a different STOP code, and point the finger at driver P when it overran its pool allocation
"Drivers" can include video, audio, network, USB, antivirus, virtual device, filters - they can be related to specific hardware devices or used by software such as services or system applications
Sometimes assumptions made by drivers can lead to problems that only occur when used in combination with each other
- if 2 filter drivers assume they are at the very top or bottom of the filter stack, for example - and this is why running 2 separate AV products at the same time might not be wise