How To Read Dumps – ESX Crash Dumps That Is

2331307556_84c8bb52c7_o[1]

About thirty years ago in the jungle in South Korea I was spending some time living as a monk. One of the things I learned from these monks, was the ancient art of Dump reading. Yes! That’s right, I can tell the future by reading the finer texture and smell of a dump.

Ok, while not true (I’m naught by 26) and I can’t tell the future by reading dumps. I can tell you, however, that reading ESX dumps would be conducive to your future.

What Makes A Dump?

Lots and lots of fiber in your diet. That… and PSOD’s (Purple Screens of Death). They’ll generate an ESX kernel dump and drop a crash dump file into the /root/ directory, named something like: ‘vmkernel-zdump-<reversed date>.#.#.#’

This file is created on the first reboot following your psod and is generated from the contents of your VMKCORE partition, you did make a VMKCORE partition, right? It’s the one labeled ‘fc’. Can’t find it? Sure? Did you look in your sock drawer? Ok… well in that case “vmkdump -d /dev/sda5” where /dev/sda5 is the output from esxcfg-dumppart -l

I Have My Dump, Now What?

So you can do a few things. First is to generate a support bundle and send it off to VMware for analysis (which you should do anyways). However, if you’re like me, and can’t wait, from the service console you can do the following:

Here is where the dump hides:

# ls -alh
total 14M
-rw-r–r–    1 root     root          13M Feb  6 04:40 vmkernel-zdump-020609.04.40.1

Lets extract it:

# vmkdump -l vmkernel-zdump-020609.04.40.1
created file vmkernel-log.1

# ls -alh
-rw-r–r–    1 root     root         186K Feb 11 14:32 vmkernel-log.1
-rw-r–r–    1 root     root          13M Feb  6 04:40 vmkernel-zdump-020609.04.40.1

So there it is… now lets take a look at the insides:

54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.386 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.386 cpu15:1166)<6>Debug scsi underrun
54:06:35:47.637 cpu7:1074)<6>qla24xx_abort_command(0): handle to abort=1457
_[45m_[33;1mVMware ESX Server [Releasebuild-113339]_[0m
Exception type 13 in world 1169:vmm0:197830- @ 0x6ff49b
frame=0x3c47cec ip=0x6ff49b cr2=0x8617c88 cr3=0x3f686000 cr4=0x2660
es=0x3ee64028 ds=0x4028 fs=0x1580000 gs=0x4041
eax=0x2a ebx=0xb3f0f80 ecx=0x9ff47e90 edx=0x50
ebp=0x3c47ed4 esi=0xe edi=0x15806c8 err=0 eflags=0x10286
0:1024/console 1:1196/vmware-vm 2:1200/mks:19783 3:1186/mks:19783
*4:1169/vmm0:1978 5:1161/vmware-vm 6:1170/vmm1:1978 7:1179/mks:19783
8:1176/vmm0:1978 9:1184/vmm1:1978 10:1182/vmware-vm 11:1177/vmm1:1978
12:1162/vmm0:1978 13:1198/vmm1:1978 14:1197/vmm0:1978 15:1039/idle15
@BlueScreen: Exception type 13 in world 1169:vmm0:197830- @ 0x6ff49b
0x3c47ed4:[0x6ff49b]E1000PollTxRing+0x366 stack: 0x7030140, 0xb3f0fb4, 0x0
0x3c47f2c:[0x701474]E1000_PollRings+0x1d7 stack: 0x3ee6a308, 0x704, 0x267d49c0
0x3c47f84:[0x618647]BH_Check+0x2ee stack: 0x1, 0x82000000, 0x85f7d70
0x3c47fd8:[0x62249c]VMKCall+0x147 stack: 0x2d, 0x85f7d70, 0x82000000
0x3c47ffc:[0x67af0b]VMKVMMEnterVMKernel+0x8e stack: 0x0, 0x0, 0x0
VMK uptime: 57:17:09:07.125 TSC: 11937242658207618
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1… using slot 1 of 1… log

The first column is your uptime. The last event before the crash was the aborted handle:

54:06:35:47.637 cpu7:1074)<6>qla24xx_abort_command(0): handle to abort=1457

The uptime of the kernel when the crash occurred is the second last line:

VMK uptime: 57:17:09:07.125 TSC: 11937242658207618

We can see that there is 11 hours between the last message and the time of the crash. This means that those debug scsi underrun messages can basically be ignored.

Now let’s move on to the backtrace itself:

@BlueScreen: Exception type 13 in world 1169:vmm0:notthemama- @ 0x6ff49b
0x3c47ed4:[0x6ff49b]E1000PollTxRing+0x366 stack: 0x7030140, 0xb3f0fb4, 0x0
0x3c47f2c:[0x701474]E1000_PollRings+0x1d7 stack: 0x3ee6a308, 0x704, 0x267d49c0
0x3c47f84:[0x618647]BH_Check+0x2ee stack: 0x1, 0x82000000, 0x85f7d70
0x3c47fd8:[0x62249c]VMKCall+0x147 stack: 0x2d, 0x85f7d70, 0x82000000
0x3c47ffc:[0x67af0b]VMKVMMEnterVMKernel+0x8e stack: 0x0, 0x0, 0x0

The last instruction was E1000PollTxRing then E1000_PollRings then BH_Check then VMKCall and finally VMKVMMEnterVMKernel

Based on the name of the last instruction, this host probably crashed due to some type of packet or frame corruption in the Intel E1000 driver in the VM that was running with world ID 1169 in vmm0 named ‘notthemama’.

Thanks for playing along. If you have questions hit me up in the comments or on twitter @cody_bunch

28 thoughts on “How To Read Dumps – ESX Crash Dumps That Is

Comments are closed.