Is Your NMI Stuck? – More On The NMI Stuck Issue

Is your NMI stuck? Well you better catch it! Hey! I’m allowed to make bad jokes. After all, that is why you keep coming back, no?

So to follow up some on our prior NMI issues. It seems that unlike what was stated before, this isn’t specific to resource constraints, x64, or vSMP. In fact, I’ve now seen it happen on any number of configs. After chasing this down with VMware, it seems that there is in fact some kernel funkiness that is going on.

Per VMware KB 1003936:

Some Linux kernels, prior to version 2.6.20, that run on multiple processors have a bug that can cause the kernel to hang. If this occurs, the following message will display:

NMI appears to be stuck.

Red Hat Enterprise Linux 5 is known to have this problem.

There is also the following Red Hat KB which links to a patch. It should be noted, however, that more recent RHEL (and other distro’s) do not seem to have this issue.

Like always, leave me any questions or comments in the comments or on Twitter. Happy Hunting

Edited to fix my spelling error :\

4 thoughts on “Is Your NMI Stuck? – More On The NMI Stuck Issue

  • Hi

    I am using SUSE 10.2. I am having some issues with one of my device drivers in SUSE. when my system boots up everything seems to be fine. But when I do dmesg some where in the log it shows me the same message NMI seems to be stuck. Nothing is wromg till now but as soon as i plug in my device driver and start reading stuff from that kernel gets hang and the error comes is

    Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
    <ffffffff802ea868>{_spin_lock_irqsave+3}

    I changed the file nmi.c which you mentioned but the message NMI seems to be stuck doesnt go away. I checked my kernel version and it 2.6.16. so i am suspecting this is kernel issue.

    I will really appreciate if you give me some guidelines what to do.

    Thanks
    Arun Mittal

  • I'd follow up with the VMware KBs, and perhaps open a support case. It sounds like this particular driver is at fault… is this a custom driver, or a vendor driver? If a vendor driver, have you tried contacting the vendor?

  • Hi

    Thanks for the reply. Actually i just discussed with my seniors and they
    told me that by only changing the file doesnt make any difference. I need to
    build the kernel then actually it should work. I didnt know that. Let me do
    that first then we will see next step.

  • Hi

    Thanks for the reply. Actually i just discussed with my seniors and they
    told me that by only changing the file doesnt make any difference. I need to
    build the kernel then actually it should work. I didnt know that. Let me do
    that first then we will see next step.

Comments are closed.