It’s late on a Friday. You’ve just virtualized this workload app server, and now it’s gone straight to hell. No, it didn’t pass go first either. You go through the motions of checking the host, knowing that you triple checked it the first time before putting the workload on there. You verified that you’re not hugely overcommitted on anything, CPU is fine, as is disk and memory. There is no ballooning or swapping, you just can’t place your finger on it. What is it?
After sitting and scratching your head for a while, you have a white hot flash of inspiration, perhaps it’s the coffee, perhaps it’s the hour of evening, who knows, you’re onto something. You fire up perfmon (I did say this was a Windows app server didn’t I?), and in this moment of gloriousness you add the following counters:
Processor > % Privileged time _Total
System > Context Switches
Memory > Page Faults/sec
These links explain a bit about what these counters stand for:
In the screenshot above none of these values are critical, in fact they’re all quite good. It’s where these counters get into the higher ranges that performance starts to suffer. For context switches, this is at about 8-10,000 per processor, per second. Page Faults, similar, and % Privileged Time anything above 50% could be troublesome.
To get around this, on some of the newer AMD (and Intel) processors, you have a virtualized MMU, that will help with context switching. Further enabling RVI on some specific workloads (like those with high context switching) will yield great results. It may however have to be forced.
If changing out your hardware is not an option, and you are able to identify one of these as the culprit, you may want to look further into it with the plethora of Windows troubleshooting tools (I’ve found the ones from Sysinternals to be particularly helpful). This will give you some insight into what application, or bits of an application that are causing these issues.
In particular, I like to pull down Process Explorer, to get a really good picture of what’s going on, further, you can customize your view:
Now you can take a look at some of your more likely culprits:
A few parting notes: Troubleshooting from within the guest OS is skewed by the nature that it is a VM. However, after confirming that the hypervisor itself is in good shape, these tips are useful for helping to troubleshoot your workload.