While this isn’t as big a problem as it used to be, Context Switches in applications can still bring a VM to it’s knees. In this post I’m going to link you to some resources as to what context switching is, how to look for it, and why excessive context switches hurts performance.
What Is Context Switching
- Wikipedia’s answer here.
- Linfo’s answer here.
- How they work on Windows.
- Seemingly, “Windows Internals 5th Ed” has some info in Chapter 5: “Processes, Threads and Jobs.”
Monitoring Context Switching
- On Windows – here and here.
- On Linux, you can use SAR:
- sar -w (more here)
- More on the Perfmon counters here.
- Page Faults (They cause context switches) on Windows here.
Why Excessive Context Switching Sucks
I was wondering how much overhead there is when using virtualization. I repeated the benchmarks for the dual E5440, once a normal Linux install, once while running the same install inside VMware ESX Server. The result is that, on average, it’s 2.5x to 3x more expensive to do a context switch when using virtualization. My guess is that this is due to the fact that the guest OS can’t update the page table itself, so when it attempts to change it, the hypervisor intervenes, which causes an extra 2 context switches (one to get inside the hypervisor, one to get out, back to the guest OS).
This probably explains why Intel added the EPT (Extended Page Table) on the Nehalem, since it enables the guest OS to modify its own page table without help of the hypervisor, and the CPU is able to do the end-to-end memory address translation on its own, entirely in hardware (virtual address to "guest-physical" address to physical address).
This bit, is important to note… newer generation processors (those with NPT) should not suffer the 2-3x increase in cost. However, they’re still expensive.
As always, if you’ve additional questions, please drop a note in the comments, or hit me up on Twitter.