Choosing SQL Clustering or VMware HA – What is Right?

This is a big one to try to tackle in a single post, but the question comes up often enough to try. I figure to best answer it, it would help to understand what each does:

VMware HA

What it does: VMware HA will detects host & VM (VM heartbeat, etc) failures. On a failure it will attempt to restart the VM or group of VMs on another node in the VMware cluster.

What it doesn’t: VMware HA will not detect application or OS level failures (excepting VM heartbeat, etc). What this means: Your SQL VM will only fail from its host to another cluster node after a catastrophic failure:  Someone sticks a screwdriver into the ESX host, etc.

MSCS (SQL Clustering)

What it does: MSCS will detect the failure of any one of it’s cluster resources, and take the defined action. What does that mean? Each cluster resource can be set to have any number of dependencies and have a failure action like: Move Resource Group. This setup will also protect you from catastrophic failure of a host, in that the SQL services will fail over to a VM that is running on the other node.

What you lose

 You lose VMware HA for the 2-4 MSCS VMs. Knowing what MSCS does, that may not be a problem. So long as one sets up appropriate DRS rules to keep the MSCS VMs from running on the same host.

What you DON’T lose

This one is so critical, that I used caps in the section header! Really! Why? Because while you give up some of the more advanced features (HA, etc) going with MSCS, you DON’T (there go them caps again) lose the ability to have HA for the remainder of your VMs. That’s right, your VM web heads, and that accounting VM will still have the advanced features available to them (That is if you have vCenter and are licensed for them).

Other Considerations

Cost. There are associated costs with either method, for instance, VMware HA requires a vCenter license, and vCenter server to make it work. MSCS requires Windows to be licensed appropriately for both nodes. Both solutions require some form of share storage medium.

Supportability. While it can be done, MSCS on ESX adds some complexity into the design that would not other wise be present. Is it a san issue? A VM issue? Heart Beat networking? Each piece that changes from your standard method adds complexity into the solution, and makes it more ‘interesting’ to troubleshoot.

Which is best?

This is really up to you, and what your environment requires. After all, who knows the complexity and requirements of your design better than you. Well… perhaps that Leprechaun from down the street, but alas. With the notes above, it should help clear up the choice.

Questions? Comments? Other issues I missed? Drop me a note in the comments or via Twitter

9 thoughts on “Choosing SQL Clustering or VMware HA – What is Right?

  • Hi

    Good points, but you forgot MS licensing costs as one of the reasons.

    To cluster SQL you require Windows 2003/Windows 2008 Enterprise edition and above. Also in most cases you will want to give your virtualised SQL server to have more than 1 vpcu, which again puts the costs up if you buy MS SQL 2005 Std or above.

    MS SQL 2005 Enterprise edition gives you unlimited SQL 2005 VM's per virtual host, but at a premium price of course.

    Sometimes it is cheaper to have the 2 way physical SQL cluster, using 1 quad core CPU per cluster node (and memory of course) which still give you the availability, reduces hardware costs/software costs and gives you the performance you need.

    Cheers
    David

  • David,
    Licensing was mentioned:
    “*Cost. *There are associated costs with either method, for instance, VMware
    HA requires a vCenter license, and vCenter server to make it work. MSCS
    requires Windows to be licensed appropriately for both nodes. Both solutions
    require some form of share storage medium.”

    But not in as much detail. I'll update the post accordingly in just a bit.

    Tanks again for the heads up.

  • Or you could simplify your life and get much higher availability by putting this all on a Stratus ftServer. You only need one server, so you save the cost of the second of third (or more) machines. You don’t need an SQL enterprise license (~$25,000.) to run on ftServer, it’s a single server with a standard license (~$2,300.), saves more then the cost of the ftServer, not to mention the simplicity of the ftServer vs. either one of these options, and greater then 5-nines uptime to boot…

  • Phil,

    Good comment. I've not heard of ftServer prior. I'll look into it more, but
    they appear to essentially a hardware solution, which while excellent, may
    not be an option for currently existing environments. It is however, another
    option, and options are always great to have!

    Thanks!
    -Cody

  • Phil,

    Good comment. I've not heard of ftServer prior. I'll look into it more, but
    they appear to essentially a hardware solution, which while excellent, may
    not be an option for currently existing environments. It is however, another
    option, and options are always great to have!

    Thanks!
    -Cody

  • Completely silly and flawed logic there with Stratus ftServer. You will never build server that cannot fail. There will always be elements that are vulnerable like cabling, power, fiber, switches ets, etc. You need to build your systems in such way that failure of individual component does not matter. Servers are cheap commodity, fault tolerance simply adds cost. This is where VMWare adds value.

  • Not too flawed – the ftserver has totally duplicated hardware running in lockstep, so there is no hardware failure that cannot be recovered from. In any datacentre all cabling and power will be redundant, as well as switching and data networks. The only downside of ftservers are their cost, as you point out, normal servers are relatively cheap and thus two servers in a cluster can be a better option than the ftserver (which is essentially like having two servers in one box).

Comments are closed.