vSphere Automation 101 – Check for Snapshots

This get’s to be the third post in the “vSphere Automation 101” series or so. Not sure I want to call it a series, as that denotes some kind of commitment, and well, as we’ve discussed in prior posts, I’m lazy. Before we get too deep, here are links to the first and second parts of this series:

Automation 101 Posts

TL;DR

You need to both monitor % free space and snapshot age. Use the stuff in the “Handling” sections below to handle both situations.

Finding And Dealing with Old Snapshots

We’ve covered snapshots in the past, but only in the instances where they’ve blown something up. Now, there is nothing wrong with this, but it puts you in the position of being reactive to downtime, rather than pro-active. Of the two, being pro-active can help keep you gainfully employed.

How old is old?

Before you automate the handling of snapshots, you need to figure out how you’d like to handle this in your environment. You’ll want to consider that there are as many ways to handle this as there are vSphere admins, so we’ll talk about a few, and then leave it up to you to decide what is right for your environment. “Old” snapshots can generally be broken into two categories, size and age.

Size:

VMware snapshots, if left unchecked can grow to some pretty extreme sizes (size of the base VMDK + Memory). This can be multiplied if you have multiple nested snapshots of a VM. Depending on the change rate of your VM, these can grow quite quick. As snapshots are stored on the same volume as the vmx file for the VM, the potential is there to run out of space quick. You can monitor for the total size of a snapshot as well as the percentage of space left on the volume.

Age:

Snapshots are not backups. Phew, now with that said, they are indeed used by several backup products and should not be discounted for their value as a “get out of jail free” card after a bad code update or so. That does not mean however, that having multiple snapshots is a good idea. As described in the size section, snapshots will grow over time, eventually leading to well… boom.

Decision?

I can’t make the decision as to how you implement the monitoring of the above. I can however recommend that you make use of some combination of the two as you’ll see below.

Handling Snapshot Size with PowerCLI

For the sake of this post, I’ve chosen to use %free on a datastore to be the trigger a scan for and deletion of snapshots for VMs on a given datastore.

Alarm

Thankfully vSphere comes with a built in “Datastore usage” alarm that we can add an action to suit our needs:

You’ll want to add an action similar to: "c:\windows\system32\cmd.exe" "/c echo.|powershell.exe -nologo -noprofile -noninteractive c:\scripts\clean-snaps.ps1”

Script

Note: The script that follows is only here as a “Proof of Concept”, you will want to build in some more logic and safety’s however.

Add-PSSnapin VMware.Vimautomation.Core
Connect-VIServer localhost

$vmsOnDatastore = get-vm -Datastore $env:VMWARE_ALARM_TARGET_NAME
$vmsOnDatastore | Get-Snapshot | Remove-Snapshot -Confirm:$false

Handling Snapshot Age with vCenter Orchestrator

vCO actually includes a workflow to monitor snapshots based on age, so it’s up to us to schedule said workflow. (If you haven’t worked with or setup vCO yet, start with the vCO 101 post here).

Workflow:

In the vCO Client:

Scheduled:

Right click the above, click schedule:

Summary

We talked about why you need to look out for snapshots, as well as discussed the ways in which you will want to monitor for snapshots running out of control. While you will need to tweak these to suit the variables in your environment, it should provide you with a framework to build upon. As always, if you have questions or comments, drop me a line here or on Twitter here.