This guest post is by David Espejo, who blogs at vcloudopia.wordpress.com, where you can find his back catalogue of posts. David writes mainly in Spanish, but has an expanding English language section here. Find out more about the guest blogger program here.
Welcome!. This site is mainly written in spanish but i’m starting to open a new section for an english-speaking audience, touching different topics that -I consider- may have even more resonance in the broader reach that english language enable in this globalized world😉
As part of my preparation for the VCAP6-DCV Design exam, I red different official documents and came across with a bunch of notes written, mainly, in a very informal format that could be rapidly understood by me when I needed to review them. I have practical experience deploying VSAN since the creepy 1.0 version and I have witnessed the constant effort from the VMware team to enhance the product.
VSAN 6.2 is a release that bring interesting features to the table, and I will touch some of them in this post, written in the format of personal study notes.
VSAN 6.2 general considerations
- Listen carefully: even if there’s no Stripe Width configured different than the default (1), VSAN could decide to stripe a component in multiple disks. When it decides that? Well, if the object is bigger than 256GB it will be striped across multiple drives
- Regarding the use of network teaming policies like Etherchannel, LACP, etc: VSAN does not balance traffic across links
- When FTT=1, the VMDK OBJECT is comprised of two COMPONENTS. In VSAN (object storage) OBJECTS ARE MADE UP OF COMPONENTS.
- There is a typical RAID 1+0 behavior when it comes to writes: for every write requested, there is (at least, assuming FTT=1) one replica of the write that is stored in the write buffer of ANOTHER host in the cluster. That’s because the write buffer is NON-VOLATILE: if the current host fail, the “hot” data in the write buffer remains available in the other host.
- Two roles for magnetic disks in Hybrid: capacity and factor for stripe width
- It’s always better to choose I/O controllers that allow pass-through instead of RAID-0
- Do you like LSI’s FastPath? Well, VMware recommends to DISABLE any hardware acceleration in the controller
- Heads up!: VSAN can rebuild components in a disk group different than the one in which the component used to live. I mean, if the caching device fails in host 1 (for a three node cluster with FTT=1), VSAN will check if there is free space in other disk group and will then rebuild the component there. If you have enough capacity but a single Disk group created, VSAN won’t be able to recover this.
- If a caching device fails, THE WHOLE DISK GROUP NEEDS TO BE REBUILD
- VM swap is Thick provisioned by default (OSR = 100%), and also has preconfigured Force Provisioning=true ,FTT=1
- Snapshot delta disks inherit the policy from the base disk
- VSAN does not interoperate with HA completely, because it doesn’t let HA validate if there is enough disk space in other nodes to rebuild components after a host failure. Simply, if a 60 min timeout is reached, VSAN will do whatever it takes to make the VM compliant again: new replicas, new stripes, whatever. This might cause oversubscription of resources
- Fault Domains: it’s a logical extension of the role an ESXi host plays in a VSAN 5.x deployment: if FTT>0, no two copies of a VM’s data reside in the same host. Well, it’s not the Army of One anymore, if you group multiple hosts by, for example, the rack in which they are located, then you have a Fault Domain: no two copies of VM’s data will reside in that group of hosts.
VSAN AFA (All-Flash)
Since VSAN 6.1 it’s now supported (it was possible before, though) to create All-Flash VSAN disk groups in addition to hybrid ones (that hold a flash device used for caching and hard drives for capacity). Using AFA implies the following considerations:
- There is no vFlash Read Cache: for obvious reasons, flash devices (NVMe/SSD) in the capacity tier wouldn’t need read acceleration.
- Still there is a Flash device from the Disk Group that is used for write caching
- A maximum of 200 VMs per host are supported
- CAUTION: you CANNOT have AFA and HYBRID disk groups in the same cluster, that’s not supported right now.
- The 10% cache-to-capacity ratio remains true even in AFA
- With VSAN 6.2 hybrid , it doesn’t matter how many full disk writes per day (DWPD) the SSD supports. As long as it doesn’t become larger than the Terabytes Written (TBW) the device supports in a day. TBW = (SSD size) * (DWPD)
- For VSAN AFA, the VMware specification is higher than for hybrid: 4TBW. In both cases this only really matters for the caching device, not for the capacity drives. This parameter can be found at the VSAN HCL, in the detail of the Caching Tier for any VSAN ready node you will find the Endurance value expressed in TBW. For example this is the HP E-Class 779164-B21 SSD :
I will share more design-focused notes in a later post
Thanks!