Note: This is a guest post provided to us by an extremely smart gentleman, with whom I’ve worked closely with in the past. At this time however, homeboy prefers to remain anonymous.
I get questions like this all the time:
"I’m trying to help educate my management on the importance of having the right microcode in place for our array. Our array is requiring an upgrade that is scheduled for sometime in mid to late November (1 month from now). This is causing a delay with our certification of our ESX infrastructure as I cannot schedule it sooner due to political issues."
The simple answer here is that microcode upgrades are required as regular SAN maintenance. This applies to arrays, switches, and HBAs. Upgrading microcode should be expedited if there are critical updates or other issues resolved that would affect the stability and availability of the SAN. Past vendors have released microcode fixes categorized under critical for everything from stability to availability to data corruption.
The problem with most companies out there (SMB and enterprise) is that SAN maintenance is not taken as seriously as it should be and outages are incurred as a result of a lack of due diligence on keeping firmware up to date. In this day in age and with the advantages with using ESX, customers are able to move all VMs off an ESX host to put it into maintenance to update HBA firmware. By the same token, array vendors have developed non-disruptive or online firmware upgrades to the array controllers. Switches also have the ability to do online upgrades but more than that, switch fabrics in most enterprise environments are redundant and thus one switch can be upgraded at a time with the other online without impacting availability.
With the above being stated, there is nothing physically/logistically preventing a company from upgrading their SAN during the day however procedures are in place to do upgrades/changes off hours to ensure minimal impact should an issue arise. A standard practice however as more layers of approval, scheduling, and testing are introduced, longer delays are the end result. The reality is that the more red tape there is to go through to get these changes in place, the greater the chance that an outage or other anomaly can occur due to outdated firmware.
Keeping in mind that firmware is quite a bit different than software patches or service packs (it is simpler and far better tested/QA’ed), firmware should be considered a priority in every level of infrastructure to avoid unwanted downtime.