Over the past several weeks we have had multiple power outages (long, short, brown, buzzing…). Partly due to recent storms, but mostly due to major work being done on local distribution lines. Some of my systems are in the clouds where industrial-grade power management is in place. (I hope.) My personal servers and dev/test systems are on-site and are subject to the vagaries of suburban power services. While “backup, backup, backup” is the mantra that ensures I won’t lose much, recovering from system corruption can be time-consuming.
Thankfully I also have an uninterruptible power supply (UPS) parked below the server shelving. Over the past month (and several outages) I have been pushed to refine and improve how the environment deals with sudden power issues. Here are some observations along the way:
- apcupsd is brilliant. I have it running in my host server, interacting with the UPS over USB. To this I have added a number of new outage event scripts to deal with the various power-related scenarios.
- My UPS can offer about half an hour of supply once the mains goes. But this is from a fully charged state, and with multiple outages happening on the same day the second or third time the panic alarm sounds the battery might not have had enough time to recharge. Therefore the event handling scripts should read the “minutes remaining” information from the UPS and act accordingly.
- Don’t panic. One of the outages last week was for just 40 seconds. So, if the UPS minutes remaining will allow it, wait a bit before commencing a controlled shutdown.
- The controlled shutdown of my host server will take care of saving the state of any running VMs. But there are also some NAS boxes, some of which are mounted over the network onto some of the VMs. I wanted my host server to also take care of shutting down the NAS boxes. Unfortunately they are from different manufacturers and none of them have UPS signalling support, but the have either SSH access or a Web interface, and I was able to script some shutdown commands from the host server to the NAS (after the VMs are saved). To ensure network connectivity, I also added a small Ethernet switch to the UPS. Power goes, switch stays up, host server saves VMs, shuts down NAS boxes, then shuts itself down.
- I was not able to find a satisfactory way to shut down the UPS programmatically from the host server, while giving enough time for the host server to shut itself down before the UPS goes. More experimentation may be needed, but maybe on a separate mock-up environment rather than the real thing. After all, even if the UPS is left running, all it is powering is the small Ethernet switch as all other things have powered down.
- There is no automated recovered when power is restored. I am OK with that, as I am generally on-prem anyway, and to be honest I don’t actually trust the power to be stable until at least 30 minutes after it has been restored.
Finally, one thought does occur to me every time the power goes: does the UPS have enough juice left to power the coffee machine?