Human error is not the root cause
Posted by Michał ‘mina86’ Nazarewicz on 12th of January 2025
In 2023 UniSuper, an Australian retirement fund, decided to migrate part of its operations to Google Cloud. As port of the migration, they needed to create virtual machines provisioned with limits higher than what Google’s user interface allowed to set. To achieve their goals, UniSuper contacted Google support. Having access to internal tools, Google engineer was able to create requested instances.
Fast forward to May 2024. UniSuper members lose access to their accounts. The fund blames Google. Some people are sceptical, but eventually UniSuper and Google Cloud publish a joint statement which points at ‘a misconfiguration during provisioning’ as cause of the outage. Later, a postmortem of the incident sheds even more light on events which have transpired.
Turns out that back in 2023, Google engineer used a command line tool to manually create cloud instances according to UniSuper’s requirements. Among various options, said tool had a switch setting cloud instance’s term. The engineer omitted it leading to the instance being created with a fixed term which triggered automatic deletion a year later.
So, human error. Scold the engineer and case closed. Or is it?