By now you have probably seen or heard about the Google SNAFU, surmised nicely by this tweet.
Now you’re probably thinking “Oh here we go. Someone from not-Google is going to take the opportunity to dunk on Google” , but rest assured this is not my intention.
As anyone who has been in IT for any length of time, it’s generally unwise to kick someone when they’re down, and when you consider the CEO of Google Cloud made a public apology for the issue, I think we can safely assume there was already a fair amount of a “Uh oh…this is bad” going around internally at Google when this was discovered.
(Source: The Guardian)
And of course, as the saying goes “What goes around, comes around“. The history pages of Information Technology are littered with catastrophic mistakes, so its naïve to think that your own turn will never come. There’s a good chance that one day there might be a metaphorical bullet with your name on it.
So this post is about what we can do to dodge that bullet, or perhaps a wakeup call for what we may have in the past labelled as “I’m secure in my backup strategy”
Think about the common maxims most IT shops will have for the confidence level in their backups
- Ensure redundant backups in on-site or in their region
- Ensure redundant backups off-site or in an alternate region
- Ensure regularly testing of the recoverability of backups
If you ticked all of those boxes above, then… if it had been your company that had the unfortunate experience above, then you would be now dead in the water. UniSuper needed more than that.
(Source: The Guardian)
The cause of this (near) disaster I believe is a mis-interpretation of the historical justifications for how we did backups. I probably need to explain that position.
Originally we did tape backups. Someone walked into the data centre, loaded a tape, and when the backup was finished, the tape was removed and put somewhere safe. The definition of “safe” depended on your recoverability posture. Maybe it was outside of the data centre to guard against a data centre disaster. Maybe it was stored in another building to avoid a building disaster, or maybe a different site altogether to handle a full site wide disaster.
Then disks became ridiculously cheap and many backup solutions now became disk based, but the same mantras held firm – some storage onsite, some storage off-site etc.
In the world of cloud, this “naturally” equates to backups being both intra-region and inter-region. But one very important element has been overlooked from the historical method, namely, that when we had tape backups off-site or even disk backups off site, they were inaccessible by software. The definition of “off site” is not just physically off site, but also physically detached from the software on the primary site. If a backup can be reached by software, it is not really that much safer than a backup sitting on your primary server in your primary data centre.
UniSuper avoided a catastrophe by following this principle with backups with an alternate provider.
So rather than dunking on Google (I’m sure they are having their own uncomfortable conversations internally), perhaps now is a good time to review your own backup regime, either cloud or on-premises, and double check that software isn’t going to be the “fire” that burns your data centre to the ground.
(thumbnail credit: Zoe Roth)




Leave a reply to Why and How to Backup your Oracle Database to OCI Object Storage Immutable Bucket – Database Heartbeat Cancel reply