A warning story for the #ZFS people. The #Gandi storage failure post mortem https://news.gandi.net/en/2020/01/postmortem-of-the-failure-of-one-hosting-storage-unit-at-lu-bi1-on-january-8-2020/
@jwildeboer Except for the conclusion that it's somehow ZFS's fault ;-)
@freakazoid That's your interpretation of my words. I didn't say or imply that. I found the story interesting and shareworthy for these reasons:
- One server can cause a lot of problems, even when ZFS seems to be set up in a way that should garantuee high resilience.
- Finding the root cause can be quite difficult
- Lack of features in older versions that came unexpected, causing severe slowing down of the recovery process.
It's an insightful postmortem of a ZFS failure mode. Hence I shared it
@jwildeboer It's also a good cautionary tale for those who think "Oh it's triple-replicated so we don't need backups." You always need backups. And you need to test your backups. Just like you need to test every other recovery procedure.
Mastodon instance for people with Wildeboer as their last name