@jwildeboer sorry for inconvenience, Jan. By Chance, any ceph developers around that could help debugging a ceph/libceph kernel driver bug?
You seem to be running a Debian kernel based in upstream 5.10.136. Could you update to a more recent one? It looks like Debian has already new stable kernels released and I see a bunch of ext4-related fixes in (upstream) 5.10.137.
@antondollmaier We're looking to migrate this to new hardware and LXC anyway. But this also requires some work ..
@codeberg ah, chicken fencing problem...
Been there as well, not just once.
Good luck! (Seriously! Get big and show GitHub who's the better platform :) )
Ceph has some work queued, but there's not memory available. The shrinker kicks-in and ext4 is selected to free some memory. And that's where things go south.
So, a possible workaround is to increase system memory. If upgrading the kernel doesn't fix it, I'd suggest reporting a bug to the ext4 mailing-list firstname.lastname@example.org. No need to register, it's an open list.
@codeberg @jwildeboer so from what I understand this is (likely) because of memory pressure and then some issue (bug/failed mitigation) in ext4.
I am no expert on this but out of interest I have followed developments of XFS for a while. They seem very conscious of failure cases to the point that they successfully countered a claim and pointed out that the kernel subsystem defies its own memory allocation, instead of a bug in XFS. That whole filesystem seems to be very carefully and deliberately designed. It has also been default filesystem for a number of distros.
If you have to evaluate configurations, please consider XFS as a file system too.
Mastodon instance for people with Wildeboer as their last name