You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
In short, we don't allow a page to be deleted if it's the rightmost childof its parent, but that situation can change after we check for it.Problem-------We check that the page to be deleted is not the rightmost child of itsparent, and then lock its left sibling, the page itself, its right sibling,and the parent, in that order. However, if the parent page is split afterthe check but before acquiring the locks, the target page might become therightmost child, if the split happens at the right place. That leads to anerror in vacuum (I reproduced this by setting a breakpoint in debugger):ERROR: failed to delete rightmost child 41 of block 3 in index "foo_pkey"We currently re-check that the page is still the rightmost child, and throwthe above error if it's not. We could easily just give up rather than throwan error, but that approach doesn't scale to half-dead pages. To recap,although we don't normally allow deleting the rightmost child, if the pageis the *only* child of its parent, we delete the child page and mark theparent page as half-dead in one atomic operation. But before we do that, wecheck that the parent can later be deleted, by checking that it in turn isnot the rightmost child of the grandparent (potentially recursing all theway up to the root). But the same situation can arise there - thegrandparent can be split while we're not holding the locks. We end up witha half-dead page that we cannot delete.To make things worse, the keyspace of the deleted page has already beentransferred to its right sibling. As the README points out, the keyspace atthe grandparent level is "out-of-whack" until the half-dead page is deleted,and if enough tuples with keys in the transferred keyspace are inserted, thepage might get split and a downlink might be inserted into the grandparentthat is out-of-order. That might not cause any serious problem if it'stransient (as the README ponders), but is surely bad if it stays that way.Solution--------This patch changes the page deletion algorithm to avoid that problem. Afterchecking that the topmost page in the chain of to-be-deleted pages is notthe rightmost child of its parent, and then deleting the pages from bottomup, unlink the pages from top to bottom. This way, the intermediate stagesare similar to the intermediate stages in page splitting, and there is notransient stage where the keyspace is "out-of-whack". The topmost page inthe to-be-deleted chain doesn't have a downlink pointing to it, like a pagesplit before the downlink has been inserted.This also allows us to get rid of the cleanup step after WAL recovery, if wecrash during page deletion. The deletion will be continued at next VACUUM,but the tree is consistent for searches and insertions at every step.This bug is old, all supported versions are affected, but this patch is toobig to back-patch (and changes the WAL record formats of related records).We have not heard any reports of the bug from users, so clearly it's noteasy to bump into. Maybe backpatch later, after this has had some fieldtesting.Reviewed by Kevin Grittner and Peter Geoghegan.