You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was tworeasons for the cleanup step:1. When a new tuple was inserted to a leaf page, the downlink in the parentneeded to be updated to contain (ie. to be consistent with) the new key.Updating the parent in turn might require recursively updating the parent ofthe parent. We now handle that by updating the parent while traversing downthe tree, so that when we insert the leaf tuple, all the parents are alreadyconsistent with the new key, and the tree is consistent at every step.2. When a page is split, we need to insert the downlink for the new rightpage(s), and update the downlink for the original page to not include keysthat moved to the right page(s). We now handle that by setting a new flag,F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag isset, scans always follow the rightlink, regardless of the NSN mechanism usedto detect concurrent page splits. That way the tree is consistent right aftersplit, even though the downlink is still missing. This is very similar to theway B-tree splits are handled. When the downlink is inserted in the parent,the flag is cleared. To keep the insertion algorithm simple, when aninsertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, itfinishes the split before doing anything else.These changes allow removing the whole "invalid tuple" mechanism, but Iretained the scan code to still follow invalid tuples correctly. While wedon't create any such tuples anymore, we want to handle them gracefully incase you pg_upgrade a GiST index that has them. If we encounter any on aninsert, though, we just throw an error saying that you need to REINDEX.The issue that got me into doing this is that if you did a checkpoint whilean insert or split was in progress, and the checkpoint finishes quickly sothat there is no WAL record related to the insert between RedoRecPtr and thecheckpoint record, recovery from that checkpoint would not know to finishthe incomplete insert. IOW, we have the same issue we solved with therm_safe_restartpoint mechanism during normal operation too. It's highlyunlikely to happen in practice, and this fix is far too large to backpatch,so we're just going to live with in previous versions, but this refactoringfixes it going forward.With this patch, you don't get the annoying'index "FOO" needs VACUUM or REINDEX to finish crash recovery' noticesanymore if you crash at an unfortunate moment.