Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb071a31

Browse files
Add nbtree README section on page recycling.
Consolidate discussion of how VACUUM places pages in the FSM forrecycling by adding a new section that comes after discussion of pagedeletion. This structure reflects the fact that page recycling isexplicitly decoupled from page deletion in Lanin & Shasha's paper. Pagerecycling in nbtree is an implementation of what the paper calls "thedrain technique".This decoupling is an important concept for nbtree VACUUM. Searchershave to detect and recover from concurrent page deletions, but they willnever have to reason about concurrent page recycling. Recycling canalmost always be thought of as a low level garbage collection operationthat asynchronously frees the physical space that backs a logical treenode. Almost all code need only concern itself with logical tree nodes.(Note that "logical tree node" is not currently a term of art in thenbtree code -- this all works implicitly.)This is preparation for an upcoming patch that teaches nbtree VACUUM toremember the details of pages that it deletes on the fly, in localmemory. This enables the same VACUUM operation to consider placing itsown deleted pages in the FSM later on, when it reaches the end ofbtvacuumscan().
1 parentb5a66e7 commitb071a31

File tree

1 file changed

+52
-21
lines changed
  • src/backend/access/nbtree

1 file changed

+52
-21
lines changed

‎src/backend/access/nbtree/README

Lines changed: 52 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -329,19 +329,26 @@ down in the chain. This is repeated until there are no internal pages left
329329
in the chain. Finally, the half-dead leaf page itself is unlinked from its
330330
siblings.
331331

332-
A deleted page cannot bereclaimed immediately, since there may be other
332+
A deleted page cannot berecycled immediately, since there may be other
333333
processes waiting to reference it (ie, search processes that just left the
334334
parent, or scans moving right or left from one of the siblings). These
335-
processes must observe that the page is marked dead and recover
336-
accordingly. Searches and forward scans simply follow the right-link
337-
until they find a non-dead page --- this will be where the deleted page's
338-
key-space moved to.
335+
processes must be able to observe a deleted page for some time after the
336+
deletion operation, in order to be able to at least recover from it (they
337+
recover by moving right, as with concurrent page splits). Searchers never
338+
have to worry about concurrent page recycling.
339+
340+
See "Placing deleted pages in the FSM" section below for a description of
341+
when and how deleted pages become safe for VACUUM to make recyclable.
342+
343+
Page deletion and backwards scans
344+
---------------------------------
339345

340346
Moving left in a backward scan is complicated because we must consider
341347
the possibility that the left sibling was just split (meaning we must find
342348
the rightmost page derived from the left sibling), plus the possibility
343349
that the page we were just on has now been deleted and hence isn't in the
344350
sibling chain at all anymore. So the move-left algorithm becomes:
351+
345352
0. Remember the page we are on as the "original page".
346353
1. Follow the original page's left-link (we're done if this is zero).
347354
2. If the current page is live and its right-link matches the "original
@@ -358,28 +365,15 @@ sibling chain at all anymore. So the move-left algorithm becomes:
358365
current left-link). If it is dead, move right until a non-dead page
359366
is found (there must be one, since rightmost pages are never deleted),
360367
mark that as the new "original page", and return to step 1.
368+
361369
This algorithm is correct because the live page found by step 4 will have
362370
the same left keyspace boundary as the page we started from. Therefore,
363371
when we ultimately exit, it must be on a page whose right keyspace
364372
boundary matches the left boundary of where we started --- which is what
365373
we need to be sure we don't miss or re-scan any items.
366374

367-
A deleted page can only be reclaimed once there is no scan or search that
368-
has a reference to it; until then, it must stay in place with its
369-
right-link undisturbed. We implement this by waiting until all active
370-
snapshots and registered snapshots as of the deletion are gone; which is
371-
overly strong, but is simple to implement within Postgres. When marked
372-
dead, a deleted page is labeled with the next-transaction counter value.
373-
VACUUM can reclaim the page for re-use when this transaction number is
374-
guaranteed to be "visible to everyone". As collateral damage, this
375-
implementation also waits for running XIDs with no snapshots and for
376-
snapshots taken until the next transaction to allocate an XID commits.
377-
378-
Reclaiming a page doesn't actually change the state of the page --- we
379-
simply record it in the free space map, from which it will be handed out
380-
the next time a new page is needed for a page split. The deleted page's
381-
contents will be overwritten by the split operation (it will become the
382-
new right page).
375+
Page deletion and tree height
376+
-----------------------------
383377

384378
Because we never delete the rightmost page of any level (and in particular
385379
never delete the root), it's impossible for the height of the tree to
@@ -399,6 +393,43 @@ as part of the atomic update for the delete (either way, the metapage has
399393
to be the last page locked in the update to avoid deadlock risks). This
400394
avoids race conditions if two such operations are executing concurrently.
401395

396+
Placing deleted pages in the FSM
397+
--------------------------------
398+
399+
Recycling a page is decoupled from page deletion. A deleted page can only
400+
be put in the FSM to be recycled once there is no possible scan or search
401+
that has a reference to it; until then, it must stay in place with its
402+
sibling links undisturbed, as a tombstone that allows concurrent searches
403+
to detect and then recover from concurrent deletions (which are rather
404+
like concurrent page splits to searchers). This design is an
405+
implementation of what Lanin and Shasha call "the drain technique".
406+
407+
We implement the technique by waiting until all active snapshots and
408+
registered snapshots as of the page deletion are gone; which is overly
409+
strong, but is simple to implement within Postgres. When marked fully
410+
dead, a deleted page is labeled with the next-transaction counter value.
411+
VACUUM can reclaim the page for re-use when the stored XID is guaranteed
412+
to be "visible to everyone". As collateral damage, we wait for snapshots
413+
taken until the next transaction to allocate an XID commits. We also wait
414+
for running XIDs with no snapshots.
415+
416+
The need for this additional indirection after a page deletion operation
417+
takes place is a natural consequence of the highly permissive rules for
418+
index scans with Lehman and Yao's design. In general an index scan
419+
doesn't have to hold a lock or even a pin on any page when it descends the
420+
tree (nothing that you'd usually think of as an interlock is held "between
421+
levels"). At the same time, index scans cannot be allowed to land on a
422+
truly unrelated page due to concurrent recycling (not to be confused with
423+
concurrent deletion), because that results in wrong answers to queries.
424+
Simpler approaches to page deletion that don't need to defer recycling are
425+
possible, but none seem compatible with Lehman and Yao's design.
426+
427+
Placing an already-deleted page in the FSM to be recycled when needed
428+
doesn't actually change the state of the page. The page will be changed
429+
whenever it is subsequently taken from the FSM for reuse. The deleted
430+
page's contents will be overwritten by the split operation (it will become
431+
the new right sibling page).
432+
402433
Fastpath For Index Insertion
403434
----------------------------
404435

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp