Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit21ad61a

Browse files
committed
Fix deadlock between ginDeletePage() and ginStepRight()
When ginDeletePage() is about to delete page it locks its left sibling to revisethe rightlink. So, it locks pages in right to left manner. Int he same timeginStepRight() locks pages in left to right manner, and that could cause adeadlock.This commit makes ginScanToDelete() keep exclusive lock on left siblings ofcurrently investigated path. That elimites need to relock left sibling inginDeletePage(). Thus, deadlock with ginStepRight() can't happen anymore.Reported-by: Chen HuajunDiscussion:https://postgr.es/m/5c332bd1.87b6.16d7c17aa98.Coremail.chjischj%40163.comAuthor: Alexander KorotkovReviewed-by: Peter GeogheganBackpatch-through: 10
1 parentf115590 commit21ad61a

File tree

2 files changed

+81
-52
lines changed

2 files changed

+81
-52
lines changed

‎src/backend/access/gin/README

Lines changed: 46 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -277,50 +277,63 @@ followed by the packed items.
277277
Concurrency
278278
-----------
279279

280-
The entry tree and each posting treeis a B-tree, with right-links connecting
280+
The entry tree and each posting treeare B-trees, with right-links connecting
281281
sibling pages at the same level. This is the same structure that is used in
282282
the regular B-tree indexam (invented by Lehman & Yao), but we don't support
283283
scanning a GIN trees backwards, so we don't need left-links.
284284

285285
To avoid deadlocks, B-tree pages must always be locked in the same order:
286-
left to right, and bottom to top. When searching, the tree is traversed from
287-
top to bottom, so the lock on the parent page must be released before
286+
left to right, and bottom to top. The exception is page deletion during vacuum,
287+
which would be considered separately. When searching, the tree is traversed
288+
from top to bottom, so the lock on the parent page must be released before
288289
descending to the next level. Concurrent page splits move the keyspace to
289290
right, so after following a downlink, the page actually containing the key
290291
we're looking for might be somewhere to the right of the page we landed on.
291292
In that case, we follow the right-links until we find the page we're looking
292-
for.
293-
294-
To delete a page, the page's left sibling, the target page, and its parent,
295-
are locked in that order, and the page is marked as deleted. However, a
296-
concurrent search might already have read a pointer to the page, and might be
297-
just about to follow it. A page can be reached via the right-link of its left
298-
sibling, or via its downlink in the parent.
293+
for. In spite of searches, insertions keeps pins on all pages in path from
294+
root to the current page. These pages potentially requies downlink insertion,
295+
while pins prevent them from being deleted.
296+
297+
Vacuum never deletes tuples or pages from the entry tree. It traverses entry
298+
tree leafs in logical order by rightlinks and removes deletable TIDs from
299+
posting lists. Posting trees are processed by links from entry tree leafs. They
300+
are vacuumed in two stages. At first stage, deletable TIDs are removed from
301+
leafs. If first stage detects at least one empty page, then at the second stage
302+
ginScanToDelete() deletes empty pages.
303+
304+
ginScanToDelete() scans posting tree to delete empty pages, while vacuum holds
305+
cleanup lock on the posting tree root. This lock prevent concurrent inserts,
306+
because inserters hold a pin on the root page. In spite of inserters searches
307+
don't hold pin on root page. So, while new searches cannot begin while root page
308+
is locked, any already-in-progress scans can continue concurrently with vacuum.
309+
310+
ginScanToDelete() does depth-first tree scanning while keeping each page in path
311+
from root to current page exclusively locked. It also keeps left sibling of
312+
each page in the path locked. Thus, if current page is to be removed, all
313+
required pages to remove both downlink and rightlink are already locked.
314+
Therefore, page deletion locks pages top to bottom and left to right breaking
315+
our general rule. But assuming there is no concurrent insertions, this can't
316+
cause a deadlock.
317+
318+
During replay od page deletion at standby, the page's left sibling, the target
319+
page, and its parent, are locked in that order. So it follows bottom to top and
320+
left to right rule.
321+
322+
A search concurrent to page deletion might already have read a pointer to the
323+
page to be deleted, and might be just about to follow it. A page can be reached
324+
via the right-link of its left sibling, or via its downlink in the parent.
299325

300326
To prevent a backend from reaching a deleted page via a right-link, when
301-
following a right-link the lock on the previous page is not released until
302-
the lock on next page has been acquired.
303-
304-
The downlink is more tricky. A search descending the tree must release the
305-
lock on the parent page before locking the child, or it could deadlock with
306-
a concurrent split of the child page; a page split locks the parent, while
307-
already holding a lock on the child page. So, deleted page cannot be reclaimed
308-
immediately. Instead, we have to wait for every transaction, which might wait
309-
to reference this page, to finish. Corresponding processes must observe that
310-
the page is marked deleted and recover accordingly.
311-
312-
The previous paragraph's reasoning only applies to searches, and only to
313-
posting trees. To protect from inserters following a downlink to a deleted
314-
page, vacuum simply locks out all concurrent insertions to the posting tree,
315-
by holding a super-exclusive lock on the posting tree root. Inserters hold a
316-
pin on the root page, but searches do not, so while new searches cannot begin
317-
while root page is locked, any already-in-progress scans can continue
318-
concurrently with vacuum. In the entry tree, we never delete pages.
319-
320-
(This is quite different from the mechanism the btree indexam uses to make
321-
page-deletions safe; it stamps the deleted pages with an XID and keeps the
322-
deleted pages around with the right-link intact until all concurrent scans
323-
have finished.)
327+
following a right-link the lock on the previous page is not released until the
328+
lock on next page has been acquired.
329+
330+
The downlink is more tricky. A search descending the tree must release the lock
331+
on the parent page before locking the child, or it could deadlock with a
332+
concurrent split of the child page; a page split locks the parent, while already
333+
holding a lock on the child page. So, deleted page cannot be reclaimed
334+
immediately. Instead, we have to wait for every transaction, which might wait to
335+
reference this page, to finish. Corresponding processes must observe that the
336+
page is marked deleted and recover accordingly.
324337

325338
Compatibility
326339
-------------

‎src/backend/access/gin/ginvacuum.c

Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,8 @@ typedef struct DataPageDeleteStack
116116
structDataPageDeleteStack*parent;
117117

118118
BlockNumberblkno;/* current block number */
119-
BlockNumberleftBlkno;/* rightest non-deleted page on left */
119+
BufferleftBuffer;/* pinned and locked rightest non-deleted page
120+
* on left */
120121
boolisRoot;
121122
}DataPageDeleteStack;
122123

@@ -138,11 +139,8 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
138139
/*
139140
* This function MUST be called only if someone of parent pages hold
140141
* exclusive cleanup lock. This guarantees that no insertions currently
141-
* happen in this subtree. Caller also acquire Exclusive lock on deletable
142-
* page and is acquiring and releasing exclusive lock on left page before.
143-
* Left page was locked and released. Then parent and this page are
144-
* locked. We acquire left page lock here only to mark page dirty after
145-
* changing right pointer.
142+
* happen in this subtree. Caller also acquires Exclusive locks on
143+
* deletable, parent and left pages.
146144
*/
147145
lBuffer=ReadBufferExtended(gvs->index,MAIN_FORKNUM,leftBlkno,
148146
RBM_NORMAL,gvs->strategy);
@@ -151,8 +149,6 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
151149
pBuffer=ReadBufferExtended(gvs->index,MAIN_FORKNUM,parentBlkno,
152150
RBM_NORMAL,gvs->strategy);
153151

154-
LockBuffer(lBuffer,GIN_EXCLUSIVE);
155-
156152
START_CRIT_SECTION();
157153

158154
/* Unlink the page by changing left sibling's rightlink */
@@ -220,7 +216,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
220216
}
221217

222218
ReleaseBuffer(pBuffer);
223-
UnlockReleaseBuffer(lBuffer);
219+
ReleaseBuffer(lBuffer);
224220
ReleaseBuffer(dBuffer);
225221

226222
END_CRIT_SECTION();
@@ -230,7 +226,11 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
230226

231227

232228
/*
233-
* scans posting tree and deletes empty pages
229+
* Scans posting tree and deletes empty pages. Caller must lock root page for
230+
* cleanup. During scan path from root to current page is kept exclusively
231+
* locked. Also keep left page exclusively locked, because ginDeletePage()
232+
* needs it. If we try to relock left page later, it could deadlock with
233+
* ginStepRight().
234234
*/
235235
staticbool
236236
ginScanToDelete(GinVacuumState*gvs,BlockNumberblkno,boolisRoot,
@@ -253,7 +253,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
253253
me= (DataPageDeleteStack*)palloc0(sizeof(DataPageDeleteStack));
254254
me->parent=parent;
255255
parent->child=me;
256-
me->leftBlkno=InvalidBlockNumber;
256+
me->leftBuffer=InvalidBuffer;
257257
}
258258
else
259259
me=parent->child;
@@ -281,6 +281,12 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
281281
if (ginScanToDelete(gvs,PostingItemGetBlockNumber(pitem), FALSE,me,i))
282282
i--;
283283
}
284+
285+
if (GinPageRightMost(page)&&BufferIsValid(me->child->leftBuffer))
286+
{
287+
UnlockReleaseBuffer(me->child->leftBuffer);
288+
me->child->leftBuffer=InvalidBuffer;
289+
}
284290
}
285291

286292
if (GinPageIsLeaf(page))
@@ -291,21 +297,31 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
291297
if (isempty)
292298
{
293299
/* we never delete the left- or rightmost branch */
294-
if (me->leftBlkno!=InvalidBlockNumber&& !GinPageRightMost(page))
300+
if (BufferIsValid(me->leftBuffer)&& !GinPageRightMost(page))
295301
{
296302
Assert(!isRoot);
297-
ginDeletePage(gvs,blkno,me->leftBlkno,me->parent->blkno,myoff,me->parent->isRoot);
303+
ginDeletePage(gvs,blkno,BufferGetBlockNumber(me->leftBuffer),
304+
me->parent->blkno,myoff,me->parent->isRoot);
298305
meDelete= TRUE;
299306
}
300307
}
301308

302-
if (!isRoot)
303-
LockBuffer(buffer,GIN_UNLOCK);
309+
if (!meDelete)
310+
{
311+
if (BufferIsValid(me->leftBuffer))
312+
UnlockReleaseBuffer(me->leftBuffer);
313+
me->leftBuffer=buffer;
314+
}
315+
else
316+
{
317+
if (!isRoot)
318+
LockBuffer(buffer,GIN_UNLOCK);
304319

305-
ReleaseBuffer(buffer);
320+
ReleaseBuffer(buffer);
321+
}
306322

307-
if (!meDelete)
308-
me->leftBlkno=blkno;
323+
if (isRoot)
324+
ReleaseBuffer(buffer);
309325

310326
returnmeDelete;
311327
}
@@ -402,7 +418,7 @@ ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
402418
LockBufferForCleanup(buffer);
403419

404420
memset(&root,0,sizeof(DataPageDeleteStack));
405-
root.leftBlkno=InvalidBlockNumber;
421+
root.leftBuffer=InvalidBuffer;
406422
root.isRoot= true;
407423

408424
ginScanToDelete(gvs,rootBlkno, true,&root,InvalidOffsetNumber);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp