Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0bef1c0

Browse files
committed
Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and notdocumented. Add a section in the README to explain how it's supposed towork, and change the code so that it actually works that way.This fixes two bugs:1. If fast update was turned on concurrently, subsequent inserts to the pending list would not conflict with predicate locks that were acquired earlier, on entry pages. The included 'predicate-gin-fastupdate' test demonstrates that. To fix, make all scans acquire a predicate lock on the metapage. That lock represents a scan of the pending list, whether or not there is a pending list at the moment. Forget about the optimization to skip locking/checking for locks, when fastupdate=off.2. If a scan finds no match, it still needs to lock the entry page. The point of predicate locks is to lock the gabs between values, whether or not there is a match. The included 'predicate-gin-nomatch' test tests that case.In addition to those two bug fixes, this removes some unnecessary locking,following the principle laid out in the README. Because all items ina posting tree have the same key value, a lock on the posting tree root isenough to cover all the items. (With a very large posting tree, it wouldpossibly be better to lock the posting tree leaf pages instead, so that a"skip scan" with a query like "A & B", you could avoid unnecessary conflictif a new tuple is inserted with A but !B. But let's keep this simple.)Also, some spelling fixes.Author: Heikki Linnakangas with some editorization by meReview: Andrey Borodin, Alexander KorotkovDiscussion:https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
1 parent7d86799 commit0bef1c0

18 files changed

+251
-117
lines changed

‎src/backend/access/gin/README

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,40 @@ page-deletions safe; it stamps the deleted pages with an XID and keeps the
331331
deleted pages around with the right-link intact until all concurrent scans
332332
have finished.)
333333

334+
Predicate Locking
335+
-----------------
336+
337+
GIN supports predicate locking, for serializable snapshot isolation.
338+
A predicate locks represent that a scan has scanned a range of values. They
339+
are not concerned with physical pages as such, but the logical key values.
340+
A predicate lock on a page covers the key range that would belong on that
341+
page, whether or not there are any matching tuples there currently. In other
342+
words, a predicate lock on an index page covers the "gaps" between the index
343+
tuples. To minimize false positives, predicate locks are acquired at the
344+
finest level possible.
345+
346+
* Like in the B-tree index, it is enough to lock only leaf pages, because all
347+
insertions happen at the leaf level.
348+
349+
* In an equality search (i.e. not a partial match search), if a key entry has
350+
a posting tree, we lock the posting tree root page, to represent a lock on
351+
just that key entry. Otherwise, we lock the entry tree page. We also lock
352+
the entry tree page if no match is found, to lock the "gap" where the entry
353+
would've been, had there been one.
354+
355+
* In a partial match search, we lock all the entry leaf pages that we scan,
356+
in addition to locks on posting tree roots, to represent the "gaps" between
357+
values.
358+
359+
* In addition to the locks on entry leaf pages and posting tree roots, all
360+
scans grab a lock the metapage. This is to interlock with insertions to
361+
the fast update pending list. An insertion to the pending list can really
362+
belong anywhere in the tree, and the lock on the metapage represents that.
363+
364+
The interlock for fastupdate pending lists means that with fastupdate=on,
365+
we effectively always grab a full-index lock, so you could get a lot of false
366+
positives.
367+
334368
Compatibility
335369
-------------
336370

‎src/backend/access/gin/ginbtree.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot)
8484
stack->parent=NULL;
8585
stack->predictNumber=1;
8686

87+
if (!searchMode)
88+
CheckForSerializableConflictIn(btree->index,NULL,stack->buffer);
89+
8790
for (;;)
8891
{
8992
Pagepage;

‎src/backend/access/gin/gindatapage.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1812,8 +1812,8 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
18121812
blkno=BufferGetBlockNumber(buffer);
18131813

18141814
/*
1815-
* Copya predicatelock from entry tree leaf (containing posting list) to
1816-
* posting tree.
1815+
* Copyany predicatelocks fromtheentry tree leaf (containing posting
1816+
*list) to theposting tree.
18171817
*/
18181818
PredicateLockPageSplit(index,BufferGetBlockNumber(entrybuffer),blkno);
18191819

@@ -1864,7 +1864,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
18641864
returnblkno;
18651865
}
18661866

1867-
void
1867+
staticvoid
18681868
ginPrepareDataScan(GinBtreebtree,Relationindex,BlockNumberrootBlkno)
18691869
{
18701870
memset(btree,0,sizeof(GinBtreeData));
@@ -1911,7 +1911,6 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
19111911
btree.itemptr=insertdata.items[insertdata.curitem];
19121912
stack=ginFindLeafPage(&btree, false,NULL);
19131913

1914-
GinCheckForSerializableConflictIn(btree.index,NULL,stack->buffer);
19151914
ginInsertValue(&btree,stack,&insertdata,buildStats);
19161915
}
19171916
}

‎src/backend/access/gin/ginfast.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include"postmaster/autovacuum.h"
3232
#include"storage/indexfsm.h"
3333
#include"storage/lmgr.h"
34+
#include"storage/predicate.h"
3435
#include"utils/builtins.h"
3536

3637
/* GUC parameter */
@@ -245,6 +246,13 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
245246
metabuffer=ReadBuffer(index,GIN_METAPAGE_BLKNO);
246247
metapage=BufferGetPage(metabuffer);
247248

249+
/*
250+
* An insertion to the pending list could logically belong anywhere in
251+
* the tree, so it conflicts with all serializable scans. All scans
252+
* acquire a predicate lock on the metabuffer to represent that.
253+
*/
254+
CheckForSerializableConflictIn(index,NULL,metabuffer);
255+
248256
if (collector->sumsize+collector->ntuples*sizeof(ItemIdData)>GinListPageSize)
249257
{
250258
/*

‎src/backend/access/gin/ginget.c

Lines changed: 54 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -35,20 +35,6 @@ typedef struct pendingPosition
3535
}pendingPosition;
3636

3737

38-
/*
39-
* Place predicate lock on GIN page if needed.
40-
*/
41-
staticvoid
42-
GinPredicateLockPage(Relationindex,BlockNumberblkno,Snapshotsnapshot)
43-
{
44-
/*
45-
* When fast update is on then no need in locking pages, because we anyway
46-
* need to lock the whole index.
47-
*/
48-
if (!GinGetUseFastUpdate(index))
49-
PredicateLockPage(index,blkno,snapshot);
50-
}
51-
5238
/*
5339
* Goes to the next page if current offset is outside of bounds
5440
*/
@@ -68,7 +54,7 @@ moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack, Snapshot snapshot
6854
stack->buffer=ginStepRight(stack->buffer,btree->index,GIN_SHARE);
6955
stack->blkno=BufferGetBlockNumber(stack->buffer);
7056
stack->off=FirstOffsetNumber;
71-
GinPredicateLockPage(btree->index,stack->blkno,snapshot);
57+
PredicateLockPage(btree->index,stack->blkno,snapshot);
7258
}
7359

7460
return true;
@@ -100,11 +86,6 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
10086
*/
10187
for (;;)
10288
{
103-
/*
104-
* Predicate lock each leaf page in posting tree
105-
*/
106-
GinPredicateLockPage(index,BufferGetBlockNumber(buffer),snapshot);
107-
10889
page=BufferGetPage(buffer);
10990
if ((GinPageGetOpaque(page)->flags&GIN_DELETED)==0)
11091
{
@@ -158,7 +139,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
158139
* Predicate lock entry leaf page, following pages will be locked by
159140
* moveRightIfItNeeded()
160141
*/
161-
GinPredicateLockPage(btree->index,stack->buffer,snapshot);
142+
PredicateLockPage(btree->index,stack->buffer,snapshot);
162143

163144
for (;;)
164145
{
@@ -253,6 +234,13 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
253234

254235
LockBuffer(stack->buffer,GIN_UNLOCK);
255236

237+
/*
238+
* Acquire predicate lock on the posting tree. We already hold
239+
* a lock on the entry page, but insertions to the posting tree
240+
* don't check for conflicts on that level.
241+
*/
242+
PredicateLockPage(btree->index,rootPostingTree,snapshot);
243+
256244
/* Collect all the TIDs in this entry's posting tree */
257245
scanPostingTree(btree->index,scanEntry,rootPostingTree,
258246
snapshot);
@@ -400,17 +388,20 @@ startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
400388
{
401389
IndexTupleitup= (IndexTuple)PageGetItem(page,PageGetItemId(page,stackEntry->off));
402390

403-
/* Predicate lock visited entry leaf page */
404-
GinPredicateLockPage(ginstate->index,
405-
BufferGetBlockNumber(stackEntry->buffer),snapshot);
406-
407391
if (GinIsPostingTree(itup))
408392
{
409393
BlockNumberrootPostingTree=GinGetPostingTree(itup);
410394
GinBtreeStack*stack;
411395
Pagepage;
412396
ItemPointerDataminItem;
413397

398+
/*
399+
* This is an equality scan, so lock the root of the posting tree.
400+
* It represents a lock on the exact key value, and covers all the
401+
* items in the posting tree.
402+
*/
403+
PredicateLockPage(ginstate->index,rootPostingTree,snapshot);
404+
414405
/*
415406
* We should unlock entry page before touching posting tree to
416407
* prevent deadlocks with vacuum processes. Because entry is never
@@ -425,12 +416,6 @@ startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
425416
rootPostingTree,snapshot);
426417
entry->buffer=stack->buffer;
427418

428-
/*
429-
* Predicate lock visited posting tree page, following pages will
430-
* be locked by moveRightIfItNeeded or entryLoadMoreItems
431-
*/
432-
GinPredicateLockPage(ginstate->index,BufferGetBlockNumber(entry->buffer),snapshot);
433-
434419
/*
435420
* We keep buffer pinned because we need to prevent deletion of
436421
* page during scan. See GIN's vacuum implementation. RefCount is
@@ -452,15 +437,38 @@ startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
452437
freeGinBtreeStack(stack);
453438
entry->isFinished= false;
454439
}
455-
elseif (GinGetNPosting(itup)>0)
440+
else
456441
{
457-
entry->list=ginReadTuple(ginstate,entry->attnum,itup,
458-
&entry->nlist);
459-
entry->predictNumberResult=entry->nlist;
442+
/*
443+
* Lock the entry leaf page. This is more coarse-grained than
444+
* necessary, because it will conflict with any insertions that
445+
* land on the same leaf page, not only the exacty key we searched
446+
* for. But locking an individual tuple would require updating
447+
* that lock whenever it moves because of insertions or vacuums,
448+
* which seems too complicated.
449+
*/
450+
PredicateLockPage(ginstate->index,
451+
BufferGetBlockNumber(stackEntry->buffer),
452+
snapshot);
453+
if (GinGetNPosting(itup)>0)
454+
{
455+
entry->list=ginReadTuple(ginstate,entry->attnum,itup,
456+
&entry->nlist);
457+
entry->predictNumberResult=entry->nlist;
460458

461-
entry->isFinished= false;
459+
entry->isFinished= false;
460+
}
462461
}
463462
}
463+
else
464+
{
465+
/*
466+
* No entry found. Predicate lock the leaf page, to lock the place
467+
* where the entry would've been, had there been one.
468+
*/
469+
PredicateLockPage(ginstate->index,
470+
BufferGetBlockNumber(stackEntry->buffer),snapshot);
471+
}
464472

465473
if (needUnlock)
466474
LockBuffer(stackEntry->buffer,GIN_UNLOCK);
@@ -533,7 +541,7 @@ startScanKey(GinState *ginstate, GinScanOpaque so, GinScanKey key)
533541

534542
for (i=0;i<key->nentries-1;i++)
535543
{
536-
/* Pass all entries <= i asfalse, and the rest as MAYBE */
544+
/* Pass all entries <= i asFALSE, and the rest as MAYBE */
537545
for (j=0;j <=i;j++)
538546
key->entryRes[entryIndexes[j]]=GIN_FALSE;
539547
for (j=i+1;j<key->nentries;j++)
@@ -673,8 +681,6 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
673681
entry->btree.fullScan= false;
674682
stack=ginFindLeafPage(&entry->btree, true,snapshot);
675683

676-
GinPredicateLockPage(ginstate->index,BufferGetBlockNumber(stack->buffer),snapshot);
677-
678684
/* we don't need the stack, just the buffer. */
679685
entry->buffer=stack->buffer;
680686
IncrBufferRefCount(entry->buffer);
@@ -719,10 +725,6 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
719725
entry->buffer=ginStepRight(entry->buffer,
720726
ginstate->index,
721727
GIN_SHARE);
722-
723-
GinPredicateLockPage(ginstate->index,BufferGetBlockNumber(entry->buffer),snapshot);
724-
725-
726728
page=BufferGetPage(entry->buffer);
727729
}
728730
stepright= true;
@@ -1084,8 +1086,8 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
10841086
* lossy page even when none of the other entries match.
10851087
*
10861088
* Our strategy is to call the tri-state consistent function, with the
1087-
* lossy-page entries set to MAYBE, and all the other entriesfalse. If it
1088-
* returnsfalse, none of the lossy items alone are enough for a match, so
1089+
* lossy-page entries set to MAYBE, and all the other entriesFALSE. If it
1090+
* returnsFALSE, none of the lossy items alone are enough for a match, so
10891091
* we don't need to return a lossy-page pointer. Otherwise, return a
10901092
* lossy-page pointer to indicate that the whole heap page must be
10911093
* checked. (On subsequent calls, we'll do nothing until minItem is past
@@ -1746,8 +1748,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
17461748
}
17471749

17481750
/*
1749-
* Collect all matched rows from pending list into bitmap. Also function
1750-
* takes PendingLockRelation if it's needed.
1751+
* Collect all matched rows from pending list into bitmap.
17511752
*/
17521753
staticvoid
17531754
scanPendingInsert(IndexScanDescscan,TIDBitmap*tbm,int64*ntids)
@@ -1764,6 +1765,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
17641765

17651766
*ntids=0;
17661767

1768+
/*
1769+
* Acquire predicate lock on the metapage, to conflict with any
1770+
* fastupdate insertions.
1771+
*/
1772+
PredicateLockPage(scan->indexRelation,GIN_METAPAGE_BLKNO,scan->xs_snapshot);
1773+
17671774
LockBuffer(metabuffer,GIN_SHARE);
17681775
page=BufferGetPage(metabuffer);
17691776
TestForOldSnapshot(scan->xs_snapshot,scan->indexRelation,page);
@@ -1777,24 +1784,9 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
17771784
{
17781785
/* No pending list, so proceed with normal scan */
17791786
UnlockReleaseBuffer(metabuffer);
1780-
1781-
/*
1782-
* If fast update is enabled, we acquire a predicate lock on the
1783-
* entire relation as fast update postpones the insertion of tuples
1784-
* into index structure due to which we can't detect rw conflicts.
1785-
*/
1786-
if (GinGetUseFastUpdate(scan->indexRelation))
1787-
PredicateLockRelation(scan->indexRelation,scan->xs_snapshot);
1788-
17891787
return;
17901788
}
17911789

1792-
/*
1793-
* Pending list is not empty, we need to lock the index doesn't despite on
1794-
* fastupdate state
1795-
*/
1796-
PredicateLockRelation(scan->indexRelation,scan->xs_snapshot);
1797-
17981790
pos.pendingBuffer=ReadBuffer(scan->indexRelation,blkno);
17991791
LockBuffer(pos.pendingBuffer,GIN_SHARE);
18001792
pos.firstOffset=FirstOffsetNumber;

‎src/backend/access/gin/gininsert.c

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ ginEntryInsert(GinState *ginstate,
219219
return;
220220
}
221221

222-
GinCheckForSerializableConflictIn(btree.index,NULL,stack->buffer);
222+
CheckForSerializableConflictIn(ginstate->index,NULL,stack->buffer);
223223
/* modify an existing leaf entry */
224224
itup=addItemPointersToLeafTuple(ginstate,itup,
225225
items,nitem,buildStats,stack->buffer);
@@ -228,7 +228,7 @@ ginEntryInsert(GinState *ginstate,
228228
}
229229
else
230230
{
231-
GinCheckForSerializableConflictIn(btree.index,NULL,stack->buffer);
231+
CheckForSerializableConflictIn(ginstate->index,NULL,stack->buffer);
232232
/* no match, so construct a new leaf entry */
233233
itup=buildFreshLeafTuple(ginstate,attnum,key,category,
234234
items,nitem,buildStats,stack->buffer);
@@ -517,18 +517,6 @@ gininsert(Relation index, Datum *values, bool *isnull,
517517

518518
memset(&collector,0,sizeof(GinTupleCollector));
519519

520-
/*
521-
* With fastupdate on each scan and each insert begin with access to
522-
* pending list, so it effectively lock entire index. In this case we
523-
* aquire predicate lock and check for conflicts over index relation,
524-
* and hope that it will reduce locking overhead.
525-
*
526-
* Do not use GinCheckForSerializableConflictIn() here, because it
527-
* will do nothing (it does actual work only with fastupdate off).
528-
* Check for conflicts for entire index.
529-
*/
530-
CheckForSerializableConflictIn(index,NULL,InvalidBuffer);
531-
532520
for (i=0;i<ginstate->origTupdesc->natts;i++)
533521
ginHeapTupleFastCollect(ginstate,&collector,
534522
(OffsetNumber) (i+1),
@@ -539,16 +527,6 @@ gininsert(Relation index, Datum *values, bool *isnull,
539527
}
540528
else
541529
{
542-
GinStatsDatastats;
543-
544-
/*
545-
* Fastupdate is off but if pending list isn't empty then we need to
546-
* check conflicts with PredicateLockRelation in scanPendingInsert().
547-
*/
548-
ginGetStats(index,&stats);
549-
if (stats.nPendingPages>0)
550-
CheckForSerializableConflictIn(index,NULL,InvalidBuffer);
551-
552530
for (i=0;i<ginstate->origTupdesc->natts;i++)
553531
ginHeapTupleInsert(ginstate, (OffsetNumber) (i+1),
554532
values[i],isnull[i],

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp