Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9b10926

Browse files
Prevent O(N^2) unique index insertion edge case.
Commitdd299df made nbtree treat heap TID as a tiebreaker column,establishing the principle that there is only one correct location (pageand page offset number) for every index tuple, no matter what.Insertions of tuples into non-unique indexes proceed as if heap TID(scan key's scantid) is just another user-attribute value, butinsertions into unique indexes are more delicate. The TID value inscantid must initially be omitted to ensure that the unique indexinsertion visits every leaf page that duplicates could be on. Thescantid is set once again after unique checking finishes successfully,which can force _bt_findinsertloc() to step right one or more times, tolocate the leaf page that the new tuple must be inserted on.Stepping right within _bt_findinsertloc() was assumed to occur no morefrequently than stepping right within _bt_check_unique(), but there wasone important case where that assumption was incorrect: inserting a"duplicate" with NULL values. Since _bt_check_unique() didn't do anyreal work in this case, it wasn't appropriate for _bt_findinsertloc() tobehave as if it was finishing off a conventional unique insertion, whereany existing physical duplicate must be dead or recently dead._bt_findinsertloc() might have to grovel through a substantial portionof all of the leaf pages in the index to insert a single tuple, evenwhen there were no dead tuples.To fix, treat insertions of tuples with NULLs into a unique index as ifthey were insertions into a non-unique index: never unset scantid beforecalling _bt_search() to descend the tree, and bypass _bt_check_unique()entirely. _bt_check_unique() is no longer responsible for incomingtuples with NULL values.Discussion:https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
1 parentf4a3fdf commit9b10926

File tree

4 files changed

+56
-76
lines changed

4 files changed

+56
-76
lines changed

‎src/backend/access/nbtree/nbtinsert.c

Lines changed: 41 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,6 @@ static void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
5555
BTStackstack,boolis_root,boolis_only);
5656
staticbool_bt_pgaddtup(Pagepage,Sizeitemsize,IndexTupleitup,
5757
OffsetNumberitup_off);
58-
staticbool_bt_isequal(TupleDescitupdesc,BTScanInsertitup_key,
59-
Pagepage,OffsetNumberoffnum);
6058
staticvoid_bt_vacuum_one_page(Relationrel,Bufferbuffer,RelationheapRel);
6159

6260
/*
@@ -91,9 +89,31 @@ _bt_doinsert(Relation rel, IndexTuple itup,
9189

9290
/* we need an insertion scan key to do our search, so build one */
9391
itup_key=_bt_mkscankey(rel,itup);
94-
/* No scantid until uniqueness established in checkingunique case */
95-
if (checkingunique&&itup_key->heapkeyspace)
96-
itup_key->scantid=NULL;
92+
93+
if (checkingunique)
94+
{
95+
if (!itup_key->anynullkeys)
96+
{
97+
/* No (heapkeyspace) scantid until uniqueness established */
98+
itup_key->scantid=NULL;
99+
}
100+
else
101+
{
102+
/*
103+
* Scan key for new tuple contains NULL key values. Bypass
104+
* checkingunique steps. They are unnecessary because core code
105+
* considers NULL unequal to every value, including NULL.
106+
*
107+
* This optimization avoids O(N^2) behavior within the
108+
* _bt_findinsertloc() heapkeyspace path when a unique index has a
109+
* large number of "duplicates" with NULL key values.
110+
*/
111+
checkingunique= false;
112+
/* Tuple is unique in the sense that core code cares about */
113+
Assert(checkUnique!=UNIQUE_CHECK_EXISTING);
114+
is_unique= true;
115+
}
116+
}
97117

98118
/*
99119
* Fill in the BTInsertState working area, to track the current page and
@@ -209,7 +229,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
209229
* NOTE: obviously, _bt_check_unique can only detect keys that are already
210230
* in the index; so it cannot defend against concurrent insertions of the
211231
* same key. We protect against that by means of holding a write lock on
212-
* the first page the value could be on,regardless of thevalueof its
232+
* the first page the value could be on,with omitted/-infvaluefor the
213233
* implicit heap TID tiebreaker attribute. Any other would-be inserter of
214234
* the same key must acquire a write lock on the same page, so only one
215235
* would-be inserter can be making the check at one time. Furthermore,
@@ -266,10 +286,9 @@ _bt_doinsert(Relation rel, IndexTuple itup,
266286
/*
267287
* The only conflict predicate locking cares about for indexes is when
268288
* an index tuple insert conflicts with an existing lock. We don't
269-
* know the actual page we're going to insert to yet because scantid
270-
* was not filled in initially, but it's okay to use the "first valid"
271-
* page instead. This reasoning also applies to INCLUDE indexes,
272-
* whose extra attributes are not considered part of the key space.
289+
* know the actual page we're going to insert on for sure just yet in
290+
* checkingunique and !heapkeyspace cases, but it's okay to use the
291+
* first page the value could be on (with scantid omitted) instead.
273292
*/
274293
CheckForSerializableConflictIn(rel,NULL,insertstate.buf);
275294

@@ -315,13 +334,16 @@ _bt_doinsert(Relation rel, IndexTuple itup,
315334
* As a side-effect, sets state in insertstate that can later be used by
316335
* _bt_findinsertloc() to reuse most of the binary search work we do
317336
* here.
337+
*
338+
* Do not call here when there are NULL values in scan key. NULL should be
339+
* considered unequal to NULL when checking for duplicates, but we are not
340+
* prepared to handle that correctly.
318341
*/
319342
staticTransactionId
320343
_bt_check_unique(Relationrel,BTInsertStateinsertstate,RelationheapRel,
321344
IndexUniqueCheckcheckUnique,bool*is_unique,
322345
uint32*speculativeToken)
323346
{
324-
TupleDescitupdesc=RelationGetDescr(rel);
325347
IndexTupleitup=insertstate->itup;
326348
BTScanInsertitup_key=insertstate->itup_key;
327349
SnapshotDataSnapshotDirty;
@@ -354,6 +376,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
354376
* Scan over all equal tuples, looking for live conflicts.
355377
*/
356378
Assert(!insertstate->bounds_valid||insertstate->low==offset);
379+
Assert(!itup_key->anynullkeys);
357380
Assert(itup_key->scantid==NULL);
358381
for (;;)
359382
{
@@ -375,16 +398,16 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
375398
* original page, which may indicate that we need to examine a
376399
* second or subsequent page.
377400
*
378-
* Note that this optimizationavoids calling _bt_isequal()
379-
*entirelywhen there are no duplicates, as long as the offset
380-
* where the key will go is not at the end of the page.
401+
* Note that this optimizationallows us to avoid calling
402+
*_bt_compare() directlywhen there are no duplicates, as long as
403+
*the offsetwhere the key will go is not at the end of the page.
381404
*/
382405
if (nbuf==InvalidBuffer&&offset==insertstate->stricthigh)
383406
{
384407
Assert(insertstate->bounds_valid);
385408
Assert(insertstate->low >=P_FIRSTDATAKEY(opaque));
386409
Assert(insertstate->low <=insertstate->stricthigh);
387-
Assert(!_bt_isequal(itupdesc,itup_key,page,offset));
410+
Assert(_bt_compare(rel,itup_key,page,offset)<0);
388411
break;
389412
}
390413

@@ -394,9 +417,9 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
394417
* We can skip items that are marked killed.
395418
*
396419
* In the presence of heavy update activity an index may contain
397-
* many killed items with the same key; running_bt_isequal() on
420+
* many killed items with the same key; running_bt_compare() on
398421
* each killed item gets expensive. Just advance over killed
399-
* items as quickly as we can. We only apply_bt_isequal() when
422+
* items as quickly as we can. We only apply_bt_compare() when
400423
* we get to a non-killed item. Even those comparisons could be
401424
* avoided (in the common case where there is only one page to
402425
* visit) by reusing bounds, but just skipping dead items is fast
@@ -407,13 +430,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
407430
ItemPointerDatahtid;
408431
boolall_dead;
409432

410-
/*
411-
* _bt_compare returns 0 for (1,NULL) and (1,NULL) - this's
412-
* how we handling NULLs - and so we must not use _bt_compare
413-
* in real comparison, but only for ordering/finding items on
414-
* pages. - vadim 03/24/97
415-
*/
416-
if (!_bt_isequal(itupdesc,itup_key,page,offset))
433+
if (_bt_compare(rel,itup_key,page,offset)!=0)
417434
break;/* we're past all the equal tuples */
418435

419436
/* okay, we gotta fetch the heap tuple ... */
@@ -2184,58 +2201,6 @@ _bt_pgaddtup(Page page,
21842201
return true;
21852202
}
21862203

2187-
/*
2188-
* _bt_isequal - used in _bt_doinsert in check for duplicates.
2189-
*
2190-
* This is very similar to _bt_compare, except for NULL and negative infinity
2191-
* handling. Rule is simple: NOT_NULL not equal NULL, NULL not equal NULL too.
2192-
*/
2193-
staticbool
2194-
_bt_isequal(TupleDescitupdesc,BTScanInsertitup_key,Pagepage,
2195-
OffsetNumberoffnum)
2196-
{
2197-
IndexTupleitup;
2198-
ScanKeyscankey;
2199-
inti;
2200-
2201-
/* Better be comparing to a non-pivot item */
2202-
Assert(P_ISLEAF((BTPageOpaque)PageGetSpecialPointer(page)));
2203-
Assert(offnum >=P_FIRSTDATAKEY((BTPageOpaque)PageGetSpecialPointer(page)));
2204-
Assert(itup_key->scantid==NULL);
2205-
2206-
scankey=itup_key->scankeys;
2207-
itup= (IndexTuple)PageGetItem(page,PageGetItemId(page,offnum));
2208-
2209-
for (i=1;i <=itup_key->keysz;i++)
2210-
{
2211-
AttrNumberattno;
2212-
Datumdatum;
2213-
boolisNull;
2214-
int32result;
2215-
2216-
attno=scankey->sk_attno;
2217-
Assert(attno==i);
2218-
datum=index_getattr(itup,attno,itupdesc,&isNull);
2219-
2220-
/* NULLs are never equal to anything */
2221-
if (isNull|| (scankey->sk_flags&SK_ISNULL))
2222-
return false;
2223-
2224-
result=DatumGetInt32(FunctionCall2Coll(&scankey->sk_func,
2225-
scankey->sk_collation,
2226-
datum,
2227-
scankey->sk_argument));
2228-
2229-
if (result!=0)
2230-
return false;
2231-
2232-
scankey++;
2233-
}
2234-
2235-
/* if we get here, the keys are equal */
2236-
return true;
2237-
}
2238-
22392204
/*
22402205
* _bt_vacuum_one_page - vacuum just one index page.
22412206
*

‎src/backend/access/nbtree/nbtsearch.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1235,6 +1235,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
12351235

12361236
/* Initialize remaining insertion scan key fields */
12371237
inskey.heapkeyspace=_bt_heapkeyspace(rel);
1238+
inskey.anynullkeys= false;/* unusued */
12381239
inskey.nextkey=nextkey;
12391240
inskey.pivotsearch= false;
12401241
inskey.scantid=NULL;

‎src/backend/access/nbtree/nbtutils.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
107107
key=palloc(offsetof(BTScanInsertData,scankeys)+
108108
sizeof(ScanKeyData)*indnkeyatts);
109109
key->heapkeyspace=itup==NULL||_bt_heapkeyspace(rel);
110+
key->anynullkeys= false;/* initial assumption */
110111
key->nextkey= false;
111112
key->pivotsearch= false;
112113
key->keysz=Min(indnkeyatts,tupnatts);
@@ -147,6 +148,9 @@ _bt_mkscankey(Relation rel, IndexTuple itup)
147148
rel->rd_indcollation[i],
148149
procinfo,
149150
arg);
151+
/* Record if any key attribute is NULL (or truncated) */
152+
if (null)
153+
key->anynullkeys= true;
150154
}
151155

152156
returnkey;

‎src/include/access/nbtree.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -435,6 +435,15 @@ typedef BTStackData *BTStack;
435435
* indexes whose version is >= version 4. It's convenient to keep this close
436436
* by, rather than accessing the metapage repeatedly.
437437
*
438+
* anynullkeys indicates if any of the keys had NULL value when scankey was
439+
* built from index tuple (note that already-truncated tuple key attributes
440+
* set NULL as a placeholder key value, which also affects value of
441+
* anynullkeys). This is a convenience for unique index non-pivot tuple
442+
* insertion, which usually temporarily unsets scantid, but shouldn't iff
443+
* anynullkeys is true. Value generally matches non-pivot tuple's HasNulls
444+
* bit, but may not when inserting into an INCLUDE index (tuple header value
445+
* is affected by the NULL-ness of both key and non-key attributes).
446+
*
438447
* When nextkey is false (the usual case), _bt_search and _bt_binsrch will
439448
* locate the first item >= scankey. When nextkey is true, they will locate
440449
* the first item > scan key.
@@ -461,6 +470,7 @@ typedef BTStackData *BTStack;
461470
typedefstructBTScanInsertData
462471
{
463472
boolheapkeyspace;
473+
boolanynullkeys;
464474
boolnextkey;
465475
boolpivotsearch;
466476
ItemPointerscantid;/* tiebreaker for scankeys */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp