Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6c512fc

Browse files
committed
Fix handling of empty ranges and NULLs in BRIN
BRIN indexes did not properly distinguish between summaries for empty(no rows) and all-NULL ranges, treating them as essentially the samething. Summaries were initialized with allnulls=true, and opclassessimply reset allnulls to false when processing the first non-NULL value.This however produces incorrect results if the range starts with a NULLvalue (or a sequence of NULL values), in which case we forget the rangecontains NULL values when adding the first non-NULL value.This happens because the allnulls flag is used for two separatepurposes - to mark empty ranges (not representing any rows yet) andranges containing only NULL values.Opclasses don't know which of these cases it is, and so don't knowwhether to set hasnulls=true. Setting the flag in both cases would makeit correct, but it would also make BRIN indexes useless for queries withIS NULL clauses. All ranges start empty (and thus allnulls=true), so allranges would end up with either allnulls=true or hasnulls=true.The severity of the issue is somewhat reduced by the fact that it onlyhappens when adding values to an existing summary with allnulls=true.This can happen e.g. for small tables (because a summary for the firstrange exists for all BRIN indexes), or for tables with large fraction ofNULL values in the indexed columns.Bulk summarization (e.g. during CREATE INDEX or automatic summarization)that processes all values at once is not affected by this issue. In thiscase the flags were updated in a slightly different way, not forgettingthe NULL values.To identify empty ranges we use a new flag, stored in an unused bit inthe BRIN tuple header so the on-disk format remains the same. A matchingflag is added to BrinMemTuple, into a 3B gap after bt_placeholder.That means there's no risk of ABI breakage, although we don't actuallypass the BrinMemTuple to any public API.We could also skip storing index tuples for empty summaries, but thenwe'd have to always process such ranges - even if there are no rows inlarge parts of the table (e.g. after a bulk DELETE), it would stillrequire reading the pages etc. So we store them, but ignore them whenbuilding the bitmap.Backpatch to 11. The issue exists since BRIN indexes were introduced in9.5, but older releases are already EOL.Backpatch-through: 11Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro HerreraDiscussion:https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
1 parent2b1ab28 commit6c512fc

File tree

5 files changed

+167
-8
lines changed

5 files changed

+167
-8
lines changed

‎src/backend/access/brin/brin.c

Lines changed: 145 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#include"storage/freespace.h"
3636
#include"utils/acl.h"
3737
#include"utils/builtins.h"
38+
#include"utils/datum.h"
3839
#include"utils/index_selfuncs.h"
3940
#include"utils/memutils.h"
4041
#include"utils/rel.h"
@@ -173,7 +174,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
173174

174175
for (;;)
175176
{
176-
boolneed_insert= false;
177+
boolneed_insert;
177178
OffsetNumberoff;
178179
BrinTuple*brtup;
179180
BrinMemTuple*dtup;
@@ -241,6 +242,9 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
241242

242243
dtup=brin_deform_tuple(bdesc,brtup,NULL);
243244

245+
/* If the range starts empty, we're certainly going to modify it. */
246+
need_insert=dtup->bt_empty_range;
247+
244248
/*
245249
* Compare the key values of the new tuple to the stored index values;
246250
* our deformed tuple will get updated if the new tuple doesn't fit
@@ -253,8 +257,20 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
253257
Datumresult;
254258
BrinValues*bval;
255259
FmgrInfo*addValue;
260+
boolhas_nulls;
256261

257262
bval=&dtup->bt_columns[keyno];
263+
264+
/*
265+
* Does the range have actual NULL values? Either of the flags can
266+
* be set, but we ignore the state before adding first row.
267+
*
268+
* We have to remember this, because we'll modify the flags and we
269+
* need to know if the range started as empty.
270+
*/
271+
has_nulls= ((!dtup->bt_empty_range)&&
272+
(bval->bv_hasnulls||bval->bv_allnulls));
273+
258274
addValue=index_getprocinfo(idxRel,keyno+1,
259275
BRIN_PROCNUM_ADDVALUE);
260276
result=FunctionCall4Coll(addValue,
@@ -265,8 +281,33 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
265281
nulls[keyno]);
266282
/* if that returned true, we need to insert the updated tuple */
267283
need_insert |=DatumGetBool(result);
284+
285+
/*
286+
* If the range was had actual NULL values (i.e. did not start empty),
287+
* make sure we don't forget about the NULL values. Either the allnulls
288+
* flag is still set to true, or (if the opclass cleared it) we need to
289+
* set hasnulls=true.
290+
*
291+
* XXX This can only happen when the opclass modified the tuple, so the
292+
* modified flag should be set.
293+
*/
294+
if (has_nulls&& !(bval->bv_hasnulls||bval->bv_allnulls))
295+
{
296+
Assert(need_insert);
297+
bval->bv_hasnulls= true;
298+
}
268299
}
269300

301+
/*
302+
* After updating summaries for all the keys, mark it as not empty.
303+
*
304+
* If we're actually changing the flag value (i.e. tuple started as
305+
* empty), we should have modified the tuple. So we should not see
306+
* empty range that was not modified.
307+
*/
308+
Assert(!dtup->bt_empty_range||need_insert);
309+
dtup->bt_empty_range= false;
310+
270311
if (!need_insert)
271312
{
272313
/*
@@ -508,6 +549,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
508549
CurrentMemoryContext);
509550
}
510551

552+
/*
553+
* If the BRIN tuple indicates that this range is empty,
554+
* we can skip it: there's nothing to match. We don't
555+
* need to examine the next columns.
556+
*/
557+
if (dtup->bt_empty_range)
558+
{
559+
addrange= false;
560+
break;
561+
}
562+
511563
/*
512564
* Check whether the scan key is consistent with the page
513565
* range values; if so, have the pages in the range added
@@ -645,8 +697,24 @@ brinbuildCallback(Relation index,
645697
FmgrInfo*addValue;
646698
BrinValues*col;
647699
Form_pg_attributeattr=TupleDescAttr(state->bs_bdesc->bd_tupdesc,i);
700+
boolhas_nulls;
648701

649702
col=&state->bs_dtuple->bt_columns[i];
703+
704+
/*
705+
* Does the range have actual NULL values? Either of the flags can
706+
* be set, but we ignore the state before adding first row.
707+
*
708+
* We have to remember this, because we'll modify the flags and we
709+
* need to know if the range started as empty.
710+
*/
711+
has_nulls= ((!state->bs_dtuple->bt_empty_range)&&
712+
(col->bv_hasnulls||col->bv_allnulls));
713+
714+
/*
715+
* Call the BRIN_PROCNUM_ADDVALUE procedure. We do this even for NULL
716+
* values, because who knows what the opclass is doing.
717+
*/
650718
addValue=index_getprocinfo(index,i+1,
651719
BRIN_PROCNUM_ADDVALUE);
652720

@@ -658,7 +726,25 @@ brinbuildCallback(Relation index,
658726
PointerGetDatum(state->bs_bdesc),
659727
PointerGetDatum(col),
660728
values[i],isnull[i]);
729+
730+
/*
731+
* If the range was had actual NULL values (i.e. did not start empty),
732+
* make sure we don't forget about the NULL values. Either the allnulls
733+
* flag is still set to true, or (if the opclass cleared it) we need to
734+
* set hasnulls=true.
735+
*/
736+
if (has_nulls&& !(col->bv_hasnulls||col->bv_allnulls))
737+
col->bv_hasnulls= true;
661738
}
739+
740+
/*
741+
* After updating summaries for all the keys, mark it as not empty.
742+
*
743+
* If we're actually changing the flag value (i.e. tuple started as
744+
* empty), we should have modified the tuple. So we should not see
745+
* empty range that was not modified.
746+
*/
747+
state->bs_dtuple->bt_empty_range= false;
662748
}
663749

664750
/*
@@ -1465,6 +1551,64 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
14651551
db=brin_deform_tuple(bdesc,b,NULL);
14661552
MemoryContextSwitchTo(oldcxt);
14671553

1554+
/*
1555+
* Check if the ranges are empty.
1556+
*
1557+
* If at least one of them is empty, we don't need to call per-key union
1558+
* functions at all. If "b" is empty, we just use "a" as the result (it
1559+
* might be empty fine, but that's fine). If "a" is empty but "b" is not,
1560+
* we use "b" as the result (but we have to copy the data into "a" first).
1561+
*
1562+
* Only when both ranges are non-empty, we actually do the per-key merge.
1563+
*/
1564+
1565+
/* If "b" is empty - ignore it and just use "a" (even if it's empty etc.). */
1566+
if (db->bt_empty_range)
1567+
{
1568+
/* skip the per-key merge */
1569+
MemoryContextDelete(cxt);
1570+
return;
1571+
}
1572+
1573+
/*
1574+
* Now we know "b" is not empty. If "a" is empty, then "b" is the result.
1575+
* But we need to copy the data from "b" to "a" first, because that's how
1576+
* we pass result out.
1577+
*
1578+
* We have to copy all the global/per-key flags etc. too.
1579+
*/
1580+
if (a->bt_empty_range)
1581+
{
1582+
for (keyno=0;keyno<bdesc->bd_tupdesc->natts;keyno++)
1583+
{
1584+
inti;
1585+
BrinValues*col_a=&a->bt_columns[keyno];
1586+
BrinValues*col_b=&db->bt_columns[keyno];
1587+
BrinOpcInfo*opcinfo=bdesc->bd_info[keyno];
1588+
1589+
col_a->bv_allnulls=col_b->bv_allnulls;
1590+
col_a->bv_hasnulls=col_b->bv_hasnulls;
1591+
1592+
/* If "b" has no data, we're done. */
1593+
if (col_b->bv_allnulls)
1594+
continue;
1595+
1596+
for (i=0;i<opcinfo->oi_nstored;i++)
1597+
col_a->bv_values[i]=
1598+
datumCopy(col_b->bv_values[i],
1599+
opcinfo->oi_typcache[i]->typbyval,
1600+
opcinfo->oi_typcache[i]->typlen);
1601+
}
1602+
1603+
/* "a" started empty, but "b" was not empty, so remember that */
1604+
a->bt_empty_range= false;
1605+
1606+
/* skip the per-key merge */
1607+
MemoryContextDelete(cxt);
1608+
return;
1609+
}
1610+
1611+
/* Neither range is empty, so call the union proc. */
14681612
for (keyno=0;keyno<bdesc->bd_tupdesc->natts;keyno++)
14691613
{
14701614
FmgrInfo*unionFn;

‎src/backend/access/brin/brin_tuple.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,9 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
349349
if (tuple->bt_placeholder)
350350
rettuple->bt_info |=BRIN_PLACEHOLDER_MASK;
351351

352+
if (tuple->bt_empty_range)
353+
rettuple->bt_info |=BRIN_EMPTY_RANGE_MASK;
354+
352355
*size=len;
353356
returnrettuple;
354357
}
@@ -376,7 +379,7 @@ brin_form_placeholder_tuple(BrinDesc *brdesc, BlockNumber blkno, Size *size)
376379
rettuple=palloc0(len);
377380
rettuple->bt_blkno=blkno;
378381
rettuple->bt_info=hoff;
379-
rettuple->bt_info |=BRIN_NULLS_MASK |BRIN_PLACEHOLDER_MASK;
382+
rettuple->bt_info |=BRIN_NULLS_MASK |BRIN_PLACEHOLDER_MASK |BRIN_EMPTY_RANGE_MASK;
380383

381384
bitP= ((bits8*) ((char*)rettuple+SizeOfBrinTuple))-1;
382385
bitmask=HIGHBIT;
@@ -466,6 +469,8 @@ brin_new_memtuple(BrinDesc *brdesc)
466469
dtup->bt_allnulls=palloc(sizeof(bool)*brdesc->bd_tupdesc->natts);
467470
dtup->bt_hasnulls=palloc(sizeof(bool)*brdesc->bd_tupdesc->natts);
468471

472+
dtup->bt_empty_range= true;
473+
469474
dtup->bt_context=AllocSetContextCreate(CurrentMemoryContext,
470475
"brin dtuple",
471476
ALLOCSET_DEFAULT_SIZES);
@@ -499,6 +504,8 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
499504
currdatum+=sizeof(Datum)*brdesc->bd_info[i]->oi_nstored;
500505
}
501506

507+
dtuple->bt_empty_range= true;
508+
502509
returndtuple;
503510
}
504511

@@ -532,6 +539,11 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
532539

533540
if (BrinTupleIsPlaceholder(tuple))
534541
dtup->bt_placeholder= true;
542+
543+
/* ranges start as empty, depends on the BrinTuple */
544+
if (!BrinTupleIsEmptyRange(tuple))
545+
dtup->bt_empty_range= false;
546+
535547
dtup->bt_blkno=tuple->bt_blkno;
536548

537549
values=dtup->bt_values;

‎src/include/access/brin_tuple.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ typedef struct BrinValues
3636
typedefstructBrinMemTuple
3737
{
3838
boolbt_placeholder;/* this is a placeholder tuple */
39+
boolbt_empty_range;/* range represents no tuples */
3940
BlockNumberbt_blkno;/* heap blkno that the tuple is for */
4041
MemoryContextbt_context;/* memcxt holding the bt_columns values */
4142
/* output arrays for brin_deform_tuple: */
@@ -61,7 +62,7 @@ typedef struct BrinTuple
6162
*
6263
* 7th (high) bit: has nulls
6364
* 6th bit: is placeholder tuple
64-
* 5th bit:unused
65+
* 5th bit:range is empty
6566
* 4-0 bit: offset of data
6667
* ---------------
6768
*/
@@ -74,13 +75,14 @@ typedef struct BrinTuple
7475
* bt_info manipulation macros
7576
*/
7677
#defineBRIN_OFFSET_MASK0x1F
77-
/* bit0x20 is not used at present */
78+
#defineBRIN_EMPTY_RANGE_MASK0x20
7879
#defineBRIN_PLACEHOLDER_MASK0x40
7980
#defineBRIN_NULLS_MASK0x80
8081

8182
#defineBrinTupleDataOffset(tup)((Size) (((BrinTuple *) (tup))->bt_info & BRIN_OFFSET_MASK))
8283
#defineBrinTupleHasNulls(tup)(((((BrinTuple *) (tup))->bt_info & BRIN_NULLS_MASK)) != 0)
8384
#defineBrinTupleIsPlaceholder(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_PLACEHOLDER_MASK)) != 0)
85+
#defineBrinTupleIsEmptyRange(tup) (((((BrinTuple *) (tup))->bt_info & BRIN_EMPTY_RANGE_MASK)) != 0)
8486

8587

8688
externBrinTuple*brin_form_tuple(BrinDesc*brdesc,BlockNumberblkno,

‎src/test/modules/brin/expected/summarization-and-inprogress-insertion.out

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
44
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
55
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
66
----------+------+------+--------+--------+-----------+--------
7-
1| 0| 1|f |f |f |{1 .. 1}
7+
1| 0| 1|f |t |f |{1 .. 1}
88
(1 row)
99

1010
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -26,7 +26,7 @@ step s2c: COMMIT;
2626
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
2727
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
2828
----------+------+------+--------+--------+-----------+-----------
29-
1| 0| 1|f |f |f |{1 .. 1}
29+
1| 0| 1|f |t |f |{1 .. 1}
3030
2| 1| 1|f |f |f |{1 .. 1000}
3131
(2 rows)
3232

@@ -35,7 +35,7 @@ starting permutation: s2check s1b s1i s2vacuum s1c s2check
3535
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
3636
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
3737
----------+------+------+--------+--------+-----------+--------
38-
1| 0| 1|f |f |f |{1 .. 1}
38+
1| 0| 1|f |t |f |{1 .. 1}
3939
(1 row)
4040

4141
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
@@ -45,7 +45,7 @@ step s1c: COMMIT;
4545
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
4646
itemoffset|blknum|attnum|allnulls|hasnulls|placeholder|value
4747
----------+------+------+--------+--------+-----------+-----------
48-
1| 0| 1|f |f |f |{1 .. 1}
48+
1| 0| 1|f |t |f |{1 .. 1}
4949
2| 1| 1|f |f |f |{1 .. 1000}
5050
(2 rows)
5151

‎src/test/modules/brin/specs/summarization-and-inprogress-insertion.spec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ setup
99
)WITH(fillfactor=10);
1010
CREATEINDEXbrinidxONbrin_isoUSINGbrin(value)WITH(pages_per_range=1);
1111
--thisfillsthefirstpage
12+
INSERTINTObrin_isoVALUES(NULL);
1213
DO $$
1314
DECLAREcurtidtid;
1415
BEGIN

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp