Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commita24bffc

Browse files
Avoid parallel nbtree index scan hangs with SAOPs.
Commit5bf748b, which enhanced nbtree ScalarArrayOp execution, madeparallel index scans work with the new design for arrays via explicitscheduling of primitive index scans. A backend that successfullyscheduled the scan's next primitive index scan saved its backend localarray keys in shared memory. Any backend could pick up the scheduledprimitive scan within _bt_first. This scheme decouples scheduling aprimitive scan from starting the scan (by performing another descent ofthe index via a _bt_search call from _bt_first) to make things robust.The scheme had a deadlock hazard, at least when the leader processparticipated in the scan. _bt_parallel_seize had a code path that madebackends that were not in an immediate position to start a scheduledprimitive index scan wait for some other backend to do so instead.Under the right circumstances, the leader process could wait hereforever: the leader would wait for any other backend to start theprimitive scan, while every worker was busy waiting on the leader toconsume tuples from the scan's tuple queue.To fix, don't wait for a scheduled primitive index scan to be started bysome other eligible backend from within _bt_parallel_seize (when thecalling backend isn't in a position to do so itself). Return falseinstead, while recording that the scan has a scheduled primitive indexscan in backend local state. This leaves the backend in the same stateas the existing case where a backend schedules (or tries to schedule)another primitive index scan from within _bt_advance_array_keys, beforecalling _bt_parallel_seize. _bt_parallel_seize already handles thatcase by returning false without waiting, and without unsetting thebackend local state. Leaving the backend in this state enables it tostart a previously scheduled primitive index scan once it gets back to_bt_first.Oversight in commit5bf748b, which enhanced nbtree ScalarArrayOpexecution.Matthias van de Meent, with tweaks by me.Author: Matthias van de Meent <boekewurm+postgres@gmail.com>Reported-By: Tomas Vondra <tomas@vondra.me>Reviewed-By: Peter Geoghegan <pg@bowt.ie>Discussion:https://postgr.es/m/CAH2-WzmMGaPa32u9x_FvEbPTUkP5e95i=QxR8054nvCRydP-sw@mail.gmail.comBackpatch: 17-, where nbtree SAOP execution was enhanced.
1 parent054a23b commita24bffc

File tree

1 file changed

+33
-20
lines changed

1 file changed

+33
-20
lines changed

‎src/backend/access/nbtree/nbtree.c‎

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,10 @@ btparallelrescan(IndexScanDesc scan)
585585
*or _bt_parallel_done().
586586
*
587587
* The return value is true if we successfully seized the scan and false
588-
* if we did not. The latter case occurs if no pages remain.
588+
* if we did not. The latter case occurs when no pages remain, or when
589+
* another primitive index scan is scheduled that caller's backend cannot
590+
* start just yet (only backends that call from _bt_first are capable of
591+
* starting primitive index scans, which they indicate by passing first=true).
589592
*
590593
* If the return value is true, *pageno returns the next or current page
591594
* of the scan (depending on the scan direction). An invalid block number
@@ -596,10 +599,6 @@ btparallelrescan(IndexScanDesc scan)
596599
* scan will return false.
597600
*
598601
* Callers should ignore the value of pageno if the return value is false.
599-
*
600-
* Callers that are in a position to start a new primitive index scan must
601-
* pass first=true (all other callers pass first=false). We just return false
602-
* for first=false callers that require another primitive index scan.
603602
*/
604603
bool
605604
_bt_parallel_seize(IndexScanDescscan,BlockNumber*pageno,boolfirst)
@@ -616,22 +615,16 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
616615
{
617616
/*
618617
* Initialize array related state when called from _bt_first, assuming
619-
* that this will either be the first primitive index scan for the
620-
* scan, or a previous explicitly scheduled primitive scan.
621-
*
622-
* Note: so->needPrimScan is only set when a scheduled primitive index
623-
* scan is set to be performed in caller's worker process. It should
624-
* not be set here by us for the first primitive scan, nor should we
625-
* ever set it for a parallel scan that has no array keys.
618+
* that this will be the first primitive index scan for the scan
626619
*/
627620
so->needPrimScan= false;
628621
so->scanBehind= false;
629622
}
630623
else
631624
{
632625
/*
633-
* Don't attempt to seize the scan whenbackend requires another
634-
*primitiveindex scan unless we're in a position tostart it now
626+
* Don't attempt to seize the scan whenit requires another primitive
627+
* index scan, since caller's backend cannotstart it right now
635628
*/
636629
if (so->needPrimScan)
637630
return false;
@@ -653,12 +646,9 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
653646
{
654647
Assert(so->numArrayKeys);
655648

656-
/*
657-
* If we can start another primitive scan right away, do so.
658-
* Otherwise just wait.
659-
*/
660649
if (first)
661650
{
651+
/* Can start scheduled primitive scan right away, so do so */
662652
btscan->btps_pageStatus=BTPARALLEL_ADVANCING;
663653
for (inti=0;i<so->numArrayKeys;i++)
664654
{
@@ -668,11 +658,25 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
668658
array->cur_elem=btscan->btps_arrElems[i];
669659
skey->sk_argument=array->elem_values[array->cur_elem];
670660
}
671-
so->needPrimScan= true;
672-
so->scanBehind= false;
673661
*pageno=InvalidBlockNumber;
674662
exit_loop= true;
675663
}
664+
else
665+
{
666+
/*
667+
* Don't attempt to seize the scan when it requires another
668+
* primitive index scan, since caller's backend cannot start
669+
* it right now
670+
*/
671+
status= false;
672+
}
673+
674+
/*
675+
* Either way, update backend local state to indicate that a
676+
* pending primitive scan is required
677+
*/
678+
so->needPrimScan= true;
679+
so->scanBehind= false;
676680
}
677681
elseif (btscan->btps_pageStatus!=BTPARALLEL_ADVANCING)
678682
{
@@ -731,6 +735,7 @@ _bt_parallel_release(IndexScanDesc scan, BlockNumber scan_page)
731735
void
732736
_bt_parallel_done(IndexScanDescscan)
733737
{
738+
BTScanOpaqueso= (BTScanOpaque)scan->opaque;
734739
ParallelIndexScanDescparallel_scan=scan->parallel_scan;
735740
BTParallelScanDescbtscan;
736741
boolstatus_changed= false;
@@ -739,6 +744,13 @@ _bt_parallel_done(IndexScanDesc scan)
739744
if (parallel_scan==NULL)
740745
return;
741746

747+
/*
748+
* Should not mark parallel scan done when there's still a pending
749+
* primitive index scan
750+
*/
751+
if (so->needPrimScan)
752+
return;
753+
742754
btscan= (BTParallelScanDesc)OffsetToPointer((void*)parallel_scan,
743755
parallel_scan->ps_offset);
744756

@@ -747,6 +759,7 @@ _bt_parallel_done(IndexScanDesc scan)
747759
* already
748760
*/
749761
SpinLockAcquire(&btscan->btps_mutex);
762+
Assert(btscan->btps_pageStatus!=BTPARALLEL_NEED_PRIMSCAN);
750763
if (btscan->btps_pageStatus!=BTPARALLEL_DONE)
751764
{
752765
btscan->btps_pageStatus=BTPARALLEL_DONE;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp