Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1bd4bc8

Browse files
Optimize nbtree backwards scans.
Make nbtree backwards scans optimistically access the next page to beread to the left by following a prevPage block number that's now stashedin currPos when the leaf page is first read. This approach matches theone taken during forward scans, which follow a symmetric nextPage blocknumber from currPos. We stash both a prevPage and a nextPage, since thescan direction might change (when fetching from a scrollable cursor).Backwards scans will no longer need to lock the same page twice, exceptin rare cases where the scan detects a concurrent page split (or pagedeletion). Testing has shown this optimization to be particularlyeffective during parallel index-only backwards scans: ~12% reductions inquery execution time are quite possible.We're much better off being optimistic; concurrent left sibling pagesplits are rare in general. It's possible that we'll need to lock morepages than the pessimistic approach would have, but only when there are_multiple_ concurrent splits of the left sibling page we now start at.If there's just a single concurrent left sibling page split, the newapproach to scanning backwards will at least break even relative to theold one (we'll acquire the same number of leaf page locks as before).The optimization from this commit has long been contemplated by commentsadded by commit2ed5b87, which changed the rules for locking/pinningduring nbtree index scans. The approach that that commit introduced toleaf level link traversal when scanning forwards is now more or lessapplied all the time, regardless of the direction we're scanning in.Following uniform conventions around sibling link traversal is simpler.The only real remaining difference between our forward and backwardshandling is that our backwards handling must still detect and recoverfrom any concurrent left sibling splits (and concurrent page deletions),as documented in the nbtree README. That is structured as a single,isolated extra step that takes place in _bt_readnextpage.Also use this opportunity to further simplify the functions that dealwith reading pages and traversing sibling links on the leaf level, andto document their preconditions and postconditions (with respect tothings like buffer locks, buffer pins, and seizing the parallel scan).This enhancement completely supersedes the one recently added by commit3f44959.Author: Matthias van de Meent <boekewurm+postgres@gmail.com>Author: Peter Geoghegan <pg@bowt.ie>Discussion:https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.comDiscussion:https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com
1 parent9e2d813 commit1bd4bc8

File tree

4 files changed

+388
-466
lines changed

4 files changed

+388
-466
lines changed

‎src/backend/access/nbtree/nbtree.c

Lines changed: 49 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,9 @@ typedef enum
6666
*/
6767
typedefstructBTParallelScanDescData
6868
{
69-
BlockNumberbtps_scanPage;/* latest or next page to be scanned */
69+
BlockNumberbtps_nextScanPage;/* next page to be scanned */
70+
BlockNumberbtps_lastCurrPage;/* page whose sibling link was copied into
71+
* btps_nextScanPage */
7072
BTPS_Statebtps_pageStatus;/* indicates whether next page is
7173
* available for scan. see above for
7274
* possible states of parallel scan. */
@@ -550,7 +552,8 @@ btinitparallelscan(void *target)
550552
BTParallelScanDescbt_target= (BTParallelScanDesc)target;
551553

552554
SpinLockInit(&bt_target->btps_mutex);
553-
bt_target->btps_scanPage=InvalidBlockNumber;
555+
bt_target->btps_nextScanPage=InvalidBlockNumber;
556+
bt_target->btps_lastCurrPage=InvalidBlockNumber;
554557
bt_target->btps_pageStatus=BTPARALLEL_NOT_INITIALIZED;
555558
ConditionVariableInit(&bt_target->btps_cv);
556559
}
@@ -575,7 +578,8 @@ btparallelrescan(IndexScanDesc scan)
575578
* consistency.
576579
*/
577580
SpinLockAcquire(&btscan->btps_mutex);
578-
btscan->btps_scanPage=InvalidBlockNumber;
581+
btscan->btps_nextScanPage=InvalidBlockNumber;
582+
btscan->btps_lastCurrPage=InvalidBlockNumber;
579583
btscan->btps_pageStatus=BTPARALLEL_NOT_INITIALIZED;
580584
SpinLockRelease(&btscan->btps_mutex);
581585
}
@@ -591,26 +595,38 @@ btparallelrescan(IndexScanDesc scan)
591595
* start just yet (only backends that call from _bt_first are capable of
592596
* starting primitive index scans, which they indicate by passing first=true).
593597
*
594-
* If the return value is true, *pageno returns the nextor current page
595-
*of thescan (depending onthescan direction). An invalid block number
596-
* means the scan hasn't yet started, or that caller needs to start the next
597-
* primitive index scan (if it's the latter case we'll set so.needPrimScan).
598-
* The first time a participating process reaches the last page, it will return
599-
*true and set *pageno to P_NONE; after that, further attempts to seize the
600-
* scan will return false.
598+
* If the return value is true, *next_scan_page returns the nextpage of the
599+
* scan, and *last_curr_page returnsthepage that *next_scan_page came from.
600+
*An invalid *next_scan_pagemeans the scan hasn't yet started, or that
601+
*caller needs to start the nextprimitive index scan (if it's the latter
602+
*case we'll set so.needPrimScan).The first time a participating process
603+
*reaches the last page, it will return true and set *next_scan_page to
604+
*P_NONE; after that, further attempts to seize thescan will return false.
601605
*
602-
* Callers should ignore the value of pageno if the return value is false.
606+
* Callers should ignore the value of *next_scan_page and *last_curr_page if
607+
* the return value is false.
603608
*/
604609
bool
605-
_bt_parallel_seize(IndexScanDescscan,BlockNumber*pageno,boolfirst)
610+
_bt_parallel_seize(IndexScanDescscan,BlockNumber*next_scan_page,
611+
BlockNumber*last_curr_page,boolfirst)
606612
{
607613
BTScanOpaqueso= (BTScanOpaque)scan->opaque;
608614
boolexit_loop= false;
609615
boolstatus= true;
610616
ParallelIndexScanDescparallel_scan=scan->parallel_scan;
611617
BTParallelScanDescbtscan;
612618

613-
*pageno=P_NONE;
619+
*next_scan_page=P_NONE;
620+
*last_curr_page=InvalidBlockNumber;
621+
622+
/*
623+
* Reset so->currPos, and initialize moreLeft/moreRight such that the next
624+
* call to _bt_readnextpage treats this backend similarly to a serial
625+
* backend that steps from *last_curr_page to *next_scan_page (unless this
626+
* backend's so->currPos is initialized by _bt_readfirstpage before then).
627+
*/
628+
BTScanPosInvalidate(so->currPos);
629+
so->currPos.moreLeft=so->currPos.moreRight= true;
614630

615631
if (first)
616632
{
@@ -660,7 +676,7 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
660676
array->cur_elem=btscan->btps_arrElems[i];
661677
skey->sk_argument=array->elem_values[array->cur_elem];
662678
}
663-
*pageno=InvalidBlockNumber;
679+
*next_scan_page=InvalidBlockNumber;
664680
exit_loop= true;
665681
}
666682
else
@@ -688,7 +704,8 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
688704
* of advancing it to a new page!
689705
*/
690706
btscan->btps_pageStatus=BTPARALLEL_ADVANCING;
691-
*pageno=btscan->btps_scanPage;
707+
*next_scan_page=btscan->btps_nextScanPage;
708+
*last_curr_page=btscan->btps_lastCurrPage;
692709
exit_loop= true;
693710
}
694711
SpinLockRelease(&btscan->btps_mutex);
@@ -703,17 +720,21 @@ _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, bool first)
703720

704721
/*
705722
* _bt_parallel_release() -- Complete the process of advancing the scan to a
706-
*new page. We now have the new valuebtps_scanPage; some other backend
723+
*new page. We now have the new valuebtps_nextScanPage; another backend
707724
*can now begin advancing the scan.
708725
*
709-
* Callers whose scan uses array keys must save theirscan_page argument so
726+
* Callers whose scan uses array keys must save theircurr_page argument so
710727
* that it can be passed to _bt_parallel_primscan_schedule, should caller
711-
* determine that another primitive index scan is required. If that happens,
712-
* scan_page won't be scanned by any backend (unless the next primitive index
713-
* scan lands on scan_page).
728+
* determine that another primitive index scan is required.
729+
*
730+
* Note: unlike the serial case, parallel scans don't need to remember both
731+
* sibling links. next_scan_page is whichever link is next given the scan's
732+
* direction. That's all we'll ever need, since the direction of a parallel
733+
* scan can never change.
714734
*/
715735
void
716-
_bt_parallel_release(IndexScanDescscan,BlockNumberscan_page)
736+
_bt_parallel_release(IndexScanDescscan,BlockNumbernext_scan_page,
737+
BlockNumbercurr_page)
717738
{
718739
ParallelIndexScanDescparallel_scan=scan->parallel_scan;
719740
BTParallelScanDescbtscan;
@@ -722,7 +743,8 @@ _bt_parallel_release(IndexScanDesc scan, BlockNumber scan_page)
722743
parallel_scan->ps_offset);
723744

724745
SpinLockAcquire(&btscan->btps_mutex);
725-
btscan->btps_scanPage=scan_page;
746+
btscan->btps_nextScanPage=next_scan_page;
747+
btscan->btps_lastCurrPage=curr_page;
726748
btscan->btps_pageStatus=BTPARALLEL_IDLE;
727749
SpinLockRelease(&btscan->btps_mutex);
728750
ConditionVariableSignal(&btscan->btps_cv);
@@ -778,13 +800,13 @@ _bt_parallel_done(IndexScanDesc scan)
778800
/*
779801
* _bt_parallel_primscan_schedule() -- Schedule another primitive index scan.
780802
*
781-
* Caller passes theblock number most recently passed to _bt_parallel_release
803+
* Caller passes thecurr_page most recently passed to _bt_parallel_release
782804
* by its backend. Caller successfully schedules the next primitive index scan
783805
* if the shared parallel state hasn't been seized since caller's backend last
784806
* advanced the scan.
785807
*/
786808
void
787-
_bt_parallel_primscan_schedule(IndexScanDescscan,BlockNumberprev_scan_page)
809+
_bt_parallel_primscan_schedule(IndexScanDescscan,BlockNumbercurr_page)
788810
{
789811
BTScanOpaqueso= (BTScanOpaque)scan->opaque;
790812
ParallelIndexScanDescparallel_scan=scan->parallel_scan;
@@ -796,10 +818,11 @@ _bt_parallel_primscan_schedule(IndexScanDesc scan, BlockNumber prev_scan_page)
796818
parallel_scan->ps_offset);
797819

798820
SpinLockAcquire(&btscan->btps_mutex);
799-
if (btscan->btps_scanPage==prev_scan_page&&
821+
if (btscan->btps_lastCurrPage==curr_page&&
800822
btscan->btps_pageStatus==BTPARALLEL_IDLE)
801823
{
802-
btscan->btps_scanPage=InvalidBlockNumber;
824+
btscan->btps_nextScanPage=InvalidBlockNumber;
825+
btscan->btps_lastCurrPage=InvalidBlockNumber;
803826
btscan->btps_pageStatus=BTPARALLEL_NEED_PRIMSCAN;
804827

805828
/* Serialize scan's current array keys */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp