Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd2086b0

Browse files
committed
Reduce path length for locking leaf B-tree pages during insertion
In our B-tree implementation appropriate leaf page for new tupleinsertion is acquired using _bt_search() function. This function alwaysreturns leaf page locked in shared mode. In order to obtain exclusivelock, caller have to relock the page.This commit makes _bt_search() function lock leaf page immediately inexclusive mode when needed. That removes unnecessary relock and, inturn reduces lock contention for B-tree leaf pages. Our experimentson multi-core systems showed acceleration up to 4.5 times in cornercase.Discussion:https://postgr.es/m/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.comAuthor: Alexander KorotkovReviewed-by: Yoshikazu Imai, Simon Riggs, Peter Geoghegan
1 parent8a9b72c commitd2086b0

File tree

2 files changed

+41
-22
lines changed

2 files changed

+41
-22
lines changed

‎src/backend/access/nbtree/nbtinsert.c

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -216,23 +216,12 @@ _bt_doinsert(Relation rel, IndexTuple itup,
216216

217217
if (!fastpath)
218218
{
219-
/* find the first page containing this key */
220-
stack=_bt_search(rel,indnkeyatts,itup_scankey, false,&buf,BT_WRITE,
221-
NULL);
222-
223-
/* trade in our read lock for a write lock */
224-
LockBuffer(buf,BUFFER_LOCK_UNLOCK);
225-
LockBuffer(buf,BT_WRITE);
226-
227219
/*
228-
* If the page was split between the time that we surrendered our read
229-
* lock and acquired our write lock, then this page may no longer be
230-
* the right place for the key we want to insert. In this case, we
231-
* need to move right in the tree. See Lehman and Yao for an
232-
* excruciatingly precise description.
220+
* Find the first page containing this key. Buffer returned by
221+
* _bt_search() is locked in exclusive mode.
233222
*/
234-
buf=_bt_moveright(rel,buf,indnkeyatts,itup_scankey, false,
235-
true,stack,BT_WRITE,NULL);
223+
stack=_bt_search(rel,indnkeyatts,itup_scankey, false,&buf,BT_WRITE,
224+
NULL);
236225
}
237226

238227
/*

‎src/backend/access/nbtree/nbtsearch.c

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -87,17 +87,18 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
8787
* place during the descent through the tree. This is not needed when
8888
* positioning for an insert or delete, so NULL is used for those cases.
8989
*
90-
*NOTE that thereturned buffer isread-lockedregardless of theaccess
91-
*parameter. However,access = BT_WRITE will allow an empty root page
92-
*to be created and returned.When access = BT_READ, an empty index
93-
*will result in *bufP being set toInvalidBuffer. Also, in BT_WRITE mode,
94-
*any incomplete splits encounteredduring the search will be finished.
90+
*Thereturned buffer is lockedaccording toaccess parameter. Additionally,
91+
* access = BT_WRITE will allow an empty root page to be created and returned.
92+
* When access = BT_READ, an empty index will result in *bufP being set to
93+
* InvalidBuffer. Also, in BT_WRITE mode, any incomplete splits encountered
94+
* during the search will be finished.
9595
*/
9696
BTStack
9797
_bt_search(Relationrel,intkeysz,ScanKeyscankey,boolnextkey,
9898
Buffer*bufP,intaccess,Snapshotsnapshot)
9999
{
100100
BTStackstack_in=NULL;
101+
intpage_access=BT_READ;
101102

102103
/* Get the root page to start with */
103104
*bufP=_bt_getroot(rel,access);
@@ -132,7 +133,7 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
132133
*/
133134
*bufP=_bt_moveright(rel,*bufP,keysz,scankey,nextkey,
134135
(access==BT_WRITE),stack_in,
135-
BT_READ,snapshot);
136+
page_access,snapshot);
136137

137138
/* if this is a leaf page, we're done */
138139
page=BufferGetPage(*bufP);
@@ -166,13 +167,42 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
166167
new_stack->bts_btentry=blkno;
167168
new_stack->bts_parent=stack_in;
168169

170+
/*
171+
* Page level 1 is lowest non-leaf page level prior to leaves. So,
172+
* if we're on the level 1 and asked to lock leaf page in write mode,
173+
* then lock next page in write mode, because it must be a leaf.
174+
*/
175+
if (opaque->btpo.level==1&&access==BT_WRITE)
176+
page_access=BT_WRITE;
177+
169178
/* drop the read lock on the parent page, acquire one on the child */
170-
*bufP=_bt_relandgetbuf(rel,*bufP,blkno,BT_READ);
179+
*bufP=_bt_relandgetbuf(rel,*bufP,blkno,page_access);
171180

172181
/* okay, all set to move down a level */
173182
stack_in=new_stack;
174183
}
175184

185+
/*
186+
* If we're asked to lock leaf in write mode, but didn't manage to, then
187+
* relock. That may happend when root page appears to be leaf.
188+
*/
189+
if (access==BT_WRITE&&page_access==BT_READ)
190+
{
191+
/* trade in our read lock for a write lock */
192+
LockBuffer(*bufP,BUFFER_LOCK_UNLOCK);
193+
LockBuffer(*bufP,BT_WRITE);
194+
195+
/*
196+
* If the page was split between the time that we surrendered our read
197+
* lock and acquired our write lock, then this page may no longer be
198+
* the right place for the key we want to insert. In this case, we
199+
* need to move right in the tree. See Lehman and Yao for an
200+
* excruciatingly precise description.
201+
*/
202+
*bufP=_bt_moveright(rel,*bufP,keysz,scankey,nextkey,
203+
true,stack_in,BT_WRITE,snapshot);
204+
}
205+
176206
returnstack_in;
177207
}
178208

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp