Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5d7962c

Browse files
committed
Change locking regimen around buffer replacement.
Previously, we used an lwlock that was held from the time we beganseeking a candidate buffer until the time when we found and pinnedone, which is disastrous for concurrency. Instead, use a spinlockwhich is held just long enough to pop the freelist or advance theclock sweep hand, and then released. If we need to advance the clocksweep further, we reacquire the spinlock once per buffer.This represents a significant increase in atomic operations aroundbuffer eviction, but it still wins on many workloads. On others, itmay result in no gain, or even cause a regression, unless the numberof buffer mapping locks is also increased. However, that seems likematerial for a separate commit. We may also need to consider othermethods of mitigating contention on this spinlock, such as splittingit into multiple locks or jumping the clock sweep hand more than onebuffer at a time, but those, too, seem like separate improvements.Patch by me, inspired by a much larger patch from Amit Kapila.Reviewed by Andres Freund.
1 parent1dcfb8d commit5d7962c

File tree

5 files changed

+61
-60
lines changed

5 files changed

+61
-60
lines changed

‎src/backend/storage/buffer/README

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -125,14 +125,12 @@ bits of the tag's hash value. The rules stated above apply to each partition
125125
independently. If it is necessary to lock more than one partition at a time,
126126
they must be locked in partition-number order to avoid risk of deadlock.
127127

128-
* A separate system-wideLWLock, the BufFreelistLock, provides mutual
128+
* A separate system-widespinlock, buffer_strategy_lock, provides mutual
129129
exclusion for operations that access the buffer free list or select
130-
buffers for replacement. This is always taken in exclusive mode since
131-
there are no read-only operations on those data structures. The buffer
132-
management policy is designed so that BufFreelistLock need not be taken
133-
except in paths that will require I/O, and thus will be slow anyway.
134-
(Details appear below.) It is never necessary to hold the BufMappingLock
135-
and the BufFreelistLock at the same time.
130+
buffers for replacement. A spinlock is used here rather than a lightweight
131+
lock for efficiency; no other locks of any sort should be acquired while
132+
buffer_strategy_lock is held. This is essential to allow buffer replacement
133+
to happen in multiple backends with reasonable concurrency.
136134

137135
* Each buffer header contains a spinlock that must be taken when examining
138136
or changing fields of that buffer header. This allows operations such as
@@ -165,7 +163,7 @@ consider their pages unlikely to be needed soon; however, the current
165163
algorithm never does that. The list is singly-linked using fields in the
166164
buffer headers; we maintain head and tail pointers in global variables.
167165
(Note: although the list links are in the buffer headers, they are
168-
considered to be protected by theBufFreelistLock, not the buffer-header
166+
considered to be protected by thebuffer_strategy_lock, not the buffer-header
169167
spinlocks.) To choose a victim buffer to recycle when there are no free
170168
buffers available, we use a simple clock-sweep algorithm, which avoids the
171169
need to take system-wide locks during common operations. It works like
@@ -178,25 +176,26 @@ buffer reference count, so it's nearly free.)
178176

179177
The "clock hand" is a buffer index, nextVictimBuffer, that moves circularly
180178
through all the available buffers. nextVictimBuffer is protected by the
181-
BufFreelistLock.
179+
buffer_strategy_lock.
182180

183181
The algorithm for a process that needs to obtain a victim buffer is:
184182

185-
1. ObtainBufFreelistLock.
183+
1. Obtainbuffer_strategy_lock.
186184

187-
2. If buffer free list is nonempty, remove its head buffer.If the buffer
188-
is pinned or has a nonzero usage count, it cannot be used; ignore it and
189-
return to the start ofstep2. Otherwise, pin the buffer, release
190-
BufFreelistLock,and returnthe buffer.
185+
2. If buffer free list is nonempty, remove its head buffer.Release
186+
buffer_strategy_lock. If the bufferis pinned or has a nonzero usage count,
187+
it cannot be used; ignore it go back tostep1. Otherwise, pin the buffer,
188+
and returnit.
191189

192-
3. Otherwise, select the buffer pointed to by nextVictimBuffer, and
193-
circularly advance nextVictimBuffer for next time.
190+
3. Otherwise, the buffer free list is empty. Select the buffer pointed to by
191+
nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
192+
Release buffer_strategy_lock.
194193

195194
4. If the selected buffer is pinned or has a nonzero usage count, it cannot
196-
be used. Decrement its usage count (if nonzero) and return to step 3 to
197-
examine the next buffer.
195+
be used. Decrement its usage count (if nonzero), reacquire
196+
buffer_strategy_lock, and return to step 3 toexamine the next buffer.
198197

199-
5. Pin the selected buffer,release BufFreelistLock,and return the buffer.
198+
5. Pin the selected buffer, and return.
200199

201200
(Note that if the selected buffer is dirty, we will have to write it out
202201
before we can recycle it; if someone else pins the buffer meanwhile we will
@@ -259,7 +258,7 @@ dirty and not pinned nor marked with a positive usage count. It pins,
259258
writes, and releases any such buffer.
260259

261260
If we can assume that reading nextVictimBuffer is an atomic action, then
262-
the writer doesn't even need to takethe BufFreelistLock in order to look
261+
the writer doesn't even need to takebuffer_strategy_lock in order to look
263262
for buffers to write; it needs only to spinlock each buffer header for long
264263
enough to check the dirtybit. Even without that assumption, the writer
265264
only needs to take the lock long enough to read the variable value, not

‎src/backend/storage/buffer/bufmgr.c

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -889,15 +889,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
889889
/* Loop here in case we have to try another victim buffer */
890890
for (;;)
891891
{
892-
boollock_held;
893-
894892
/*
895893
* Select a victim buffer. The buffer is returned with its header
896-
* spinlock still held! Also (in most cases) the BufFreelistLock is
897-
* still held, since it would be bad to hold the spinlock while
898-
* possibly waking up other processes.
894+
* spinlock still held!
899895
*/
900-
buf=StrategyGetBuffer(strategy,&lock_held);
896+
buf=StrategyGetBuffer(strategy);
901897

902898
Assert(buf->refcount==0);
903899

@@ -907,10 +903,6 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
907903
/* Pin the buffer and then release the buffer spinlock */
908904
PinBuffer_Locked(buf);
909905

910-
/* Now it's safe to release the freelist lock */
911-
if (lock_held)
912-
LWLockRelease(BufFreelistLock);
913-
914906
/*
915907
* If the buffer was dirty, try to write it out. There is a race
916908
* condition here, in that someone might dirty it after we released it

‎src/backend/storage/buffer/freelist.c

Lines changed: 37 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@
2424
*/
2525
typedefstruct
2626
{
27+
/* Spinlock: protects the values below */
28+
slock_tbuffer_strategy_lock;
29+
2730
/* Clock sweep hand: index of next buffer to consider grabbing */
2831
intnextVictimBuffer;
2932

@@ -101,37 +104,28 @@ static void AddBufferToRing(BufferAccessStrategy strategy,
101104
*strategy is a BufferAccessStrategy object, or NULL for default strategy.
102105
*
103106
*To ensure that no one else can pin the buffer before we do, we must
104-
*return the buffer with the buffer header spinlock still held. If
105-
**lock_held is set on exit, we have returned with the BufFreelistLock
106-
*still held, as well; the caller must release that lock once the spinlock
107-
*is dropped. We do it that way because releasing the BufFreelistLock
108-
*might awaken other processes, and it would be bad to do the associated
109-
*kernel calls while holding the buffer header spinlock.
107+
*return the buffer with the buffer header spinlock still held.
110108
*/
111109
volatileBufferDesc*
112-
StrategyGetBuffer(BufferAccessStrategystrategy,bool*lock_held)
110+
StrategyGetBuffer(BufferAccessStrategystrategy)
113111
{
114112
volatileBufferDesc*buf;
115113
Latch*bgwriterLatch;
116114
inttrycounter;
117115

118116
/*
119117
* If given a strategy object, see whether it can select a buffer. We
120-
* assume strategy objects don't needthe BufFreelistLock.
118+
* assume strategy objects don't needbuffer_strategy_lock.
121119
*/
122120
if (strategy!=NULL)
123121
{
124122
buf=GetBufferFromRing(strategy);
125123
if (buf!=NULL)
126-
{
127-
*lock_held= false;
128124
returnbuf;
129-
}
130125
}
131126

132127
/* Nope, so lock the freelist */
133-
*lock_held= true;
134-
LWLockAcquire(BufFreelistLock,LW_EXCLUSIVE);
128+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
135129

136130
/*
137131
* We count buffer allocation requests so that the bgwriter can estimate
@@ -142,22 +136,22 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
142136

143137
/*
144138
* If bgwriterLatch is set, we need to waken the bgwriter, but we should
145-
* not do so while holdingBufFreelistLock; so release and re-grab. This
146-
* is annoyingly tedious, but it happens at most once per bgwriter cycle,
147-
* so the performance hit is minimal.
139+
* not do so while holdingbuffer_strategy_lock; so release and re-grab.
140+
*Thisis annoyingly tedious, but it happens at most once per bgwriter
141+
*cycle,so the performance hit is minimal.
148142
*/
149143
bgwriterLatch=StrategyControl->bgwriterLatch;
150144
if (bgwriterLatch)
151145
{
152146
StrategyControl->bgwriterLatch=NULL;
153-
LWLockRelease(BufFreelistLock);
147+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
154148
SetLatch(bgwriterLatch);
155-
LWLockAcquire(BufFreelistLock,LW_EXCLUSIVE);
149+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
156150
}
157151

158152
/*
159153
* Try to get a buffer from the freelist. Note that the freeNext fields
160-
* are considered to be protected by theBufFreelistLock not the
154+
* are considered to be protected by thebuffer_strategy_lock not the
161155
* individual buffer spinlocks, so it's OK to manipulate them without
162156
* holding the spinlock.
163157
*/
@@ -170,6 +164,12 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
170164
StrategyControl->firstFreeBuffer=buf->freeNext;
171165
buf->freeNext=FREENEXT_NOT_IN_LIST;
172166

167+
/*
168+
* Release the lock so someone else can access the freelist (or run
169+
* the clocksweep) while we check out this buffer.
170+
*/
171+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
172+
173173
/*
174174
* If the buffer is pinned or has a nonzero usage_count, we cannot use
175175
* it; discard it and retry. (This can only happen if VACUUM put a
@@ -185,6 +185,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
185185
returnbuf;
186186
}
187187
UnlockBufHdr(buf);
188+
189+
/* Reacquire the lock and go around for another pass. */
190+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
188191
}
189192

190193
/* Nothing on the freelist, so run the "clock sweep" algorithm */
@@ -199,6 +202,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
199202
StrategyControl->completePasses++;
200203
}
201204

205+
/* Release the lock before manipulating the candidate buffer. */
206+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
207+
202208
/*
203209
* If the buffer is pinned or has a nonzero usage_count, we cannot use
204210
* it; decrement the usage_count (unless pinned) and keep scanning.
@@ -232,6 +238,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
232238
elog(ERROR,"no unpinned buffers available");
233239
}
234240
UnlockBufHdr(buf);
241+
242+
/* Reacquire the lock and get a new candidate buffer. */
243+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
235244
}
236245
}
237246

@@ -241,7 +250,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
241250
void
242251
StrategyFreeBuffer(volatileBufferDesc*buf)
243252
{
244-
LWLockAcquire(BufFreelistLock,LW_EXCLUSIVE);
253+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
245254

246255
/*
247256
* It is possible that we are told to put something in the freelist that
@@ -255,7 +264,7 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
255264
StrategyControl->firstFreeBuffer=buf->buf_id;
256265
}
257266

258-
LWLockRelease(BufFreelistLock);
267+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
259268
}
260269

261270
/*
@@ -274,7 +283,7 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
274283
{
275284
intresult;
276285

277-
LWLockAcquire(BufFreelistLock,LW_EXCLUSIVE);
286+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
278287
result=StrategyControl->nextVictimBuffer;
279288
if (complete_passes)
280289
*complete_passes=StrategyControl->completePasses;
@@ -283,7 +292,7 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
283292
*num_buf_alloc=StrategyControl->numBufferAllocs;
284293
StrategyControl->numBufferAllocs=0;
285294
}
286-
LWLockRelease(BufFreelistLock);
295+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
287296
returnresult;
288297
}
289298

@@ -299,13 +308,13 @@ void
299308
StrategyNotifyBgWriter(Latch*bgwriterLatch)
300309
{
301310
/*
302-
* We acquirethe BufFreelistLock just to ensure that the store appears
311+
* We acquirebuffer_strategy_lock just to ensure that the store appears
303312
* atomic to StrategyGetBuffer. The bgwriter should call this rather
304313
* infrequently, so there's no performance penalty from being safe.
305314
*/
306-
LWLockAcquire(BufFreelistLock,LW_EXCLUSIVE);
315+
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
307316
StrategyControl->bgwriterLatch=bgwriterLatch;
308-
LWLockRelease(BufFreelistLock);
317+
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
309318
}
310319

311320

@@ -370,6 +379,8 @@ StrategyInitialize(bool init)
370379
*/
371380
Assert(init);
372381

382+
SpinLockInit(&StrategyControl->buffer_strategy_lock);
383+
373384
/*
374385
* Grab the whole linked list of free buffers for our strategy. We
375386
* assume it was previously set up by InitBufferPool().

‎src/include/storage/buf_internals.h

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ typedef struct buftag
115115
* Note: buf_hdr_lock must be held to examine or change the tag, flags,
116116
* usage_count, refcount, or wait_backend_pid fields. buf_id field never
117117
* changes after initialization, so does not need locking. freeNext is
118-
* protected by theBufFreelistLock not buf_hdr_lock. The LWLocks can take
118+
* protected by thebuffer_strategy_lock not buf_hdr_lock. The LWLocks can take
119119
* care of themselves. The buf_hdr_lock is *not* used to control access to
120120
* the data in the buffer!
121121
*
@@ -185,8 +185,7 @@ extern BufferDesc *LocalBufferDescriptors;
185185
*/
186186

187187
/* freelist.c */
188-
externvolatileBufferDesc*StrategyGetBuffer(BufferAccessStrategystrategy,
189-
bool*lock_held);
188+
externvolatileBufferDesc*StrategyGetBuffer(BufferAccessStrategystrategy);
190189
externvoidStrategyFreeBuffer(volatileBufferDesc*buf);
191190
externboolStrategyRejectBuffer(BufferAccessStrategystrategy,
192191
volatileBufferDesc*buf);

‎src/include/storage/lwlock.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ extern PGDLLIMPORT LWLockPadded *MainLWLockArray;
8989
* if you remove a lock, consider leaving a gap in the numbering sequence for
9090
* the benefit of DTrace and other external debugging scripts.
9191
*/
92-
#defineBufFreelistLock(&MainLWLockArray[0].lock)
92+
/* 0 is available; was formerly BufFreelistLock */
9393
#defineShmemIndexLock(&MainLWLockArray[1].lock)
9494
#defineOidGenLock(&MainLWLockArray[2].lock)
9595
#defineXidGenLock(&MainLWLockArray[3].lock)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp