Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9cd00c4

Browse files
committed
Checkpoint sorting and balancing.
Up to now checkpoints were written in the order they're in theBufferDescriptors. That's nearly random in a lot of cases, whichperforms badly on rotating media, but even on SSDs it causes slowdowns.To avoid that, sort checkpoints before writing them out. We currentlysort by tablespace, relfilenode, fork and block number.One of the major reasons that previously wasn't done, was fear ofimbalance between tablespaces. To address that balance writes betweentablespaces.The other prime concern was that the relatively large allocation to sortthe buffers in might fail, preventing checkpoints from happening. Thuspre-allocate the required memory in shared memory, at server startup.This particularly makes it more efficient to have checkpoint flushingenabled, because that'll often result in a lot of writes that can becoalesced into one flush.Discussion: alpine.DEB.2.10.1506011320000.28433@stoAuthor: Fabien Coelho and Andres Freund
1 parent428b1d6 commit9cd00c4

File tree

5 files changed

+277
-44
lines changed

5 files changed

+277
-44
lines changed

‎src/backend/storage/buffer/README

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -267,11 +267,6 @@ only needs to take the lock long enough to read the variable value, not
267267
while scanning the buffers. (This is a very substantial improvement in
268268
the contention cost of the writer compared to PG 8.0.)
269269

270-
During a checkpoint, the writer's strategy must be to write every dirty
271-
buffer (pinned or not!). We may as well make it start this scan from
272-
nextVictimBuffer, however, so that the first-to-be-written pages are the
273-
ones that backends might otherwise have to write for themselves soon.
274-
275270
The background writer takes shared content lock on a buffer while writing it
276271
out (and anyone else who flushes buffer contents to disk must do so too).
277272
This ensures that the page image transferred to disk is reasonably consistent.

‎src/backend/storage/buffer/buf_init.c

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ LWLockMinimallyPadded *BufferIOLWLockArray = NULL;
2424
LWLockTrancheBufferIOLWLockTranche;
2525
LWLockTrancheBufferContentLWLockTranche;
2626
WritebackContextBackendWritebackContext;
27+
CkptSortItem*CkptBufferIds;
2728

2829

2930
/*
@@ -70,7 +71,8 @@ InitBufferPool(void)
7071
{
7172
boolfoundBufs,
7273
foundDescs,
73-
foundIOLocks;
74+
foundIOLocks,
75+
foundBufCkpt;
7476

7577
/* Align descriptors to a cacheline boundary. */
7678
BufferDescriptors= (BufferDescPadded*)
@@ -104,10 +106,21 @@ InitBufferPool(void)
104106
LWLockRegisterTranche(LWTRANCHE_BUFFER_CONTENT,
105107
&BufferContentLWLockTranche);
106108

107-
if (foundDescs||foundBufs||foundIOLocks)
109+
/*
110+
* The array used to sort to-be-checkpointed buffer ids is located in
111+
* shared memory, to avoid having to allocate significant amounts of
112+
* memory at runtime. As that'd be in the middle of a checkpoint, or when
113+
* the checkpointer is restarted, memory allocation failures would be
114+
* painful.
115+
*/
116+
CkptBufferIds= (CkptSortItem*)
117+
ShmemInitStruct("Checkpoint BufferIds",
118+
NBuffers*sizeof(CkptSortItem),&foundBufCkpt);
119+
120+
if (foundDescs||foundBufs||foundIOLocks||foundBufCkpt)
108121
{
109122
/* should find all of these, or none of them */
110-
Assert(foundDescs&&foundBufs&&foundIOLocks);
123+
Assert(foundDescs&&foundBufs&&foundIOLocks&&foundBufCkpt);
111124
/* note: this path is only taken in EXEC_BACKEND case */
112125
}
113126
else
@@ -190,5 +203,8 @@ BufferShmemSize(void)
190203
/* to allow aligning the above */
191204
size=add_size(size,PG_CACHE_LINE_SIZE);
192205

206+
/* size of checkpoint sort array in bufmgr.c */
207+
size=add_size(size,mul_size(NBuffers,sizeof(CkptSortItem)));
208+
193209
returnsize;
194210
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp