Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7587286

Browse files
committed
Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption inCompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there areno uninitialized pad bytes in the request queue structs. This would onlycause trouble if (a) there were such pad bytes, which could happen in 8.4and up if the compiler makes enum ForkNumber narrower than 32 bits, butotherwise would require not-currently-planned changes in the widths ofother typedefs; and (b) the kernel has not uniformly initialized thecontents of shared memory to zeroes. Still, it seems a tad risky, and wecan easily remove any risk by pre-zeroing the request array for ourselves.In addition to that, we need to establish a coding rule that structRelFileNode can't contain any padding bytes, since such structs are copiedinto the request array verbatim. (There are other places that are assumingthis anyway, it turns out.)In 9.1 and up, the risk was a bit larger because we were also effectivelyassuming that struct RelFileNodeBackend contained no pad bytes, and withfields of different types in there, that would be much easier to break.However, there is no good reason to ever transmit fsync or delete requestsfor temp files to the bgwriter/checkpointer, so we can revert the requeststructs to plain RelFileNode, getting rid of the padding risk and savingsome marginal number of bytes and cycles in fsync queue manipulation whilewe are at it. The savings might be more than marginal during deletion ofa temp relation, because the old code transmitted an entirely useless butnonetheless expensive-to-process ForgetRelationFsync request to thebackground process, and also had the background process perform the filedeletion even though that can safely be done immediately.In addition, make some cleanup of nearby comments and small improvements tothe code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
1 parentf81d50d commit7587286

File tree

3 files changed

+54
-25
lines changed

3 files changed

+54
-25
lines changed

‎src/backend/postmaster/bgwriter.c

Lines changed: 41 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -885,17 +885,22 @@ BgWriterShmemSize(void)
885885
void
886886
BgWriterShmemInit(void)
887887
{
888+
Sizesize=BgWriterShmemSize();
888889
boolfound;
889890

890891
BgWriterShmem= (BgWriterShmemStruct*)
891892
ShmemInitStruct("Background Writer Data",
892-
BgWriterShmemSize(),
893+
size,
893894
&found);
894895

895896
if (!found)
896897
{
897-
/* First time through, so initialize */
898-
MemSet(BgWriterShmem,0,sizeof(BgWriterShmemStruct));
898+
/*
899+
* First time through, so initialize. Note that we zero the whole
900+
* requests array; this is so that CompactBgwriterRequestQueue
901+
* can assume that any pad bytes in the request structs are zeroes.
902+
*/
903+
MemSet(BgWriterShmem,0,size);
899904
SpinLockInit(&BgWriterShmem->ckpt_lck);
900905
BgWriterShmem->max_requests=NBuffers;
901906
}
@@ -1111,25 +1116,27 @@ ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
11111116

11121117
/*
11131118
* CompactBgwriterRequestQueue
1114-
* Remove duplicates from the request queue to avoid backend fsyncs.
1119+
*Remove duplicates from the request queue to avoid backend fsyncs.
1120+
*Returns "true" if any entries were removed.
11151121
*
11161122
* Although a full fsync request queue is not common, it can lead to severe
11171123
* performance problems when it does happen. So far, this situation has
11181124
* only been observed to occur when the system is under heavy write load,
1119-
* and especially during the "sync" phase of a checkpoint.Without this
1125+
* and especially during the "sync" phase of a checkpoint.Without this
11201126
* logic, each backend begins doing an fsync for every block written, which
11211127
* gets very expensive and can slow down the whole system.
11221128
*
11231129
* Trying to do this every time the queue is full could lose if there
1124-
* aren't any removable entries. But should be vanishingly rare in
1130+
* aren't any removable entries. Butthatshould be vanishingly rare in
11251131
* practice: there's one queue entry per shared buffer.
11261132
*/
11271133
staticbool
1128-
CompactBgwriterRequestQueue()
1134+
CompactBgwriterRequestQueue(void)
11291135
{
1130-
structBgWriterSlotMapping {
1131-
BgWriterRequestrequest;
1132-
intslot;
1136+
structBgWriterSlotMapping
1137+
{
1138+
BgWriterRequestrequest;
1139+
intslot;
11331140
};
11341141

11351142
intn,
@@ -1142,20 +1149,22 @@ CompactBgwriterRequestQueue()
11421149
/* must hold BgWriterCommLock in exclusive mode */
11431150
Assert(LWLockHeldByMe(BgWriterCommLock));
11441151

1152+
/* Initialize skip_slot array */
1153+
skip_slot=palloc0(sizeof(bool)*BgWriterShmem->num_requests);
1154+
11451155
/* Initialize temporary hash table */
11461156
MemSet(&ctl,0,sizeof(ctl));
11471157
ctl.keysize=sizeof(BgWriterRequest);
11481158
ctl.entrysize=sizeof(structBgWriterSlotMapping);
11491159
ctl.hash=tag_hash;
1160+
ctl.hcxt=CurrentMemoryContext;
1161+
11501162
htab=hash_create("CompactBgwriterRequestQueue",
11511163
BgWriterShmem->num_requests,
11521164
&ctl,
1153-
HASH_ELEM |HASH_FUNCTION);
1165+
HASH_ELEM |HASH_FUNCTION |HASH_CONTEXT);
11541166

1155-
/* Initialize skip_slot array */
1156-
skip_slot=palloc0(sizeof(bool)*BgWriterShmem->num_requests);
1157-
1158-
/*
1167+
/*
11591168
* The basic idea here is that a request can be skipped if it's followed
11601169
* by a later, identical request. It might seem more sensible to work
11611170
* backwards from the end of the queue and check whether a request is
@@ -1164,23 +1173,32 @@ CompactBgwriterRequestQueue()
11641173
* intervening FORGET_RELATION_FSYNC or FORGET_DATABASE_FSYNC request, so
11651174
* we do it this way. It would be possible to be even smarter if we made
11661175
* the code below understand the specific semantics of such requests (it
1167-
* could blow away preceding entries that would end up beingcancelled
1176+
* could blow away preceding entries that would end up beingcanceled
11681177
* anyhow), but it's not clear that the extra complexity would buy us
11691178
* anything.
11701179
*/
1171-
for (n=0;n<BgWriterShmem->num_requests;++n)
1180+
for (n=0;n<BgWriterShmem->num_requests;n++)
11721181
{
11731182
BgWriterRequest*request;
11741183
structBgWriterSlotMapping*slotmap;
1175-
boolfound;
1184+
boolfound;
11761185

1186+
/*
1187+
* We use the request struct directly as a hashtable key. This
1188+
* assumes that any padding bytes in the structs are consistently the
1189+
* same, which should be okay because we zeroed them in
1190+
* BgWriterShmemInit. Note also that RelFileNode had better
1191+
* contain no pad bytes.
1192+
*/
11771193
request=&BgWriterShmem->requests[n];
11781194
slotmap=hash_search(htab,request,HASH_ENTER,&found);
11791195
if (found)
11801196
{
1197+
/* Duplicate, so mark the previous occurrence as skippable */
11811198
skip_slot[slotmap->slot]= true;
1182-
++num_skipped;
1199+
num_skipped++;
11831200
}
1201+
/* Remember slot containing latest occurrence of this request value */
11841202
slotmap->slot=n;
11851203
}
11861204

@@ -1195,15 +1213,16 @@ CompactBgwriterRequestQueue()
11951213
}
11961214

11971215
/* We found some duplicates; remove them. */
1198-
for (n=0,preserve_count=0;n<BgWriterShmem->num_requests;++n)
1216+
preserve_count=0;
1217+
for (n=0;n<BgWriterShmem->num_requests;n++)
11991218
{
12001219
if (skip_slot[n])
12011220
continue;
12021221
BgWriterShmem->requests[preserve_count++]=BgWriterShmem->requests[n];
12031222
}
12041223
ereport(DEBUG1,
1205-
(errmsg("compacted fsync request queue from %d entries to %d entries",
1206-
BgWriterShmem->num_requests,preserve_count)));
1224+
(errmsg("compacted fsync request queue from %d entries to %d entries",
1225+
BgWriterShmem->num_requests,preserve_count)));
12071226
BgWriterShmem->num_requests=preserve_count;
12081227

12091228
/* Cleanup. */

‎src/backend/storage/smgr/md.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ static MemoryContext MdCxt;/* context for all md.c allocations */
126126
typedefstruct
127127
{
128128
RelFileNodernode;/* the targeted relation */
129-
ForkNumberforknum;
129+
ForkNumberforknum;/* which fork */
130130
BlockNumbersegno;/* which segment */
131131
}PendingOperationTag;
132132

@@ -1212,7 +1212,7 @@ mdpostckpt(void)
12121212
* If there is a local pending-ops table, just make an entry in it for
12131213
* mdsync to process later. Otherwise, try to pass off the fsync request
12141214
* to the background writer process. If that fails, just do the fsync
1215-
* locally before returning (weexpect this will not happen often enough
1215+
* locally before returning (wehope this will not happen often enough
12161216
* to be a performance problem).
12171217
*/
12181218
staticvoid
@@ -1239,6 +1239,9 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
12391239
/*
12401240
* register_unlink() -- Schedule a file to be deleted after next checkpoint
12411241
*
1242+
* We don't bother passing in the fork number, because this is only used
1243+
* with main forks.
1244+
*
12421245
* As with register_dirty_segment, this could involve either a local or
12431246
* a remote pending-ops table.
12441247
*/
@@ -1349,6 +1352,9 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
13491352
MemoryContextoldcxt=MemoryContextSwitchTo(MdCxt);
13501353
PendingUnlinkEntry*entry;
13511354

1355+
/* PendingUnlinkEntry doesn't store forknum, since it's always MAIN */
1356+
Assert(forknum==MAIN_FORKNUM);
1357+
13521358
entry=palloc(sizeof(PendingUnlinkEntry));
13531359
entry->rnode=rnode;
13541360
entry->cycle_ctr=mdckpt_cycle_ctr;
@@ -1398,7 +1404,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
13981404
}
13991405

14001406
/*
1401-
* ForgetRelationFsyncRequests -- forget any fsyncs for arel
1407+
* ForgetRelationFsyncRequests -- forget any fsyncs for arelation fork
14021408
*/
14031409
void
14041410
ForgetRelationFsyncRequests(RelFileNodernode,ForkNumberforknum)

‎src/include/storage/relfilenode.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ typedef enum ForkNumber
6565
* Note: in pg_class, relfilenode can be zero to denote that the relation
6666
* is a "mapped" relation, whose current true filenode number is available
6767
* from relmapper.c. Again, this case is NOT allowed in RelFileNodes.
68+
*
69+
* Note: various places use RelFileNode in hashtable keys. Therefore,
70+
* there *must not* be any unused padding bytes in this struct. That
71+
* should be safe as long as all the fields are of type Oid.
6872
*/
6973
typedefstructRelFileNode
7074
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp