Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit2f961b1

Browse files
committed
Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption inCompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there areno uninitialized pad bytes in the request queue structs. This would onlycause trouble if (a) there were such pad bytes, which could happen in 8.4and up if the compiler makes enum ForkNumber narrower than 32 bits, butotherwise would require not-currently-planned changes in the widths ofother typedefs; and (b) the kernel has not uniformly initialized thecontents of shared memory to zeroes. Still, it seems a tad risky, and wecan easily remove any risk by pre-zeroing the request array for ourselves.In addition to that, we need to establish a coding rule that structRelFileNode can't contain any padding bytes, since such structs are copiedinto the request array verbatim. (There are other places that are assumingthis anyway, it turns out.)In 9.1 and up, the risk was a bit larger because we were also effectivelyassuming that struct RelFileNodeBackend contained no pad bytes, and withfields of different types in there, that would be much easier to break.However, there is no good reason to ever transmit fsync or delete requestsfor temp files to the bgwriter/checkpointer, so we can revert the requeststructs to plain RelFileNode, getting rid of the padding risk and savingsome marginal number of bytes and cycles in fsync queue manipulation whilewe are at it. The savings might be more than marginal during deletion ofa temp relation, because the old code transmitted an entirely useless butnonetheless expensive-to-process ForgetRelationFsync request to thebackground process, and also had the background process perform the filedeletion even though that can safely be done immediately.In addition, make some cleanup of nearby comments and small improvements tothe code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
1 parent5dd19d1 commit2f961b1

File tree

6 files changed

+107
-52
lines changed

6 files changed

+107
-52
lines changed

‎src/backend/postmaster/bgwriter.c

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@
116116
*/
117117
typedefstruct
118118
{
119-
RelFileNodeBackendrnode;
119+
RelFileNodernode;
120120
ForkNumberforknum;
121121
BlockNumbersegno;/* see md.c for special values */
122122
/* might add a real request-type field later; not needed yet */
@@ -896,17 +896,22 @@ BgWriterShmemSize(void)
896896
void
897897
BgWriterShmemInit(void)
898898
{
899+
Sizesize=BgWriterShmemSize();
899900
boolfound;
900901

901902
BgWriterShmem= (BgWriterShmemStruct*)
902903
ShmemInitStruct("Background Writer Data",
903-
BgWriterShmemSize(),
904+
size,
904905
&found);
905906

906907
if (!found)
907908
{
908-
/* First time through, so initialize */
909-
MemSet(BgWriterShmem,0,sizeof(BgWriterShmemStruct));
909+
/*
910+
* First time through, so initialize. Note that we zero the whole
911+
* requests array; this is so that CompactBgwriterRequestQueue
912+
* can assume that any pad bytes in the request structs are zeroes.
913+
*/
914+
MemSet(BgWriterShmem,0,size);
910915
SpinLockInit(&BgWriterShmem->ckpt_lck);
911916
BgWriterShmem->max_requests=NBuffers;
912917
}
@@ -1068,6 +1073,10 @@ RequestCheckpoint(int flags)
10681073
* is dirty and must be fsync'd before next checkpoint. We also use this
10691074
* opportunity to count such writes for statistical purposes.
10701075
*
1076+
* This functionality is only supported for regular (not backend-local)
1077+
* relations, so the rnode argument is intentionally RelFileNode not
1078+
* RelFileNodeBackend.
1079+
*
10711080
* segno specifies which segment (not block!) of the relation needs to be
10721081
* fsync'd. (Since the valid range is much less than BlockNumber, we can
10731082
* use high values for special flags; that's all internal to md.c, which
@@ -1084,8 +1093,7 @@ RequestCheckpoint(int flags)
10841093
* let the backend know by returning false.
10851094
*/
10861095
bool
1087-
ForwardFsyncRequest(RelFileNodeBackendrnode,ForkNumberforknum,
1088-
BlockNumbersegno)
1096+
ForwardFsyncRequest(RelFileNodernode,ForkNumberforknum,BlockNumbersegno)
10891097
{
10901098
BgWriterRequest*request;
10911099

@@ -1129,6 +1137,7 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11291137
/*
11301138
* CompactBgwriterRequestQueue
11311139
*Remove duplicates from the request queue to avoid backend fsyncs.
1140+
*Returns "true" if any entries were removed.
11321141
*
11331142
* Although a full fsync request queue is not common, it can lead to severe
11341143
* performance problems when it does happen. So far, this situation has
@@ -1138,11 +1147,11 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11381147
* gets very expensive and can slow down the whole system.
11391148
*
11401149
* Trying to do this every time the queue is full could lose if there
1141-
* aren't any removable entries. But should be vanishingly rare in
1150+
* aren't any removable entries. Butthatshould be vanishingly rare in
11421151
* practice: there's one queue entry per shared buffer.
11431152
*/
11441153
staticbool
1145-
CompactBgwriterRequestQueue()
1154+
CompactBgwriterRequestQueue(void)
11461155
{
11471156
structBgWriterSlotMapping
11481157
{
@@ -1160,18 +1169,20 @@ CompactBgwriterRequestQueue()
11601169
/* must hold BgWriterCommLock in exclusive mode */
11611170
Assert(LWLockHeldByMe(BgWriterCommLock));
11621171

1172+
/* Initialize skip_slot array */
1173+
skip_slot=palloc0(sizeof(bool)*BgWriterShmem->num_requests);
1174+
11631175
/* Initialize temporary hash table */
11641176
MemSet(&ctl,0,sizeof(ctl));
11651177
ctl.keysize=sizeof(BgWriterRequest);
11661178
ctl.entrysize=sizeof(structBgWriterSlotMapping);
11671179
ctl.hash=tag_hash;
1180+
ctl.hcxt=CurrentMemoryContext;
1181+
11681182
htab=hash_create("CompactBgwriterRequestQueue",
11691183
BgWriterShmem->num_requests,
11701184
&ctl,
1171-
HASH_ELEM |HASH_FUNCTION);
1172-
1173-
/* Initialize skip_slot array */
1174-
skip_slot=palloc0(sizeof(bool)*BgWriterShmem->num_requests);
1185+
HASH_ELEM |HASH_FUNCTION |HASH_CONTEXT);
11751186

11761187
/*
11771188
* The basic idea here is that a request can be skipped if it's followed
@@ -1186,19 +1197,28 @@ CompactBgwriterRequestQueue()
11861197
* anyhow), but it's not clear that the extra complexity would buy us
11871198
* anything.
11881199
*/
1189-
for (n=0;n<BgWriterShmem->num_requests;++n)
1200+
for (n=0;n<BgWriterShmem->num_requests;n++)
11901201
{
11911202
BgWriterRequest*request;
11921203
structBgWriterSlotMapping*slotmap;
11931204
boolfound;
11941205

1206+
/*
1207+
* We use the request struct directly as a hashtable key. This
1208+
* assumes that any padding bytes in the structs are consistently the
1209+
* same, which should be okay because we zeroed them in
1210+
* BgWriterShmemInit. Note also that RelFileNode had better
1211+
* contain no pad bytes.
1212+
*/
11951213
request=&BgWriterShmem->requests[n];
11961214
slotmap=hash_search(htab,request,HASH_ENTER,&found);
11971215
if (found)
11981216
{
1217+
/* Duplicate, so mark the previous occurrence as skippable */
11991218
skip_slot[slotmap->slot]= true;
1200-
++num_skipped;
1219+
num_skipped++;
12011220
}
1221+
/* Remember slot containing latest occurrence of this request value */
12021222
slotmap->slot=n;
12031223
}
12041224

@@ -1213,7 +1233,8 @@ CompactBgwriterRequestQueue()
12131233
}
12141234

12151235
/* We found some duplicates; remove them. */
1216-
for (n=0,preserve_count=0;n<BgWriterShmem->num_requests;++n)
1236+
preserve_count=0;
1237+
for (n=0;n<BgWriterShmem->num_requests;n++)
12171238
{
12181239
if (skip_slot[n])
12191240
continue;

‎src/backend/storage/buffer/bufmgr.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1971,7 +1971,8 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
19711971
{
19721972
inti;
19731973

1974-
if (rnode.backend!=InvalidBackendId)
1974+
/* If it's a local relation, it's localbuf.c's problem. */
1975+
if (RelFileNodeBackendIsTemp(rnode))
19751976
{
19761977
if (rnode.backend==MyBackendId)
19771978
DropRelFileNodeLocalBuffers(rnode.node,forkNum,firstDelBlock);

‎src/backend/storage/smgr/md.c

Lines changed: 51 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -121,13 +121,17 @@ static MemoryContext MdCxt;/* context for all md.c allocations */
121121
* be deleted after the next checkpoint, but we use a linked list instead of
122122
* a hash table, because we don't expect there to be any duplicate requests.
123123
*
124+
* These mechanisms are only used for non-temp relations; we never fsync
125+
* temp rels, nor do we need to postpone their deletion (see comments in
126+
* mdunlink).
127+
*
124128
* (Regular backends do not track pending operations locally, but forward
125129
* them to the bgwriter.)
126130
*/
127131
typedefstruct
128132
{
129-
RelFileNodeBackendrnode;/* the targeted relation */
130-
ForkNumberforknum;
133+
RelFileNodernode;/* the targeted relation */
134+
ForkNumberforknum;/* which fork */
131135
BlockNumbersegno;/* which segment */
132136
}PendingOperationTag;
133137

@@ -142,7 +146,7 @@ typedef struct
142146

143147
typedefstruct
144148
{
145-
RelFileNodeBackendrnode;/* the dead relation to delete */
149+
RelFileNodernode;/* the dead relation to delete */
146150
CycleCtrcycle_ctr;/* mdckpt_cycle_ctr when request was made */
147151
}PendingUnlinkEntry;
148152

@@ -301,11 +305,11 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
301305
/*
302306
*mdunlink() -- Unlink a relation.
303307
*
304-
* Note that we're passed aRelFileNode --- by the time this is called,
308+
* Note that we're passed aRelFileNodeBackend --- by the time this is called,
305309
* there won't be an SMgrRelation hashtable entry anymore.
306310
*
307-
*Actually, we don't unlink the first segment file of therelation, but
308-
* just truncate it to zero length, and record a request to unlink it after
311+
*For regular relations, we don't unlink the first segment file of therel,
312+
*butjust truncate it to zero length, and record a request to unlink it after
309313
* the next checkpoint. Additional segments can be unlinked immediately,
310314
* however. Leaving the empty file in place prevents that relfilenode
311315
* number from being reused. The scenario this protects us from is:
@@ -322,6 +326,12 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
322326
* number until it's safe, because relfilenode assignment skips over any
323327
* existing file.
324328
*
329+
* We do not need to go through this dance for temp relations, though, because
330+
* we never make WAL entries for temp rels, and so a temp rel poses no threat
331+
* to the health of a regular rel that has taken over its relfilenode number.
332+
* The fact that temp rels and regular rels have different file naming
333+
* patterns provides additional safety.
334+
*
325335
* All the above applies only to the relation's main fork; other forks can
326336
* just be removed immediately, since they are not needed to prevent the
327337
* relfilenode number from being recycled. Also, we do not carefully
@@ -344,16 +354,18 @@ mdunlink(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
344354

345355
/*
346356
* We have to clean out any pending fsync requests for the doomed
347-
* relation, else the next mdsync() will fail.
357+
* relation, else the next mdsync() will fail. There can't be any such
358+
* requests for a temp relation, though.
348359
*/
349-
ForgetRelationFsyncRequests(rnode,forkNum);
360+
if (!RelFileNodeBackendIsTemp(rnode))
361+
ForgetRelationFsyncRequests(rnode.node,forkNum);
350362

351363
path=relpath(rnode,forkNum);
352364

353365
/*
354366
* Delete or truncate the first segment.
355367
*/
356-
if (isRedo||forkNum!=MAIN_FORKNUM)
368+
if (isRedo||forkNum!=MAIN_FORKNUM||RelFileNodeBackendIsTemp(rnode))
357369
{
358370
ret=unlink(path);
359371
if (ret<0&&errno!=ENOENT)
@@ -1078,8 +1090,7 @@ mdsync(void)
10781090
* the relation will have been dirtied through this same smgr
10791091
* relation, and so we can save a file open/close cycle.
10801092
*/
1081-
reln=smgropen(entry->tag.rnode.node,
1082-
entry->tag.rnode.backend);
1093+
reln=smgropen(entry->tag.rnode,InvalidBackendId);
10831094

10841095
/*
10851096
* It is possible that the relation has been dropped or
@@ -1230,7 +1241,7 @@ mdpostckpt(void)
12301241
Assert((CycleCtr) (entry->cycle_ctr+1)==mdckpt_cycle_ctr);
12311242

12321243
/* Unlink the file */
1233-
path=relpath(entry->rnode,MAIN_FORKNUM);
1244+
path=relpathperm(entry->rnode,MAIN_FORKNUM);
12341245
if (unlink(path)<0)
12351246
{
12361247
/*
@@ -1258,20 +1269,23 @@ mdpostckpt(void)
12581269
* If there is a local pending-ops table, just make an entry in it for
12591270
* mdsync to process later. Otherwise, try to pass off the fsync request
12601271
* to the background writer process. If that fails, just do the fsync
1261-
* locally before returning (weexpect this will not happen often enough
1272+
* locally before returning (wehope this will not happen often enough
12621273
* to be a performance problem).
12631274
*/
12641275
staticvoid
12651276
register_dirty_segment(SMgrRelationreln,ForkNumberforknum,MdfdVec*seg)
12661277
{
1278+
/* Temp relations should never be fsync'd */
1279+
Assert(!SmgrIsTemp(reln));
1280+
12671281
if (pendingOpsTable)
12681282
{
12691283
/* push it into local pending-ops table */
1270-
RememberFsyncRequest(reln->smgr_rnode,forknum,seg->mdfd_segno);
1284+
RememberFsyncRequest(reln->smgr_rnode.node,forknum,seg->mdfd_segno);
12711285
}
12721286
else
12731287
{
1274-
if (ForwardFsyncRequest(reln->smgr_rnode,forknum,seg->mdfd_segno))
1288+
if (ForwardFsyncRequest(reln->smgr_rnode.node,forknum,seg->mdfd_segno))
12751289
return;/* passed it off successfully */
12761290

12771291
ereport(DEBUG1,
@@ -1288,16 +1302,23 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
12881302
/*
12891303
* register_unlink() -- Schedule a file to be deleted after next checkpoint
12901304
*
1305+
* We don't bother passing in the fork number, because this is only used
1306+
* with main forks.
1307+
*
12911308
* As with register_dirty_segment, this could involve either a local or
12921309
* a remote pending-ops table.
12931310
*/
12941311
staticvoid
12951312
register_unlink(RelFileNodeBackendrnode)
12961313
{
1314+
/* Should never be used with temp relations */
1315+
Assert(!RelFileNodeBackendIsTemp(rnode));
1316+
12971317
if (pendingOpsTable)
12981318
{
12991319
/* push it into local pending-ops table */
1300-
RememberFsyncRequest(rnode,MAIN_FORKNUM,UNLINK_RELATION_REQUEST);
1320+
RememberFsyncRequest(rnode.node,MAIN_FORKNUM,
1321+
UNLINK_RELATION_REQUEST);
13011322
}
13021323
else
13031324
{
@@ -1309,7 +1330,7 @@ register_unlink(RelFileNodeBackend rnode)
13091330
* XXX should we just leave the file orphaned instead?
13101331
*/
13111332
Assert(IsUnderPostmaster);
1312-
while (!ForwardFsyncRequest(rnode,MAIN_FORKNUM,
1333+
while (!ForwardFsyncRequest(rnode.node,MAIN_FORKNUM,
13131334
UNLINK_RELATION_REQUEST))
13141335
pg_usleep(10000L);/* 10 msec seems a good number */
13151336
}
@@ -1335,8 +1356,7 @@ register_unlink(RelFileNodeBackend rnode)
13351356
* structure for them.)
13361357
*/
13371358
void
1338-
RememberFsyncRequest(RelFileNodeBackendrnode,ForkNumberforknum,
1339-
BlockNumbersegno)
1359+
RememberFsyncRequest(RelFileNodernode,ForkNumberforknum,BlockNumbersegno)
13401360
{
13411361
Assert(pendingOpsTable);
13421362

@@ -1349,7 +1369,7 @@ RememberFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
13491369
hash_seq_init(&hstat,pendingOpsTable);
13501370
while ((entry= (PendingOperationEntry*)hash_seq_search(&hstat))!=NULL)
13511371
{
1352-
if (RelFileNodeBackendEquals(entry->tag.rnode,rnode)&&
1372+
if (RelFileNodeEquals(entry->tag.rnode,rnode)&&
13531373
entry->tag.forknum==forknum)
13541374
{
13551375
/* Okay, cancel this entry */
@@ -1370,7 +1390,7 @@ RememberFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
13701390
hash_seq_init(&hstat,pendingOpsTable);
13711391
while ((entry= (PendingOperationEntry*)hash_seq_search(&hstat))!=NULL)
13721392
{
1373-
if (entry->tag.rnode.node.dbNode==rnode.node.dbNode)
1393+
if (entry->tag.rnode.dbNode==rnode.dbNode)
13741394
{
13751395
/* Okay, cancel this entry */
13761396
entry->canceled= true;
@@ -1384,7 +1404,7 @@ RememberFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
13841404
PendingUnlinkEntry*entry= (PendingUnlinkEntry*)lfirst(cell);
13851405

13861406
next=lnext(cell);
1387-
if (entry->rnode.node.dbNode==rnode.node.dbNode)
1407+
if (entry->rnode.dbNode==rnode.dbNode)
13881408
{
13891409
pendingUnlinks=list_delete_cell(pendingUnlinks,cell,prev);
13901410
pfree(entry);
@@ -1399,6 +1419,9 @@ RememberFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
13991419
MemoryContextoldcxt=MemoryContextSwitchTo(MdCxt);
14001420
PendingUnlinkEntry*entry;
14011421

1422+
/* PendingUnlinkEntry doesn't store forknum, since it's always MAIN */
1423+
Assert(forknum==MAIN_FORKNUM);
1424+
14021425
entry=palloc(sizeof(PendingUnlinkEntry));
14031426
entry->rnode=rnode;
14041427
entry->cycle_ctr=mdckpt_cycle_ctr;
@@ -1448,10 +1471,10 @@ RememberFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
14481471
}
14491472

14501473
/*
1451-
* ForgetRelationFsyncRequests -- forget any fsyncs for arel
1474+
* ForgetRelationFsyncRequests -- forget any fsyncs for arelation fork
14521475
*/
14531476
void
1454-
ForgetRelationFsyncRequests(RelFileNodeBackendrnode,ForkNumberforknum)
1477+
ForgetRelationFsyncRequests(RelFileNodernode,ForkNumberforknum)
14551478
{
14561479
if (pendingOpsTable)
14571480
{
@@ -1486,12 +1509,11 @@ ForgetRelationFsyncRequests(RelFileNodeBackend rnode, ForkNumber forknum)
14861509
void
14871510
ForgetDatabaseFsyncRequests(Oiddbid)
14881511
{
1489-
RelFileNodeBackendrnode;
1512+
RelFileNodernode;
14901513

1491-
rnode.node.dbNode=dbid;
1492-
rnode.node.spcNode=0;
1493-
rnode.node.relNode=0;
1494-
rnode.backend=InvalidBackendId;
1514+
rnode.dbNode=dbid;
1515+
rnode.spcNode=0;
1516+
rnode.relNode=0;
14951517

14961518
if (pendingOpsTable)
14971519
{

‎src/include/postmaster/bgwriter.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ extern void BackgroundWriterMain(void);
2727
externvoidRequestCheckpoint(intflags);
2828
externvoidCheckpointWriteDelay(intflags,doubleprogress);
2929

30-
externboolForwardFsyncRequest(RelFileNodeBackendrnode,ForkNumberforknum,
30+
externboolForwardFsyncRequest(RelFileNodernode,ForkNumberforknum,
3131
BlockNumbersegno);
3232
externvoidAbsorbFsyncRequests(void);
3333

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp