Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit85d8b30

Browse files
committed
Apply a better fix to mdunlinkfork().
Replace the stopgap fix I made in0e758ae with a cleaner one.The real problem with4ab5dae is that it contorted this function'slogic substantially, by introducing a third code path that requireddifferent behavior in the function's main loop. That seems quiteunnecessary on closer inspection: the new IsBinaryUpgrade case canjust share the behavior of the other immediate-unlink cases. Hence,revert4ab5dae and most of0e758ae (keeping the latter'ssave/restore errno fix), and add IsBinaryUpgrade to the set ofconditions tested to choose immediate unlink.Also fix some additional places with sloppy handling of errno,to ensure we have an invariant that we always continue processingafter any non-ENOENT failure of do_truncate. I doubt that that'sfixing any bug of field importance, so I don't feel it necessary toback-patch; but we might as well get it right while we're here.Also improve the comments, which had drifted a bit from what thecode actually does, and neglected to mention some importantconsiderations.Back-patch to v15, not because this is fixing any bug but becauseit doesn't seem like a good idea for v15's mdunlinkfork logic to besignificantly different from both v14 and v16.Discussion:https://postgr.es/m/3797575.1667924888@sss.pgh.pa.us
1 parentff0d8f2 commit85d8b30

File tree

1 file changed

+45
-37
lines changed
  • src/backend/storage/smgr

1 file changed

+45
-37
lines changed

‎src/backend/storage/smgr/md.c

Lines changed: 45 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -257,11 +257,23 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
257257
* next checkpoint, we prevent reassignment of the relfilenumber until it's
258258
* safe, because relfilenumber assignment skips over any existing file.
259259
*
260+
* Additional segments, if any, are truncated and then unlinked. The reason
261+
* for truncating is that other backends may still hold open FDs for these at
262+
* the smgr level, so that the kernel can't remove the file yet. We want to
263+
* reclaim the disk space right away despite that.
264+
*
260265
* We do not need to go through this dance for temp relations, though, because
261266
* we never make WAL entries for temp rels, and so a temp rel poses no threat
262267
* to the health of a regular rel that has taken over its relfilenumber.
263268
* The fact that temp rels and regular rels have different file naming
264-
* patterns provides additional safety.
269+
* patterns provides additional safety. Other backends shouldn't have open
270+
* FDs for them, either.
271+
*
272+
* We also don't do it while performing a binary upgrade. There is no reuse
273+
* hazard in that case, since after a crash or even a simple ERROR, the
274+
* upgrade fails and the whole cluster must be recreated from scratch.
275+
* Furthermore, it is important to remove the files from disk immediately,
276+
* because we may be about to reuse the same relfilenumber.
265277
*
266278
* All the above applies only to the relation's main fork; other forks can
267279
* just be removed immediately, since they are not needed to prevent the
@@ -274,6 +286,9 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
274286
* for later, since during redo there's no possibility of creating a
275287
* conflicting relation.
276288
*
289+
* Note: we currently just never warn about ENOENT at all. We could warn in
290+
* the main-fork, non-isRedo case, but it doesn't seem worth the trouble.
291+
*
277292
* Note: any failure should be reported as WARNING not ERROR, because
278293
* we are usually not in a transaction anymore when this is called.
279294
*/
@@ -319,19 +334,19 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
319334
{
320335
char*path;
321336
intret;
322-
BlockNumbersegno=0;
337+
intsave_errno;
323338

324339
path=relpath(rlocator,forknum);
325340

326341
/*
327-
* Delete or truncate the first segment.
342+
* Truncate and then unlink the first segment, or just register a request
343+
* to unlink it later, as described in the comments for mdunlink().
328344
*/
329-
if (isRedo||forknum!=MAIN_FORKNUM||RelFileLocatorBackendIsTemp(rlocator))
345+
if (isRedo||IsBinaryUpgrade||forknum!=MAIN_FORKNUM||
346+
RelFileLocatorBackendIsTemp(rlocator))
330347
{
331348
if (!RelFileLocatorBackendIsTemp(rlocator))
332349
{
333-
intsave_errno;
334-
335350
/* Prevent other backends' fds from holding on to the disk space */
336351
ret=do_truncate(path);
337352

@@ -344,63 +359,56 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
344359
ret=0;
345360

346361
/* Next unlink the file, unless it was already found to be missing */
347-
if (ret==0||errno!=ENOENT)
362+
if (ret>=0||errno!=ENOENT)
348363
{
349364
ret=unlink(path);
350365
if (ret<0&&errno!=ENOENT)
366+
{
367+
save_errno=errno;
351368
ereport(WARNING,
352369
(errcode_for_file_access(),
353370
errmsg("could not remove file \"%s\": %m",path)));
354-
segno++;
371+
errno=save_errno;
372+
}
355373
}
356374
}
357375
else
358376
{
359377
/* Prevent other backends' fds from holding on to the disk space */
360378
ret=do_truncate(path);
361379

362-
/*
363-
* Except during a binary upgrade, register request to unlink first
364-
* segment later, rather than now.
365-
*
366-
* If we're performing a binary upgrade, the dangers described in the
367-
* header comments for mdunlink() do not exist, since after a crash or
368-
* even a simple ERROR, the upgrade fails and the whole new cluster
369-
* must be recreated from scratch. And, on the other hand, it is
370-
* important to remove the files from disk immediately, because we may
371-
* be about to reuse the same relfilenumber.
372-
*/
373-
if (!IsBinaryUpgrade)
374-
{
375-
register_unlink_segment(rlocator,forknum,0/* first seg */ );
376-
segno++;
377-
}
380+
/* Register request to unlink first segment later */
381+
save_errno=errno;
382+
register_unlink_segment(rlocator,forknum,0/* first seg */ );
383+
errno=save_errno;
378384
}
379385

380386
/*
381-
* Delete any remaining segments (we might or might not have dealt with
382-
* the first one above).
387+
* Delete any additional segments.
388+
*
389+
* Note that because we loop until getting ENOENT, we will correctly
390+
* remove all inactive segments as well as active ones. Ideally we'd
391+
* continue the loop until getting exactly that errno, but that risks an
392+
* infinite loop if the problem is directory-wide (for instance, if we
393+
* suddenly can't read the data directory itself). We compromise by
394+
* continuing after a non-ENOENT truncate error, but stopping after any
395+
* unlink error. If there is indeed a directory-wide problem, additional
396+
* unlink attempts wouldn't work anyway.
383397
*/
384-
if (ret >=0)
398+
if (ret >=0||errno!=ENOENT)
385399
{
386400
char*segpath= (char*)palloc(strlen(path)+12);
401+
BlockNumbersegno;
387402

388-
/*
389-
* Note that because we loop until getting ENOENT, we will correctly
390-
* remove all inactive segments as well as active ones.
391-
*/
392-
for (;;segno++)
403+
for (segno=1;;segno++)
393404
{
394-
if (segno==0)
395-
strcpy(segpath,path);
396-
else
397-
sprintf(segpath,"%s.%u",path,segno);
405+
sprintf(segpath,"%s.%u",path,segno);
398406

399407
if (!RelFileLocatorBackendIsTemp(rlocator))
400408
{
401409
/*
402410
* Prevent other backends' fds from holding on to the disk
403-
* space.
411+
* space. We're done if we see ENOENT, though.
404412
*/
405413
if (do_truncate(segpath)<0&&errno==ENOENT)
406414
break;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp