Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7e2a18a

Browse files
committed
Perform an immediate shutdown if the postmaster.pid file is removed.
The postmaster now checks every minute or so (worst case, at most twominutes) that postmaster.pid is still there and still contains its own PID.If not, it performs an immediate shutdown, as though it had receivedSIGQUIT.The original goal behind this change was to ensure that failed buildfarmruns would get fully cleaned up, even if the test scripts had left apostmaster running, which is not an infrequent occurrence. When thebuildfarm script removes a test postmaster's $PGDATA directory, its nextcheck on postmaster.pid will fail and cause it to exit. Previously, manualintervention was often needed to get rid of such orphaned postmasters,since they'd block new test postmasters from obtaining the expected socketaddress.However, by checking postmaster.pid and not something else, we can provideadditional robustness: manual removal of postmaster.pid is a frequent DBAmistake, and now we can at least limit the damage that will ensue if a newpostmaster is started while the old one is still alive.Back-patch to all supported branches, since we won't get the desiredimprovement in buildfarm reliability otherwise.
1 parent8f6bb85 commit7e2a18a

File tree

3 files changed

+115
-14
lines changed

3 files changed

+115
-14
lines changed

‎src/backend/postmaster/postmaster.c

Lines changed: 44 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1602,9 +1602,10 @@ ServerLoop(void)
16021602
fd_setreadmask;
16031603
intnSockets;
16041604
time_tnow,
1605+
last_lockfile_recheck_time,
16051606
last_touch_time;
16061607

1607-
last_touch_time=time(NULL);
1608+
last_lockfile_recheck_time=last_touch_time=time(NULL);
16081609

16091610
nSockets=initMasks(&readmask);
16101611

@@ -1754,19 +1755,6 @@ ServerLoop(void)
17541755
if (StartWorkerNeeded||HaveCrashedWorker)
17551756
maybe_start_bgworker();
17561757

1757-
/*
1758-
* Touch Unix socket and lock files every 58 minutes, to ensure that
1759-
* they are not removed by overzealous /tmp-cleaning tasks. We assume
1760-
* no one runs cleaners with cutoff times of less than an hour ...
1761-
*/
1762-
now=time(NULL);
1763-
if (now-last_touch_time >=58*SECS_PER_MINUTE)
1764-
{
1765-
TouchSocketFiles();
1766-
TouchSocketLockFiles();
1767-
last_touch_time=now;
1768-
}
1769-
17701758
#ifdefHAVE_PTHREAD_IS_THREADED_NP
17711759

17721760
/*
@@ -1793,6 +1781,48 @@ ServerLoop(void)
17931781
/* reset flag so we don't SIGKILL again */
17941782
AbortStartTime=0;
17951783
}
1784+
1785+
/*
1786+
* Lastly, check to see if it's time to do some things that we don't
1787+
* want to do every single time through the loop, because they're a
1788+
* bit expensive. Note that there's up to a minute of slop in when
1789+
* these tasks will be performed, since DetermineSleepTime() will let
1790+
* us sleep at most that long.
1791+
*/
1792+
now=time(NULL);
1793+
1794+
/*
1795+
* Once a minute, verify that postmaster.pid hasn't been removed or
1796+
* overwritten. If it has, we force a shutdown. This avoids having
1797+
* postmasters and child processes hanging around after their database
1798+
* is gone, and maybe causing problems if a new database cluster is
1799+
* created in the same place. It also provides some protection
1800+
* against a DBA foolishly removing postmaster.pid and manually
1801+
* starting a new postmaster. Data corruption is likely to ensue from
1802+
* that anyway, but we can minimize the damage by aborting ASAP.
1803+
*/
1804+
if (now-last_lockfile_recheck_time >=1*SECS_PER_MINUTE)
1805+
{
1806+
if (!RecheckDataDirLockFile())
1807+
{
1808+
ereport(LOG,
1809+
(errmsg("performing immediate shutdown because data directory lock file is invalid")));
1810+
kill(MyProcPid,SIGQUIT);
1811+
}
1812+
last_lockfile_recheck_time=now;
1813+
}
1814+
1815+
/*
1816+
* Touch Unix socket and lock files every 58 minutes, to ensure that
1817+
* they are not removed by overzealous /tmp-cleaning tasks. We assume
1818+
* no one runs cleaners with cutoff times of less than an hour ...
1819+
*/
1820+
if (now-last_touch_time >=58*SECS_PER_MINUTE)
1821+
{
1822+
TouchSocketFiles();
1823+
TouchSocketLockFiles();
1824+
last_touch_time=now;
1825+
}
17961826
}
17971827
}
17981828

‎src/backend/utils/init/miscinit.c

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1218,6 +1218,76 @@ AddToDataDirLockFile(int target_line, const char *str)
12181218
}
12191219

12201220

1221+
/*
1222+
* Recheck that the data directory lock file still exists with expected
1223+
* content. Return TRUE if the lock file appears OK, FALSE if it isn't.
1224+
*
1225+
* We call this periodically in the postmaster. The idea is that if the
1226+
* lock file has been removed or replaced by another postmaster, we should
1227+
* do a panic database shutdown. Therefore, we should return TRUE if there
1228+
* is any doubt: we do not want to cause a panic shutdown unnecessarily.
1229+
* Transient failures like EINTR or ENFILE should not cause us to fail.
1230+
* (If there really is something wrong, we'll detect it on a future recheck.)
1231+
*/
1232+
bool
1233+
RecheckDataDirLockFile(void)
1234+
{
1235+
intfd;
1236+
intlen;
1237+
longfile_pid;
1238+
charbuffer[BLCKSZ];
1239+
1240+
fd=open(DIRECTORY_LOCK_FILE,O_RDWR |PG_BINARY,0);
1241+
if (fd<0)
1242+
{
1243+
/*
1244+
* There are many foreseeable false-positive error conditions. For
1245+
* safety, fail only on enumerated clearly-something-is-wrong
1246+
* conditions.
1247+
*/
1248+
switch (errno)
1249+
{
1250+
caseENOENT:
1251+
caseENOTDIR:
1252+
/* disaster */
1253+
ereport(LOG,
1254+
(errcode_for_file_access(),
1255+
errmsg("could not open file \"%s\": %m",
1256+
DIRECTORY_LOCK_FILE)));
1257+
return false;
1258+
default:
1259+
/* non-fatal, at least for now */
1260+
ereport(LOG,
1261+
(errcode_for_file_access(),
1262+
errmsg("could not open file \"%s\": %m; continuing anyway",
1263+
DIRECTORY_LOCK_FILE)));
1264+
return true;
1265+
}
1266+
}
1267+
len=read(fd,buffer,sizeof(buffer)-1);
1268+
if (len<0)
1269+
{
1270+
ereport(LOG,
1271+
(errcode_for_file_access(),
1272+
errmsg("could not read from file \"%s\": %m",
1273+
DIRECTORY_LOCK_FILE)));
1274+
close(fd);
1275+
return true;/* treat read failure as nonfatal */
1276+
}
1277+
buffer[len]='\0';
1278+
close(fd);
1279+
file_pid=atol(buffer);
1280+
if (file_pid==getpid())
1281+
return true;/* all is well */
1282+
1283+
/* Trouble: someone's overwritten the lock file */
1284+
ereport(LOG,
1285+
(errmsg("lock file \"%s\" contains wrong PID: %ld instead of %ld",
1286+
DIRECTORY_LOCK_FILE,file_pid, (long)getpid())));
1287+
return false;
1288+
}
1289+
1290+
12211291
/*-------------------------------------------------------------------------
12221292
*Version checking support
12231293
*-------------------------------------------------------------------------

‎src/include/miscadmin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,7 @@ extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
453453
constchar*socketDir);
454454
externvoidTouchSocketLockFiles(void);
455455
externvoidAddToDataDirLockFile(inttarget_line,constchar*str);
456+
externboolRecheckDataDirLockFile(void);
456457
externvoidValidatePgVersion(constchar*path);
457458
externvoidprocess_shared_preload_libraries(void);
458459
externvoidprocess_session_preload_libraries(void);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp