Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6760a4d

Browse files
committed
Documentation improvement and minor code cleanups for the latch facility.
Improve the documentation around weak-memory-ordering risks, and do a passof general editorialization on the comments in the latch code. Make theWindows latch code more like the Unix latch code where feasible; inparticular provide the same Assert checks in both implementations.Fix poorly-placed WaitLatch call in syncrep.c.This patch resolves, for the moment, concerns around weak-memory-orderingbugs in latch-related code: we have documented the restrictions and checkedthat existing calls meet them. In 9.2 I hope that we will install suitablememory barrier instructions in SetLatch/ResetLatch, so that their callersdon't need to be quite so careful.
1 parent028a0c5 commit6760a4d

File tree

5 files changed

+163
-102
lines changed

5 files changed

+163
-102
lines changed

‎src/backend/port/unix_latch.c

Lines changed: 71 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -3,60 +3,6 @@
33
* unix_latch.c
44
* Routines for inter-process latches
55
*
6-
* A latch is a boolean variable, with operations that let you to sleep
7-
* until it is set. A latch can be set from another process, or a signal
8-
* handler within the same process.
9-
*
10-
* The latch interface is a reliable replacement for the common pattern of
11-
* using pg_usleep() or select() to wait until a signal arrives, where the
12-
* signal handler sets a global variable. Because on some platforms, an
13-
* incoming signal doesn't interrupt sleep, and even on platforms where it
14-
* does there is a race condition if the signal arrives just before
15-
* entering the sleep, the common pattern must periodically wake up and
16-
* poll the global variable. pselect() system call was invented to solve
17-
* the problem, but it is not portable enough. Latches are designed to
18-
* overcome these limitations, allowing you to sleep without polling and
19-
* ensuring a quick response to signals from other processes.
20-
*
21-
* There are two kinds of latches: local and shared. A local latch is
22-
* initialized by InitLatch, and can only be set from the same process.
23-
* A local latch can be used to wait for a signal to arrive, by calling
24-
* SetLatch in the signal handler. A shared latch resides in shared memory,
25-
* and must be initialized at postmaster startup by InitSharedLatch. Before
26-
* a shared latch can be waited on, it must be associated with a process
27-
* with OwnLatch. Only the process owning the latch can wait on it, but any
28-
* process can set it.
29-
*
30-
* There are three basic operations on a latch:
31-
*
32-
* SetLatch- Sets the latch
33-
* ResetLatch- Clears the latch, allowing it to be set again
34-
* WaitLatch- Waits for the latch to become set
35-
*
36-
* The correct pattern to wait for an event is:
37-
*
38-
* for (;;)
39-
* {
40-
* ResetLatch();
41-
* if (work to do)
42-
* Do Stuff();
43-
*
44-
* WaitLatch();
45-
* }
46-
*
47-
* It's important to reset the latch *before* checking if there's work to
48-
* do. Otherwise, if someone sets the latch between the check and the
49-
* ResetLatch call, you will miss it and Wait will block.
50-
*
51-
* To wake up the waiter, you must first set a global flag or something
52-
* else that the main loop tests in the "if (work to do)" part, and call
53-
* SetLatch *after* that. SetLatch is designed to return quickly if the
54-
* latch is already set.
55-
*
56-
*
57-
* Implementation
58-
* --------------
59-
*
606
* The Unix implementation uses the so-called self-pipe trick to overcome
617
* the race condition involved with select() and setting a global flag
628
* in the signal handler. When a latch is set and the current process
@@ -65,8 +11,8 @@
6511
* interrupt select() on all platforms, and even on platforms where it
6612
* does, a signal that arrives just before the select() call does not
6713
* prevent the select() from entering sleep. An incoming byte on a pipe
68-
* however reliably interrupts the sleep, andmakes select() to return
69-
* immediately if the signal arrives just before select() begins.
14+
* however reliably interrupts the sleep, andcauses select() to return
15+
* immediatelyevenif the signal arrives before select() begins.
7016
*
7117
* When SetLatch is called from the same process that owns the latch,
7218
* SetLatch writes the byte directly to the pipe. If it's owned by another
@@ -99,7 +45,7 @@
9945
/* Are we currently in WaitLatch? The signal handler would like to know. */
10046
staticvolatilesig_atomic_twaiting= false;
10147

102-
/* Read and writeend of the self-pipe */
48+
/* Read and writeends of the self-pipe */
10349
staticintselfpipe_readfd=-1;
10450
staticintselfpipe_writefd=-1;
10551

@@ -115,7 +61,7 @@ static void sendSelfPipeByte(void);
11561
void
11662
InitLatch(volatileLatch*latch)
11763
{
118-
/* Initialize the selfpipe if this is our first latch in the process */
64+
/* Initialize the self-pipe if this is our first latch in the process */
11965
if (selfpipe_readfd==-1)
12066
initSelfPipe();
12167

@@ -126,13 +72,14 @@ InitLatch(volatile Latch *latch)
12672

12773
/*
12874
* Initialize a shared latch that can be set from other processes. The latch
129-
* is initially owned by no-one, use OwnLatch to associate it with the
75+
* is initially owned by no-one; use OwnLatch to associate it with the
13076
* current process.
13177
*
13278
* InitSharedLatch needs to be called in postmaster before forking child
13379
* processes, usually right after allocating the shared memory block
134-
* containing the latch with ShmemInitStruct. The Unix implementation
135-
* doesn't actually require that, but the Windows one does.
80+
* containing the latch with ShmemInitStruct. (The Unix implementation
81+
* doesn't actually require that, but the Windows one does.) Because of
82+
* this restriction, we have no concurrency issues to worry about here.
13683
*/
13784
void
13885
InitSharedLatch(volatileLatch*latch)
@@ -144,23 +91,30 @@ InitSharedLatch(volatile Latch *latch)
14491

14592
/*
14693
* Associate a shared latch with the current process, allowing it to
147-
* wait onit.
94+
* wait onthe latch.
14895
*
149-
* Make sure that latch_sigusr1_handler() is called from the SIGUSR1 signal
150-
* handler, as shared latches use SIGUSR1 to for inter-process communication.
96+
* Although there is a sanity check for latch-already-owned, we don't do
97+
* any sort of locking here, meaning that we could fail to detect the error
98+
* if two processes try to own the same latch at about the same time. If
99+
* there is any risk of that, caller must provide an interlock to prevent it.
100+
*
101+
* In any process that calls OwnLatch(), make sure that
102+
* latch_sigusr1_handler() is called from the SIGUSR1 signal handler,
103+
* as shared latches use SIGUSR1 for inter-process communication.
151104
*/
152105
void
153106
OwnLatch(volatileLatch*latch)
154107
{
155108
Assert(latch->is_shared);
156109

157-
/* Initialize the selfpipe if this is our first latch inthe process */
110+
/* Initialize the self-pipe if this is our first latch inthis process */
158111
if (selfpipe_readfd==-1)
159112
initSelfPipe();
160113

161114
/* sanity check */
162115
if (latch->owner_pid!=0)
163116
elog(ERROR,"latch already owned");
117+
164118
latch->owner_pid=MyProcPid;
165119
}
166120

@@ -172,6 +126,7 @@ DisownLatch(volatile Latch *latch)
172126
{
173127
Assert(latch->is_shared);
174128
Assert(latch->owner_pid==MyProcPid);
129+
175130
latch->owner_pid=0;
176131
}
177132

@@ -229,29 +184,38 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
229184
inthifd;
230185

231186
/*
232-
* Clear the pipe,and check if the latch is set already. If someone
187+
* Clear the pipe,then check if the latch is set already. If someone
233188
* sets the latch between this and the select() below, the setter will
234189
* write a byte to the pipe (or signal us and the signal handler will
235190
* do that), and the select() will return immediately.
191+
*
192+
* Note: we assume that the kernel calls involved in drainSelfPipe()
193+
* and SetLatch() will provide adequate synchronization on machines
194+
* with weak memory ordering, so that we cannot miss seeing is_set
195+
* if the signal byte is already in the pipe when we drain it.
236196
*/
237197
drainSelfPipe();
198+
238199
if (latch->is_set)
239200
{
240201
result=1;
241202
break;
242203
}
243204

205+
/* Must wait ... set up the event masks for select() */
244206
FD_ZERO(&input_mask);
207+
FD_ZERO(&output_mask);
208+
245209
FD_SET(selfpipe_readfd,&input_mask);
246210
hifd=selfpipe_readfd;
211+
247212
if (sock!=PGINVALID_SOCKET&&forRead)
248213
{
249214
FD_SET(sock,&input_mask);
250215
if (sock>hifd)
251216
hifd=sock;
252217
}
253218

254-
FD_ZERO(&output_mask);
255219
if (sock!=PGINVALID_SOCKET&&forWrite)
256220
{
257221
FD_SET(sock,&output_mask);
@@ -288,14 +252,23 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
288252
}
289253

290254
/*
291-
* Sets a latch and wakes up anyone waiting on it. Returns quickly if the
292-
* latch is already set.
255+
* Sets a latch and wakes up anyone waiting on it.
256+
*
257+
* This is cheap if the latch is already set, otherwise not so much.
293258
*/
294259
void
295260
SetLatch(volatileLatch*latch)
296261
{
297262
pid_towner_pid;
298263

264+
/*
265+
* XXX there really ought to be a memory barrier operation right here,
266+
* to ensure that any flag variables we might have changed get flushed
267+
* to main memory before we check/set is_set. Without that, we have to
268+
* require that callers provide their own synchronization for machines
269+
* with weak memory ordering (see latch.h).
270+
*/
271+
299272
/* Quick exit if already set */
300273
if (latch->is_set)
301274
return;
@@ -307,13 +280,21 @@ SetLatch(volatile Latch *latch)
307280
* we're in a signal handler. We use the self-pipe to wake up the select()
308281
* in that case. If it's another process, send a signal.
309282
*
310-
* Fetch owner_pid only once, in case the owner simultaneously disowns the
311-
* latch and clears owner_pid. XXX: This assumes that pid_t is atomic,
312-
* which isn't guaranteed to be true! In practice, the effective range of
313-
* pid_t fits in a 32 bit integer, and so should be atomic. In the worst
314-
* case, we might end up signaling wrong process if the right one disowns
315-
* the latch just as we fetch owner_pid. Even then, you're very unlucky if
316-
* a process with that bogus pid exists.
283+
* Fetch owner_pid only once, in case the latch is concurrently getting
284+
* owned or disowned. XXX: This assumes that pid_t is atomic, which isn't
285+
* guaranteed to be true! In practice, the effective range of pid_t fits
286+
* in a 32 bit integer, and so should be atomic. In the worst case, we
287+
* might end up signaling the wrong process. Even then, you're very
288+
* unlucky if a process with that bogus pid exists and belongs to
289+
* Postgres; and PG database processes should handle excess SIGUSR1
290+
* interrupts without a problem anyhow.
291+
*
292+
* Another sort of race condition that's possible here is for a new process
293+
* to own the latch immediately after we look, so we don't signal it.
294+
* This is okay so long as all callers of ResetLatch/WaitLatch follow the
295+
* standard coding convention of waiting at the bottom of their loops,
296+
* not the top, so that they'll correctly process latch-setting events that
297+
* happen before they enter the loop.
317298
*/
318299
owner_pid=latch->owner_pid;
319300
if (owner_pid==0)
@@ -335,11 +316,23 @@ ResetLatch(volatile Latch *latch)
335316
Assert(latch->owner_pid==MyProcPid);
336317

337318
latch->is_set= false;
319+
320+
/*
321+
* XXX there really ought to be a memory barrier operation right here, to
322+
* ensure that the write to is_set gets flushed to main memory before we
323+
* examine any flag variables. Otherwise a concurrent SetLatch might
324+
* falsely conclude that it needn't signal us, even though we have missed
325+
* seeing some flag updates that SetLatch was supposed to inform us of.
326+
* For the moment, callers must supply their own synchronization of flag
327+
* variables (see latch.h).
328+
*/
338329
}
339330

340331
/*
341-
* SetLatch uses SIGUSR1 to wake up the process waiting on the latch. Wake
342-
* up WaitLatch.
332+
* SetLatch uses SIGUSR1 to wake up the process waiting on the latch.
333+
*
334+
* Wake up WaitLatch, if we're waiting. (We might not be, since SIGUSR1 is
335+
* overloaded for multiple purposes.)
343336
*/
344337
void
345338
latch_sigusr1_handler(void)

‎src/backend/port/win32_latch.c

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
/*-------------------------------------------------------------------------
22
*
33
* win32_latch.c
4-
*Windows implementation of latches.
4+
*Routines for inter-process latches
55
*
6-
* See unix_latch.c for information on usage.
6+
* See unix_latch.c for header comments for the exported functions;
7+
* the API presented here is supposed to be the same as there.
78
*
89
* The Windows implementation uses Windows events that are inherited by
910
* all postmaster child processes.
@@ -23,7 +24,6 @@
2324
#include<unistd.h>
2425

2526
#include"miscadmin.h"
26-
#include"replication/walsender.h"
2727
#include"storage/latch.h"
2828
#include"storage/shmem.h"
2929

@@ -88,7 +88,7 @@ WaitLatch(volatile Latch *latch, long timeout)
8888
}
8989

9090
int
91-
WaitLatchOrSocket(volatileLatch*latch,SOCKETsock,boolforRead,
91+
WaitLatchOrSocket(volatileLatch*latch,pgsocketsock,boolforRead,
9292
boolforWrite,longtimeout)
9393
{
9494
DWORDrc;
@@ -98,6 +98,9 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
9898
intnumevents;
9999
intresult=0;
100100

101+
if (latch->owner_pid!=MyProcPid)
102+
elog(ERROR,"cannot wait on a latch owned by another process");
103+
101104
latchevent=latch->event;
102105

103106
events[0]=latchevent;
@@ -187,15 +190,10 @@ SetLatch(volatile Latch *latch)
187190

188191
/*
189192
* See if anyone's waiting for the latch. It can be the current process if
190-
* we're in a signal handler. Use a local variable here in case the latch
191-
* is just disowned between the test and the SetEvent call, and event
192-
* field set to NULL.
193+
* we're in a signal handler.
193194
*
194-
* Fetch handle field only once, in case the owner simultaneously disowns
195-
* the latch and clears handle. This assumes that HANDLE is atomic, which
196-
* isn't guaranteed to be true! In practice, it should be, and in the
197-
* worst case we end up calling SetEvent with a bogus handle, and SetEvent
198-
* will return an error with no harm done.
195+
* Use a local variable here just in case somebody changes the event field
196+
* concurrently (which really should not happen).
199197
*/
200198
handle=latch->event;
201199
if (handle)
@@ -212,5 +210,8 @@ SetLatch(volatile Latch *latch)
212210
void
213211
ResetLatch(volatileLatch*latch)
214212
{
213+
/* Only the owner should reset the latch */
214+
Assert(latch->owner_pid==MyProcPid);
215+
215216
latch->is_set= false;
216217
}

‎src/backend/replication/syncrep.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -166,13 +166,6 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
166166
{
167167
intsyncRepState;
168168

169-
/*
170-
* Wait on latch for up to 60 seconds. This allows us to check for
171-
* postmaster death regularly while waiting. Note that timeout here
172-
* does not necessarily release from loop.
173-
*/
174-
WaitLatch(&MyProc->waitLatch,60000000L);
175-
176169
/* Must reset the latch before testing state. */
177170
ResetLatch(&MyProc->waitLatch);
178171

@@ -184,6 +177,12 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
184177
* walsender changes the state to SYNC_REP_WAIT_COMPLETE, it will
185178
* never update it again, so we can't be seeing a stale value in that
186179
* case.
180+
*
181+
* Note: on machines with weak memory ordering, the acquisition of
182+
* the lock is essential to avoid race conditions: we cannot be sure
183+
* the sender's state update has reached main memory until we acquire
184+
* the lock. We could get rid of this dance if SetLatch/ResetLatch
185+
* contained memory barriers.
187186
*/
188187
syncRepState=MyProc->syncRepState;
189188
if (syncRepState==SYNC_REP_WAITING)
@@ -246,6 +245,13 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
246245
SyncRepCancelWait();
247246
break;
248247
}
248+
249+
/*
250+
* Wait on latch for up to 60 seconds. This allows us to check for
251+
* cancel/die signal or postmaster death regularly while waiting. Note
252+
* that timeout here does not necessarily release from loop.
253+
*/
254+
WaitLatch(&MyProc->waitLatch,60000000L);
249255
}
250256

251257
/*

‎src/backend/storage/lmgr/proc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ InitProcess(void)
338338
MyProc->waitLSN.xrecoff=0;
339339
MyProc->syncRepState=SYNC_REP_NOT_WAITING;
340340
SHMQueueElemInit(&(MyProc->syncRepLinks));
341-
OwnLatch((Latch*)&MyProc->waitLatch);
341+
OwnLatch(&MyProc->waitLatch);
342342

343343
/*
344344
* We might be reusing a semaphore that belonged to a failed process. So

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp