Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commiteba917d

Browse files
committed
Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
The async notification queue contains the XID of the sender, and whenprocessing notifications we call TransactionIdDidCommit() on theXID. But we had no safeguards to prevent the CLOG segments containingthose XIDs from being truncated away. As a result, if a backend didn'tfor some reason process its notifications for a long time, or when anew backend issued LISTEN, you could get an error like:test=# listen c21;ERROR: 58P01: could not access status of transaction 14279685DETAIL: Could not open file "pg_xact/000D": No such file or directory.LOCATION: SlruReportIOError, slru.c:1087To fix, make VACUUM "freeze" the XIDs in the async notification queuebefore truncating the CLOG. Old XIDs are replaced withFrozenTransactionId or InvalidTransactionId.Note: This commit is not a full fix. A race condition remains, where abackend is executing asyncQueueReadAllNotifications() and has justmade a local copy of an async SLRU page which contains old XIDs, whilevacuum concurrently truncates the CLOG covering those XIDs. When thebackend then calls TransactionIdDidCommit() on those XIDs from thelocal copy, you still get the error. The next commit will fix thatremaining race condition.This was first reported by Sergey Zhuravlev in 2021, with many otherpeople hitting the same issue later. Thanks to:- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink for investigating and providing reproducable test cases,- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed patches to fix this,- Álvaro Herrera and Masahiko Sawada for reviews,- Yura Sokolov aka funny-falcon for the idea of marking transactions as committed in the notification queue, and- Joel Jacobson for the final patch version. I hope I didn't forget anyone.Backpatch to all supported versions. I believe the bug goes back allthe way to commitd1e0272, which introduced the SLRU-based asyncnotification queue.Discussion:https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.orgDiscussion:https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.orgDiscussion:https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.comBackpatch-through: 14
1 parent7cb05dd commiteba917d

File tree

3 files changed

+121
-0
lines changed

3 files changed

+121
-0
lines changed

‎src/backend/commands/async.c‎

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2244,6 +2244,117 @@ asyncQueueAdvanceTail(void)
22442244
LWLockRelease(NotifyQueueTailLock);
22452245
}
22462246

2247+
/*
2248+
* AsyncNotifyFreezeXids
2249+
*
2250+
* Prepare the async notification queue for CLOG truncation by freezing
2251+
* transaction IDs that are about to become inaccessible.
2252+
*
2253+
* This function is called by VACUUM before advancing datfrozenxid. It scans
2254+
* the notification queue and replaces XIDs that would become inaccessible
2255+
* after CLOG truncation with special markers:
2256+
* - Committed transactions are set to FrozenTransactionId
2257+
* - Aborted/crashed transactions are set to InvalidTransactionId
2258+
*
2259+
* Only XIDs < newFrozenXid are processed, as those are the ones whose CLOG
2260+
* pages will be truncated. If XID < newFrozenXid, it cannot still be running
2261+
* (or it would have held back newFrozenXid through ProcArray).
2262+
* Therefore, if TransactionIdDidCommit returns false, we know the transaction
2263+
* either aborted explicitly or crashed, and we can safely mark it invalid.
2264+
*/
2265+
void
2266+
AsyncNotifyFreezeXids(TransactionIdnewFrozenXid)
2267+
{
2268+
QueuePositionpos;
2269+
QueuePositionhead;
2270+
int64curpage=-1;
2271+
intslotno=-1;
2272+
char*page_buffer=NULL;
2273+
boolpage_dirty= false;
2274+
2275+
/*
2276+
* Acquire locks in the correct order to avoid deadlocks. As per the
2277+
* locking protocol: NotifyQueueTailLock, then NotifyQueueLock, then
2278+
* NotifySLRULock.
2279+
*
2280+
* We only need SHARED mode since we're just reading the head/tail
2281+
* positions, not modifying them.
2282+
*/
2283+
LWLockAcquire(NotifyQueueTailLock,LW_SHARED);
2284+
LWLockAcquire(NotifyQueueLock,LW_SHARED);
2285+
2286+
pos=QUEUE_TAIL;
2287+
head=QUEUE_HEAD;
2288+
2289+
/* Release NotifyQueueLock early, we only needed to read the positions */
2290+
LWLockRelease(NotifyQueueLock);
2291+
2292+
/*
2293+
* Scan the queue from tail to head, freezing XIDs as needed. We hold
2294+
* NotifyQueueTailLock throughout to ensure the tail doesn't move while
2295+
* we're working.
2296+
*/
2297+
while (!QUEUE_POS_EQUAL(pos,head))
2298+
{
2299+
AsyncQueueEntry*qe;
2300+
TransactionIdxid;
2301+
int64pageno=QUEUE_POS_PAGE(pos);
2302+
intoffset=QUEUE_POS_OFFSET(pos);
2303+
2304+
/* If we need a different page, release old lock and get new one */
2305+
if (pageno!=curpage)
2306+
{
2307+
/* Release previous page if any */
2308+
if (slotno >=0)
2309+
{
2310+
if (page_dirty)
2311+
{
2312+
NotifyCtl->shared->page_dirty[slotno]= true;
2313+
page_dirty= false;
2314+
}
2315+
LWLockRelease(NotifySLRULock);
2316+
}
2317+
2318+
LWLockAcquire(NotifySLRULock,LW_EXCLUSIVE);
2319+
slotno=SimpleLruReadPage(NotifyCtl,pageno, true,
2320+
InvalidTransactionId);
2321+
page_buffer=NotifyCtl->shared->page_buffer[slotno];
2322+
curpage=pageno;
2323+
}
2324+
2325+
qe= (AsyncQueueEntry*) (page_buffer+offset);
2326+
xid=qe->xid;
2327+
2328+
if (TransactionIdIsNormal(xid)&&
2329+
TransactionIdPrecedes(xid,newFrozenXid))
2330+
{
2331+
if (TransactionIdDidCommit(xid))
2332+
{
2333+
qe->xid=FrozenTransactionId;
2334+
page_dirty= true;
2335+
}
2336+
else
2337+
{
2338+
qe->xid=InvalidTransactionId;
2339+
page_dirty= true;
2340+
}
2341+
}
2342+
2343+
/* Advance to next entry */
2344+
asyncQueueAdvance(&pos,qe->length);
2345+
}
2346+
2347+
/* Release final page lock if we acquired one */
2348+
if (slotno >=0)
2349+
{
2350+
if (page_dirty)
2351+
NotifyCtl->shared->page_dirty[slotno]= true;
2352+
LWLockRelease(NotifySLRULock);
2353+
}
2354+
2355+
LWLockRelease(NotifyQueueTailLock);
2356+
}
2357+
22472358
/*
22482359
* ProcessIncomingNotify
22492360
*

‎src/backend/commands/vacuum.c‎

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#include"catalog/pg_database.h"
3636
#include"catalog/pg_inherits.h"
3737
#include"catalog/pg_namespace.h"
38+
#include"commands/async.h"
3839
#include"commands/cluster.h"
3940
#include"commands/defrem.h"
4041
#include"commands/vacuum.h"
@@ -1788,6 +1789,12 @@ vac_truncate_clog(TransactionId frozenXID,
17881789
return;
17891790
}
17901791

1792+
/*
1793+
* Freeze any old transaction IDs in the async notification queue before
1794+
* CLOG truncation.
1795+
*/
1796+
AsyncNotifyFreezeXids(frozenXID);
1797+
17911798
/*
17921799
* Advance the oldest value for commit timestamps before truncating, so
17931800
* that if a user requests a timestamp for a transaction we're truncating

‎src/include/commands/async.h‎

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,7 @@ extern void HandleNotifyInterrupt(void);
5151
/* process interrupts */
5252
externvoidProcessNotifyInterrupt(boolflush);
5353

54+
/* freeze old transaction IDs in notify queue (called by VACUUM) */
55+
externvoidAsyncNotifyFreezeXids(TransactionIdnewFrozenXid);
56+
5457
#endif/* ASYNC_H */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp