NotificationsYou must be signed in to change notification settings
Fork6
Star31

Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace#3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

ololobus wants to merge3 commits intopostgrespro:master_ci

base:master_ci

Choose a base branch

fromololobus:tablespace3

Open

Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace#3

ololobus wants to merge3 commits intopostgrespro:master_cifromololobus:tablespace3

Conversation

Copy link

ololobus commentedApr 8, 2020

Test pull-request

ololobusand others added3 commits

March 26, 2020 19:22

Allow REINDEX to change tablespace

507a6ed

REINDEX already does full relation rewrite, this patch adds apossibility to specify a new tablespace where new relfilenodewill be created.

Specially handle toast relations during REINDEX.

7acd35b

Is this fine ?  It says "cannot reindex system catalogs concurrently" (once),and hits the pg_toast tables for information_schema.  Should it skip toastindexes (like it said) ?  Or should it REINDEX them on the same tablespace?template1=# REINDEX DATABASE CONCURRENTLY template1 TABLESPACE pg_default;2020-03-09 15:33:51.792 CDT [6464] WARNING:  cannot reindex system catalogs concurrently, skipping allWARNING:  cannot reindex system catalogs concurrently, skipping all2020-03-09 15:33:51.794 CDT [6464] WARNING:  skipping tablespace change of "pg_toast_12558_index"2020-03-09 15:33:51.794 CDT [6464] DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.WARNING:  skipping tablespace change of "pg_toast_12558_index"DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.2020-03-09 15:33:51.924 CDT [6464] WARNING:  skipping tablespace change of "pg_toast_12543_index"2020-03-09 15:33:51.924 CDT [6464] DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.WARNING:  skipping tablespace change of "pg_toast_12543_index"DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.2020-03-09 15:33:51.982 CDT [6464] WARNING:  skipping tablespace change of "pg_toast_12548_index"2020-03-09 15:33:51.982 CDT [6464] DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.WARNING:  skipping tablespace change of "pg_toast_12548_index"DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.2020-03-09 15:33:52.048 CDT [6464] WARNING:  skipping tablespace change of "pg_toast_12553_index"2020-03-09 15:33:52.048 CDT [6464] DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.WARNING:  skipping tablespace change of "pg_toast_12553_index"DETAIL:  Cannot move system relation, only REINDEX CONCURRENTLY is performed.REINDEX

Allow CLUSTER and VACUUM FULL to change tablespace

692bcad

ololobus force-pushed themaster_ci branch 2 times, most recently from3adf730 toec97e14Compare

April 8, 2020 12:14

ololobus force-pushed themaster_ci branch fromec97e14 toa3fe992Compare

June 9, 2020 10:28

nik1tam pushed a commit that referenced this pull request

Aug 15, 2022

Be more wary about 32-bit integer overflow in pg_stat_statements.

c67c2e2

We've heard a couple of reports of people having trouble withmulti-gigabyte-sized query-texts files. It occurred to me that on32-bit platforms, there could be an issue with integer overflowof calculations associated with the total query text size.Address that with several changes:1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.The hashtable code will bound it to that anyway unless "long"is 64 bits. We still need overflow guards on its use, butthis helps.2. Add a check to prevent extending the query-texts file tomore than MaxAllocHugeSize. If it got that big, qtext_load_filewould certainly fail, so there's not much point in allowing it.Without this, we'd need to consider whether extent, query_offset,and related variables shouldn't be off_t not size_t.3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bitarithmetic on all platforms. It appears possible that under duressthose multiplications could overflow 32 bits, yielding a falseconclusion that we need to garbage-collect the texts file, whichcould lead to repeatedly garbage-collecting after every hash tableinsertion.Per report from Bruno da Silva. I'm not convinced that theseissues fully explain his problem; there may be some other bug that'scontributing to the query-texts file becoming so large in the firstplace. But it did get that big, so#2 is a reasonable defense,and#3 could explain the reported performance difficulties.(See also commit8bbe4cb, which addressed some related bugs.The second Discussion: link is the thread that led up to that.)This issue is old, and is primarily a problem for old platforms,so back-patch.Discussion:https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.comDiscussion:https://postgr.es/m/5601D354.5000703@BlueTreble.com

glukhovn pushed a commit that referenced this pull request

Feb 10, 2023

Be more wary about 32-bit integer overflow in pg_stat_statements.

82ebc70

danolivo pushed a commit that referenced this pull request

Jul 2, 2024

Fix bugs in MultiXact truncation

b1ffe3f

1. TruncateMultiXact() performs the SLRU truncations in a criticalsection. Deleting the SLRU segments calls ForwardSyncRequest(), whichwill try to compact the request queue if it's full(CompactCheckpointerRequestQueue()). That in turn allocates memory,which is not allowed in a critical section. Backtrace:    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]To fix, bail out in CompactCheckpointerRequestQueue() without doinganything, if it's called in a critical section. That covers the abovecall path, as well as any other similar cases whereRegisterSyncRequest might be called in a critical section.2. After fixing that, another problem became apparent: Autovacuumprocess doing that truncation can deadlock with the checkpointerprocess. TruncateMultiXact() sets "MyProc->delayChkptFlags |=DELAY_CHKPT_START". If the sync request queue is full and cannot becompacted, the process will repeatedly sleep and retry, until there isroom in the queue. However, if the checkpointer is trying to start acheckpoint at the same time, and is waiting for the DELAY_CHKPT_STARTprocesses to finish, the queue will never shrink.More concretely, the autovacuum process is stuck here:    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6#1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570#2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)        at ../src/backend/storage/ipc/latch.c:538#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569and the checkpointer is stuck here:    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6#1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6#2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464To fix, add AbsorbSyncRequests() to the loops where the checkpointerwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations tofinish.Backpatch to v14. Before that, SLRU deletion didn't callRegisterSyncRequest, which avoided this failure. I'm not sure if thereare other similar scenarios on older versions, but we haven't hadany such reports.Discussion:https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi

Labels

None yet

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace#3

Are you sure you want to change the base?

Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace#3

Uh oh!

Conversation

ololobus commentedApr 8, 2020

Uh oh!

Uh oh!