Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8f83ce8

Browse files
committed
Fix pg_dump for hash partitioning on enum columns.
Hash partitioning on an enum is problematic because the hash codes arederived from the OIDs assigned to the enum values, which will almostcertainly be different after a dump-and-reload than they were before.This means that some rows probably end up in different partitions thanbefore, causing restore to fail because of partition constraintviolations. (pg_upgrade dodges this problem by using hacks to forcethe enum values to keep the same OIDs, but that's not possible nordesirable for pg_dump.)Users can work around that by specifying --load-via-partition-root,but since that's a dump-time not restore-time decision, one mightfind out the need for it far too late. Instead, teach pg_dump toapply that option automatically when dealing with a partitionedtable that has hash-on-enum partitioning.Also deal with a pre-existing issue for --load-via-partition-rootmode: in a parallel restore, we try to TRUNCATE target tables justbefore loading them, in order to enable some backend optimizations.This is bad when using --load-via-partition-root because (a) we'relikely to suffer deadlocks from restore jobs trying to restore rowsinto other partitions than they came from, and (b) if we miss gettinga deadlock we might still lose data due to a TRUNCATE removing rowsfrom some already-completed restore job.The fix for this is conceptually simple: just don't TRUNCATE if we'redealing with a --load-via-partition-root case. The tricky bit is forpg_restore to identify those cases. In dumps using COPY commands wecan inspect each COPY command to see if it targets the nominal targettable or some ancestor. However, in dumps using INSERT commands it'spretty impractical to examine the INSERTs in advance. To provide asolution for that going forward, modify pg_dump to mark TABLE DATAitems that are using --load-via-partition-root with a comment.(This change also responds to a complaint from Robert Haas thatthe dump output for --load-via-partition-root is pretty confusing.)pg_restore checks for the special comment as well as checking theCOPY command if present. This will fail to identify the combinationof --load-via-partition-root and --inserts in pre-existing dump files,but that should be a pretty rare case in the field. If it doeshappen you will probably get a deadlock failure that you can workaround by not using parallel restore, which is the same as beforethis bug fix.Having done this, there seems no remaining reason for the alarmismin the pg_dump man page about combining --load-via-partition-rootwith parallel restore, so remove that warning.Patch by me; thanks to Julien Rouhaud for review. Back-patch tov11 where hash partitioning was introduced.Discussion:https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
1 parentdbe926b commit8f83ce8

File tree

7 files changed

+287
-52
lines changed

7 files changed

+287
-52
lines changed

‎doc/src/sgml/ref/pg_dump.sgml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -800,16 +800,6 @@ PostgreSQL documentation
800800
and the two systems have different definitions of the collation used
801801
to sort the partitioning column.
802802
</para>
803-
804-
<para>
805-
It is best not to use parallelism when restoring from an archive made
806-
with this option, because <application>pg_restore</application> will
807-
not know exactly which partition(s) a given archive data item will
808-
load data into. This could result in inefficiency due to lock
809-
conflicts between parallel jobs, or perhaps even restore failures due
810-
to foreign key constraints being set up before all the relevant data
811-
is loaded.
812-
</para>
813803
</listitem>
814804
</varlistentry>
815805

‎doc/src/sgml/ref/pg_dumpall.sgml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -360,10 +360,6 @@ PostgreSQL documentation
360360
and the two systems have different definitions of the collation used
361361
to sort the partitioning column.
362362
</para>
363-
364-
<!-- Currently, we don't need pg_dump's warning about parallelism here,
365-
since parallel restore from a pg_dumpall script is impossible.
366-
-->
367363
</listitem>
368364
</varlistentry>
369365

‎src/bin/pg_dump/common.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
230230
pg_log_info("flagging inherited columns in subtables");
231231
flagInhAttrs(fout->dopt,tblinfo,numTables);
232232

233+
pg_log_info("reading partitioning data");
234+
getPartitioningInfo(fout);
235+
233236
pg_log_info("reading indexes");
234237
getIndexes(fout,tblinfo,numTables);
235238

@@ -280,7 +283,6 @@ static void
280283
flagInhTables(Archive*fout,TableInfo*tblinfo,intnumTables,
281284
InhInfo*inhinfo,intnumInherits)
282285
{
283-
DumpOptions*dopt=fout->dopt;
284286
inti,
285287
j;
286288

@@ -296,18 +298,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
296298
continue;
297299

298300
/*
299-
* Normally, we don't bother computing anything for non-target tables,
300-
*but if load-via-partition-root is specified, we gather information
301-
*on every partition in the systemso thatgetRootTableInfo can trace
302-
*from any given toleafpartition all the way up to the root.(We
303-
*don't need tomarkthem as interesting for getTableAttrs, though.)
301+
* Normally, we don't bother computing anything for non-target tables.
302+
*However, we must find the parents of non-root partitioned tables in
303+
*any case,so thatwe can trace from leaf partitions up to the root
304+
*(in case aleafis to be dumped but its parents are not).We need
305+
*notmarksuch parents interesting for getTableAttrs, though.
304306
*/
305307
if (!tblinfo[i].dobj.dump)
306308
{
307309
mark_parents= false;
308310

309-
if (!dopt->load_via_partition_root||
310-
!tblinfo[i].ispartition)
311+
if (!(tblinfo[i].relkind==RELKIND_PARTITIONED_TABLE&&
312+
tblinfo[i].ispartition))
311313
find_parents= false;
312314
}
313315

‎src/bin/pg_dump/pg_backup_archiver.c

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ static RestorePass _tocEntryRestorePass(TocEntry *te);
9191
staticbool_tocEntryIsACL(TocEntry*te);
9292
staticvoid_disableTriggersIfNecessary(ArchiveHandle*AH,TocEntry*te);
9393
staticvoid_enableTriggersIfNecessary(ArchiveHandle*AH,TocEntry*te);
94+
staticboolis_load_via_partition_root(TocEntry*te);
9495
staticvoidbuildTocEntryArrays(ArchiveHandle*AH);
9596
staticvoid_moveBefore(ArchiveHandle*AH,TocEntry*pos,TocEntry*te);
9697
staticint_discoverArchiveFormat(ArchiveHandle*AH);
@@ -879,6 +880,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
879880
}
880881
else
881882
{
883+
booluse_truncate;
884+
882885
_disableTriggersIfNecessary(AH,te);
883886

884887
/* Select owner and schema as necessary */
@@ -890,13 +893,24 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
890893

891894
/*
892895
* In parallel restore, if we created the table earlier in
893-
* the run then we wrap the COPY in a transaction and
894-
* precede it with a TRUNCATE. If archiving is not on
895-
* this prevents WAL-logging the COPY. This obtains a
896-
* speedup similar to that from using single_txn mode in
897-
* non-parallel restores.
896+
* this run (so that we know it is empty) and we are not
897+
* restoring a load-via-partition-root data item then we
898+
* wrap the COPY in a transaction and precede it with a
899+
* TRUNCATE. If wal_level is set to minimal this prevents
900+
* WAL-logging the COPY. This obtains a speedup similar
901+
* to that from using single_txn mode in non-parallel
902+
* restores.
903+
*
904+
* We mustn't do this for load-via-partition-root cases
905+
* because some data might get moved across partition
906+
* boundaries, risking deadlock and/or loss of previously
907+
* loaded data. (We assume that all partitions of a
908+
* partitioned table will be treated the same way.)
898909
*/
899-
if (is_parallel&&te->created)
910+
use_truncate=is_parallel&&te->created&&
911+
!is_load_via_partition_root(te);
912+
913+
if (use_truncate)
900914
{
901915
/*
902916
* Parallel restore is always talking directly to a
@@ -937,7 +951,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
937951
AH->outputKind=OUTPUT_SQLCMDS;
938952

939953
/* close out the transaction started above */
940-
if (is_parallel&&te->created)
954+
if (use_truncate)
941955
CommitTransaction(&AH->public);
942956

943957
_enableTriggersIfNecessary(AH,te);
@@ -1029,6 +1043,43 @@ _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te)
10291043
fmtQualifiedId(te->namespace,te->tag));
10301044
}
10311045

1046+
/*
1047+
* Detect whether a TABLE DATA TOC item is performing "load via partition
1048+
* root", that is the target table is an ancestor partition rather than the
1049+
* table the TOC item is nominally for.
1050+
*
1051+
* In newer archive files this can be detected by checking for a special
1052+
* comment placed in te->defn. In older files we have to fall back to seeing
1053+
* if the COPY statement targets the named table or some other one. This
1054+
* will not work for data dumped as INSERT commands, so we could give a false
1055+
* negative in that case; fortunately, that's a rarely-used option.
1056+
*/
1057+
staticbool
1058+
is_load_via_partition_root(TocEntry*te)
1059+
{
1060+
if (te->defn&&
1061+
strncmp(te->defn,"-- load via partition root ",27)==0)
1062+
return true;
1063+
if (te->copyStmt&&*te->copyStmt)
1064+
{
1065+
PQExpBuffercopyStmt=createPQExpBuffer();
1066+
boolresult;
1067+
1068+
/*
1069+
* Build the initial part of the COPY as it would appear if the
1070+
* nominal target table is the actual target. If we see anything
1071+
* else, it must be a load-via-partition-root case.
1072+
*/
1073+
appendPQExpBuffer(copyStmt,"COPY %s ",
1074+
fmtQualifiedId(te->namespace,te->tag));
1075+
result=strncmp(te->copyStmt,copyStmt->data,copyStmt->len)!=0;
1076+
destroyPQExpBuffer(copyStmt);
1077+
returnresult;
1078+
}
1079+
/* Assume it's not load-via-partition-root */
1080+
return false;
1081+
}
1082+
10321083
/*
10331084
* This is a routine that is part of the dumper interface, hence the 'Archive*' parameter.
10341085
*/
@@ -2971,8 +3022,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
29713022
res=res& ~REQ_DATA;
29723023
}
29733024

2974-
/* If there's no definition command, there's no schema component */
2975-
if (!te->defn|| !te->defn[0])
3025+
/*
3026+
* If there's no definition command, there's no schema component. Treat
3027+
* "load via partition root" comments as not schema.
3028+
*/
3029+
if (!te->defn|| !te->defn[0]||
3030+
strncmp(te->defn,"-- load via partition root ",27)==0)
29763031
res=res& ~REQ_SCHEMA;
29773032

29783033
/*

‎src/bin/pg_dump/pg_dump.c

Lines changed: 130 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
297297
static char *get_synchronized_snapshot(Archive *fout);
298298
static void setupDumpWorker(Archive *AHX);
299299
static TableInfo *getRootTableInfo(TableInfo *tbinfo);
300+
static bool forcePartitionRootLoad(const TableInfo *tbinfo);
300301

301302

302303
int
@@ -2016,11 +2017,13 @@ dumpTableData_insert(Archive *fout, void *dcontext)
20162017
insertStmt = createPQExpBuffer();
20172018

20182019
/*
2019-
* When load-via-partition-root is set, get the root table name
2020-
* for the partition table, so that we can reload data through the
2021-
* root table.
2020+
* When load-via-partition-root is set or forced, get the root
2021+
*table namefor the partition table, so that we can reload data
2022+
*through theroot table.
20222023
*/
2023-
if (dopt->load_via_partition_root && tbinfo->ispartition)
2024+
if (tbinfo->ispartition &&
2025+
(dopt->load_via_partition_root ||
2026+
forcePartitionRootLoad(tbinfo)))
20242027
targettab = getRootTableInfo(tbinfo);
20252028
else
20262029
targettab = tbinfo;
@@ -2218,6 +2221,35 @@ getRootTableInfo(TableInfo *tbinfo)
22182221
return parentTbinfo;
22192222
}
22202223

2224+
/*
2225+
* forcePartitionRootLoad
2226+
* Check if we must force load_via_partition_root for this partition.
2227+
*
2228+
* This is required if any level of ancestral partitioned table has an
2229+
* unsafe partitioning scheme.
2230+
*/
2231+
static bool
2232+
forcePartitionRootLoad(const TableInfo *tbinfo)
2233+
{
2234+
TableInfo *parentTbinfo;
2235+
2236+
Assert(tbinfo->ispartition);
2237+
Assert(tbinfo->numParents == 1);
2238+
2239+
parentTbinfo = tbinfo->parents[0];
2240+
if (parentTbinfo->unsafe_partitions)
2241+
return true;
2242+
while (parentTbinfo->ispartition)
2243+
{
2244+
Assert(parentTbinfo->numParents == 1);
2245+
parentTbinfo = parentTbinfo->parents[0];
2246+
if (parentTbinfo->unsafe_partitions)
2247+
return true;
2248+
}
2249+
2250+
return false;
2251+
}
2252+
22212253
/*
22222254
* dumpTableData -
22232255
* dump the contents of a single table
@@ -2232,34 +2264,40 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
22322264
PQExpBuffer copyBuf = createPQExpBuffer();
22332265
PQExpBuffer clistBuf = createPQExpBuffer();
22342266
DataDumperPtr dumpFn;
2267+
char *tdDefn = NULL;
22352268
char *copyStmt;
22362269
const char *copyFrom;
22372270

22382271
/* We had better have loaded per-column details about this table */
22392272
Assert(tbinfo->interesting);
22402273

2274+
/*
2275+
* When load-via-partition-root is set or forced, get the root table name
2276+
* for the partition table, so that we can reload data through the root
2277+
* table. Then construct a comment to be inserted into the TOC entry's
2278+
* defn field, so that such cases can be identified reliably.
2279+
*/
2280+
if (tbinfo->ispartition &&
2281+
(dopt->load_via_partition_root ||
2282+
forcePartitionRootLoad(tbinfo)))
2283+
{
2284+
TableInfo *parentTbinfo;
2285+
2286+
parentTbinfo = getRootTableInfo(tbinfo);
2287+
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2288+
printfPQExpBuffer(copyBuf, "-- load via partition root %s",
2289+
copyFrom);
2290+
tdDefn = pg_strdup(copyBuf->data);
2291+
}
2292+
else
2293+
copyFrom = fmtQualifiedDumpable(tbinfo);
2294+
22412295
if (!dopt->dump_inserts)
22422296
{
22432297
/* Dump/restore using COPY */
22442298
dumpFn = dumpTableData_copy;
2245-
2246-
/*
2247-
* When load-via-partition-root is set, get the root table name for
2248-
* the partition table, so that we can reload data through the root
2249-
* table.
2250-
*/
2251-
if (dopt->load_via_partition_root && tbinfo->ispartition)
2252-
{
2253-
TableInfo *parentTbinfo;
2254-
2255-
parentTbinfo = getRootTableInfo(tbinfo);
2256-
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2257-
}
2258-
else
2259-
copyFrom = fmtQualifiedDumpable(tbinfo);
2260-
22612299
/* must use 2 steps here 'cause fmtId is nonreentrant */
2262-
appendPQExpBuffer(copyBuf, "COPY %s ",
2300+
printfPQExpBuffer(copyBuf, "COPY %s ",
22632301
copyFrom);
22642302
appendPQExpBuffer(copyBuf, "%s FROM stdin;\n",
22652303
fmtCopyColumnList(tbinfo, clistBuf));
@@ -2287,6 +2325,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
22872325
.owner = tbinfo->rolname,
22882326
.description = "TABLE DATA",
22892327
.section = SECTION_DATA,
2328+
.createStmt = tdDefn,
22902329
.copyStmt = copyStmt,
22912330
.deps = &(tbinfo->dobj.dumpId),
22922331
.nDeps = 1,
@@ -6899,6 +6938,76 @@ getInherits(Archive *fout, int *numInherits)
68996938
return inhinfo;
69006939
}
69016940

6941+
/*
6942+
* getPartitioningInfo
6943+
* get information about partitioning
6944+
*
6945+
* For the most part, we only collect partitioning info about tables we
6946+
* intend to dump. However, this function has to consider all partitioned
6947+
* tables in the database, because we need to know about parents of partitions
6948+
* we are going to dump even if the parents themselves won't be dumped.
6949+
*
6950+
* Specifically, what we need to know is whether each partitioned table
6951+
* has an "unsafe" partitioning scheme that requires us to force
6952+
* load-via-partition-root mode for its children. Currently the only case
6953+
* for which we force that is hash partitioning on enum columns, since the
6954+
* hash codes depend on enum value OIDs which won't be replicated across
6955+
* dump-and-reload. There are other cases in which load-via-partition-root
6956+
* might be necessary, but we expect users to cope with them.
6957+
*/
6958+
void
6959+
getPartitioningInfo(Archive *fout)
6960+
{
6961+
PQExpBuffer query;
6962+
PGresult *res;
6963+
intntups;
6964+
6965+
/* hash partitioning didn't exist before v11 */
6966+
if (fout->remoteVersion < 110000)
6967+
return;
6968+
/* needn't bother if schema-only dump */
6969+
if (fout->dopt->schemaOnly)
6970+
return;
6971+
6972+
query = createPQExpBuffer();
6973+
6974+
/*
6975+
* Unsafe partitioning schemes are exactly those for which hash enum_ops
6976+
* appears among the partition opclasses. We needn't check partstrat.
6977+
*
6978+
* Note that this query may well retrieve info about tables we aren't
6979+
* going to dump and hence have no lock on. That's okay since we need not
6980+
* invoke any unsafe server-side functions.
6981+
*/
6982+
appendPQExpBufferStr(query,
6983+
"SELECT partrelid FROM pg_partitioned_table WHERE\n"
6984+
"(SELECT c.oid FROM pg_opclass c JOIN pg_am a "
6985+
"ON c.opcmethod = a.oid\n"
6986+
"WHERE opcname = 'enum_ops' "
6987+
"AND opcnamespace = 'pg_catalog'::regnamespace "
6988+
"AND amname = 'hash') = ANY(partclass)");
6989+
6990+
res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
6991+
6992+
ntups = PQntuples(res);
6993+
6994+
for (int i = 0; i < ntups; i++)
6995+
{
6996+
Oidtabrelid = atooid(PQgetvalue(res, i, 0));
6997+
TableInfo *tbinfo;
6998+
6999+
tbinfo = findTableByOid(tabrelid);
7000+
if (tbinfo == NULL)
7001+
fatal("failed sanity check, table OID %u appearing in pg_partitioned_table not found",
7002+
tabrelid);
7003+
tbinfo->unsafe_partitions = true;
7004+
}
7005+
7006+
PQclear(res);
7007+
7008+
destroyPQExpBuffer(query);
7009+
}
7010+
69027011
/*
69037012
* getIndexes
69047013
* get information about every index on a dumpable table

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp