Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7e7c5b6

Browse files
committed
Fix pg_dump for hash partitioning on enum columns.
Hash partitioning on an enum is problematic because the hash codes arederived from the OIDs assigned to the enum values, which will almostcertainly be different after a dump-and-reload than they were before.This means that some rows probably end up in different partitions thanbefore, causing restore to fail because of partition constraintviolations. (pg_upgrade dodges this problem by using hacks to forcethe enum values to keep the same OIDs, but that's not possible nordesirable for pg_dump.)Users can work around that by specifying --load-via-partition-root,but since that's a dump-time not restore-time decision, one mightfind out the need for it far too late. Instead, teach pg_dump toapply that option automatically when dealing with a partitionedtable that has hash-on-enum partitioning.Also deal with a pre-existing issue for --load-via-partition-rootmode: in a parallel restore, we try to TRUNCATE target tables justbefore loading them, in order to enable some backend optimizations.This is bad when using --load-via-partition-root because (a) we'relikely to suffer deadlocks from restore jobs trying to restore rowsinto other partitions than they came from, and (b) if we miss gettinga deadlock we might still lose data due to a TRUNCATE removing rowsfrom some already-completed restore job.The fix for this is conceptually simple: just don't TRUNCATE if we'redealing with a --load-via-partition-root case. The tricky bit is forpg_restore to identify those cases. In dumps using COPY commands wecan inspect each COPY command to see if it targets the nominal targettable or some ancestor. However, in dumps using INSERT commands it'spretty impractical to examine the INSERTs in advance. To provide asolution for that going forward, modify pg_dump to mark TABLE DATAitems that are using --load-via-partition-root with a comment.(This change also responds to a complaint from Robert Haas thatthe dump output for --load-via-partition-root is pretty confusing.)pg_restore checks for the special comment as well as checking theCOPY command if present. This will fail to identify the combinationof --load-via-partition-root and --inserts in pre-existing dump files,but that should be a pretty rare case in the field. If it doeshappen you will probably get a deadlock failure that you can workaround by not using parallel restore, which is the same as beforethis bug fix.Having done this, there seems no remaining reason for the alarmismin the pg_dump man page about combining --load-via-partition-rootwith parallel restore, so remove that warning.Patch by me; thanks to Julien Rouhaud for review. Back-patch tov11 where hash partitioning was introduced.Discussion:https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
1 parentee88003 commit7e7c5b6

File tree

7 files changed

+287
-52
lines changed

7 files changed

+287
-52
lines changed

‎doc/src/sgml/ref/pg_dump.sgml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -834,16 +834,6 @@ PostgreSQL documentation
834834
and the two systems have different definitions of the collation used
835835
to sort the partitioning column.
836836
</para>
837-
838-
<para>
839-
It is best not to use parallelism when restoring from an archive made
840-
with this option, because <application>pg_restore</application> will
841-
not know exactly which partition(s) a given archive data item will
842-
load data into. This could result in inefficiency due to lock
843-
conflicts between parallel jobs, or perhaps even restore failures due
844-
to foreign key constraints being set up before all the relevant data
845-
is loaded.
846-
</para>
847837
</listitem>
848838
</varlistentry>
849839

‎doc/src/sgml/ref/pg_dumpall.sgml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -358,10 +358,6 @@ PostgreSQL documentation
358358
and the two systems have different definitions of the collation used
359359
to sort the partitioning column.
360360
</para>
361-
362-
<!-- Currently, we don't need pg_dump's warning about parallelism here,
363-
since parallel restore from a pg_dumpall script is impossible.
364-
-->
365361
</listitem>
366362
</varlistentry>
367363

‎src/bin/pg_dump/common.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
228228
pg_log_info("flagging inherited columns in subtables");
229229
flagInhAttrs(fout->dopt,tblinfo,numTables);
230230

231+
pg_log_info("reading partitioning data");
232+
getPartitioningInfo(fout);
233+
231234
pg_log_info("reading indexes");
232235
getIndexes(fout,tblinfo,numTables);
233236

@@ -278,7 +281,6 @@ static void
278281
flagInhTables(Archive*fout,TableInfo*tblinfo,intnumTables,
279282
InhInfo*inhinfo,intnumInherits)
280283
{
281-
DumpOptions*dopt=fout->dopt;
282284
inti,
283285
j;
284286

@@ -294,18 +296,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
294296
continue;
295297

296298
/*
297-
* Normally, we don't bother computing anything for non-target tables,
298-
*but if load-via-partition-root is specified, we gather information
299-
*on every partition in the systemso thatgetRootTableInfo can trace
300-
*from any given toleafpartition all the way up to the root.(We
301-
*don't need tomarkthem as interesting for getTableAttrs, though.)
299+
* Normally, we don't bother computing anything for non-target tables.
300+
*However, we must find the parents of non-root partitioned tables in
301+
*any case,so thatwe can trace from leaf partitions up to the root
302+
*(in case aleafis to be dumped but its parents are not).We need
303+
*notmarksuch parents interesting for getTableAttrs, though.
302304
*/
303305
if (!tblinfo[i].dobj.dump)
304306
{
305307
mark_parents= false;
306308

307-
if (!dopt->load_via_partition_root||
308-
!tblinfo[i].ispartition)
309+
if (!(tblinfo[i].relkind==RELKIND_PARTITIONED_TABLE&&
310+
tblinfo[i].ispartition))
309311
find_parents= false;
310312
}
311313

‎src/bin/pg_dump/pg_backup_archiver.c

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ static RestorePass _tocEntryRestorePass(TocEntry *te);
9090
staticbool_tocEntryIsACL(TocEntry*te);
9191
staticvoid_disableTriggersIfNecessary(ArchiveHandle*AH,TocEntry*te);
9292
staticvoid_enableTriggersIfNecessary(ArchiveHandle*AH,TocEntry*te);
93+
staticboolis_load_via_partition_root(TocEntry*te);
9394
staticvoidbuildTocEntryArrays(ArchiveHandle*AH);
9495
staticvoid_moveBefore(ArchiveHandle*AH,TocEntry*pos,TocEntry*te);
9596
staticint_discoverArchiveFormat(ArchiveHandle*AH);
@@ -878,6 +879,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
878879
}
879880
else
880881
{
882+
booluse_truncate;
883+
881884
_disableTriggersIfNecessary(AH,te);
882885

883886
/* Select owner and schema as necessary */
@@ -889,13 +892,24 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
889892

890893
/*
891894
* In parallel restore, if we created the table earlier in
892-
* the run then we wrap the COPY in a transaction and
893-
* precede it with a TRUNCATE. If archiving is not on
894-
* this prevents WAL-logging the COPY. This obtains a
895-
* speedup similar to that from using single_txn mode in
896-
* non-parallel restores.
895+
* this run (so that we know it is empty) and we are not
896+
* restoring a load-via-partition-root data item then we
897+
* wrap the COPY in a transaction and precede it with a
898+
* TRUNCATE. If wal_level is set to minimal this prevents
899+
* WAL-logging the COPY. This obtains a speedup similar
900+
* to that from using single_txn mode in non-parallel
901+
* restores.
902+
*
903+
* We mustn't do this for load-via-partition-root cases
904+
* because some data might get moved across partition
905+
* boundaries, risking deadlock and/or loss of previously
906+
* loaded data. (We assume that all partitions of a
907+
* partitioned table will be treated the same way.)
897908
*/
898-
if (is_parallel&&te->created)
909+
use_truncate=is_parallel&&te->created&&
910+
!is_load_via_partition_root(te);
911+
912+
if (use_truncate)
899913
{
900914
/*
901915
* Parallel restore is always talking directly to a
@@ -936,7 +950,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
936950
AH->outputKind=OUTPUT_SQLCMDS;
937951

938952
/* close out the transaction started above */
939-
if (is_parallel&&te->created)
953+
if (use_truncate)
940954
CommitTransaction(&AH->public);
941955

942956
_enableTriggersIfNecessary(AH,te);
@@ -1028,6 +1042,43 @@ _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te)
10281042
fmtQualifiedId(te->namespace,te->tag));
10291043
}
10301044

1045+
/*
1046+
* Detect whether a TABLE DATA TOC item is performing "load via partition
1047+
* root", that is the target table is an ancestor partition rather than the
1048+
* table the TOC item is nominally for.
1049+
*
1050+
* In newer archive files this can be detected by checking for a special
1051+
* comment placed in te->defn. In older files we have to fall back to seeing
1052+
* if the COPY statement targets the named table or some other one. This
1053+
* will not work for data dumped as INSERT commands, so we could give a false
1054+
* negative in that case; fortunately, that's a rarely-used option.
1055+
*/
1056+
staticbool
1057+
is_load_via_partition_root(TocEntry*te)
1058+
{
1059+
if (te->defn&&
1060+
strncmp(te->defn,"-- load via partition root ",27)==0)
1061+
return true;
1062+
if (te->copyStmt&&*te->copyStmt)
1063+
{
1064+
PQExpBuffercopyStmt=createPQExpBuffer();
1065+
boolresult;
1066+
1067+
/*
1068+
* Build the initial part of the COPY as it would appear if the
1069+
* nominal target table is the actual target. If we see anything
1070+
* else, it must be a load-via-partition-root case.
1071+
*/
1072+
appendPQExpBuffer(copyStmt,"COPY %s ",
1073+
fmtQualifiedId(te->namespace,te->tag));
1074+
result=strncmp(te->copyStmt,copyStmt->data,copyStmt->len)!=0;
1075+
destroyPQExpBuffer(copyStmt);
1076+
returnresult;
1077+
}
1078+
/* Assume it's not load-via-partition-root */
1079+
return false;
1080+
}
1081+
10311082
/*
10321083
* This is a routine that is part of the dumper interface, hence the 'Archive*' parameter.
10331084
*/
@@ -2965,8 +3016,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
29653016
res=res& ~REQ_DATA;
29663017
}
29673018

2968-
/* If there's no definition command, there's no schema component */
2969-
if (!te->defn|| !te->defn[0])
3019+
/*
3020+
* If there's no definition command, there's no schema component. Treat
3021+
* "load via partition root" comments as not schema.
3022+
*/
3023+
if (!te->defn|| !te->defn[0]||
3024+
strncmp(te->defn,"-- load via partition root ",27)==0)
29703025
res=res& ~REQ_SCHEMA;
29713026

29723027
/*

‎src/bin/pg_dump/pg_dump.c

Lines changed: 130 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
291291
static char *get_synchronized_snapshot(Archive *fout);
292292
static void setupDumpWorker(Archive *AHX);
293293
static TableInfo *getRootTableInfo(TableInfo *tbinfo);
294+
static bool forcePartitionRootLoad(const TableInfo *tbinfo);
294295

295296

296297
int
@@ -2075,11 +2076,13 @@ dumpTableData_insert(Archive *fout, void *dcontext)
20752076
insertStmt = createPQExpBuffer();
20762077

20772078
/*
2078-
* When load-via-partition-root is set, get the root table name
2079-
* for the partition table, so that we can reload data through the
2080-
* root table.
2079+
* When load-via-partition-root is set or forced, get the root
2080+
*table namefor the partition table, so that we can reload data
2081+
*through theroot table.
20812082
*/
2082-
if (dopt->load_via_partition_root && tbinfo->ispartition)
2083+
if (tbinfo->ispartition &&
2084+
(dopt->load_via_partition_root ||
2085+
forcePartitionRootLoad(tbinfo)))
20832086
targettab = getRootTableInfo(tbinfo);
20842087
else
20852088
targettab = tbinfo;
@@ -2277,6 +2280,35 @@ getRootTableInfo(TableInfo *tbinfo)
22772280
return parentTbinfo;
22782281
}
22792282

2283+
/*
2284+
* forcePartitionRootLoad
2285+
* Check if we must force load_via_partition_root for this partition.
2286+
*
2287+
* This is required if any level of ancestral partitioned table has an
2288+
* unsafe partitioning scheme.
2289+
*/
2290+
static bool
2291+
forcePartitionRootLoad(const TableInfo *tbinfo)
2292+
{
2293+
TableInfo *parentTbinfo;
2294+
2295+
Assert(tbinfo->ispartition);
2296+
Assert(tbinfo->numParents == 1);
2297+
2298+
parentTbinfo = tbinfo->parents[0];
2299+
if (parentTbinfo->unsafe_partitions)
2300+
return true;
2301+
while (parentTbinfo->ispartition)
2302+
{
2303+
Assert(parentTbinfo->numParents == 1);
2304+
parentTbinfo = parentTbinfo->parents[0];
2305+
if (parentTbinfo->unsafe_partitions)
2306+
return true;
2307+
}
2308+
2309+
return false;
2310+
}
2311+
22802312
/*
22812313
* dumpTableData -
22822314
* dump the contents of a single table
@@ -2291,34 +2323,40 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
22912323
PQExpBuffer copyBuf = createPQExpBuffer();
22922324
PQExpBuffer clistBuf = createPQExpBuffer();
22932325
DataDumperPtr dumpFn;
2326+
char *tdDefn = NULL;
22942327
char *copyStmt;
22952328
const char *copyFrom;
22962329

22972330
/* We had better have loaded per-column details about this table */
22982331
Assert(tbinfo->interesting);
22992332

2333+
/*
2334+
* When load-via-partition-root is set or forced, get the root table name
2335+
* for the partition table, so that we can reload data through the root
2336+
* table. Then construct a comment to be inserted into the TOC entry's
2337+
* defn field, so that such cases can be identified reliably.
2338+
*/
2339+
if (tbinfo->ispartition &&
2340+
(dopt->load_via_partition_root ||
2341+
forcePartitionRootLoad(tbinfo)))
2342+
{
2343+
TableInfo *parentTbinfo;
2344+
2345+
parentTbinfo = getRootTableInfo(tbinfo);
2346+
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2347+
printfPQExpBuffer(copyBuf, "-- load via partition root %s",
2348+
copyFrom);
2349+
tdDefn = pg_strdup(copyBuf->data);
2350+
}
2351+
else
2352+
copyFrom = fmtQualifiedDumpable(tbinfo);
2353+
23002354
if (!dopt->dump_inserts)
23012355
{
23022356
/* Dump/restore using COPY */
23032357
dumpFn = dumpTableData_copy;
2304-
2305-
/*
2306-
* When load-via-partition-root is set, get the root table name for
2307-
* the partition table, so that we can reload data through the root
2308-
* table.
2309-
*/
2310-
if (dopt->load_via_partition_root && tbinfo->ispartition)
2311-
{
2312-
TableInfo *parentTbinfo;
2313-
2314-
parentTbinfo = getRootTableInfo(tbinfo);
2315-
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2316-
}
2317-
else
2318-
copyFrom = fmtQualifiedDumpable(tbinfo);
2319-
23202358
/* must use 2 steps here 'cause fmtId is nonreentrant */
2321-
appendPQExpBuffer(copyBuf, "COPY %s ",
2359+
printfPQExpBuffer(copyBuf, "COPY %s ",
23222360
copyFrom);
23232361
appendPQExpBuffer(copyBuf, "%s FROM stdin;\n",
23242362
fmtCopyColumnList(tbinfo, clistBuf));
@@ -2346,6 +2384,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
23462384
.owner = tbinfo->rolname,
23472385
.description = "TABLE DATA",
23482386
.section = SECTION_DATA,
2387+
.createStmt = tdDefn,
23492388
.copyStmt = copyStmt,
23502389
.deps = &(tbinfo->dobj.dumpId),
23512390
.nDeps = 1,
@@ -7007,6 +7046,76 @@ getInherits(Archive *fout, int *numInherits)
70077046
return inhinfo;
70087047
}
70097048

7049+
/*
7050+
* getPartitioningInfo
7051+
* get information about partitioning
7052+
*
7053+
* For the most part, we only collect partitioning info about tables we
7054+
* intend to dump. However, this function has to consider all partitioned
7055+
* tables in the database, because we need to know about parents of partitions
7056+
* we are going to dump even if the parents themselves won't be dumped.
7057+
*
7058+
* Specifically, what we need to know is whether each partitioned table
7059+
* has an "unsafe" partitioning scheme that requires us to force
7060+
* load-via-partition-root mode for its children. Currently the only case
7061+
* for which we force that is hash partitioning on enum columns, since the
7062+
* hash codes depend on enum value OIDs which won't be replicated across
7063+
* dump-and-reload. There are other cases in which load-via-partition-root
7064+
* might be necessary, but we expect users to cope with them.
7065+
*/
7066+
void
7067+
getPartitioningInfo(Archive *fout)
7068+
{
7069+
PQExpBuffer query;
7070+
PGresult *res;
7071+
intntups;
7072+
7073+
/* hash partitioning didn't exist before v11 */
7074+
if (fout->remoteVersion < 110000)
7075+
return;
7076+
/* needn't bother if schema-only dump */
7077+
if (fout->dopt->schemaOnly)
7078+
return;
7079+
7080+
query = createPQExpBuffer();
7081+
7082+
/*
7083+
* Unsafe partitioning schemes are exactly those for which hash enum_ops
7084+
* appears among the partition opclasses. We needn't check partstrat.
7085+
*
7086+
* Note that this query may well retrieve info about tables we aren't
7087+
* going to dump and hence have no lock on. That's okay since we need not
7088+
* invoke any unsafe server-side functions.
7089+
*/
7090+
appendPQExpBufferStr(query,
7091+
"SELECT partrelid FROM pg_partitioned_table WHERE\n"
7092+
"(SELECT c.oid FROM pg_opclass c JOIN pg_am a "
7093+
"ON c.opcmethod = a.oid\n"
7094+
"WHERE opcname = 'enum_ops' "
7095+
"AND opcnamespace = 'pg_catalog'::regnamespace "
7096+
"AND amname = 'hash') = ANY(partclass)");
7097+
7098+
res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
7099+
7100+
ntups = PQntuples(res);
7101+
7102+
for (int i = 0; i < ntups; i++)
7103+
{
7104+
Oidtabrelid = atooid(PQgetvalue(res, i, 0));
7105+
TableInfo *tbinfo;
7106+
7107+
tbinfo = findTableByOid(tabrelid);
7108+
if (tbinfo == NULL)
7109+
fatal("failed sanity check, table OID %u appearing in pg_partitioned_table not found",
7110+
tabrelid);
7111+
tbinfo->unsafe_partitions = true;
7112+
}
7113+
7114+
PQclear(res);
7115+
7116+
destroyPQExpBuffer(query);
7117+
}
7118+
70107119
/*
70117120
* getIndexes
70127121
* get information about every index on a dumpable table

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp