Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8aba932

Browse files
committed
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitionsthat are marked pending detach -- otherwise, the RI query is trickedinto allowing a row into the referencing table whose corresponding rowis in the detached partition. Which is bogus: once the detach operationcompletes, the row becomes an orphan.However, the code was not doing that in repeatable-read transactions,because relcache kept a copy of the partition descriptor that includedthe partition, and used it in the RI query. This commit changes thepartdesc cache code to only keep descriptors that aren't dependent ona snapshot (namely: those where no detached partition exist, and thosewhere detached partitions are included). When a partdesc-without-detached-partitions is requested, we create one afresh each time; also,those partdescs are stored in PortalContext instead ofCacheMemoryContext.find_inheritance_children gets a new output *detached_exist boolean,which indicates whether any partition marked pending-detach is found.Its "include_detached" input flag is changed to "omit_detached", becausethat name captures desired the semantics more naturally.CreatePartitionDirectory() and RelationGetPartitionDesc() arguments areidentically renamed.This was noticed because a buildfarm member that runs with relcacheclobbering, which would not keep the improperly cached partdesc, brokeone test, which led us to realize that the expected output of that testwas bogus. This commit also corrects that expected output.Author: Amit Langote <amitlangote09@gmail.com>Author: Álvaro Herrera <alvherre@alvh.no-ip.org>Discussion:https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
1 parent82b13db commit8aba932

File tree

12 files changed

+160
-101
lines changed

12 files changed

+160
-101
lines changed

‎src/backend/catalog/heap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3840,7 +3840,7 @@ StorePartitionBound(Relation rel, Relation parent, PartitionBoundSpec *bound)
38403840
* removed.
38413841
*/
38423842
defaultPartOid=
3843-
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent,false));
3843+
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent,true));
38443844
if (OidIsValid(defaultPartOid))
38453845
CacheInvalidateRelcacheByRelid(defaultPartOid);
38463846

‎src/backend/catalog/pg_inherits.c

Lines changed: 39 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
5252
* then no locks are acquired, but caller must beware of race conditions
5353
* against possible DROPs of child relations.
5454
*
55-
* include_detached says to include all partitions, even if they're marked
56-
* detached. Passing it as false means they might or might not be included,
57-
* depending on the visibility of the pg_inherits row for the active snapshot.
55+
* If a partition's pg_inherits row is marked "detach pending",
56+
* *detached_exist (if not null) is set true, otherwise it is set false.
57+
*
58+
* If omit_detached is true and there is an active snapshot (not the same as
59+
* the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
60+
* marked "detach pending" is visible to that snapshot, then that partition is
61+
* omitted from the output list. This makes partitions invisible depending on
62+
* whether the transaction that marked those partitions as detached appears
63+
* committed to the active snapshot.
5864
*/
5965
List*
60-
find_inheritance_children(OidparentrelId,boolinclude_detached,
61-
LOCKMODElockmode)
66+
find_inheritance_children(OidparentrelId,boolomit_detached,
67+
LOCKMODElockmode,bool*detached_exist)
6268
{
6369
List*list=NIL;
6470
Relationrelation;
@@ -78,6 +84,9 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
7884
if (!has_subclass(parentrelId))
7985
returnNIL;
8086

87+
if (detached_exist)
88+
*detached_exist= false;
89+
8190
/*
8291
* Scan pg_inherits and build a working array of subclass OIDs.
8392
*/
@@ -99,29 +108,35 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
99108
{
100109
/*
101110
* Cope with partitions concurrently being detached. When we see a
102-
* partition marked "detach pending", we only include it in the set of
103-
* visible partitions if caller requested all detached partitions, or
104-
* if its pg_inherits tuple's xmin is still visible to the active
105-
* snapshot.
111+
* partition marked "detach pending", we omit it from the returned set
112+
* of visible partitions if caller requested that and the tuple's xmin
113+
* does not appear in progress to the active snapshot. (If there's no
114+
* active snapshot set, that means we're not running a user query, so
115+
* it's OK to always include detached partitions in that case; if the
116+
* xmin is still running to the active snapshot, then the partition
117+
* has not been detached yet and so we include it.)
106118
*
107-
* The reason for thischeck is that we want to avoid seeing the
119+
* The reason for thishack is that we want to avoid seeing the
108120
* partition as alive in RI queries during REPEATABLE READ or
109-
* SERIALIZABLE transactions. (If there's no active snapshot set,
110-
* that means we're not running a user query, so it's OK to always
111-
* include detached partitions in that case.)
121+
* SERIALIZABLE transactions: such queries use a different snapshot
122+
* than the one used by regular (user) queries.
112123
*/
113-
if (((Form_pg_inherits)GETSTRUCT(inheritsTuple))->inhdetachpending&&
114-
!include_detached&&
115-
ActiveSnapshotSet())
124+
if (((Form_pg_inherits)GETSTRUCT(inheritsTuple))->inhdetachpending)
116125
{
117-
TransactionIdxmin;
118-
Snapshotsnap;
126+
if (detached_exist)
127+
*detached_exist= true;
128+
129+
if (omit_detached&&ActiveSnapshotSet())
130+
{
131+
TransactionIdxmin;
132+
Snapshotsnap;
119133

120-
xmin=HeapTupleHeaderGetXmin(inheritsTuple->t_data);
121-
snap=GetActiveSnapshot();
134+
xmin=HeapTupleHeaderGetXmin(inheritsTuple->t_data);
135+
snap=GetActiveSnapshot();
122136

123-
if (!XidInMVCCSnapshot(xmin,snap))
124-
continue;
137+
if (!XidInMVCCSnapshot(xmin,snap))
138+
continue;
139+
}
125140
}
126141

127142
inhrelid= ((Form_pg_inherits)GETSTRUCT(inheritsTuple))->inhrelid;
@@ -235,8 +250,8 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
235250
ListCell*lc;
236251

237252
/* Get the direct children of this rel */
238-
currentchildren=find_inheritance_children(currentrel,false,
239-
lockmode);
253+
currentchildren=find_inheritance_children(currentrel,true,
254+
lockmode,NULL);
240255

241256
/*
242257
* Add to the queue only those children not already seen. This avoids

‎src/backend/commands/indexcmds.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1123,7 +1123,7 @@ DefineIndex(Oid relationId,
11231123
*/
11241124
if (partitioned&&stmt->relation&& !stmt->relation->inh)
11251125
{
1126-
PartitionDescpd=RelationGetPartitionDesc(rel,false);
1126+
PartitionDescpd=RelationGetPartitionDesc(rel,true);
11271127

11281128
if (pd->nparts!=0)
11291129
flags |=INDEX_CREATE_INVALID;
@@ -1180,7 +1180,7 @@ DefineIndex(Oid relationId,
11801180
*
11811181
* If we're called internally (no stmt->relation), recurse always.
11821182
*/
1183-
partdesc=RelationGetPartitionDesc(rel,false);
1183+
partdesc=RelationGetPartitionDesc(rel,true);
11841184
if ((!stmt->relation||stmt->relation->inh)&&partdesc->nparts>0)
11851185
{
11861186
intnparts=partdesc->nparts;

‎src/backend/commands/tablecmds.c

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1041,7 +1041,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
10411041
*/
10421042
defaultPartOid =
10431043
get_default_oid_from_partdesc(RelationGetPartitionDesc(parent,
1044-
false));
1044+
true));
10451045
if (OidIsValid(defaultPartOid))
10461046
defaultRel = table_open(defaultPartOid, AccessExclusiveLock);
10471047

@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid,
35073507
* expected_parents will only be 0 if we are not already recursing.
35083508
*/
35093509
if (expected_parents == 0 &&
3510-
find_inheritance_children(myrelid,false, NoLock) != NIL)
3510+
find_inheritance_children(myrelid,true, NoLock, NULL) != NIL)
35113511
ereport(ERROR,
35123512
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
35133513
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid,
37063706
else
37073707
{
37083708
if (expected_parents == 0 &&
3709-
find_inheritance_children(myrelid,false, NoLock) != NIL)
3709+
find_inheritance_children(myrelid,true, NoLock, NULL) != NIL)
37103710
ereport(ERROR,
37113711
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
37123712
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -6580,7 +6580,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
65806580
*/
65816581
if (colDef->identity &&
65826582
recurse &&
6583-
find_inheritance_children(myrelid,false, NoLock) != NIL)
6583+
find_inheritance_children(myrelid,true, NoLock, NULL) != NIL)
65846584
ereport(ERROR,
65856585
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
65866586
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -6826,7 +6826,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
68266826
* use find_all_inheritors to do it in one pass.
68276827
*/
68286828
children =
6829-
find_inheritance_children(RelationGetRelid(rel),false, lockmode);
6829+
find_inheritance_children(RelationGetRelid(rel),true, lockmode, NULL);
68306830

68316831
/*
68326832
* If we are told not to recurse, there had better not be any child
@@ -6980,7 +6980,7 @@ ATPrepDropNotNull(Relation rel, bool recurse, bool recursing)
69806980
*/
69816981
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
69826982
{
6983-
PartitionDesc partdesc = RelationGetPartitionDesc(rel,false);
6983+
PartitionDesc partdesc = RelationGetPartitionDesc(rel,true);
69846984

69856985
Assert(partdesc != NULL);
69866986
if (partdesc->nparts > 0 && !recurse && !recursing)
@@ -7689,7 +7689,7 @@ ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recurs
76897689
* resulting state can be properly dumped and restored.
76907690
*/
76917691
if (!recurse &&
7692-
find_inheritance_children(RelationGetRelid(rel),false, lockmode))
7692+
find_inheritance_children(RelationGetRelid(rel),true, lockmode, NULL))
76937693
ereport(ERROR,
76947694
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
76957695
errmsg("ALTER TABLE / DROP EXPRESSION must be applied to child tables too")));
@@ -8297,7 +8297,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
82978297
* use find_all_inheritors to do it in one pass.
82988298
*/
82998299
children =
8300-
find_inheritance_children(RelationGetRelid(rel),false, lockmode);
8300+
find_inheritance_children(RelationGetRelid(rel),true, lockmode, NULL);
83018301

83028302
if (children)
83038303
{
@@ -8785,7 +8785,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
87858785
* use find_all_inheritors to do it in one pass.
87868786
*/
87878787
children =
8788-
find_inheritance_children(RelationGetRelid(rel),false, lockmode);
8788+
find_inheritance_children(RelationGetRelid(rel),true, lockmode, NULL);
87898789

87908790
/*
87918791
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -9400,7 +9400,7 @@ addFkRecurseReferenced(List **wqueue, Constraint *fkconstraint, Relation rel,
94009400
*/
94019401
if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
94029402
{
9403-
PartitionDesc pd = RelationGetPartitionDesc(pkrel,false);
9403+
PartitionDesc pd = RelationGetPartitionDesc(pkrel,true);
94049404

94059405
for (int i = 0; i < pd->nparts; i++)
94069406
{
@@ -9534,7 +9534,7 @@ addFkRecurseReferencing(List **wqueue, Constraint *fkconstraint, Relation rel,
95349534
}
95359535
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
95369536
{
9537-
PartitionDesc pd = RelationGetPartitionDesc(rel,false);
9537+
PartitionDesc pd = RelationGetPartitionDesc(rel,true);
95389538

95399539
/*
95409540
* Recurse to take appropriate action on each partition; either we
@@ -11318,8 +11318,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
1131811318
* use find_all_inheritors to do it in one pass.
1131911319
*/
1132011320
if (!is_no_inherit_constraint)
11321-
children =
11322-
find_inheritance_children(RelationGetRelid(rel), false, lockmode);
11321+
children = find_inheritance_children(RelationGetRelid(rel), true,
11322+
lockmode, NULL);
1132311323
else
1132411324
children = NIL;
1132511325

@@ -11703,8 +11703,8 @@ ATPrepAlterColumnType(List **wqueue,
1170311703
}
1170411704
}
1170511705
else if (!recursing &&
11706-
find_inheritance_children(RelationGetRelid(rel),false,
11707-
NoLock) != NIL)
11706+
find_inheritance_children(RelationGetRelid(rel),true,
11707+
NoLock, NULL) != NIL)
1170811708
ereport(ERROR,
1170911709
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
1171011710
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -16875,7 +16875,7 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
1687516875
}
1687616876
else if (scanrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
1687716877
{
16878-
PartitionDesc partdesc = RelationGetPartitionDesc(scanrel,false);
16878+
PartitionDesc partdesc = RelationGetPartitionDesc(scanrel,true);
1687916879
inti;
1688016880

1688116881
for (i = 0; i < partdesc->nparts; i++)
@@ -16935,7 +16935,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
1693516935
* new partition will change its partition constraint.
1693616936
*/
1693716937
defaultPartOid =
16938-
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel,false));
16938+
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel,true));
1693916939
if (OidIsValid(defaultPartOid))
1694016940
LockRelationOid(defaultPartOid, AccessExclusiveLock);
1694116941

@@ -17551,7 +17551,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
1755117551
* will change its partition constraint.
1755217552
*/
1755317553
defaultPartOid =
17554-
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel,false));
17554+
get_default_oid_from_partdesc(RelationGetPartitionDesc(rel,true));
1755517555
if (OidIsValid(defaultPartOid))
1755617556
{
1755717557
/*
@@ -18148,7 +18148,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
1814818148
RelationGetRelationName(partIdx))));
1814918149

1815018150
/* Make sure it indexes a partition of the other index's table */
18151-
partDesc = RelationGetPartitionDesc(parentTbl,false);
18151+
partDesc = RelationGetPartitionDesc(parentTbl,true);
1815218152
found = false;
1815318153
for (i = 0; i < partDesc->nparts; i++)
1815418154
{
@@ -18302,7 +18302,7 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
1830218302
* If we found as many inherited indexes as the partitioned table has
1830318303
* partitions, we're good; update pg_index to set indisvalid.
1830418304
*/
18305-
if (tuples == RelationGetPartitionDesc(partedTbl,false)->nparts)
18305+
if (tuples == RelationGetPartitionDesc(partedTbl,true)->nparts)
1830618306
{
1830718307
RelationidxRel;
1830818308
HeapTuplenewtup;

‎src/backend/commands/trigger.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1119,7 +1119,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
11191119
*/
11201120
if (partition_recurse)
11211121
{
1122-
PartitionDescpartdesc=RelationGetPartitionDesc(rel,false);
1122+
PartitionDescpartdesc=RelationGetPartitionDesc(rel,true);
11231123
List*idxs=NIL;
11241124
List*childTbls=NIL;
11251125
ListCell*l;
@@ -1141,8 +1141,8 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
11411141
ListCell*l;
11421142
List*idxs=NIL;
11431143

1144-
idxs=find_inheritance_children(indexOid,false,
1145-
ShareRowExclusiveLock);
1144+
idxs=find_inheritance_children(indexOid,true,
1145+
ShareRowExclusiveLock,NULL);
11461146
foreach(l,idxs)
11471147
childTbls=lappend_oid(childTbls,
11481148
IndexGetRelation(lfirst_oid(l),

‎src/backend/executor/execPartition.c

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -991,19 +991,16 @@ ExecInitPartitionDispatchInfo(EState *estate,
991991

992992
/*
993993
* For data modification, it is better that executor does not include
994-
* partitions being detached, except in snapshot-isolation mode. This
995-
* means that a read-committed transaction immediately gets a "no
996-
* partition for tuple" error when a tuple is inserted into a partition
997-
* that's being detached concurrently, but a transaction in repeatable-
998-
* read mode can still use the partition. Note that because partition
999-
* detach uses ShareLock on the partition (which conflicts with DML),
1000-
* we're certain that the detach won't be able to complete until any
1001-
* inserting transaction is done.
994+
* partitions being detached, except when running in snapshot-isolation
995+
* mode. This means that a read-committed transaction immediately gets a
996+
* "no partition for tuple" error when a tuple is inserted into a
997+
* partition that's being detached concurrently, but a transaction in
998+
* repeatable-read mode can still use such a partition.
1002999
*/
10031000
if (estate->es_partition_directory==NULL)
10041001
estate->es_partition_directory=
10051002
CreatePartitionDirectory(estate->es_query_cxt,
1006-
IsolationUsesXactSnapshot());
1003+
!IsolationUsesXactSnapshot());
10071004

10081005
oldcxt=MemoryContextSwitchTo(proute->memcxt);
10091006

@@ -1571,10 +1568,10 @@ ExecCreatePartitionPruneState(PlanState *planstate,
15711568
ListCell*lc;
15721569
inti;
15731570

1574-
/*Executor mustalwaysinclude detached partitions */
1571+
/*For data reading, executoralwaysomits detached partitions */
15751572
if (estate->es_partition_directory==NULL)
15761573
estate->es_partition_directory=
1577-
CreatePartitionDirectory(estate->es_query_cxt,true);
1574+
CreatePartitionDirectory(estate->es_query_cxt,false);
15781575

15791576
n_part_hierarchies=list_length(partitionpruneinfo->prune_infos);
15801577
Assert(n_part_hierarchies>0);

‎src/backend/optimizer/util/plancat.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2200,7 +2200,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
22002200
if (root->glob->partition_directory==NULL)
22012201
{
22022202
root->glob->partition_directory=
2203-
CreatePartitionDirectory(CurrentMemoryContext,false);
2203+
CreatePartitionDirectory(CurrentMemoryContext,true);
22042204
}
22052205

22062206
partdesc=PartitionDirectoryLookup(root->glob->partition_directory,

‎src/backend/partitioning/partbounds.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2798,7 +2798,7 @@ check_new_partition_bound(char *relname, Relation parent,
27982798
PartitionBoundSpec*spec,ParseState*pstate)
27992799
{
28002800
PartitionKeykey=RelationGetPartitionKey(parent);
2801-
PartitionDescpartdesc=RelationGetPartitionDesc(parent,true);
2801+
PartitionDescpartdesc=RelationGetPartitionDesc(parent,false);
28022802
PartitionBoundInfoboundinfo=partdesc->boundinfo;
28032803
intwith=-1;
28042804
booloverlap= false;
@@ -3991,7 +3991,7 @@ get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
39913991
{
39923992
inti;
39933993
intndatums=0;
3994-
PartitionDescpdesc=RelationGetPartitionDesc(parent,true);/* XXX correct? */
3994+
PartitionDescpdesc=RelationGetPartitionDesc(parent,false);
39953995
PartitionBoundInfoboundinfo=pdesc->boundinfo;
39963996

39973997
if (boundinfo)
@@ -4191,7 +4191,7 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
41914191
if (spec->is_default)
41924192
{
41934193
List*or_expr_args=NIL;
4194-
PartitionDescpdesc=RelationGetPartitionDesc(parent,true);/* XXX correct? */
4194+
PartitionDescpdesc=RelationGetPartitionDesc(parent,false);
41954195
Oid*inhoids=pdesc->oids;
41964196
intnparts=pdesc->nparts,
41974197
i;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp