Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit898e5e3

Browse files
committed
Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, becauseotherwise an insert that violates the newly-imposed partitionconstraint could be in progress at the same time that we're changingthat constraint; only the lock level on the parent relation isweakened.To make this safe, we have to cope with (at least) three separateproblems. First, relevant DDL might commit while we're in the processof building a PartitionDesc. If so, find_inheritance_children() mightsee a new partition while the RELOID system cache still has the oldpartition bound cached, and even before invalidation messages havebeen queued. To fix that, if we see that the pg_class tuple seems tobe missing or to have a null relpartbound, refetch the value directlyfrom the table. We can't get the wrong value, because DETACH PARTITIONstill requires AccessExclusiveLock throughout; if we ever want tochange that, this will need more thought. In testing, I found it quitedifficult to hit even the null-relpartbound case; the race conditionis extremely tight, but the theoretical risk is there.Second, successive calls to RelationGetPartitionDesc might not returnthe same answer. The query planner will get confused if lookup up thePartitionDesc for a particular relation does not return a consistentanswer for the entire duration of query planning. Likewise, queryexecution will get confused if the same relation seems to have adifferent PartitionDesc at different times. Invent a newPartitionDirectory concept and use it to ensure consistency. Thisensures that a single invocation of either the planner or the executorsees the same view of the PartitionDesc from beginning to end, but itdoes not guarantee that the planner and the executor see the sameview. Since this allows pointers to old PartitionDesc entries tosurvive even after a relcache rebuild, also postpone removing the oldPartitionDesc entry until we're certain no one is using it.For the most part, it seems to be OK for the planner and executor tohave different views of the PartitionDesc, because the executor willjust ignore any concurrently added partitions which were unknown atplan time; those partitions won't be part of the inheritanceexpansion, but invalidation messages will trigger replanning at somepoint. Normally, this happens by the time the very next command isexecuted, but if the next command acquires no locks and executes aprepared query, it can manage not to notice until a new transaction isstarted. We might want to tighten that up, but it's material for aseparate patch. There would still be a small window where a querythat started just after an ATTACH PARTITION command committed mightfail to notice its results -- but only if the command starts beforethe commit has been acknowledged to the user. All in all, the wartshere around serializability seem small enough to be worth acceptingfor the considerable advantage of being able to add partitions withouta full table lock.Although in general the consequences of new partitions showing upbetween planning and execution are limited to the query not noticingthe new partitions, run-time partition pruning will get confused inthat case, so that's the third problem that this patch fixes.Run-time partition pruning assumes that indexes into the PartitionDescare stable between planning and execution. So, add code so that ifnew partitions are added between plan time and execution time, theindexes stored in the subplan_map[] and subpart_map[] arrays withinthe plan's PartitionedRelPruneInfo get adjusted accordingly. Theredoes not seem to be a simple way to generalize this scheme to copewith partitions that are removed, mostly because they could then getadded back again with different bounds, but it works OK for addedpartitions.This code does not try to ensure that every backend participating ina parallel query sees the same view of the PartitionDesc. Thatcurrently doesn't matter, because we never pass PartitionDescindexes between backends. Each backend will ignore the concurrentlyadded partitions which it notices, and it doesn't matter if differentbackends are ignoring different sets of concurrently added partitions.If in the future that matters, for example because we allow writes inparallel query and want all participants to do tuple routing to the sameset of partitions, the PartitionDirectory concept could be improved toshare PartitionDescs across backends. There is a draft patch toserialize and restore PartitionDescs on the thread where this patchwas discussed, which may be a useful place to start.Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs,Amit Langote, and Michael Paquier for discussion, and to AlvaroHerrera for some review.Discussion:http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.comDiscussion:http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.comDiscussion:http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
1 parentec51727 commit898e5e3

File tree

21 files changed

+314
-45
lines changed

21 files changed

+314
-45
lines changed

‎doc/src/sgml/ddl.sgml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3827,7 +3827,8 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
38273827
the system will be able to skip the scan to validate the implicit
38283828
partition constraint. Without such a constraint, the table will be
38293829
scanned to validate the partition constraint while holding an
3830-
<literal>ACCESS EXCLUSIVE</literal> lock on the parent table.
3830+
<literal>ACCESS EXCLUSIVE</literal> lock on that partition
3831+
and a <literal>SHARE UPDATE EXCLUSIVE</literal> lock on the parent table.
38313832
One may then drop the constraint after <command>ATTACH PARTITION</command>
38323833
is finished, because it is no longer necessary.
38333834
</para>

‎src/backend/commands/copy.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2556,7 +2556,7 @@ CopyFrom(CopyState cstate)
25562556
* CopyFrom tuple routing.
25572557
*/
25582558
if (cstate->rel->rd_rel->relkind==RELKIND_PARTITIONED_TABLE)
2559-
proute=ExecSetupPartitionTupleRouting(NULL,cstate->rel);
2559+
proute=ExecSetupPartitionTupleRouting(estate,NULL,cstate->rel);
25602560

25612561
if (cstate->whereClause)
25622562
cstate->qualexpr=ExecInitQual(castNode(List,cstate->whereClause),

‎src/backend/commands/tablecmds.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3692,6 +3692,9 @@ AlterTableGetLockLevel(List *cmds)
36923692
break;
36933693

36943694
caseAT_AttachPartition:
3695+
cmd_lockmode=ShareUpdateExclusiveLock;
3696+
break;
3697+
36953698
caseAT_DetachPartition:
36963699
cmd_lockmode=AccessExclusiveLock;
36973700
break;

‎src/backend/executor/execPartition.c

Lines changed: 77 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,8 @@ static void ExecInitRoutingInfo(ModifyTableState *mtstate,
167167
PartitionDispatchdispatch,
168168
ResultRelInfo*partRelInfo,
169169
intpartidx);
170-
staticPartitionDispatchExecInitPartitionDispatchInfo(PartitionTupleRouting*proute,
170+
staticPartitionDispatchExecInitPartitionDispatchInfo(EState*estate,
171+
PartitionTupleRouting*proute,
171172
Oidpartoid,PartitionDispatchparent_pd,intpartidx);
172173
staticvoidFormPartitionKeyDatum(PartitionDispatchpd,
173174
TupleTableSlot*slot,
@@ -201,7 +202,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
201202
* it should be estate->es_query_cxt.
202203
*/
203204
PartitionTupleRouting*
204-
ExecSetupPartitionTupleRouting(ModifyTableState*mtstate,Relationrel)
205+
ExecSetupPartitionTupleRouting(EState*estate,ModifyTableState*mtstate,
206+
Relationrel)
205207
{
206208
PartitionTupleRouting*proute;
207209
ModifyTable*node=mtstate ? (ModifyTable*)mtstate->ps.plan :NULL;
@@ -223,7 +225,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
223225
* parent as NULL as we don't need to care about any parent of the target
224226
* partitioned table.
225227
*/
226-
ExecInitPartitionDispatchInfo(proute,RelationGetRelid(rel),NULL,0);
228+
ExecInitPartitionDispatchInfo(estate,proute,RelationGetRelid(rel),
229+
NULL,0);
227230

228231
/*
229232
* If performing an UPDATE with tuple routing, we can reuse partition
@@ -424,7 +427,8 @@ ExecFindPartition(ModifyTableState *mtstate,
424427
* Create the new PartitionDispatch. We pass the current one
425428
* in as the parent PartitionDispatch
426429
*/
427-
subdispatch=ExecInitPartitionDispatchInfo(proute,
430+
subdispatch=ExecInitPartitionDispatchInfo(mtstate->ps.state,
431+
proute,
428432
partdesc->oids[partidx],
429433
dispatch,partidx);
430434
Assert(dispatch->indexes[partidx] >=0&&
@@ -988,7 +992,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
988992
*PartitionDispatch later.
989993
*/
990994
staticPartitionDispatch
991-
ExecInitPartitionDispatchInfo(PartitionTupleRouting*proute,Oidpartoid,
995+
ExecInitPartitionDispatchInfo(EState*estate,
996+
PartitionTupleRouting*proute,Oidpartoid,
992997
PartitionDispatchparent_pd,intpartidx)
993998
{
994999
Relationrel;
@@ -997,6 +1002,10 @@ ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, Oid partoid,
9971002
intdispatchidx;
9981003
MemoryContextoldcxt;
9991004

1005+
if (estate->es_partition_directory==NULL)
1006+
estate->es_partition_directory=
1007+
CreatePartitionDirectory(estate->es_query_cxt);
1008+
10001009
oldcxt=MemoryContextSwitchTo(proute->memcxt);
10011010

10021011
/*
@@ -1008,7 +1017,7 @@ ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, Oid partoid,
10081017
rel=table_open(partoid,RowExclusiveLock);
10091018
else
10101019
rel=proute->partition_root;
1011-
partdesc=RelationGetPartitionDesc(rel);
1020+
partdesc=PartitionDirectoryLookup(estate->es_partition_directory,rel);
10121021

10131022
pd= (PartitionDispatch)palloc(offsetof(PartitionDispatchData,indexes)+
10141023
partdesc->nparts*sizeof(int));
@@ -1554,6 +1563,10 @@ ExecCreatePartitionPruneState(PlanState *planstate,
15541563
ListCell*lc;
15551564
inti;
15561565

1566+
if (estate->es_partition_directory==NULL)
1567+
estate->es_partition_directory=
1568+
CreatePartitionDirectory(estate->es_query_cxt);
1569+
15571570
n_part_hierarchies=list_length(partitionpruneinfo->prune_infos);
15581571
Assert(n_part_hierarchies>0);
15591572

@@ -1610,18 +1623,6 @@ ExecCreatePartitionPruneState(PlanState *planstate,
16101623
intn_steps;
16111624
ListCell*lc3;
16121625

1613-
/*
1614-
* We must copy the subplan_map rather than pointing directly to
1615-
* the plan's version, as we may end up making modifications to it
1616-
* later.
1617-
*/
1618-
pprune->subplan_map=palloc(sizeof(int)*pinfo->nparts);
1619-
memcpy(pprune->subplan_map,pinfo->subplan_map,
1620-
sizeof(int)*pinfo->nparts);
1621-
1622-
/* We can use the subpart_map verbatim, since we never modify it */
1623-
pprune->subpart_map=pinfo->subpart_map;
1624-
16251626
/* present_parts is also subject to later modification */
16261627
pprune->present_parts=bms_copy(pinfo->present_parts);
16271628

@@ -1633,7 +1634,64 @@ ExecCreatePartitionPruneState(PlanState *planstate,
16331634
*/
16341635
partrel=ExecGetRangeTableRelation(estate,pinfo->rtindex);
16351636
partkey=RelationGetPartitionKey(partrel);
1636-
partdesc=RelationGetPartitionDesc(partrel);
1637+
partdesc=PartitionDirectoryLookup(estate->es_partition_directory,
1638+
partrel);
1639+
1640+
/*
1641+
* Initialize the subplan_map and subpart_map. Since detaching a
1642+
* partition requires AccessExclusiveLock, no partitions can have
1643+
* disappeared, nor can the bounds for any partition have changed.
1644+
* However, new partitions may have been added.
1645+
*/
1646+
Assert(partdesc->nparts >=pinfo->nparts);
1647+
pprune->subplan_map=palloc(sizeof(int)*partdesc->nparts);
1648+
if (partdesc->nparts==pinfo->nparts)
1649+
{
1650+
/*
1651+
* There are no new partitions, so this is simple. We can
1652+
* simply point to the subpart_map from the plan, but we must
1653+
* copy the subplan_map since we may change it later.
1654+
*/
1655+
pprune->subpart_map=pinfo->subpart_map;
1656+
memcpy(pprune->subplan_map,pinfo->subplan_map,
1657+
sizeof(int)*pinfo->nparts);
1658+
1659+
/* Double-check that list of relations has not changed. */
1660+
Assert(memcmp(partdesc->oids,pinfo->relid_map,
1661+
pinfo->nparts*sizeof(Oid))==0);
1662+
}
1663+
else
1664+
{
1665+
intpd_idx=0;
1666+
intpp_idx;
1667+
1668+
/*
1669+
* Some new partitions have appeared since plan time, and
1670+
* those are reflected in our PartitionDesc but were not
1671+
* present in the one used to construct subplan_map and
1672+
* subpart_map. So we must construct new and longer arrays
1673+
* where the partitions that were originally present map to the
1674+
* same place, and any added indexes map to -1, as if the
1675+
* new partitions had been pruned.
1676+
*/
1677+
pprune->subpart_map=palloc(sizeof(int)*partdesc->nparts);
1678+
for (pp_idx=0;pp_idx<partdesc->nparts;++pp_idx)
1679+
{
1680+
if (pinfo->relid_map[pd_idx]!=partdesc->oids[pp_idx])
1681+
{
1682+
pprune->subplan_map[pp_idx]=-1;
1683+
pprune->subpart_map[pp_idx]=-1;
1684+
}
1685+
else
1686+
{
1687+
pprune->subplan_map[pp_idx]=
1688+
pinfo->subplan_map[pd_idx];
1689+
pprune->subpart_map[pp_idx]=
1690+
pinfo->subpart_map[pd_idx++];
1691+
}
1692+
}
1693+
Assert(pd_idx==pinfo->nparts);
1694+
}
16371695

16381696
n_steps=list_length(pinfo->pruning_steps);
16391697

‎src/backend/executor/execUtils.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
#include"mb/pg_wchar.h"
5555
#include"nodes/nodeFuncs.h"
5656
#include"parser/parsetree.h"
57+
#include"partitioning/partdesc.h"
5758
#include"storage/lmgr.h"
5859
#include"utils/builtins.h"
5960
#include"utils/memutils.h"
@@ -214,6 +215,13 @@ FreeExecutorState(EState *estate)
214215
estate->es_jit=NULL;
215216
}
216217

218+
/* release partition directory, if allocated */
219+
if (estate->es_partition_directory)
220+
{
221+
DestroyPartitionDirectory(estate->es_partition_directory);
222+
estate->es_partition_directory=NULL;
223+
}
224+
217225
/*
218226
* Free the per-query memory context, thereby releasing all working
219227
* memory, including the EState node itself.

‎src/backend/executor/nodeModifyTable.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2186,7 +2186,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
21862186
if (rel->rd_rel->relkind==RELKIND_PARTITIONED_TABLE&&
21872187
(operation==CMD_INSERT||update_tuple_routing_needed))
21882188
mtstate->mt_partition_tuple_routing=
2189-
ExecSetupPartitionTupleRouting(mtstate,rel);
2189+
ExecSetupPartitionTupleRouting(estate,mtstate,rel);
21902190

21912191
/*
21922192
* Build state for collecting transition tuples. This requires having a

‎src/backend/nodes/copyfuncs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,6 +1197,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
11971197
COPY_SCALAR_FIELD(nexprs);
11981198
COPY_POINTER_FIELD(subplan_map,from->nparts*sizeof(int));
11991199
COPY_POINTER_FIELD(subpart_map,from->nparts*sizeof(int));
1200+
COPY_POINTER_FIELD(relid_map,from->nparts*sizeof(int));
12001201
COPY_POINTER_FIELD(hasexecparam,from->nexprs*sizeof(bool));
12011202
COPY_SCALAR_FIELD(do_initial_prune);
12021203
COPY_SCALAR_FIELD(do_exec_prune);

‎src/backend/nodes/outfuncs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -947,6 +947,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
947947
WRITE_INT_FIELD(nexprs);
948948
WRITE_INT_ARRAY(subplan_map,node->nparts);
949949
WRITE_INT_ARRAY(subpart_map,node->nparts);
950+
WRITE_OID_ARRAY(relid_map,node->nparts);
950951
WRITE_BOOL_ARRAY(hasexecparam,node->nexprs);
951952
WRITE_BOOL_FIELD(do_initial_prune);
952953
WRITE_BOOL_FIELD(do_exec_prune);

‎src/backend/nodes/readfuncs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2386,6 +2386,7 @@ _readPartitionedRelPruneInfo(void)
23862386
READ_INT_FIELD(nexprs);
23872387
READ_INT_ARRAY(subplan_map,local_node->nparts);
23882388
READ_INT_ARRAY(subpart_map,local_node->nparts);
2389+
READ_OID_ARRAY(relid_map,local_node->nparts);
23892390
READ_BOOL_ARRAY(hasexecparam,local_node->nexprs);
23902391
READ_BOOL_FIELD(do_initial_prune);
23912392
READ_BOOL_FIELD(do_exec_prune);

‎src/backend/optimizer/plan/planner.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
#include"parser/analyze.h"
5757
#include"parser/parsetree.h"
5858
#include"parser/parse_agg.h"
59+
#include"partitioning/partdesc.h"
5960
#include"rewrite/rewriteManip.h"
6061
#include"storage/dsm_impl.h"
6162
#include"utils/rel.h"
@@ -567,6 +568,9 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
567568
result->jitFlags |=PGJIT_DEFORM;
568569
}
569570

571+
if (glob->partition_directory!=NULL)
572+
DestroyPartitionDirectory(glob->partition_directory);
573+
570574
returnresult;
571575
}
572576

‎src/backend/optimizer/util/inherit.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,10 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
147147
{
148148
Assert(rte->relkind==RELKIND_PARTITIONED_TABLE);
149149

150+
if (root->glob->partition_directory==NULL)
151+
root->glob->partition_directory=
152+
CreatePartitionDirectory(CurrentMemoryContext);
153+
150154
/*
151155
* If this table has partitions, recursively expand and lock them.
152156
* While at it, also extract the partition key columns of all the
@@ -246,7 +250,10 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
246250
inti;
247251
RangeTblEntry*childrte;
248252
IndexchildRTindex;
249-
PartitionDescpartdesc=RelationGetPartitionDesc(parentrel);
253+
PartitionDescpartdesc;
254+
255+
partdesc=PartitionDirectoryLookup(root->glob->partition_directory,
256+
parentrel);
250257

251258
check_stack_depth();
252259

‎src/backend/optimizer/util/plancat.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2086,7 +2086,8 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
20862086

20872087
Assert(relation->rd_rel->relkind==RELKIND_PARTITIONED_TABLE);
20882088

2089-
partdesc=RelationGetPartitionDesc(relation);
2089+
partdesc=PartitionDirectoryLookup(root->glob->partition_directory,
2090+
relation);
20902091
partkey=RelationGetPartitionKey(relation);
20912092
rel->part_scheme=find_partition_scheme(root,relation);
20922093
Assert(partdesc!=NULL&&rel->part_scheme!=NULL);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp