NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit305db95

committed

Fix creation of partition descriptor during concurrent detach+drop

If a partition undergoes DETACH CONCURRENTLY immediately followed byDROP, this could cause a problem for a concurrent transactionrecomputing the partition descriptor when running a prepared statement,because it tries to dereference a pointer to a tuple that's not found ina catalog scan.The existing retry logic added in commitdbca346 is sufficient tocope with the overall problem, provided we don't try to dereference anon-existant heap tuple.Arguably, the code in RelationBuildPartitionDesc() has been wrong allalong, since no check was added in commit898e5e3 against receivinga NULL tuple from the catalog scan; that bug has only becomeuser-visible with DETACH CONCURRENTLY which was added in branch 14.Therefore, even though there's no known mechanism to cause a crashbecause of this, backpatch the addition of such a check to all supportedbranches. In branches prior to 14, this would cause the code to failwith a "missing relpartbound for relation XYZ" error instead ofcrashing; that's okay, because there are no reports of such behavioranyway.Author: Kuntal Ghosh <kuntalghosh.2007@gmail.com>Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>Reviewed-by: Tender Wang <tndrwang@gmail.com>Discussion:https://postgr.es/m/18559-b48286d2eacd9a4e@postgresql.org

1 parent16e67bc commit305db95Copy full SHA for 305db95

File tree

1 file changed

+22

-8

lines changed

src/backend/partitioning
- partdesc.c

1 file changed

+22

-8

lines changed

`‎src/backend/partitioning/partdesc.c`

Lines changed: 22 additions & 8 deletions

Original file line number	Diff line number	Diff line change
`@@ -210,6 +210,10 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)`
`210`	`210`	`* shared queue. We solve this problem by reading pg_class directly`
`211`	`211`	`* for the desired tuple.`
`212`	`212`	`*`
	`213`	`+ * If the partition recently detached is also dropped, we get no tuple`
	`214`	`+ * from the scan. In that case, we also retry, and next time through`
	`215`	`+ * here, we don't see that partition anymore.`
	`216`	`+ *`
`213`	`217`	`* The other problem is that DETACH CONCURRENTLY is in the process of`
`214`	`218`	`* removing a partition, which happens in two steps: first it marks it`
`215`	`219`	`* as "detach pending", commits, then unsets relpartbound. If`
`@@ -224,8 +228,6 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)`
`224`	`228`	`Relationpg_class;`
`225`	`229`	`SysScanDescscan;`
`226`	`230`	`ScanKeyDatakey[1];`
`227`		`-Datumdatum;`
`228`		`-boolisnull;`
`229`	`231`
`230`	`232`	`pg_class=table_open(RelationRelationId,AccessShareLock);`
`231`	`233`	`ScanKeyInit(&key[0],`
`@@ -234,17 +236,29 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)`
`234`	`236`	`ObjectIdGetDatum(inhrelid));`
`235`	`237`	`scan=systable_beginscan(pg_class,ClassOidIndexId, true,`
`236`	`238`	`NULL,1,key);`
	`239`	`+`
	`240`	`+/*`
	`241`	`+ * We could get one tuple from the scan (the normal case), or zero`
	`242`	`+ * tuples if the table has been dropped meanwhile.`
	`243`	`+ */`
`237`	`244`	`tuple=systable_getnext(scan);`
`238`		`-datum=heap_getattr(tuple,Anum_pg_class_relpartbound,`
`239`		`-RelationGetDescr(pg_class),&isnull);`
`240`		`-if (!isnull)`
`241`		`-boundspec=stringToNode(TextDatumGetCString(datum));`
	`245`	`+if (HeapTupleIsValid(tuple))`
	`246`	`+{`
	`247`	`+Datumdatum;`
	`248`	`+boolisnull;`
	`249`	`+`
	`250`	`+datum=heap_getattr(tuple,Anum_pg_class_relpartbound,`
	`251`	`+RelationGetDescr(pg_class),&isnull);`
	`252`	`+if (!isnull)`
	`253`	`+boundspec=stringToNode(TextDatumGetCString(datum));`
	`254`	`+}`
`242`	`255`	`systable_endscan(scan);`
`243`	`256`	`table_close(pg_class,AccessShareLock);`
`244`	`257`
`245`	`258`	`/*`
`246`		`- * If we still don't get a relpartbound value, then it must be`
`247`		`- * because of DETACH CONCURRENTLY. Restart from the top, as`
	`259`	`+ * If we still don't get a relpartbound value (either because`
	`260`	`+ * boundspec is null or because there was no tuple), then it must`
	`261`	`+ * be because of DETACH CONCURRENTLY. Restart from the top, as`
`248`	`262`	`* explained above. We only do this once, for two reasons: first,`
`249`	`263`	`* only one DETACH CONCURRENTLY session could affect us at a time,`
`250`	`264`	`* since each of them would have to wait for the snapshot under`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit305db95

File tree

1 file changed

1 file changed

`‎src/backend/partitioning/partdesc.c`

0 commit comments