NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit3ba11d3

committed

Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan.

... or at least, when the planner's cost estimates say it will be faster.Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane

1 parent694c56a commit3ba11d3Copy full SHA for 3ba11d3

File tree

14 files changed

+713

-138

lines changed

doc/src/sgml/ref
- cluster.sgml
src
- backend
  - commands
    - cluster.c
  - optimizer
    - path
      - costsize.c
    - plan
    - prep
      - prepunion.c
    - util
      - pathnode.c
      - plancat.c
  - utils/sort
    - tuplesort.c
- include
  - optimizer
  - utils
    - tuplesort.h

14 files changed

+713

-138

lines changed

`‎doc/src/sgml/ref/cluster.sgml`

Lines changed: 30 additions & 37 deletions

Original file line number	Diff line number	Diff line change
`@@ -128,18 +128,33 @@ CLUSTER [VERBOSE]`
`128`	`128`	`</para>`
`129`	`129`
`130`	`130`	`<para>`
`131`		`-During the cluster operation, a temporary copy ofthe tableis created`
`132`		`-that containsthetable data inthe indexorder. Temporary copies of`
`133`		`-each index on the table are created as well. Therefore, you need free`
`134`		`-space on disk at least equal to the sum of the table sizeandthe index`
`135`		`-sizes.`
	`131`	`+<command>CLUSTER</> can re-sortthe tableusing either an indexscan`
	`132`	`+onthespecified index, or (ifthe indexis a b-tree) a sequential`
	`133`	`+scan followed by sorting. It will attempt to choose the method that`
	`134`	`+will be faster, based on planner cost parametersandavailable statistical`
	`135`	`+information.`
`136`	`136`	`</para>`
`137`	`137`
`138`	`138`	`<para>`
`139`		`- Because <command>CLUSTER</command> remembers the clustering information,`
`140`		`- one can cluster the tables one wants clustered manually the first time, and`
`141`		`- setup a timed event similar to <command>VACUUM</command> so that the tables`
`142`		`- are periodically reclustered.`
	`139`	`+ When an indexscan is used, a temporary copy of the table is created that`
	`140`	`+ contains the table data in the index order. Temporary copies of each`
	`141`	`+ index on the table are created as well. Therefore, you need free space on`
	`142`	`+ disk at least equal to the sum of the table size and the index sizes.`
	`143`	`+ </para>`
	`144`	`+`
	`145`	`+ <para>`
	`146`	`+ When a sequential scan and sort is used, a temporary sort file is`
	`147`	`+ also created, so that the peak temporary space requirement is as much`
	`148`	`+ as double the table size, plus the index sizes. This method is often`
	`149`	`+ faster than the indexscan method, but if the disk space requirement is`
	`150`	`+ intolerable, you can disable this choice by temporarily setting <xref`
	`151`	`+ linkend="guc-enable-sort"> to <literal>off</>.`
	`152`	`+ </para>`
	`153`	`+`
	`154`	`+ <para>`
	`155`	`+ It is advisable to set <xref linkend="guc-maintenance-work-mem"> to`
	`156`	`+ a reasonably large value (but not more than the amount of RAM you can`
	`157`	`+ dedicate to the <command>CLUSTER</> operation) before clustering.`
`143`	`158`	`</para>`
`144`	`159`
`145`	`160`	`<para>`
`@@ -150,35 +165,13 @@ CLUSTER [VERBOSE]`
`150`	`165`	`</para>`
`151`	`166`
`152`	`167`	`<para>`
`153`		`- There is another way to cluster data. The`
`154`		`- <command>CLUSTER</command> command reorders the original table by`
`155`		`- scanning it using the index you specify. This can be slow`
`156`		`- on large tables because the rows are fetched from the table`
`157`		`- in index order, and if the table is disordered, the`
`158`		`- entries are on random pages, so there is one disk page`
`159`		`- retrieved for every row moved. (<productname>PostgreSQL</productname> has`
`160`		`- a cache, but the majority of a big table will not fit in the cache.)`
`161`		`- The other way to cluster a table is to use:`
`162`		`-`
`163`		`-<programlisting>`
`164`		`-CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS`
`165`		`- SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;`
`166`		`-</programlisting>`
`167`		`-`
`168`		`- which uses the <productname>PostgreSQL</productname> sorting code`
`169`		`- to produce the desired order;`
`170`		`- this is usually much faster than an index scan for disordered data.`
`171`		`- Then you drop the old table, use`
`172`		`- <command>ALTER TABLE ... RENAME</command>`
`173`		`- to rename <replaceable class="parameter">newtable</replaceable> to the`
`174`		`- old name, and recreate the table's indexes.`
`175`		`- The big disadvantage of this approach is that it does not preserve`
`176`		`- OIDs, constraints, foreign key relationships, granted privileges, and`
`177`		`- other ancillary properties of the table — all such items must be`
`178`		`- manually recreated. Another disadvantage is that this way requires a sort`
`179`		`- temporary file about the same size as the table itself, so peak disk usage`
`180`		`- is about three times the table size instead of twice the table size.`
	`168`	`+ Because <command>CLUSTER</command> remembers which indexes are clustered,`
	`169`	`+ one can cluster the tables one wants clustered manually the first time,`
	`170`	`+ then set up a periodic maintenance script that executes`
	`171`	`+ <command>CLUSTER</> without any parameters, so that the desired tables`
	`172`	`+ are periodically reclustered.`
`181`	`173`	`</para>`
	`174`	`+`
`182`	`175`	`</refsect1>`
`183`	`176`
`184`	`177`	`<refsect1>`

`‎src/backend/commands/cluster.c`

Lines changed: 121 additions & 40 deletions

Original file line number	Diff line number	Diff line change
`@@ -36,6 +36,7 @@`
`36`	`36`	`#include"commands/trigger.h"`
`37`	`37`	`#include"commands/vacuum.h"`
`38`	`38`	`#include"miscadmin.h"`
	`39`	`+#include"optimizer/planner.h"`
`39`	`40`	`#include"storage/bufmgr.h"`
`40`	`41`	`#include"storage/procarray.h"`
`41`	`42`	`#include"storage/smgr.h"`
`@@ -49,6 +50,7 @@`
`49`	`50`	`#include"utils/snapmgr.h"`
`50`	`51`	`#include"utils/syscache.h"`
`51`	`52`	`#include"utils/tqual.h"`
	`53`	`+#include"utils/tuplesort.h"`
`52`	`54`
`53`	`55`
`54`	`56`	`/*`
`@@ -69,7 +71,10 @@ static void copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,`
`69`	`71`	`intfreeze_min_age,intfreeze_table_age,`
`70`	`72`	`boolpSwapToastByContent,TransactionIdpFreezeXid);`
`71`	`73`	`staticList*get_tables_to_cluster(MemoryContextcluster_context);`
`72`		`-`
	`74`	`+staticvoidreform_and_rewrite_tuple(HeapTupletuple,`
	`75`	`+TupleDescoldTupDesc,TupleDescnewTupDesc,`
	`76`	`+Datumvalues,boolisnull,`
	`77`	`+boolnewRelHasOids,RewriteStaterwstate);`
`73`	`78`
`74`	`79`
`75`	`80`	`/*---------------------------------------------------------------------------`
`@@ -759,6 +764,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,`
`759`	`764`	`TransactionIdOldestXmin;`
`760`	`765`	`TransactionIdFreezeXid;`
`761`	`766`	`RewriteStaterwstate;`
	`767`	`+booluse_sort;`
	`768`	`+Tuplesortstate*tuplesort;`
`762`	`769`
`763`	`770`	`/*`
`764`	`771`	`* Open the relations we need.`
`@@ -845,12 +852,30 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,`
`845`	`852`	`rwstate=begin_heap_rewrite(NewHeap,OldestXmin,FreezeXid,use_wal);`
`846`	`853`
`847`	`854`	`/*`
`848`		`- * Scan through the OldHeap, either in OldIndex order or sequentially, and`
`849`		`- * copy each tuple into the NewHeap. To ensure we see recently-dead`
`850`		`- * tuples that still need to be copied, we scan with SnapshotAny and use`
	`855`	`+ * Decide whether to use an indexscan or seqscan-and-optional-sort to`
	`856`	`+ * scan the OldHeap. We know how to use a sort to duplicate the ordering`
	`857`	`+ * of a btree index, and will use seqscan-and-sort for that case if the`
	`858`	`+ * planner tells us it's cheaper. Otherwise, always indexscan if an`
	`859`	`+ * index is provided, else plain seqscan.`
	`860`	`+ */`
	`861`	`+if (OldIndex!=NULL&&OldIndex->rd_rel->relam==BTREE_AM_OID)`
	`862`	`+use_sort=plan_cluster_use_sort(OIDOldHeap,OIDOldIndex);`
	`863`	`+else`
	`864`	`+use_sort= false;`
	`865`	`+`
	`866`	`+/* Set up sorting if wanted */`
	`867`	`+if (use_sort)`
	`868`	`+tuplesort=tuplesort_begin_cluster(oldTupDesc,OldIndex,`
	`869`	`+maintenance_work_mem, false);`
	`870`	`+else`
	`871`	`+tuplesort=NULL;`
	`872`	`+`
	`873`	`+/*`
	`874`	`+ * Prepare to scan the OldHeap. To ensure we see recently-dead tuples`
	`875`	`+ * that still need to be copied, we scan with SnapshotAny and use`
`851`	`876`	`* HeapTupleSatisfiesVacuum for the visibility test.`
`852`	`877`	`*/`
`853`		`-if (OldIndex!=NULL)`
	`878`	`+if (OldIndex!=NULL&& !use_sort)`
`854`	`879`	`{`
`855`	`880`	`heapScan=NULL;`
`856`	`881`	`indexScan=index_beginscan(OldHeap,OldIndex,`
`@@ -862,17 +887,21 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,`
`862`	`887`	`indexScan=NULL;`
`863`	`888`	`}`
`864`	`889`
	`890`	`+/*`
	`891`	`+ * Scan through the OldHeap, either in OldIndex order or sequentially;`
	`892`	`+ * copy each tuple into the NewHeap, or transiently to the tuplesort`
	`893`	`+ * module. Note that we don't bother sorting dead tuples (they won't`
	`894`	`+ * get to the new table anyway).`
	`895`	`+ */`
`865`	`896`	`for (;;)`
`866`	`897`	`{`
`867`	`898`	`HeapTupletuple;`
`868`		`-HeapTuplecopiedTuple;`
`869`	`899`	`Bufferbuf;`
`870`	`900`	`boolisdead;`
`871`		`-inti;`
`872`	`901`
`873`	`902`	`CHECK_FOR_INTERRUPTS();`
`874`	`903`
`875`		`-if (OldIndex!=NULL)`
	`904`	`+if (indexScan!=NULL)`
`876`	`905`	`{`
`877`	`906`	`tuple=index_getnext(indexScan,ForwardScanDirection);`
`878`	`907`	`if (tuple==NULL)`
`@@ -951,45 +980,50 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,`
`951`	`980`	`continue;`
`952`	`981`	`}`
`953`	`982`
`954`		`-/*`
`955`		`- * We cannot simply copy the tuple as-is, for several reasons:`
`956`		`- *`
`957`		`- * 1. We'd like to squeeze out the values of any dropped columns, both`
`958`		`- * to save space and to ensure we have no corner-case failures. (It's`
`959`		`- * possible for example that the new table hasn't got a TOAST table`
`960`		`- * and so is unable to store any large values of dropped cols.)`
`961`		`- *`
`962`		`- * 2. The tuple might not even be legal for the new table; this is`
`963`		`- * currently only known to happen as an after-effect of ALTER TABLE`
`964`		`- * SET WITHOUT OIDS.`
`965`		`- *`
`966`		`- * So, we must reconstruct the tuple from component Datums.`
`967`		`- */`
`968`		`-heap_deform_tuple(tuple,oldTupDesc,values,isnull);`
	`983`	`+if (tuplesort!=NULL)`
	`984`	`+tuplesort_putheaptuple(tuplesort,tuple);`
	`985`	`+else`
	`986`	`+reform_and_rewrite_tuple(tuple,`
	`987`	`+oldTupDesc,newTupDesc,`
	`988`	`+values,isnull,`
	`989`	`+NewHeap->rd_rel->relhasoids,rwstate);`
	`990`	`+}`
`969`	`991`
`970`		`-/* Be sure to null out any dropped columns */`
`971`		`-for (i=0;i<natts;i++)`
	`992`	`+if (indexScan!=NULL)`
	`993`	`+index_endscan(indexScan);`
	`994`	`+if (heapScan!=NULL)`
	`995`	`+heap_endscan(heapScan);`
	`996`	`+`
	`997`	`+/*`
	`998`	`+ * In scan-and-sort mode, complete the sort, then read out all live`
	`999`	`+ * tuples from the tuplestore and write them to the new relation.`
	`1000`	`+ */`
	`1001`	`+if (tuplesort!=NULL)`
	`1002`	`+{`
	`1003`	`+tuplesort_performsort(tuplesort);`
	`1004`	`+`
	`1005`	`+for (;;)`
`972`	`1006`	`{`
`973`		`-if (newTupDesc->attrs[i]->attisdropped)`
`974`		`-isnull[i]= true;`
`975`		`-}`
	`1007`	`+HeapTupletuple;`
	`1008`	`+boolshouldfree;`
`976`	`1009`
`977`		`-copiedTuple=heap_form_tuple(newTupDesc,values,isnull);`
	`1010`	`+CHECK_FOR_INTERRUPTS();`
`978`	`1011`
`979`		`-/* Preserve OID, if any */`
`980`		`-if (NewHeap->rd_rel->relhasoids)`
`981`		`-HeapTupleSetOid(copiedTuple,HeapTupleGetOid(tuple));`
	`1012`	`+tuple=tuplesort_getheaptuple(tuplesort, true,&shouldfree);`
	`1013`	`+if (tuple==NULL)`
	`1014`	`+break;`
`982`	`1015`
`983`		`-/* The heap rewrite module does the rest */`
`984`		`-rewrite_heap_tuple(rwstate,tuple,copiedTuple);`
	`1016`	`+reform_and_rewrite_tuple(tuple,`
	`1017`	`+oldTupDesc,newTupDesc,`
	`1018`	`+values,isnull,`
	`1019`	`+NewHeap->rd_rel->relhasoids,rwstate);`
`985`	`1020`
`986`		`-heap_freetuple(copiedTuple);`
`987`		`-}`
	`1021`	`+if (shouldfree)`
	`1022`	`+heap_freetuple(tuple);`
	`1023`	`+}`
`988`	`1024`
`989`		`-if (OldIndex!=NULL)`
`990`		`-index_endscan(indexScan);`
`991`		`-else`
`992`		`-heap_endscan(heapScan);`
	`1025`	`+tuplesort_end(tuplesort);`
	`1026`	`+}`
`993`	`1027`
`994`	`1028`	`/* Write out any remaining tuples, and fsync if needed */`
`995`	`1029`	`end_heap_rewrite(rwstate);`
`@@ -1488,3 +1522,50 @@ get_tables_to_cluster(MemoryContext cluster_context)`
`1488`	`1522`
`1489`	`1523`	`returnrvs;`
`1490`	`1524`	`}`
	`1525`	`+`
	`1526`	`+`
	`1527`	`+/*`
	`1528`	`+ * Reconstruct and rewrite the given tuple`
	`1529`	`+ *`
	`1530`	`+ * We cannot simply copy the tuple as-is, for several reasons:`
	`1531`	`+ *`
	`1532`	`+ * 1. We'd like to squeeze out the values of any dropped columns, both`
	`1533`	`+ * to save space and to ensure we have no corner-case failures. (It's`
	`1534`	`+ * possible for example that the new table hasn't got a TOAST table`
	`1535`	`+ * and so is unable to store any large values of dropped cols.)`
	`1536`	`+ *`
	`1537`	`+ * 2. The tuple might not even be legal for the new table; this is`
	`1538`	`+ * currently only known to happen as an after-effect of ALTER TABLE`
	`1539`	`+ * SET WITHOUT OIDS.`
	`1540`	`+ *`
	`1541`	`+ * So, we must reconstruct the tuple from component Datums.`
	`1542`	`+ */`
	`1543`	`+staticvoid`
	`1544`	`+reform_and_rewrite_tuple(HeapTupletuple,`
	`1545`	`+TupleDescoldTupDesc,TupleDescnewTupDesc,`
	`1546`	`+Datumvalues,boolisnull,`
	`1547`	`+boolnewRelHasOids,RewriteStaterwstate)`
	`1548`	`+{`
	`1549`	`+HeapTuplecopiedTuple;`
	`1550`	`+inti;`
	`1551`	`+`
	`1552`	`+heap_deform_tuple(tuple,oldTupDesc,values,isnull);`
	`1553`	`+`
	`1554`	`+/* Be sure to null out any dropped columns */`
	`1555`	`+for (i=0;i<newTupDesc->natts;i++)`
	`1556`	`+{`
	`1557`	`+if (newTupDesc->attrs[i]->attisdropped)`
	`1558`	`+isnull[i]= true;`
	`1559`	`+}`
	`1560`	`+`
	`1561`	`+copiedTuple=heap_form_tuple(newTupDesc,values,isnull);`
	`1562`	`+`
	`1563`	`+/* Preserve OID, if any */`
	`1564`	`+if (newRelHasOids)`
	`1565`	`+HeapTupleSetOid(copiedTuple,HeapTupleGetOid(tuple));`
	`1566`	`+`
	`1567`	`+/* The heap rewrite module does the rest */`
	`1568`	`+rewrite_heap_tuple(rwstate,tuple,copiedTuple);`
	`1569`	`+`
	`1570`	`+heap_freetuple(copiedTuple);`
	`1571`	`+}`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit3ba11d3

File tree

14 files changed

14 files changed

`‎doc/src/sgml/ref/cluster.sgml`

`‎src/backend/commands/cluster.c`

0 commit comments