Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3ba11d3

Browse files
committed
Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan.
... or at least, when the planner's cost estimates say it will be faster.Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane
1 parent694c56a commit3ba11d3

File tree

14 files changed

+713
-138
lines changed

14 files changed

+713
-138
lines changed

‎doc/src/sgml/ref/cluster.sgml

Lines changed: 30 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -128,18 +128,33 @@ CLUSTER [VERBOSE]
128128
</para>
129129

130130
<para>
131-
During the cluster operation, a temporary copy ofthe tableis created
132-
that containsthetable data inthe indexorder. Temporary copies of
133-
each index on the table are created as well. Therefore, you need free
134-
space on disk at least equal to the sum of the table sizeandthe index
135-
sizes.
131+
<command>CLUSTER</> can re-sortthe tableusing either an indexscan
132+
onthespecified index, or (ifthe indexis a b-tree) a sequential
133+
scan followed by sorting. It will attempt to choose the method that
134+
will be faster, based on planner cost parametersandavailable statistical
135+
information.
136136
</para>
137137

138138
<para>
139-
Because <command>CLUSTER</command> remembers the clustering information,
140-
one can cluster the tables one wants clustered manually the first time, and
141-
setup a timed event similar to <command>VACUUM</command> so that the tables
142-
are periodically reclustered.
139+
When an indexscan is used, a temporary copy of the table is created that
140+
contains the table data in the index order. Temporary copies of each
141+
index on the table are created as well. Therefore, you need free space on
142+
disk at least equal to the sum of the table size and the index sizes.
143+
</para>
144+
145+
<para>
146+
When a sequential scan and sort is used, a temporary sort file is
147+
also created, so that the peak temporary space requirement is as much
148+
as double the table size, plus the index sizes. This method is often
149+
faster than the indexscan method, but if the disk space requirement is
150+
intolerable, you can disable this choice by temporarily setting <xref
151+
linkend="guc-enable-sort"> to <literal>off</>.
152+
</para>
153+
154+
<para>
155+
It is advisable to set <xref linkend="guc-maintenance-work-mem"> to
156+
a reasonably large value (but not more than the amount of RAM you can
157+
dedicate to the <command>CLUSTER</> operation) before clustering.
143158
</para>
144159

145160
<para>
@@ -150,35 +165,13 @@ CLUSTER [VERBOSE]
150165
</para>
151166

152167
<para>
153-
There is another way to cluster data. The
154-
<command>CLUSTER</command> command reorders the original table by
155-
scanning it using the index you specify. This can be slow
156-
on large tables because the rows are fetched from the table
157-
in index order, and if the table is disordered, the
158-
entries are on random pages, so there is one disk page
159-
retrieved for every row moved. (<productname>PostgreSQL</productname> has
160-
a cache, but the majority of a big table will not fit in the cache.)
161-
The other way to cluster a table is to use:
162-
163-
<programlisting>
164-
CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
165-
SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
166-
</programlisting>
167-
168-
which uses the <productname>PostgreSQL</productname> sorting code
169-
to produce the desired order;
170-
this is usually much faster than an index scan for disordered data.
171-
Then you drop the old table, use
172-
<command>ALTER TABLE ... RENAME</command>
173-
to rename <replaceable class="parameter">newtable</replaceable> to the
174-
old name, and recreate the table's indexes.
175-
The big disadvantage of this approach is that it does not preserve
176-
OIDs, constraints, foreign key relationships, granted privileges, and
177-
other ancillary properties of the table &mdash; all such items must be
178-
manually recreated. Another disadvantage is that this way requires a sort
179-
temporary file about the same size as the table itself, so peak disk usage
180-
is about three times the table size instead of twice the table size.
168+
Because <command>CLUSTER</command> remembers which indexes are clustered,
169+
one can cluster the tables one wants clustered manually the first time,
170+
then set up a periodic maintenance script that executes
171+
<command>CLUSTER</> without any parameters, so that the desired tables
172+
are periodically reclustered.
181173
</para>
174+
182175
</refsect1>
183176

184177
<refsect1>

‎src/backend/commands/cluster.c

Lines changed: 121 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
#include"commands/trigger.h"
3737
#include"commands/vacuum.h"
3838
#include"miscadmin.h"
39+
#include"optimizer/planner.h"
3940
#include"storage/bufmgr.h"
4041
#include"storage/procarray.h"
4142
#include"storage/smgr.h"
@@ -49,6 +50,7 @@
4950
#include"utils/snapmgr.h"
5051
#include"utils/syscache.h"
5152
#include"utils/tqual.h"
53+
#include"utils/tuplesort.h"
5254

5355

5456
/*
@@ -69,7 +71,10 @@ static void copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
6971
intfreeze_min_age,intfreeze_table_age,
7072
bool*pSwapToastByContent,TransactionId*pFreezeXid);
7173
staticList*get_tables_to_cluster(MemoryContextcluster_context);
72-
74+
staticvoidreform_and_rewrite_tuple(HeapTupletuple,
75+
TupleDescoldTupDesc,TupleDescnewTupDesc,
76+
Datum*values,bool*isnull,
77+
boolnewRelHasOids,RewriteStaterwstate);
7378

7479

7580
/*---------------------------------------------------------------------------
@@ -759,6 +764,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
759764
TransactionIdOldestXmin;
760765
TransactionIdFreezeXid;
761766
RewriteStaterwstate;
767+
booluse_sort;
768+
Tuplesortstate*tuplesort;
762769

763770
/*
764771
* Open the relations we need.
@@ -845,12 +852,30 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
845852
rwstate=begin_heap_rewrite(NewHeap,OldestXmin,FreezeXid,use_wal);
846853

847854
/*
848-
* Scan through the OldHeap, either in OldIndex order or sequentially, and
849-
* copy each tuple into the NewHeap. To ensure we see recently-dead
850-
* tuples that still need to be copied, we scan with SnapshotAny and use
855+
* Decide whether to use an indexscan or seqscan-and-optional-sort to
856+
* scan the OldHeap. We know how to use a sort to duplicate the ordering
857+
* of a btree index, and will use seqscan-and-sort for that case if the
858+
* planner tells us it's cheaper. Otherwise, always indexscan if an
859+
* index is provided, else plain seqscan.
860+
*/
861+
if (OldIndex!=NULL&&OldIndex->rd_rel->relam==BTREE_AM_OID)
862+
use_sort=plan_cluster_use_sort(OIDOldHeap,OIDOldIndex);
863+
else
864+
use_sort= false;
865+
866+
/* Set up sorting if wanted */
867+
if (use_sort)
868+
tuplesort=tuplesort_begin_cluster(oldTupDesc,OldIndex,
869+
maintenance_work_mem, false);
870+
else
871+
tuplesort=NULL;
872+
873+
/*
874+
* Prepare to scan the OldHeap. To ensure we see recently-dead tuples
875+
* that still need to be copied, we scan with SnapshotAny and use
851876
* HeapTupleSatisfiesVacuum for the visibility test.
852877
*/
853-
if (OldIndex!=NULL)
878+
if (OldIndex!=NULL&& !use_sort)
854879
{
855880
heapScan=NULL;
856881
indexScan=index_beginscan(OldHeap,OldIndex,
@@ -862,17 +887,21 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
862887
indexScan=NULL;
863888
}
864889

890+
/*
891+
* Scan through the OldHeap, either in OldIndex order or sequentially;
892+
* copy each tuple into the NewHeap, or transiently to the tuplesort
893+
* module. Note that we don't bother sorting dead tuples (they won't
894+
* get to the new table anyway).
895+
*/
865896
for (;;)
866897
{
867898
HeapTupletuple;
868-
HeapTuplecopiedTuple;
869899
Bufferbuf;
870900
boolisdead;
871-
inti;
872901

873902
CHECK_FOR_INTERRUPTS();
874903

875-
if (OldIndex!=NULL)
904+
if (indexScan!=NULL)
876905
{
877906
tuple=index_getnext(indexScan,ForwardScanDirection);
878907
if (tuple==NULL)
@@ -951,45 +980,50 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
951980
continue;
952981
}
953982

954-
/*
955-
* We cannot simply copy the tuple as-is, for several reasons:
956-
*
957-
* 1. We'd like to squeeze out the values of any dropped columns, both
958-
* to save space and to ensure we have no corner-case failures. (It's
959-
* possible for example that the new table hasn't got a TOAST table
960-
* and so is unable to store any large values of dropped cols.)
961-
*
962-
* 2. The tuple might not even be legal for the new table; this is
963-
* currently only known to happen as an after-effect of ALTER TABLE
964-
* SET WITHOUT OIDS.
965-
*
966-
* So, we must reconstruct the tuple from component Datums.
967-
*/
968-
heap_deform_tuple(tuple,oldTupDesc,values,isnull);
983+
if (tuplesort!=NULL)
984+
tuplesort_putheaptuple(tuplesort,tuple);
985+
else
986+
reform_and_rewrite_tuple(tuple,
987+
oldTupDesc,newTupDesc,
988+
values,isnull,
989+
NewHeap->rd_rel->relhasoids,rwstate);
990+
}
969991

970-
/* Be sure to null out any dropped columns */
971-
for (i=0;i<natts;i++)
992+
if (indexScan!=NULL)
993+
index_endscan(indexScan);
994+
if (heapScan!=NULL)
995+
heap_endscan(heapScan);
996+
997+
/*
998+
* In scan-and-sort mode, complete the sort, then read out all live
999+
* tuples from the tuplestore and write them to the new relation.
1000+
*/
1001+
if (tuplesort!=NULL)
1002+
{
1003+
tuplesort_performsort(tuplesort);
1004+
1005+
for (;;)
9721006
{
973-
if (newTupDesc->attrs[i]->attisdropped)
974-
isnull[i]= true;
975-
}
1007+
HeapTupletuple;
1008+
boolshouldfree;
9761009

977-
copiedTuple=heap_form_tuple(newTupDesc,values,isnull);
1010+
CHECK_FOR_INTERRUPTS();
9781011

979-
/* Preserve OID, if any */
980-
if (NewHeap->rd_rel->relhasoids)
981-
HeapTupleSetOid(copiedTuple,HeapTupleGetOid(tuple));
1012+
tuple=tuplesort_getheaptuple(tuplesort, true,&shouldfree);
1013+
if (tuple==NULL)
1014+
break;
9821015

983-
/* The heap rewrite module does the rest */
984-
rewrite_heap_tuple(rwstate,tuple,copiedTuple);
1016+
reform_and_rewrite_tuple(tuple,
1017+
oldTupDesc,newTupDesc,
1018+
values,isnull,
1019+
NewHeap->rd_rel->relhasoids,rwstate);
9851020

986-
heap_freetuple(copiedTuple);
987-
}
1021+
if (shouldfree)
1022+
heap_freetuple(tuple);
1023+
}
9881024

989-
if (OldIndex!=NULL)
990-
index_endscan(indexScan);
991-
else
992-
heap_endscan(heapScan);
1025+
tuplesort_end(tuplesort);
1026+
}
9931027

9941028
/* Write out any remaining tuples, and fsync if needed */
9951029
end_heap_rewrite(rwstate);
@@ -1488,3 +1522,50 @@ get_tables_to_cluster(MemoryContext cluster_context)
14881522

14891523
returnrvs;
14901524
}
1525+
1526+
1527+
/*
1528+
* Reconstruct and rewrite the given tuple
1529+
*
1530+
* We cannot simply copy the tuple as-is, for several reasons:
1531+
*
1532+
* 1. We'd like to squeeze out the values of any dropped columns, both
1533+
* to save space and to ensure we have no corner-case failures. (It's
1534+
* possible for example that the new table hasn't got a TOAST table
1535+
* and so is unable to store any large values of dropped cols.)
1536+
*
1537+
* 2. The tuple might not even be legal for the new table; this is
1538+
* currently only known to happen as an after-effect of ALTER TABLE
1539+
* SET WITHOUT OIDS.
1540+
*
1541+
* So, we must reconstruct the tuple from component Datums.
1542+
*/
1543+
staticvoid
1544+
reform_and_rewrite_tuple(HeapTupletuple,
1545+
TupleDescoldTupDesc,TupleDescnewTupDesc,
1546+
Datum*values,bool*isnull,
1547+
boolnewRelHasOids,RewriteStaterwstate)
1548+
{
1549+
HeapTuplecopiedTuple;
1550+
inti;
1551+
1552+
heap_deform_tuple(tuple,oldTupDesc,values,isnull);
1553+
1554+
/* Be sure to null out any dropped columns */
1555+
for (i=0;i<newTupDesc->natts;i++)
1556+
{
1557+
if (newTupDesc->attrs[i]->attisdropped)
1558+
isnull[i]= true;
1559+
}
1560+
1561+
copiedTuple=heap_form_tuple(newTupDesc,values,isnull);
1562+
1563+
/* Preserve OID, if any */
1564+
if (newRelHasOids)
1565+
HeapTupleSetOid(copiedTuple,HeapTupleGetOid(tuple));
1566+
1567+
/* The heap rewrite module does the rest */
1568+
rewrite_heap_tuple(rwstate,tuple,copiedTuple);
1569+
1570+
heap_freetuple(copiedTuple);
1571+
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp