1
1
<!--
2
- $Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.18 2002/08/10 21:03:33 momjian Exp $
2
+ $Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.19 2002/08/11 02:43:57 tgl Exp $
3
3
PostgreSQL documentation
4
4
-->
5
5
@@ -73,19 +73,6 @@ CLUSTER
73
73
</para>
74
74
</listitem>
75
75
</varlistentry>
76
- <varlistentry>
77
- <term><computeroutput>
78
- ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exist!
79
- </computeroutput></term>
80
- <listitem>
81
- <para>
82
- <comment>
83
- The specified relation was not shown in the error message,
84
- which contained a random string instead of the relation name.
85
- </comment>
86
- </para>
87
- </listitem>
88
- </varlistentry>
89
76
</variablelist>
90
77
</para>
91
78
</refsect2>
@@ -101,7 +88,7 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
101
88
<para>
102
89
<command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
103
90
to cluster the table specified
104
- by <replaceable class="parameter">table</replaceable> approximately
91
+ by <replaceable class="parameter">table</replaceable>
105
92
based on the index specified by
106
93
<replaceable class="parameter">indexname</replaceable>. The index must
107
94
already have been defined on
@@ -110,11 +97,11 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
110
97
111
98
<para>
112
99
When a table is clustered, it is physically reordered
113
- based on the index information.The clustering isstatic.
114
- In other words, as the table is updated, the changes are
115
- not clustered.No attempt is made tokeep new instances or
116
- updated tuplesclustered . If one wishes, one can
117
- re-cluster manually by issuing the command again.
100
+ based on the index information.Clustering isa one-time operation:
101
+ when the table is subsequently updated, the changes are
102
+ not clustered. That is, no attempt is made tostore new or
103
+ updated tuplesaccording to their index order . If one wishes, one can
104
+ periodically re-cluster by issuing the command again.
118
105
</para>
119
106
120
107
<refsect2 id="R2-SQL-CLUSTER-3">
@@ -146,49 +133,50 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
146
133
</para>
147
134
148
135
<para>
149
- There are two ways to cluster data. The first is with the
150
- <command>CLUSTER</command> command, which reorders the original table with
136
+ During the cluster operation, a temporary copy of the table is created
137
+ that contains the table data in the index order. Temporary copies of
138
+ each index on the table are created as well. Therefore, you need free
139
+ space on disk at least equal to the sum of the table size and the index
140
+ sizes.
141
+ </para>
142
+
143
+ <para>
144
+ CLUSTER preserves GRANT, inheritance, index, foreign key, and other
145
+ ancillary information about the table.
146
+ </para>
147
+
148
+ <para>
149
+ Because the optimizer records statistics about the ordering of tables, it
150
+ is advisable to run <command>ANALYZE</command> on the newly clustered
151
+ table. Otherwise, the optimizer may make poor choices of query plans.
152
+ </para>
153
+
154
+ <para>
155
+ There is another way to cluster data. The
156
+ <command>CLUSTER</command> command reorders the original table using
151
157
the ordering of the index you specify. This can be slow
152
158
on large tables because the rows are fetched from the heap
153
159
in index order, and if the heap table is unordered, the
154
160
entries are on random pages, so there is one disk page
155
- retrieved for every row moved. <productname>PostgreSQL</productname> has a cache,
156
- but the majority of a big table will not fit in the cache.
157
- </para>
158
-
159
- <para>
160
- Another way to cluster data is to use
161
+ retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
162
+ but the majority of a big table will not fit in the cache.)
163
+ The other way to cluster a table is to use
161
164
162
165
<programlisting>
163
166
SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <replaceable class="parameter">newtable</replaceable>
164
167
FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>
165
168
</programlisting>
166
169
167
170
which uses the <productname>PostgreSQL</productname> sorting code in
168
- the ORDER BY clause to match the index, and which is much faster for
171
+ the ORDER BY clause to create the desired order; this is usually much
172
+ faster than an indexscan for
169
173
unordered data. You then drop the old table, use
170
174
<command>ALTER TABLE...RENAME</command>
171
175
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
172
- recreate the table's indexes. The only problem is that <acronym>OID</acronym>s
173
- will not be preserved. From then on, <command>CLUSTER</command> should be
174
- fast because most of the heap data has already been
175
- ordered, and the existing index is used.
176
- </para>
177
-
178
- <para>
179
- During the cluster operation, a temporal table is created that contains
180
- the table in the index order. Due to this, you need to have free space
181
- on disk at least the size of the table itself, or the biggest index if
182
- you have one that is larger than the table.
183
- </para>
184
-
185
- <para>
186
- CLUSTER preserves GRANT, inheritance index, and foreign key information.
187
- </para>
188
-
189
- <para>
190
- Because the optimizer records the cluster status of tables, it is
191
- advised to run <command>ANALYZE</command> on the newly clustered table.
176
+ recreate the table's indexes. However, this approach does not preserve
177
+ OIDs, constraints, foreign key relationships, granted privileges, and
178
+ other ancillary properties of the table --- all such items must be
179
+ manually recreated.
192
180
</para>
193
181
194
182
</refsect2>
@@ -199,7 +187,7 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
199
187
Usage
200
188
</title>
201
189
<para>
202
- Cluster the employees relation on the basis of itssalary attribute:
190
+ Cluster the employees relation on the basis of itsID attribute:
203
191
</para>
204
192
<programlisting>
205
193
CLUSTER emp_ind ON emp;