|
1 | 1 | <!--
|
2 |
| -$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.37 2006/10/31 01:52:31 neilc Exp $ |
| 2 | +$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.38 2006/11/04 19:03:51 tgl Exp $ |
3 | 3 | PostgreSQL documentation
|
4 | 4 | -->
|
5 | 5 |
|
@@ -108,8 +108,8 @@ CLUSTER
|
108 | 108 | If you are requesting a range of indexed values from a table, or a
|
109 | 109 | single indexed value that has multiple rows that match,
|
110 | 110 | <command>CLUSTER</command> will help because once the index identifies the
|
111 |
| -heap page for the first row that matches, all other rows |
112 |
| - that match are probably already on the sameheap page, |
| 111 | +table page for the first row that matches, all other rows |
| 112 | + that match are probably already on the sametable page, |
113 | 113 | and so you save disk accesses and speed up the query.
|
114 | 114 | </para>
|
115 | 115 |
|
@@ -137,30 +137,33 @@ CLUSTER
|
137 | 137 |
|
138 | 138 | <para>
|
139 | 139 | There is another way to cluster data. The
|
140 |
| - <command>CLUSTER</command> command reorders the original tableusing |
141 |
| -the ordering of the index you specify. This can be slow |
142 |
| - on large tables because the rows are fetched from theheap |
143 |
| - in index order, and if theheaptable isunordered, the |
| 140 | + <command>CLUSTER</command> command reorders the original tableby |
| 141 | +scanning it using the index you specify. This can be slow |
| 142 | + on large tables because the rows are fetched from thetable |
| 143 | + in index order, and if the table isdisordered, the |
144 | 144 | entries are on random pages, so there is one disk page
|
145 |
| - retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache, |
146 |
| - but the majority of a big table will not fit in the cache.) |
| 145 | + retrieved for every row moved. (<productname>PostgreSQL</productname> has |
| 146 | +a cache,but the majority of a big table will not fit in the cache.) |
147 | 147 | The other way to cluster a table is to use
|
148 | 148 |
|
149 | 149 | <programlisting>
|
150 | 150 | CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
|
151 |
| - SELECT<replaceable class="parameter">columnlist</replaceable> FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>; |
| 151 | + SELECT* FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>; |
152 | 152 | </programlisting>
|
153 | 153 |
|
154 |
| - which uses the <productname>PostgreSQL</productname> sorting code in |
155 |
| -the <literal>ORDER BY</literal> clausetocreate the desired order; this is usually much |
156 |
| - faster than an index scan for |
157 |
| -unordered data. You then drop the old table, use |
| 154 | + which uses the <productname>PostgreSQL</productname> sorting code |
| 155 | + toproduce the desired order; |
| 156 | +this is usually muchfaster than an index scan for disordered data. |
| 157 | +Then you drop the old table, use |
158 | 158 | <command>ALTER TABLE ... RENAME</command>
|
159 |
| - to rename <replaceable class="parameter">newtable</replaceable> to the old name, and |
160 |
| - recreate the table's indexes. However, this approach does not preserve |
| 159 | + to rename <replaceable class="parameter">newtable</replaceable> to the |
| 160 | + old name, and recreate the table's indexes. |
| 161 | + The big disadvantage of this approach is that it does not preserve |
161 | 162 | OIDs, constraints, foreign key relationships, granted privileges, and
|
162 | 163 | other ancillary properties of the table — all such items must be
|
163 |
| - manually recreated. |
| 164 | + manually recreated. Another disadvantage is that this way requires a sort |
| 165 | + temporary file about the same size as the table itself, so peak disk usage |
| 166 | + is about three times the table size instead of twice the table size. |
164 | 167 | </para>
|
165 | 168 | </refsect1>
|
166 | 169 |
|
|