|
1 | 1 | <!-- |
2 | | -$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.37 2006/10/31 01:52:31 neilc Exp $ |
| 2 | +$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.38 2006/11/04 19:03:51 tgl Exp $ |
3 | 3 | PostgreSQL documentation |
4 | 4 | --> |
5 | 5 |
|
@@ -108,8 +108,8 @@ CLUSTER |
108 | 108 | If you are requesting a range of indexed values from a table, or a |
109 | 109 | single indexed value that has multiple rows that match, |
110 | 110 | <command>CLUSTER</command> will help because once the index identifies the |
111 | | -heap page for the first row that matches, all other rows |
112 | | - that match are probably already on the sameheap page, |
| 111 | +table page for the first row that matches, all other rows |
| 112 | + that match are probably already on the sametable page, |
113 | 113 | and so you save disk accesses and speed up the query. |
114 | 114 | </para> |
115 | 115 |
|
@@ -137,30 +137,33 @@ CLUSTER |
137 | 137 |
|
138 | 138 | <para> |
139 | 139 | There is another way to cluster data. The |
140 | | - <command>CLUSTER</command> command reorders the original tableusing |
141 | | -the ordering of the index you specify. This can be slow |
142 | | - on large tables because the rows are fetched from theheap |
143 | | - in index order, and if theheaptable isunordered, the |
| 140 | + <command>CLUSTER</command> command reorders the original tableby |
| 141 | +scanning it using the index you specify. This can be slow |
| 142 | + on large tables because the rows are fetched from thetable |
| 143 | + in index order, and if the table isdisordered, the |
144 | 144 | entries are on random pages, so there is one disk page |
145 | | - retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache, |
146 | | - but the majority of a big table will not fit in the cache.) |
| 145 | + retrieved for every row moved. (<productname>PostgreSQL</productname> has |
| 146 | +a cache,but the majority of a big table will not fit in the cache.) |
147 | 147 | The other way to cluster a table is to use |
148 | 148 |
|
149 | 149 | <programlisting> |
150 | 150 | CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS |
151 | | - SELECT<replaceable class="parameter">columnlist</replaceable> FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>; |
| 151 | + SELECT* FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>; |
152 | 152 | </programlisting> |
153 | 153 |
|
154 | | - which uses the <productname>PostgreSQL</productname> sorting code in |
155 | | -the <literal>ORDER BY</literal> clausetocreate the desired order; this is usually much |
156 | | - faster than an index scan for |
157 | | -unordered data. You then drop the old table, use |
| 154 | + which uses the <productname>PostgreSQL</productname> sorting code |
| 155 | + toproduce the desired order; |
| 156 | +this is usually muchfaster than an index scan for disordered data. |
| 157 | +Then you drop the old table, use |
158 | 158 | <command>ALTER TABLE ... RENAME</command> |
159 | | - to rename <replaceable class="parameter">newtable</replaceable> to the old name, and |
160 | | - recreate the table's indexes. However, this approach does not preserve |
| 159 | + to rename <replaceable class="parameter">newtable</replaceable> to the |
| 160 | + old name, and recreate the table's indexes. |
| 161 | + The big disadvantage of this approach is that it does not preserve |
161 | 162 | OIDs, constraints, foreign key relationships, granted privileges, and |
162 | 163 | other ancillary properties of the table — all such items must be |
163 | | - manually recreated. |
| 164 | + manually recreated. Another disadvantage is that this way requires a sort |
| 165 | + temporary file about the same size as the table itself, so peak disk usage |
| 166 | + is about three times the table size instead of twice the table size. |
164 | 167 | </para> |
165 | 168 | </refsect1> |
166 | 169 |
|
|