11<!--
2- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.43 2004/03/25 18:57:57 tgl Exp $
2+ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.44 2004/04/22 07:02:36 neilc Exp $
33-->
44
55 <chapter id="performance-tips">
@@ -28,8 +28,8 @@ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.43 2004/03/25 18:57:57 tgl Exp
2828 plan</firstterm> for each query it is given. Choosing the right
2929 plan to match the query structure and the properties of the data
3030 is absolutely critical for good performance. You can use the
31- <command>EXPLAIN</command> command to see what query plan the system
32- creates for any query.
31+ <xref linkend="sql-explain" endterm="sql-explain-title"> command
32+ to see what query plan the system creates for any query.
3333 Plan-reading is an art that deserves an extensive tutorial, which
3434 this is not; but here is some basic information.
3535 </para>
@@ -638,30 +638,51 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
638638 </indexterm>
639639
640640 <para>
641- Turn off autocommit and just do one commit at
642- the end. (In plain SQL, this means issuing <command>BEGIN</command>
643- at the start and <command>COMMIT</command> at the end. Some client
644- libraries may do this behind your back, in which case you need to
645- make sure the library does it when you want it done.)
646- If you allow each insertion to be committed separately,
647- <productname>PostgreSQL</productname> is doing a lot of work for each
648- row that is added.
649- An additional benefit of doing all insertions in one transaction
650- is that if the insertion of one row were to fail then the
651- insertion of all rows inserted up to that point would be rolled
652- back, so you won't be stuck with partially loaded data.
641+ Turn off autocommit and just do one commit at the end. (In plain
642+ SQL, this means issuing <command>BEGIN</command> at the start and
643+ <command>COMMIT</command> at the end. Some client libraries may
644+ do this behind your back, in which case you need to make sure the
645+ library does it when you want it done.) If you allow each
646+ insertion to be committed separately,
647+ <productname>PostgreSQL</productname> is doing a lot of work for
648+ each row that is added. An additional benefit of doing all
649+ insertions in one transaction is that if the insertion of one row
650+ were to fail then the insertion of all rows inserted up to that
651+ point would be rolled back, so you won't be stuck with partially
652+ loaded data.
653+ </para>
654+
655+ <para>
656+ If you are issuing a large sequence of <command>INSERT</command>
657+ commands to bulk load some data, also consider using <xref
658+ linkend="sql-prepare" endterm="sql-prepare-title"> to create a
659+ prepared <command>INSERT</command> statement. Since you are
660+ executing the same command multiple times, it is more efficient to
661+ prepare the command once and then use <command>EXECUTE</command>
662+ as many times as required.
653663 </para>
654664 </sect2>
655665
656666 <sect2 id="populate-copy-from">
657- <title>Use <command>COPY FROM</command></title>
667+ <title>Use <command>COPY</command></title>
668+
669+ <para>
670+ Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load
671+ all the rows in one command, instead of using a series of
672+ <command>INSERT</command> commands. The <command>COPY</command>
673+ command is optimized for loading large numbers of rows; it is less
674+ flexible than <command>INSERT</command>, but incurs significantly
675+ less overhead for large data loads. Since <command>COPY</command>
676+ is a single command, there is no need to disable autocommit if you
677+ use this method to populate a table.
678+ </para>
658679
659680 <para>
660- Use <command>COPY FROM STDIN</command> to load all the rowsin one
661- command, instead of using a series of <command>INSERT </command>
662- commands. This reduces parsing, planning, etc. overhead a great
663- deal. If you do this then it is not necessary to turn off
664- autocommit, since it is only one command anyway .
681+ Note that loading a large number of rowsusing
682+ <command>COPY </command> is almost always faster than using
683+ <command>INSERT</command>, even if multiple
684+ <command>INSERT</command> commands are batched into a single
685+ transaction .
665686 </para>
666687 </sect2>
667688
@@ -678,11 +699,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
678699
679700 <para>
680701 If you are augmenting an existing table, you can drop the index,
681- load the table, then recreate the index. Of
682- course, the database performance for other users may be adversely
683- affected during the time that the index is missing. One should also
684- think twice before dropping unique indexes, since the error checking
685- afforded by the unique constraint will be lost while the index is missing.
702+ load the table, and then recreate the index. Of course, the
703+ database performance for other users may be adversely affected
704+ during the time that the index is missing. One should also think
705+ twice before dropping unique indexes, since the error checking
706+ afforded by the unique constraint will be lost while the index is
707+ missing.
686708 </para>
687709 </sect2>
688710
@@ -701,16 +723,39 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
701723 </para>
702724 </sect2>
703725
726+ <sect2 id="populate-checkpoint-segments">
727+ <title>Increase <varname>checkpoint_segments</varname></title>
728+
729+ <para>
730+ Temporarily increasing the <xref
731+ linkend="guc-checkpoint-segments"> configuration variable can also
732+ make large data loads faster. This is because loading a large
733+ amount of data into <productname>PostgreSQL</productname> can
734+ cause checkpoints to occur more often than the normal checkpoint
735+ frequency (specified by the <varname>checkpoint_timeout</varname>
736+ configuration variable). Whenever a checkpoint occurs, all dirty
737+ pages must be flushed to disk. By increasing
738+ <varname>checkpoint_segments</varname> temporarily during bulk
739+ data loads, the number of checkpoints that are required can be
740+ reduced.
741+ </para>
742+ </sect2>
743+
704744 <sect2 id="populate-analyze">
705745 <title>Run <command>ANALYZE</command> Afterwards</title>
706746
707747 <para>
708- It's a good idea to run <command>ANALYZE</command> or <command>VACUUM
709- ANALYZE</command> anytime you've added or updated a lot of data,
710- including just after initially populating a table. This ensures that
711- the planner has up-to-date statistics about the table. With no statistics
712- or obsolete statistics, the planner may make poor choices of query plans,
713- leading to bad performance on queries that use your table.
748+ Whenever you have significantly altered the distribution of data
749+ within a table, running <xref linkend="sql-analyze"
750+ endterm="sql-analyze-title"> is strongly recommended. This
751+ includes when bulk loading large amounts of data into
752+ <productname>PostgreSQL</productname>. Running
753+ <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
754+ ensures that the planner has up-to-date statistics about the
755+ table. With no statistics or obsolete statistics, the planner may
756+ make poor decisions during query planning, leading to poor
757+ performance on any tables with inaccurate or nonexistent
758+ statistics.
714759 </para>
715760 </sect2>
716761 </sect1>