11<!--
2- $Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.44 2003/04/20 01:52:55 momjian Exp $
2+ $Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.45 2003/05/07 22:23:27 tgl Exp $
33PostgreSQL documentation
44-->
55
@@ -119,7 +119,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
119119 <term><literal>BINARY</literal></term>
120120 <listitem>
121121 <para>
122- Forces all data to be stored or read in binary format rather
122+ Causes all data to be stored or read in binary format rather
123123 than as text. You cannot specify the <option>DELIMITER</option>
124124 or <option>NULL</option> options in binary mode.
125125 </para>
@@ -193,17 +193,18 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
193193 </para>
194194
195195 <para>
196- The <literal>BINARY</literal> key wordwill force all data to be
196+ The <literal>BINARY</literal> key wordcauses all data to be
197197 stored/read as binary format rather than as text. It is
198- somewhat faster than the normal text mode, but a binary format
199- file is not portable across machine architectures.
198+ somewhat faster than the normal text mode, but a binary-format
199+ file is less portable across machine architectures and
200+ <productname>PostgreSQL</productname> versions.
200201 </para>
201202
202203 <para>
203- You must have select privilege onany table
204+ You must have select privilege onthe table
204205 whose values are read by <command>COPY TO</command>, and
205- insert privilege ona table into which values
206- arebeing inserted by <command>COPY FROM</command>.
206+ insert privilege onthe table into which values
207+ are inserted by <command>COPY FROM</command>.
207208 </para>
208209
209210 <para>
@@ -279,8 +280,8 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
279280 End of data can be represented by a single line containing just
280281 backslash-period (<literal>\.</>). An end-of-data marker is
281282 not necessary when reading from a file, since the end of file
282- serves perfectly well;but an end marker must be provided when copying
283- data to or from a clientapplication .
283+ serves perfectly well;it is needed only when copying data to or from
284+ client applications using pre-3.0 clientprotocol .
284285 </para>
285286
286287 <para>
@@ -358,6 +359,9 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
358359 possible to represent a data carriage return by a backslash and carriage
359360 return, and to represent a data newline by a backslash and newline.
360361 However, these representations might not be accepted in future releases.
362+ They are also highly vulnerable to corruption if the COPY file is
363+ transferred across different machines (for example, from Unix to Windows
364+ or vice versa).
361365 </para>
362366
363367 <para>
@@ -374,7 +378,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
374378
375379 <para>
376380 The file format used for <command>COPY BINARY</command> changed in
377- <application>PostgreSQL</application> 7.1 . The new format consists
381+ <application>PostgreSQL</application> 7.4 . The new format consists
378382 of a file header, zero or more tuples containing the row data, and
379383 a file trailer.
380384 </para>
@@ -383,15 +387,15 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
383387 <title>File Header</title>
384388
385389 <para>
386- The file header consists of24 bytes of fixed fields, followed
390+ The file header consists of15 bytes of fixed fields, followed
387391 by a variable-length header extension area. The fixed fields are:
388392
389393 <variablelist>
390394 <varlistentry>
391395 <term>Signature</term>
392396 <listitem>
393397 <para>
394- 12 -byte sequence <literal>PGBCOPY \n\377\r\n\0</> --- note that the zero byte
398+ 11 -byte sequence <literal>PGCOPY \n\377\r\n\0</> --- note that the zero byte
395399is a required part of the signature. (The signature is designed to allow
396400easy identification of files that have been munged by a non-8-bit-clean
397401transfer. This signature will be changed by end-of-line-translation
@@ -400,24 +404,14 @@ filters, dropped zero bytes, dropped high bits, or parity changes.)
400404 </listitem>
401405 </varlistentry>
402406
403- <varlistentry>
404- <term>Integer layout field</term>
405- <listitem>
406- <para>
407- 32-bit integer constant 0x01020304 in source's byte order. Potentially, a reader
408- could engage in byte-flipping of subsequent fields if the wrong byte
409- order is detected here.
410- </para>
411- </listitem>
412- </varlistentry>
413-
414407 <varlistentry>
415408 <term>Flags field</term>
416409 <listitem>
417410 <para>
418- 32-bit integer bit mask to denote important aspects of the file format. Bits are
419- numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>) --- note that this field is stored
420- with source's endianness, as are all subsequent integer fields. Bits
411+ 32-bit integer bit mask to denote important aspects of the file format. Bits
412+ are numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>). Note that
413+ this field is stored in network byte order (most significant byte first),
414+ as are all the integer fields used in the file format. Bits
42141516-31 are reserved to denote critical file format issues; a reader
422416should abort if it finds an unexpected bit set in this range. Bits 0-15
423417are reserved to signal backwards-compatible format issues; a reader
@@ -471,72 +465,28 @@ is left for a later release.
471465 <title>Tuples</title>
472466 <para>
473467Each tuple begins with a 16-bit integer count of the number of fields in the
474- tuple. (Presently, all tuples in a table will have the same count, but
475- that might not always be true.) Then, repeated for each field in the
476- tuple, there is a 16-bit integer <structfield>typlen</> word possibly followed by field data.
477- The <structfield>typlen</> field is interpreted thus:
478-
479- <variablelist>
480- <varlistentry>
481- <term>Zero</term>
482- <listitem>
483- <para>
484- Field is null. No data follows.
485- </para>
486- </listitem>
487- </varlistentry>
488-
489- <varlistentry>
490- <term>> 0</term>
491- <listitem>
492- <para>
493- Field is a fixed-length data type. Exactly that many
494- bytes of data follow the <structfield>typlen</> word.
495- </para>
496- </listitem>
497- </varlistentry>
498-
499- <varlistentry>
500- <term>-1</term>
501- <listitem>
502- <para>
503- Field is a <literal>varlena</> data type. The next four
504- bytes are the <literal>varlena</> header, which contains
505- the total value length including the header itself.
506- </para>
507- </listitem>
508- </varlistentry>
509-
510- <varlistentry>
511- <term>< -1</term>
512- <listitem>
513- <para>
514- Reserved for future use.
515- </para>
516- </listitem>
517- </varlistentry>
518- </variablelist>
468+ tuple. (Presently, all tuples in a table will have the same count, but that
469+ might not always be true.) Then, repeated for each field in the tuple, there
470+ is a 32-bit length word followed by that many bytes of field data. (The
471+ length word does not include itself, and can be zero.) As a special case,
472+ -1 indicates a NULL field value. No value bytes follow in the NULL case.
519473 </para>
520474
521475 <para>
522- For nonnull fields, the reader can check that the <structfield>typlen</> matches the
523- expected <structfield>typlen</> for the destination column. This provides a simple
524- but very useful check that the data is as expected.
476+ There is no alignment padding or any other extra data between fields.
525477 </para>
526478
527479 <para>
528- There is no alignment padding or any other extra data between fields.
529- Note also that the format does not distinguish whether a data type is
530- pass-by-reference or pass-by-value. Both of these provisions are
531- deliberate: they might help improve portability of the files (although
532- of course endianness and floating-point-format issues can still keep
533- you from moving a binary file across machines).
480+ Presently, all data values in a <command>COPY BINARY</command> file are
481+ assumed to be in binary format (format code one). It is anticipated that a
482+ future extension may add a header field that allows per-column format codes
483+ to be specified.
534484 </para>
535485
536486 <para>
537487If OIDs are included in the file, the OID field immediately follows the
538488field-count word. It is a normal field except that it's not included
539- in the field-count. In particular it has a<structfield>typlen</> --- this will allow
489+ in the field-count. In particular it has alength word --- this will allow
540490handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow
541491OIDs to be shown as null if that ever proves desirable.
542492 </para>
@@ -546,8 +496,8 @@ OIDs to be shown as null if that ever proves desirable.
546496 <title>File Trailer</title>
547497
548498 <para>
549- The file trailer consists ofan 16-bit integer word containing -1. This is
550- easily distinguished from a tuple's field-count word.
499+ The file trailer consists ofa 16-bit integer word containing -1. This
500+ is easily distinguished from a tuple's field-count word.
551501 </para>
552502
553503 <para>
@@ -579,19 +529,22 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
579529
580530 <para>
581531 Here is a sample of data suitable for copying into a table from
582- <literal>STDIN</literal> (so it must have the termination sequence on the
583- last line):
532+ <literal>STDIN</literal>:
584533<programlisting>
585534AF AFGHANISTAN
586535AL ALBANIA
587536DZ ALGERIA
588537ZM ZAMBIA
589538ZW ZIMBABWE
590- \.
591539</programlisting>
592540 Note that the white space on each line is actually a tab character.
593541 </para>
594542
543+ <para>
544+ XXX the following example is OBSOLETE and needs to be updated for the
545+ 7.4 binary format:
546+ </para>
547+
595548 <para>
596549 The following is the same data, output in binary format on a
597550 Linux/i586 machine. The data is shown after filtering through the