Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1718f4c

Browse files
committed
Update COPY BINARY file format spec to reflect recent decisions about
external representation of binary data.
1 parent2de6da8 commit1718f4c

File tree

1 file changed

+40
-87
lines changed

1 file changed

+40
-87
lines changed

‎doc/src/sgml/ref/copy.sgml

Lines changed: 40 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.44 2003/04/20 01:52:55 momjian Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.45 2003/05/07 22:23:27 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -119,7 +119,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
119119
<term><literal>BINARY</literal></term>
120120
<listitem>
121121
<para>
122-
Forces all data to be stored or read in binary format rather
122+
Causes all data to be stored or read in binary format rather
123123
than as text. You cannot specify the <option>DELIMITER</option>
124124
or <option>NULL</option> options in binary mode.
125125
</para>
@@ -193,17 +193,18 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
193193
</para>
194194

195195
<para>
196-
The <literal>BINARY</literal> key wordwill force all data to be
196+
The <literal>BINARY</literal> key wordcauses all data to be
197197
stored/read as binary format rather than as text. It is
198-
somewhat faster than the normal text mode, but a binary format
199-
file is not portable across machine architectures.
198+
somewhat faster than the normal text mode, but a binary-format
199+
file is less portable across machine architectures and
200+
<productname>PostgreSQL</productname> versions.
200201
</para>
201202

202203
<para>
203-
You must have select privilege onany table
204+
You must have select privilege onthe table
204205
whose values are read by <command>COPY TO</command>, and
205-
insert privilege ona table into which values
206-
arebeinginserted by <command>COPY FROM</command>.
206+
insert privilege onthe table into which values
207+
are inserted by <command>COPY FROM</command>.
207208
</para>
208209

209210
<para>
@@ -279,8 +280,8 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
279280
End of data can be represented by a single line containing just
280281
backslash-period (<literal>\.</>). An end-of-data marker is
281282
not necessary when reading from a file, since the end of file
282-
serves perfectly well;but an end marker must be provided when copying
283-
data to or from aclientapplication.
283+
serves perfectly well;it is needed only when copying data to or from
284+
client applications using pre-3.0clientprotocol.
284285
</para>
285286

286287
<para>
@@ -358,6 +359,9 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
358359
possible to represent a data carriage return by a backslash and carriage
359360
return, and to represent a data newline by a backslash and newline.
360361
However, these representations might not be accepted in future releases.
362+
They are also highly vulnerable to corruption if the COPY file is
363+
transferred across different machines (for example, from Unix to Windows
364+
or vice versa).
361365
</para>
362366

363367
<para>
@@ -374,7 +378,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
374378

375379
<para>
376380
The file format used for <command>COPY BINARY</command> changed in
377-
<application>PostgreSQL</application> 7.1. The new format consists
381+
<application>PostgreSQL</application> 7.4. The new format consists
378382
of a file header, zero or more tuples containing the row data, and
379383
a file trailer.
380384
</para>
@@ -383,15 +387,15 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
383387
<title>File Header</title>
384388

385389
<para>
386-
The file header consists of24 bytes of fixed fields, followed
390+
The file header consists of15 bytes of fixed fields, followed
387391
by a variable-length header extension area. The fixed fields are:
388392

389393
<variablelist>
390394
<varlistentry>
391395
<term>Signature</term>
392396
<listitem>
393397
<para>
394-
12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
398+
11-byte sequence <literal>PGCOPY\n\377\r\n\0</> --- note that the zero byte
395399
is a required part of the signature. (The signature is designed to allow
396400
easy identification of files that have been munged by a non-8-bit-clean
397401
transfer. This signature will be changed by end-of-line-translation
@@ -400,24 +404,14 @@ filters, dropped zero bytes, dropped high bits, or parity changes.)
400404
</listitem>
401405
</varlistentry>
402406

403-
<varlistentry>
404-
<term>Integer layout field</term>
405-
<listitem>
406-
<para>
407-
32-bit integer constant 0x01020304 in source's byte order. Potentially, a reader
408-
could engage in byte-flipping of subsequent fields if the wrong byte
409-
order is detected here.
410-
</para>
411-
</listitem>
412-
</varlistentry>
413-
414407
<varlistentry>
415408
<term>Flags field</term>
416409
<listitem>
417410
<para>
418-
32-bit integer bit mask to denote important aspects of the file format. Bits are
419-
numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>) --- note that this field is stored
420-
with source's endianness, as are all subsequent integer fields. Bits
411+
32-bit integer bit mask to denote important aspects of the file format. Bits
412+
are numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>). Note that
413+
this field is stored in network byte order (most significant byte first),
414+
as are all the integer fields used in the file format. Bits
421415
16-31 are reserved to denote critical file format issues; a reader
422416
should abort if it finds an unexpected bit set in this range. Bits 0-15
423417
are reserved to signal backwards-compatible format issues; a reader
@@ -471,72 +465,28 @@ is left for a later release.
471465
<title>Tuples</title>
472466
<para>
473467
Each tuple begins with a 16-bit integer count of the number of fields in the
474-
tuple. (Presently, all tuples in a table will have the same count, but
475-
that might not always be true.) Then, repeated for each field in the
476-
tuple, there is a 16-bit integer <structfield>typlen</> word possibly followed by field data.
477-
The <structfield>typlen</> field is interpreted thus:
478-
479-
<variablelist>
480-
<varlistentry>
481-
<term>Zero</term>
482-
<listitem>
483-
<para>
484-
Field is null. No data follows.
485-
</para>
486-
</listitem>
487-
</varlistentry>
488-
489-
<varlistentry>
490-
<term>&gt; 0</term>
491-
<listitem>
492-
<para>
493-
Field is a fixed-length data type. Exactly that many
494-
bytes of data follow the <structfield>typlen</> word.
495-
</para>
496-
</listitem>
497-
</varlistentry>
498-
499-
<varlistentry>
500-
<term>-1</term>
501-
<listitem>
502-
<para>
503-
Field is a <literal>varlena</> data type. The next four
504-
bytes are the <literal>varlena</> header, which contains
505-
the total value length including the header itself.
506-
</para>
507-
</listitem>
508-
</varlistentry>
509-
510-
<varlistentry>
511-
<term>&lt; -1</term>
512-
<listitem>
513-
<para>
514-
Reserved for future use.
515-
</para>
516-
</listitem>
517-
</varlistentry>
518-
</variablelist>
468+
tuple. (Presently, all tuples in a table will have the same count, but that
469+
might not always be true.) Then, repeated for each field in the tuple, there
470+
is a 32-bit length word followed by that many bytes of field data. (The
471+
length word does not include itself, and can be zero.) As a special case,
472+
-1 indicates a NULL field value. No value bytes follow in the NULL case.
519473
</para>
520474

521475
<para>
522-
For nonnull fields, the reader can check that the <structfield>typlen</> matches the
523-
expected <structfield>typlen</> for the destination column. This provides a simple
524-
but very useful check that the data is as expected.
476+
There is no alignment padding or any other extra data between fields.
525477
</para>
526478

527479
<para>
528-
There is no alignment padding or any other extra data between fields.
529-
Note also that the format does not distinguish whether a data type is
530-
pass-by-reference or pass-by-value. Both of these provisions are
531-
deliberate: they might help improve portability of the files (although
532-
of course endianness and floating-point-format issues can still keep
533-
you from moving a binary file across machines).
480+
Presently, all data values in a <command>COPY BINARY</command> file are
481+
assumed to be in binary format (format code one). It is anticipated that a
482+
future extension may add a header field that allows per-column format codes
483+
to be specified.
534484
</para>
535485

536486
<para>
537487
If OIDs are included in the file, the OID field immediately follows the
538488
field-count word. It is a normal field except that it's not included
539-
in the field-count. In particular it has a<structfield>typlen</> --- this will allow
489+
in the field-count. In particular it has alength word --- this will allow
540490
handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow
541491
OIDs to be shown as null if that ever proves desirable.
542492
</para>
@@ -546,8 +496,8 @@ OIDs to be shown as null if that ever proves desirable.
546496
<title>File Trailer</title>
547497

548498
<para>
549-
The file trailer consists ofan 16-bit integer word containing -1. This is
550-
easily distinguished from a tuple's field-count word.
499+
The file trailer consists ofa 16-bit integer word containing -1. This
500+
iseasily distinguished from a tuple's field-count word.
551501
</para>
552502

553503
<para>
@@ -579,19 +529,22 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
579529

580530
<para>
581531
Here is a sample of data suitable for copying into a table from
582-
<literal>STDIN</literal> (so it must have the termination sequence on the
583-
last line):
532+
<literal>STDIN</literal>:
584533
<programlisting>
585534
AF AFGHANISTAN
586535
AL ALBANIA
587536
DZ ALGERIA
588537
ZM ZAMBIA
589538
ZW ZIMBABWE
590-
\.
591539
</programlisting>
592540
Note that the white space on each line is actually a tab character.
593541
</para>
594542

543+
<para>
544+
XXX the following example is OBSOLETE and needs to be updated for the
545+
7.4 binary format:
546+
</para>
547+
595548
<para>
596549
The following is the same data, output in binary format on a
597550
Linux/i586 machine. The data is shown after filtering through the

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp