Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9e257a1

Browse files
committed
Add parallel pg_dump option.
New infrastructure is added which creates a set number of workers(threads on Windows, forked processes on Unix). Jobs are thenhanded out to these workers by the master process as needed.pg_restore is adjusted to use this new infrastructure in place of theold setup which created a new worker for each step on the fly. Paralleldumps acquire a snapshot clone in order to stay consistent, ifavailable.The parallel option is selected by the -j / --jobs command lineparameter of pg_dump.Joachim Wieland, lightly editorialized by Andrew Dunstan.
1 parent3b91fe1 commit9e257a1

22 files changed

+2765
-819
lines changed

‎doc/src/sgml/backup.sgml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,24 @@ pg_restore -d <replaceable class="parameter">dbname</replaceable> <replaceable c
310310
with one of the other two approaches.
311311
</para>
312312

313+
<formalpara>
314+
<title>Use <application>pg_dump</>'s parallel dump feature.</title>
315+
<para>
316+
To speed up the dump of a large database, you can use
317+
<application>pg_dump</application>'s parallel mode. This will dump
318+
multiple tables at the same time. You can control the degree of
319+
parallelism with the <command>-j</command> parameter. Parallel dumps
320+
are only supported for the "directory" archive format.
321+
322+
<programlisting>
323+
pg_dump -j <replaceable class="parameter">num</replaceable> -F d -f <replaceable class="parameter">out.dir</replaceable> <replaceable class="parameter">dbname</replaceable>
324+
</programlisting>
325+
326+
You can use <command>pg_restore -j</command> to restore a dump in parallel.
327+
This will work for any archive of either the "custom" or the "directory"
328+
archive mode, whether or not it has been created with <command>pg_dump -j</command>.
329+
</para>
330+
</formalpara>
313331
</sect2>
314332
</sect1>
315333

‎doc/src/sgml/perform.sgml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1433,6 +1433,15 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
14331433
base backup.
14341434
</para>
14351435
</listitem>
1436+
<listitem>
1437+
<para>
1438+
Experiment with the parallel dump and restore modes of both
1439+
<application>pg_dump</> and <application>pg_restore</> and find the
1440+
optimal number of concurrent jobs to use. Dumping and restoring in
1441+
parallel by means of the <option>-j</> option should give you a
1442+
significantly higher performance over the serial mode.
1443+
</para>
1444+
</listitem>
14361445
<listitem>
14371446
<para>
14381447
Consider whether the whole dump should be restored as a single

‎doc/src/sgml/ref/pg_dump.sgml

Lines changed: 84 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,12 @@ PostgreSQL documentation
7373
transfer mechanism. <application>pg_dump</application> can be used to
7474
backup an entire database, then <application>pg_restore</application>
7575
can be used to examine the archive and/or select which parts of the
76-
database are to be restored. The most flexible output file format is
77-
the <quote>custom</quote> format (<option>-Fc</option>). It allows
78-
for selection and reordering of all archived items, and is compressed
79-
by default.
76+
database are to be restored. The most flexible output file formats are
77+
the <quote>custom</quote> format (<option>-Fc</option>) and the
78+
<quote>directory</quote> format(<option>-Fd</option>). They allow
79+
for selection and reordering of all archived items, support parallel
80+
restoration, and are compressed by default. The <quote>directory</quote>
81+
format is the only format that supports parallel dumps.
8082
</para>
8183

8284
<para>
@@ -251,7 +253,8 @@ PostgreSQL documentation
251253
can read. A directory format archive can be manipulated with
252254
standard Unix tools; for example, files in an uncompressed archive
253255
can be compressed with the <application>gzip</application> tool.
254-
This format is compressed by default.
256+
This format is compressed by default and also supports parallel
257+
dumps.
255258
</para>
256259
</listitem>
257260
</varlistentry>
@@ -285,6 +288,62 @@ PostgreSQL documentation
285288
</listitem>
286289
</varlistentry>
287290

291+
<varlistentry>
292+
<term><option>-j <replaceable class="parameter">njobs</replaceable></></term>
293+
<term><option>--jobs=<replaceable class="parameter">njobs</replaceable></></term>
294+
<listitem>
295+
<para>
296+
Run the dump in parallel by dumping <replaceable class="parameter">njobs</replaceable>
297+
tables simultaneously. This option reduces the time of the dump but it also
298+
increases the load on the database server. You can only use this option with the
299+
directory output format because this is the only output format where multiple processes
300+
can write their data at the same time.
301+
</para>
302+
<para>
303+
<application>pg_dump</> will open <replaceable class="parameter">njobs</replaceable>
304+
+ 1 connections to the database, so make sure your <xref linkend="guc-max-connections">
305+
setting is high enough to accommodate all connections.
306+
</para>
307+
<para>
308+
Requesting exclusive locks on database objects while running a parallel dump could
309+
cause the dump to fail. The reason is that the <application>pg_dump</> master process
310+
requests shared locks on the objects that the worker processes are going to dump later
311+
in order to
312+
make sure that nobody deletes them and makes them go away while the dump is running.
313+
If another client then requests an exclusive lock on a table, that lock will not be
314+
granted but will be queued waiting for the shared lock of the master process to be
315+
released.. Consequently any other access to the table will not be granted either and
316+
will queue after the exclusive lock request. This includes the worker process trying
317+
to dump the table. Without any precautions this would be a classic deadlock situation.
318+
To detect this conflict, the <application>pg_dump</> worker process requests another
319+
shared lock using the <literal>NOWAIT</> option. If the worker process is not granted
320+
this shared lock, somebody else must have requested an exclusive lock in the meantime
321+
and there is no way to continue with the dump, so <application>pg_dump</> has no choice
322+
but to abort the dump.
323+
</para>
324+
<para>
325+
For a consistent backup, the database server needs to support synchronized snapshots,
326+
a feature that was introduced in <productname>PostgreSQL</productname> 9.2. With this
327+
feature, database clients can ensure they see the same dataset even though they use
328+
different connections. <command>pg_dump -j</command> uses multiple database
329+
connections; it connects to the database once with the master process and
330+
once again for each worker job. Without the sychronized snapshot feature, the
331+
different worker jobs wouldn't be guaranteed to see the same data in each connection,
332+
which could lead to an inconsistent backup.
333+
</para>
334+
<para>
335+
If you want to run a parallel dump of a pre-9.2 server, you need to make sure that the
336+
database content doesn't change from between the time the master connects to the
337+
database until the last worker job has connected to the database. The easiest way to
338+
do this is to halt any data modifying processes (DDL and DML) accessing the database
339+
before starting the backup. You also need to specify the
340+
<option>--no-synchronized-snapshots</option> parameter when running
341+
<command>pg_dump -j</command> against a pre-9.2 <productname>PostgreSQL</productname>
342+
server.
343+
</para>
344+
</listitem>
345+
</varlistentry>
346+
288347
<varlistentry>
289348
<term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
290349
<term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
@@ -690,6 +749,17 @@ PostgreSQL documentation
690749
</listitem>
691750
</varlistentry>
692751

752+
<varlistentry>
753+
<term><option>--no-synchronized-snapshots</></term>
754+
<listitem>
755+
<para>
756+
This option allows running <command>pg_dump -j</> against a pre-9.2
757+
server, see the documentation of the <option>-j</option> parameter
758+
for more details.
759+
</para>
760+
</listitem>
761+
</varlistentry>
762+
693763
<varlistentry>
694764
<term><option>--no-tablespaces</option></term>
695765
<listitem>
@@ -1082,6 +1152,15 @@ CREATE DATABASE foo WITH TEMPLATE template0;
10821152
</screen>
10831153
</para>
10841154

1155+
<para>
1156+
To dump a database into a directory-format archive in parallel with
1157+
5 worker jobs:
1158+
1159+
<screen>
1160+
<prompt>$</prompt> <userinput>pg_dump -Fd mydb -j 5 -f dumpdir</userinput>
1161+
</screen>
1162+
</para>
1163+
10851164
<para>
10861165
To reload an archive file into a (freshly created) database named
10871166
<literal>newdb</>:

‎src/bin/pg_dump/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ include $(top_builddir)/src/Makefile.global
1919
overrideCPPFLAGS := -I$(libpq_srcdir)$(CPPFLAGS)
2020

2121
OBJS=pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o\
22-
pg_backup_null.o pg_backup_tar.o\
22+
pg_backup_null.o pg_backup_tar.oparallel.o\
2323
pg_backup_directory.o dumputils.o compress_io.o$(WIN32RES)
2424

2525
KEYWRDOBJS = keywords.o kwlookup.o

‎src/bin/pg_dump/compress_io.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454

5555
#include"compress_io.h"
5656
#include"dumputils.h"
57+
#include"parallel.h"
5758

5859
/*----------------------
5960
* Compressor API
@@ -182,6 +183,9 @@ size_t
182183
WriteDataToArchive(ArchiveHandle*AH,CompressorState*cs,
183184
constvoid*data,size_tdLen)
184185
{
186+
/* Are we aborting? */
187+
checkAborting(AH);
188+
185189
switch (cs->comprAlg)
186190
{
187191
caseCOMPR_ALG_LIBZ:
@@ -351,6 +355,9 @@ ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF)
351355
/* no minimal chunk size for zlib */
352356
while ((cnt=readF(AH,&buf,&buflen)))
353357
{
358+
/* Are we aborting? */
359+
checkAborting(AH);
360+
354361
zp->next_in= (void*)buf;
355362
zp->avail_in=cnt;
356363

@@ -411,6 +418,9 @@ ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF)
411418

412419
while ((cnt=readF(AH,&buf,&buflen)))
413420
{
421+
/* Are we aborting? */
422+
checkAborting(AH);
423+
414424
ahwrite(buf,1,cnt,AH);
415425
}
416426

‎src/bin/pg_dump/dumputils.c

Lines changed: 73 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ static struct
3838
}on_exit_nicely_list[MAX_ON_EXIT_NICELY];
3939

4040
staticinton_exit_nicely_index;
41+
void(*on_exit_msg_func) (constchar*modulename,constchar*fmt,va_listap)=vwrite_msg;
4142

4243
#definesupports_grant_options(version) ((version) >= 70400)
4344

@@ -48,11 +49,21 @@ static bool parseAclItem(const char *item, const char *type,
4849
staticchar*copyAclUserName(PQExpBufferoutput,char*input);
4950
staticvoidAddAcl(PQExpBufferaclbuf,constchar*keyword,
5051
constchar*subname);
52+
staticPQExpBuffergetThreadLocalPQExpBuffer(void);
5153

5254
#ifdefWIN32
55+
staticvoidshutdown_parallel_dump_utils(intcode,void*unused);
5356
staticboolparallel_init_done= false;
5457
staticDWORDtls_index;
5558
staticDWORDmainThreadId;
59+
60+
staticvoid
61+
shutdown_parallel_dump_utils(intcode,void*unused)
62+
{
63+
/* Call the cleanup function only from the main thread */
64+
if (mainThreadId==GetCurrentThreadId())
65+
WSACleanup();
66+
}
5667
#endif
5768

5869
void
@@ -61,23 +72,29 @@ init_parallel_dump_utils(void)
6172
#ifdefWIN32
6273
if (!parallel_init_done)
6374
{
75+
WSADATAwsaData;
76+
interr;
77+
6478
tls_index=TlsAlloc();
65-
parallel_init_done= true;
6679
mainThreadId=GetCurrentThreadId();
80+
err=WSAStartup(MAKEWORD(2,2),&wsaData);
81+
if (err!=0)
82+
{
83+
fprintf(stderr,_("WSAStartup failed: %d\n"),err);
84+
exit_nicely(1);
85+
}
86+
on_exit_nicely(shutdown_parallel_dump_utils,NULL);
87+
parallel_init_done= true;
6788
}
6889
#endif
6990
}
7091

7192
/*
72-
*Quotes input string if it's not a legitimate SQL identifier as-is.
73-
*
74-
*Note that the returned string must be used before calling fmtId again,
75-
*since we re-use the same return buffer each time. Non-reentrant but
76-
*reduces memory leakage. (On Windows the memory leakage will be one buffer
77-
*per thread, which is at least better than one per call).
93+
* Non-reentrant but reduces memory leakage. (On Windows the memory leakage
94+
* will be one buffer per thread, which is at least better than one per call).
7895
*/
79-
constchar*
80-
fmtId(constchar*rawid)
96+
staticPQExpBuffer
97+
getThreadLocalPQExpBuffer(void)
8198
{
8299
/*
83100
* The Tls code goes awry if we use a static var, so we provide for both
@@ -86,9 +103,6 @@ fmtId(const char *rawid)
86103
staticPQExpBuffers_id_return=NULL;
87104
PQExpBufferid_return;
88105

89-
constchar*cp;
90-
boolneed_quotes= false;
91-
92106
#ifdefWIN32
93107
if (parallel_init_done)
94108
id_return= (PQExpBuffer)TlsGetValue(tls_index);/* 0 when not set */
@@ -118,6 +132,23 @@ fmtId(const char *rawid)
118132

119133
}
120134

135+
returnid_return;
136+
}
137+
138+
/*
139+
*Quotes input string if it's not a legitimate SQL identifier as-is.
140+
*
141+
*Note that the returned string must be used before calling fmtId again,
142+
*since we re-use the same return buffer each time.
143+
*/
144+
constchar*
145+
fmtId(constchar*rawid)
146+
{
147+
PQExpBufferid_return=getThreadLocalPQExpBuffer();
148+
149+
constchar*cp;
150+
boolneed_quotes= false;
151+
121152
/*
122153
* These checks need to match the identifier production in scan.l. Don't
123154
* use islower() etc.
@@ -185,6 +216,35 @@ fmtId(const char *rawid)
185216
returnid_return->data;
186217
}
187218

219+
/*
220+
* fmtQualifiedId - convert a qualified name to the proper format for
221+
* the source database.
222+
*
223+
* Like fmtId, use the result before calling again.
224+
*
225+
* Since we call fmtId and it also uses getThreadLocalPQExpBuffer() we cannot
226+
* use it until we're finished with calling fmtId().
227+
*/
228+
constchar*
229+
fmtQualifiedId(intremoteVersion,constchar*schema,constchar*id)
230+
{
231+
PQExpBufferid_return;
232+
PQExpBufferlcl_pqexp=createPQExpBuffer();
233+
234+
/* Suppress schema name if fetching from pre-7.3 DB */
235+
if (remoteVersion >=70300&&schema&&*schema)
236+
{
237+
appendPQExpBuffer(lcl_pqexp,"%s.",fmtId(schema));
238+
}
239+
appendPQExpBuffer(lcl_pqexp,"%s",fmtId(id));
240+
241+
id_return=getThreadLocalPQExpBuffer();
242+
243+
appendPQExpBuffer(id_return,"%s",lcl_pqexp->data);
244+
destroyPQExpBuffer(lcl_pqexp);
245+
246+
returnid_return->data;
247+
}
188248

189249
/*
190250
* Convert a string value to an SQL string literal and append it to
@@ -1315,7 +1375,7 @@ exit_horribly(const char *modulename, const char *fmt,...)
13151375
va_listap;
13161376

13171377
va_start(ap,fmt);
1318-
vwrite_msg(modulename,fmt,ap);
1378+
on_exit_msg_func(modulename,fmt,ap);
13191379
va_end(ap);
13201380

13211381
exit_nicely(1);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp