Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6623222

Browse files
committed
Cope with data-offset-less archive files during out-of-order restores.
pg_dump produces custom-format archive files that lack data offsetswhen it is unable to seek its output. Up to now that's been a hazardfor pg_restore. But if pg_restore is able to seek in the archivefile, there is no reason to throw up our hands when asked to restoredata blocks out of order. Instead, whenever we are searching for adata block, record the locations of the blocks we passed over (thatis, fill in the missing data-offset fields in our in-memory copy ofthe TOC data). Then, when we hit a case that requires goingbackwards, we can just seek back.Also track the furthest point that we've searched to, and seek backto there when beginning a search for a new data block. This avoidspossible O(N^2) time consumption, by ensuring that each data blockis examined at most twice. (On Unix systems, that's at most twiceper parallel-restore job; but since Windows uses threads here, thethreads can share block location knowledge, reducing the amount ofduplicated work.)We can also improve the code a bit by using fseeko() to skip overdata blocks during the search.This is all of some use even in simple restores, but it's reallysignificant for parallel pg_restore. In that case, we requireseekability of the input already, and we will very probably needto do out-of-order restores.Back-patch to v12, as this fixes a regression introduced by commit548e509. Before that, parallel restore avoided requestingout-of-order restores, so it would work on a data-offset-lessarchive. Now it will again.Ideally this patch would include some test coverage, but there areother open bugs that need to be fixed before we can extend ourcoverage of parallel restore very much. Plan to revisit that later.David Gilman and Tom Lane; reviewed by Justin PryzbyDiscussion:https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
1 parent39a068c commit6623222

File tree

2 files changed

+117
-34
lines changed

2 files changed

+117
-34
lines changed

‎doc/src/sgml/ref/pg_restore.sgml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -246,12 +246,14 @@ PostgreSQL documentation
246246
<term><option>--jobs=<replaceable class="parameter">number-of-jobs</replaceable></option></term>
247247
<listitem>
248248
<para>
249-
Run the most time-consuming parts
250-
of <application>pg_restore</application> &mdash; those which load data,
251-
create indexes, or create constraints &mdash; using multiple
252-
concurrent jobs. This option can dramatically reduce the time
249+
Run the most time-consuming steps
250+
of <application>pg_restore</application> &mdash; those that load data,
251+
create indexes, or create constraints &mdash; concurrently, using up
252+
to <replaceable class="parameter">number-of-jobs</replaceable>
253+
concurrent sessions. This option can dramatically reduce the time
253254
to restore a large database to a server running on a
254-
multiprocessor machine.
255+
multiprocessor machine. This option is ignored when emitting a script
256+
rather than connecting directly to a database server.
255257
</para>
256258

257259
<para>
@@ -274,8 +276,7 @@ PostgreSQL documentation
274276
Only the custom and directory archive formats are supported
275277
with this option.
276278
The input must be a regular file or directory (not, for example, a
277-
pipe). This option is ignored when emitting a script rather
278-
than connecting directly to a database server. Also, multiple
279+
pipe or standard input). Also, multiple
279280
jobs cannot be used together with the
280281
option <option>--single-transaction</option>.
281282
</para>

‎src/bin/pg_dump/pg_backup_custom.c

Lines changed: 109 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ typedef struct
7171
{
7272
CompressorState*cs;
7373
inthasSeek;
74+
/* lastFilePos is used only when reading, and may be invalid if !hasSeek */
75+
pgoff_tlastFilePos;/* position after last data block we've read */
7476
}lclContext;
7577

7678
typedefstruct
@@ -182,8 +184,13 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
182184

183185
ReadHead(AH);
184186
ReadToc(AH);
185-
}
186187

188+
/*
189+
* Remember location of first data block (i.e., the point after TOC)
190+
* in case we have to search for desired data blocks.
191+
*/
192+
ctx->lastFilePos=_getFilePos(AH,ctx);
193+
}
187194
}
188195

189196
/*
@@ -423,13 +430,62 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
423430
{
424431
/*
425432
* We cannot seek directly to the desired block. Instead, skip over
426-
* block headers until we find the one we want. This could fail if we
427-
* are asked to restore items out-of-order.
433+
* block headers until we find the one we want. Remember the
434+
* positions of skipped-over blocks, so that if we later decide we
435+
* need to read one, we'll be able to seek to it.
436+
*
437+
* When our input file is seekable, we can do the search starting from
438+
* the point after the last data block we scanned in previous
439+
* iterations of this function.
428440
*/
429-
_readBlockHeader(AH,&blkType,&id);
441+
if (ctx->hasSeek)
442+
{
443+
if (fseeko(AH->FH,ctx->lastFilePos,SEEK_SET)!=0)
444+
fatal("error during file seek: %m");
445+
}
430446

431-
while (blkType!=EOF&&id!=te->dumpId)
447+
for (;;)
432448
{
449+
pgoff_tthisBlkPos=_getFilePos(AH,ctx);
450+
451+
_readBlockHeader(AH,&blkType,&id);
452+
453+
if (blkType==EOF||id==te->dumpId)
454+
break;
455+
456+
/* Remember the block position, if we got one */
457+
if (thisBlkPos >=0)
458+
{
459+
TocEntry*otherte=getTocEntryByDumpId(AH,id);
460+
461+
if (otherte&&otherte->formatData)
462+
{
463+
lclTocEntry*othertctx= (lclTocEntry*)otherte->formatData;
464+
465+
/*
466+
* Note: on Windows, multiple threads might access/update
467+
* the same lclTocEntry concurrently, but that should be
468+
* safe as long as we update dataPos before dataState.
469+
* Ideally, we'd use pg_write_barrier() to enforce that,
470+
* but the needed infrastructure doesn't exist in frontend
471+
* code. But Windows only runs on machines with strong
472+
* store ordering, so it should be okay for now.
473+
*/
474+
if (othertctx->dataState==K_OFFSET_POS_NOT_SET)
475+
{
476+
othertctx->dataPos=thisBlkPos;
477+
othertctx->dataState=K_OFFSET_POS_SET;
478+
}
479+
elseif (othertctx->dataPos!=thisBlkPos||
480+
othertctx->dataState!=K_OFFSET_POS_SET)
481+
{
482+
/* sanity check */
483+
pg_log_warning("data block %d has wrong seek position",
484+
id);
485+
}
486+
}
487+
}
488+
433489
switch (blkType)
434490
{
435491
caseBLK_DATA:
@@ -445,7 +501,6 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
445501
blkType);
446502
break;
447503
}
448-
_readBlockHeader(AH,&blkType,&id);
449504
}
450505
}
451506
else
@@ -457,20 +512,18 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
457512
_readBlockHeader(AH,&blkType,&id);
458513
}
459514

460-
/* Produce suitable failure message if we fell off end of file */
515+
/*
516+
* If we reached EOF without finding the block we want, then either it
517+
* doesn't exist, or it does but we lack the ability to seek back to it.
518+
*/
461519
if (blkType==EOF)
462520
{
463-
if (tctx->dataState==K_OFFSET_POS_NOT_SET)
464-
fatal("could not find block ID %d in archive -- "
465-
"possibly due to out-of-order restore request, "
466-
"which cannot be handled due to lack of data offsets in archive",
467-
te->dumpId);
468-
elseif (!ctx->hasSeek)
521+
if (!ctx->hasSeek)
469522
fatal("could not find block ID %d in archive -- "
470523
"possibly due to out-of-order restore request, "
471524
"which cannot be handled due to non-seekable input file",
472525
te->dumpId);
473-
else/* huh, the dataPos led us to EOF? */
526+
else
474527
fatal("could not find block ID %d in archive -- "
475528
"possibly corrupt archive",
476529
te->dumpId);
@@ -496,6 +549,20 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
496549
blkType);
497550
break;
498551
}
552+
553+
/*
554+
* If our input file is seekable but lacks data offsets, update our
555+
* knowledge of where to start future searches from. (Note that we did
556+
* not update the current TE's dataState/dataPos. We could have, but
557+
* there is no point since it will not be visited again.)
558+
*/
559+
if (ctx->hasSeek&&tctx->dataState==K_OFFSET_POS_NOT_SET)
560+
{
561+
pgoff_tcurPos=_getFilePos(AH,ctx);
562+
563+
if (curPos>ctx->lastFilePos)
564+
ctx->lastFilePos=curPos;
565+
}
499566
}
500567

501568
/*
@@ -553,6 +620,7 @@ _skipBlobs(ArchiveHandle *AH)
553620
staticvoid
554621
_skipData(ArchiveHandle*AH)
555622
{
623+
lclContext*ctx= (lclContext*)AH->formatData;
556624
size_tblkLen;
557625
char*buf=NULL;
558626
intbuflen=0;
@@ -561,19 +629,27 @@ _skipData(ArchiveHandle *AH)
561629
blkLen=ReadInt(AH);
562630
while (blkLen!=0)
563631
{
564-
if (blkLen>buflen)
632+
if (ctx->hasSeek)
565633
{
566-
if (buf)
567-
free(buf);
568-
buf= (char*)pg_malloc(blkLen);
569-
buflen=blkLen;
634+
if (fseeko(AH->FH,blkLen,SEEK_CUR)!=0)
635+
fatal("error during file seek: %m");
570636
}
571-
if ((cnt=fread(buf,1,blkLen,AH->FH))!=blkLen)
637+
else
572638
{
573-
if (feof(AH->FH))
574-
fatal("could not read from input file: end of file");
575-
else
576-
fatal("could not read from input file: %m");
639+
if (blkLen>buflen)
640+
{
641+
if (buf)
642+
free(buf);
643+
buf= (char*)pg_malloc(blkLen);
644+
buflen=blkLen;
645+
}
646+
if ((cnt=fread(buf,1,blkLen,AH->FH))!=blkLen)
647+
{
648+
if (feof(AH->FH))
649+
fatal("could not read from input file: end of file");
650+
else
651+
fatal("could not read from input file: %m");
652+
}
577653
}
578654

579655
blkLen=ReadInt(AH);
@@ -809,6 +885,9 @@ _Clone(ArchiveHandle *AH)
809885
{
810886
lclContext*ctx= (lclContext*)AH->formatData;
811887

888+
/*
889+
* Each thread must have private lclContext working state.
890+
*/
812891
AH->formatData= (lclContext*)pg_malloc(sizeof(lclContext));
813892
memcpy(AH->formatData,ctx,sizeof(lclContext));
814893
ctx= (lclContext*)AH->formatData;
@@ -818,10 +897,13 @@ _Clone(ArchiveHandle *AH)
818897
fatal("compressor active");
819898

820899
/*
900+
* We intentionally do not clone TOC-entry-local state: it's useful to
901+
* share knowledge about where the data blocks are across threads.
902+
* _PrintTocData has to be careful about the order of operations on that
903+
* state, though.
904+
*
821905
* Note: we do not make a local lo_buf because we expect at most one BLOBS
822-
* entry per archive, so no parallelism is possible. Likewise,
823-
* TOC-entry-local state isn't an issue because any one TOC entry is
824-
* touched by just one worker child.
906+
* entry per archive, so no parallelism is possible.
825907
*/
826908
}
827909

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp