Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Description
Bug report
Bug description:
Hello,
I am currently debuggingthis issue.
I have noticed that the bug can be reproduced when the problematic file is truncated to 9 GiB B but it does not happen when truncated to 8 GiB.
The problem seems to be that the next member offset is computed wrong. It seems to point 512 B after the correct TAR header, which, in this case, points into the data for the extended attributes such as30 mtime=1752348[...]
.
One of the differences seems to be this code part, which is not hit for the working case:
Lines 1562 to 1569 in47b01da
if"size"inpax_headers: | |
# If the extended header replaces the size field, | |
# we need to recalculate the offset where the next | |
# header starts. | |
offset=next.offset_data | |
ifnext.isreg()ornext.typenotinSUPPORTED_TYPES: | |
offset+=next._block(next.size) | |
tarfile.offset=offset |
While looking into the line above, i.e., into_apply_pax_info
, I noticed that there is no definite order for applying the size even though it can appear multiple times!
Lines 1615 to 1634 in47b01da
def_apply_pax_info(self,pax_headers,encoding,errors): | |
"""Replace fields with supplemental information from a previous | |
pax extended or global header. | |
""" | |
forkeyword,valueinpax_headers.items(): | |
ifkeyword=="GNU.sparse.name": | |
setattr(self,"path",value) | |
elifkeyword=="GNU.sparse.size": | |
setattr(self,"size",int(value)) | |
elifkeyword=="GNU.sparse.realsize": | |
setattr(self,"size",int(value)) | |
elifkeywordinPAX_FIELDS: | |
ifkeywordinPAX_NUMBER_FIELDS: | |
try: | |
value=PAX_NUMBER_FIELDS[keyword](value) | |
exceptValueError: | |
value=0 | |
ifkeyword=="path": | |
value=value.rstrip("/") | |
setattr(self,keyword,value) |
In the non-working case, the PAX headers look like this:
{'GNU.sparse.major':'1','GNU.sparse.minor':'0','GNU.sparse.name':'userdata','GNU.sparse.realsize':'9663676416','atime':'1752349406.975921575','ctime':'1752349534.57652562','mtime':'1752349534.57652562','size':'9602318848'}
I.e, the size member first gets set toGNU.sparse.realsize
and then tosize
. The debug output looks like this:
[_apply_pax_info] SET SIZE to: 9663676416 from key: GNU.sparse.realsize[_apply_pax_info] SET SIZE to: 9602318848 from key: size[_apply_pax_info] SET key to: 1752349534.5765257 from key: mtime
Is it specified that the order of the PAX headers must always be this way? Else, one might just as well encounter it like this:
{'atime':'1752349406.975921575','ctime':'1752349534.57652562','mtime':'1752349534.57652562','size':'9602318848','GNU.sparse.major':'1','GNU.sparse.minor':'0','GNU.sparse.name':'userdata','GNU.sparse.realsize':'9663676416'}
and either one of these orders would be a bug.
The working case does not have this ambiguity:
{'GNU.sparse.major':'1','GNU.sparse.minor':'0','GNU.sparse.name':'userdata','GNU.sparse.realsize':'8589934592','atime':'1752349538.445543898','ctime':'1752351104.53673501','mtime':'1752351104.53673501'}
the debug output looks like this:
[_apply_pax_info] SET SIZE to: 8589934592 from key: GNU.sparse.realsize[_apply_pax_info] SET key to: 1752351104.536735 from key: mtime
I.e., even if the is no ordering problem, there already are different semantics for theTarInfo.size
member as one will containGNU.sparse.realsize
and the other will contain[PAXHeader.]size
.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status