Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Description
Bug report
Bug description:
For a more detailed description, please see#136601.
I have a bug that causes TAR file parsing to end preemptively for very large sparse files. The computed next TAR header is off by one 512 B block.
The problem is the recomputation of the next TAR offset in case the PAX header contains asize
key to override the overflowed (> 8GB) TAR size:
Lines 1562 to 1569 in47b01da
if"size"inpax_headers: | |
# If the extended header replaces the size field, | |
# we need to recalculate the offset where the next | |
# header starts. | |
offset=next.offset_data | |
ifnext.isreg()ornext.typenotinSUPPORTED_TYPES: | |
offset+=next._block(next.size) | |
tarfile.offset=offset |
The problem is thatnext.offset_data
is used for this recomputation even thoughnext.offset_data
gets overwritten in_proc_gnusparse_10
:
Line 1612 in47b01da
next.offset_data=tarfile.fileobj.tell() |
This leads to the next TAR offset header being off by the number of blocks it takes to store the sparse data.
But, maybe I am wrong and have overlooked something. I can say, that this fixes it for my test case:
diff --git a/Lib/tarfile.py b/Lib/tarfile.pyindex 068aa13ed7..7f3e62f5a2 100644--- a/Lib/tarfile.py+++ b/Lib/tarfile.py@@ -1565,7 +1565,7 @@ def _proc_pax(self, tarfile): # header starts. offset = next.offset_data if next.isreg() or next.type not in SUPPORTED_TYPES:- offset += next._block(next.size)+ offset += next._block(next.size) - BLOCKSIZE tarfile.offset = offset return next
Minimal reproducer (tested on EXT4 with GNU tar 1.35):
echo bar> fooecho bar> sparsefallocate -l 9G sparseecho bar>> sparsefallocate --punch-hole -o 1G -l 10M sparsetar --numeric-owner --format=pax --sparse-version=1.0 -cSf sparse.tar sparse fools -la sparse.tar# -rw-rw-r-- 1 user user 9663682560 Jul 13 14:14 sparse.tartar tvlf sparse.tar# -rw-rw-r-- 1000/1000 9663676420 2025-07-13 14:13 sparse# -rw-rw-r-- 1000/1000 4 2025-07-13 14:11 foopython3 -c'import sys, tarfile;[print(tarInfo.sparse, tarInfo.offset, tarInfo.offset_data, tarInfo.size, tarInfo.name)for tarInfo in tarfile.open(sys.argv[1])]' sparse.tar# [(0, 1073741824), (1084227584, 8579448836), (9663676420, 0)] 0 2048 9653191172 sparse# -> foo is missing!cat sparse.tar| xz -9| zstd -19| base64
Reproducersparse-file-larger-than-8GiB-followed-by-normal-file.tar.xz.zst
file as base64:
cat<<EOF | base64 -d | zstd -d > sparse-file-larger-than-8GiB-followed-by-normal-file.tar.xzKLUv/QRojBIA1CP9N3pYWgAABObWtEYCACEBHAAAABDPWMz//5wCcV0AFwvGh5JaO6ePxyUOuA/zXtE/5U/vyT1WUwqPhMr1HTeZeJyWILwrrtDwH0eKx6KKGcU7D2aYidf/9bCtFMcWp8KxDA1FLF58w9bO4J+eDKd9QfIZFPCutpNB91dMk9bSVazx9pUcWEWn2r0SWsv1BtSYmVDmdKaMdGC/Epx8bcRAnm5Joy2Tgi3O7VouoCAqha+1YYNOQyyB4sG+tDbfLGdW6fyZMztJ/lRFQwtlFpDLHGFpia92kkke+2a/mwMvPc58aiT5X56QuH2mw1OhsrBKnbYYnT89BJjyAh2GTOeDbtZ/lLDGwhvxkXlnCm/M8QiqfUGfqAjnBeikNY2nodSBFo8YQh+636fk9xfuTQ3kKQ8qEWa613HftzHJ/X/ha1bKD91T/SPTCgd/rhyvFtn8FBBiUS7UayidinQBNmGebczIaRsKUQKoffUTC9EbCrRXDQjQMjfDyo7N/eDIxD7jBImHDv8Qk/hxeFn4C83/lShGD6n8fN77mjAuVsCPhfODgcBlxCVT+PWRNjEFpbDub8FwTUcM0ZERqq1gHbrOsScYXFmG6WZSWL7pdqxZ5OVbBQj5x9qt/PtSK3TNHlsgQvndUz34KWQJO4DLKmzftTvwxL0uX6oPPktmQpAT+5I61gCf/xABKwDsc1On/b6ufDEan7eNMW5wnqcjX+woy4XRlZiKfiqR8id19xnABphNmP3Yr9WQD1EPP7IEADz8NCsncIBOR5aC/hM+FaZUAAAAAQD9/778uQYCRAAAAAEA/f85AAJEAAAAAQD9/zkAAkQAAAABAP3/OQACRAAAAAEA/f85AAJEAAAAAQD9/zkAAkQAAAABAP3/OQACRAAAAAEA/f85AAJEAAAAAQD9/zkAAh0GAIQLkODIaNUNHQG+Ib0xB5201x9u5Typk+S1zSY18D/tc2o+BXKM/RM9v6MTQoFntxwNm0So6CELgft8dinBPFJBg583tJn+q69PwBnThZQjYTzvNhv0fkxX4GjmSgwnOrb7GU5pc2qtcrNCcHrPaNkQicmkdyzESbMAA8S2zfCiJIzpnN25EroA08/3fFWQ44JfrakeAIiPdXLNPTRNAAGY3VWAsID7IwAAdKr/2BQXOzADAAAAAARZWgIAG0DNWVsOgERj+N4=EOF
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status