- Notifications
You must be signed in to change notification settings - Fork1.1k
tarfile: Make it possible to extract nested tarfiles in memory#1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
FileSection.skip() (see below the diff) uses 2-argument readinto, soattempting to recursively extract archives throws an error. This commitadds optional second argument to fix this problem. After this commit, itis possible to extract nested archives in roughly this fashion: with open(path, 'rb') as file: tar_outer = tarfile.TarFile(fileobj=file) for ti_outer in tar_outer: tar_inner = tarfile.TarFile( fileobj=tar_outer.extractfile(ti_outer)) for ti_inner in tar_inner: ...Nested archives are used in some embedded contexts, for example Menderartifacts.Signed-off-by: Wojciech Porczyk <wojciech.porczyk@connectpoint.pl>
dpgeorge commentedJul 31, 2025
Thanks for the patch. I can see why it's needed. But, this is not CPython compatible, and we strive to retain compatibility where possible. Now, So I suggest to fix it by changing how --- a/python-stdlib/tarfile/tarfile/__init__.py+++ b/python-stdlib/tarfile/tarfile/__init__.py@@ -55,9 +55,12 @@ class FileSection: if sz: buf = bytearray(16) while sz:- s = min(sz, 16)- self.f.readinto(buf, s)- sz -= s+ if sz >= 16:+ self.f.readinto(buf)+ sz -= 16+ else:+ self.f.read(sz)+ sz = 0 class TarInfo: |
dpgeorge commentedJul 31, 2025
For reference, this used to work, but commit2ca1527 optimised |
woju commentedAug 1, 2025
I thought this was already the case, because as it is, it doesn't work without 2-argument
Sure, wilco. I'll also change the title of the PR |
dpgeorge commentedAug 1, 2025
Yes, you're right, all fileobj's that it uses must support 2-arg For example, if there's a tar file on the host PC and you use So, instead of trying to add the 2-arg form to all streams/files, better to fix it once here so that it doesn't use the 2-arg form. |
woju commentedAug 1, 2025
I agree, yes, this sounds like the correct fix. I'll do that the week after; next week I'm on vacation. |
Make it possible to extract nested archives, which are used in e.g. Mender artifacts. See commit message for details and a (simplified) example.