Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-129726: Breakgzip.GzipFile
reference loop#130055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withpythongh-89550 in 3.12, but was hidden in most casesuntil 3.13 whenpythongh-62948 made it more visible.
@@ -10,6 +10,7 @@ | |||
import builtins | |||
import io | |||
import _compression | |||
import weakref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
the ordering of imports here is painful to me, but resisted bringing to PEP8 because this fix needs backporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Please reformat these imports for PEP 8 :-) I will handle merge conflicts on backports if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Tested with Python 3.13 and with Python 3.12 -X dev against their respective gzip module patched with this diff and cannot reproduce the issue anymore with my snippet. Thanks!
Nice work. +1 on a backport as I think this can result in data loss if there's data waiting to be flushed from the buffer or compressor. Should we add some docs saying that the GzipFile object should be closed explicitly instead of relying on finalizers to clean it up so as to not rely on GC heuristics? |
@danifus 👍. The report production code did a |
Hmmm I had missed that production example and it baffles me :p I don't see how more data is attempted to be written after We have this check at the start of Lines 353 to 355 in05e89c3
and this in the finally block at the end of that Lines 365 to 366 in05e89c3
Seems pretty watertight. @xrmx it's possible you saw this issue in your code in the
|
xrmx commentedFeb 14, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think I've only seen it from
It comes from
And when run with another test sometimes prints the traceback. I also see these BufferError when |
That stacktrace could appear in 2 ways:
Both would give you that traceback location but only the occurrence in the finalizer as the object goes out of scope will give you the
If you claim the buffer with
As for the original issue, are you able to see if code like this:
resolves your issue without this weak reference fix applied to your python? |
Thanks for the explanation, adding the explicit buffer.close() on the other path does not remove the
Nope, adding buffer.close() does not remove all the gzip stacktraces I was seeing before. |
Oh the elephant in the room is that this code is running as a daemon thread and so it well may be that I got these when it gets killed at exit. |
xrmx commentedFeb 14, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
That's not the problem, after poking a bit with the code the following quote from@danifus started to make sense to me:
Adding a BTW the BytesIO doc talks about a |
Glad to hear most of the exceptions are gone. As for the last one, I think you'll just want to Instead, just do something like:
|
xrmx commentedFeb 17, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The last one is from a test mocking the function getting the view as arg and trying to read it after it has already been released. If I change the code as you suggested running this test I get |
Ah ok, that does sound a bit more involved. You're right, the As far as this issue is concerned, I think the |
I think it's still worthwhile to fix, since CPython sholdn't error if a GzipFile doesn't explicitly have Definitely some Documentation improvements would also be good I think. |
Fyi without the gzip patch and with the explicit cleanup in my code I still see a ton of |
cmaloney commentedFeb 26, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@vstinner I think this is ready for review. (Not sure why the issue-number bot didn't link it back to the original issue). Exception on close issue that became visible with your change to log exceptions if |
Lib/gzip.py Outdated
@@ -226,7 +227,8 @@ def __init__(self, filename=None, mode=None, | |||
0) | |||
self._write_mtime = mtime | |||
self._buffer_size = _WRITE_BUFFER_SIZE | |||
self._buffer = io.BufferedWriter(_WriteBufferStream(self), | |||
write_wrap = _WriteBufferStream(weakref.proxy(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it be possible to modify _WriteBufferStream to use a simpler weakref.ref() to store gzip_file instead? This object is private so we can change its API.
Something like:
class_WriteBufferStream(io.RawIOBase):"""Minimal object to pass WriteBuffer flushes into GzipFile"""def__init__(self,gzip_file):self.gzip_file=weakref.ref(gzip_file)defwrite(self,data):gzip_file=self.gzip_file()ifgzip_fileisNone:raiseRuntimeError("lost gzip_file")returngzip_file._write_raw(data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Updated to your suggestion here. I Spent a bit of time trying to remove_WriteBufferStream
andjust useweakref.proxy
, but getting soGzipFile.write
has a differentwrite
function from the underlyingself._buffer: BufferedWriter
(which needs to useself._write_raw
) led me in a lot of circles. More understandable for things I tried to just keep_WriteBufferStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
7f39137
intopython:mainUh oh!
There was an error while loading.Please reload this page.
A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withpythongh-89550 in 3.12, but was hidden in most casesuntil 3.13 whenpythongh-62948 made it more visible.(cherry picked from commit7f39137)Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
GH-130669 is a backport of this pull request to the3.13 branch. |
A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withpythongh-89550 in 3.12, but was hidden in most casesuntil 3.13 whenpythongh-62948 made it more visible.(cherry picked from commit7f39137)Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
GH-130670 is a backport of this pull request to the3.12 branch. |
…130670)gh-129726: Break `gzip.GzipFile` reference loop (GH-130055)A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withgh-89550 in 3.12, but was hidden in most casesuntil 3.13 whengh-62948 made it more visible.(cherry picked from commit7f39137)Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
…130669)gh-129726: Break `gzip.GzipFile` reference loop (GH-130055)A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withgh-89550 in 3.12, but was hidden in most casesuntil 3.13 whengh-62948 made it more visible.(cherry picked from commit7f39137)Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
A reference loop was resulting in the `fileobj` held by the `GzipFile`being closed before the `GzipFile`.The issue started withpythongh-89550 in 3.12, but was hidden in most casesuntil 3.13 whenpythongh-62948 made it more visible.
Uh oh!
There was an error while loading.Please reload this page.
A reference loop was resulting in the
fileobj
held by theGzipFile
being closed before theGzipFile
.The issue started withgh-89550 in 3.12, but was hidden in most cases until 3.13 whengh-62948 made it more visible. I think this needs backporting to 3.12
Sample of test failing on
main
:loop found by@danifus , bug found and explored by@xrmx