
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2014-08-07 20:49 byzach.ware, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue22166_debugging_codec_test_leak.diff | ncoghlan,2014-08-13 07:26 | Some ideas & tracemalloc hacks | review | |
| issue22166_debugging_codec_test_leak_v2.diff | ncoghlan,2014-08-13 08:21 | Display full tracebacks from tracemalloc | review | |
| test_codecs_fix1.patch | vstinner,2014-09-02 00:24 | |||
| issue22166_no_leaks_too_many_changes.diff | ncoghlan,2014-09-06 12:34 | Removes leaks, but has unnecessary changes | review | |
| issue22166_clear_internal_cache.diff | ncoghlan,2014-09-06 12:45 | Minimal fix - just clears the internal cache via gc.get_referrers() | review | |
| issue22166_with_forget_codec_helper_api.diff | ncoghlan,2014-09-06 22:32 | Pop from internal cache via private _codecs._forget_codec API | review | |
| Messages (27) | |||
|---|---|---|---|
| msg225040 -(view) | Author: Zachary Ware (zach.ware)*![]() | Date: 2014-08-07 20:49 | |
After9bca86812857 (#22104), test_codecs now reports "leaked" references from ExceptionChainingTest. Previously the tests were only ever loaded one time, so each run of the tests was actually using the same set of TestCase instances; now the tests are freshly loaded before every test run and thus each run uses a new set of TestCase instances. My suspicion is that this is coming into play in ExceptionChainingTest.setUp, where the test's codec_name attribute is derived from the repr and id of the TestCase instance. However, I'm not familiar enough with the codecs code to know where it's going wrong from there.One possible "fix" is to abuse load_tests to ensure the tests are only loaded once:_suite = Nonedef load_tests(loader, suite, pattern): global _suite if _suite is None: _suite = suite return _suite...which restores the old behavior, but strikes me as a fairly horrible hack. Somewhat less bad would be to just skip those tests when they've already been run before. I can't help but feel that there has to be a better solution that I can't see, though.Nick, as the original author of the test class in question, do you have any insight into a better way to fix the "leaks"? | |||
| msg225263 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-08-13 07:26 | |
I did a bit of poking around with tracemalloc (debugging patch diff attached)encodings._cache is definitely one culprit, but something else is going on where it appears the *test cases* are being kept alive.For comparison, with my hacked regrtest, I get this for test_cmd_line:$ ./python -m test -R 3:1 test_grammar[1/1] test_grammarbeginning 4 repetitions1234....[ Top 20 differences ]/home/ncoghlan/devel/py34/Lib/_weakrefset.py:37: size=9936 B (+9936 B), count=54 (+54), average=184 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:38: size=6696 B (+6696 B), count=54 (+54), average=124 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:48: size=6480 B (+6480 B), count=27 (+27), average=240 B/home/ncoghlan/devel/py34/Lib/mimetypes.py:487: size=6176 B (+6176 B), count=1 (+1), average=6176 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:272: size=2362 B (+2362 B), count=12 (+12), average=197 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:99: size=2167 B (+2167 B), count=11 (+11), average=197 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:274: size=2137 B (+2137 B), count=11 (+11), average=194 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:92: size=1944 B (+1944 B), count=27 (+27), average=72 B/home/ncoghlan/devel/py34/Lib/test/regrtest.py:1466: size=1568 B (+1568 B), count=1 (+1), average=1568 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:84: size=1344 B (+1344 B), count=14 (+14), average=96 B/home/ncoghlan/devel/py34/Lib/contextlib.py:38: size=656 B (+656 B), count=1 (+1), average=656 B/home/ncoghlan/devel/py34/Lib/unittest/suite.py:87: size=512 B (+512 B), count=1 (+1), average=512 B/home/ncoghlan/devel/py34/Lib/test/support/__init__.py:1735: size=488 B (+488 B), count=1 (+1), average=488 B/home/ncoghlan/devel/py34/Lib/mimetypes.py:534: size=416 B (+416 B), count=1 (+1), average=416 B/home/ncoghlan/devel/py34/Lib/mimetypes.py:384: size=416 B (+416 B), count=1 (+1), average=416 B/home/ncoghlan/devel/py34/Lib/test/regrtest.py:1475: size=224 B (+224 B), count=1 (+1), average=224 B/home/ncoghlan/devel/py34/Lib/test/regrtest.py:1464: size=224 B (+224 B), count=1 (+1), average=224 B/home/ncoghlan/devel/py34/Lib/mimetypes.py:387: size=224 B (+224 B), count=1 (+1), average=224 B/home/ncoghlan/devel/py34/Lib/linecache.py:31: size=224 B (+224 B), count=1 (+1), average=224 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:103: size=224 B (+224 B), count=1 (+1), average=224 Btest_grammar leaked [-3] references, sum=-31 test failed: test_grammarWhile for test_codecs I get this:$ ./python -m test -R 3:1 test_codecs[1/1] test_codecsbeginning 4 repetitions1234....[ Top 20 differences ]/home/ncoghlan/devel/py34/Lib/unittest/case.py:408: size=83.7 KiB (+83.7 KiB), count=206 (+206), average=416 B/home/ncoghlan/devel/py34/Lib/unittest/case.py:368: size=41.8 KiB (+41.8 KiB), count=411 (+411), average=104 B/home/ncoghlan/devel/py34/Lib/unittest/case.py:387: size=16.1 KiB (+16.1 KiB), count=206 (+206), average=80 B/home/ncoghlan/devel/py34/Lib/unittest/suite.py:60: size=14.5 KiB (+14.5 KiB), count=206 (+206), average=72 B/home/ncoghlan/devel/py34/Lib/unittest/case.py:381: size=13.6 KiB (+13.6 KiB), count=174 (+174), average=80 B/home/ncoghlan/devel/py34/Lib/test/test_codecs.py:2653: size=12.2 KiB (+12.2 KiB), count=2 (+2), average=6242 B/home/ncoghlan/devel/py34/Lib/test/regrtest.py:1475: size=12.0 KiB (+12.0 KiB), count=1 (+1), average=12.0 KiB/home/ncoghlan/devel/py34/Lib/_weakrefset.py:37: size=9936 B (+9936 B), count=54 (+54), average=184 B/home/ncoghlan/devel/py34/Lib/unittest/loader.py:156: size=7600 B (+7600 B), count=95 (+95), average=80 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:38: size=6696 B (+6696 B), count=54 (+54), average=124 B/home/ncoghlan/devel/py34/Lib/_weakrefset.py:48: size=6480 B (+6480 B), count=27 (+27), average=240 B/home/ncoghlan/devel/py34/Lib/mimetypes.py:487: size=6176 B (+6176 B), count=1 (+1), average=6176 B/home/ncoghlan/devel/py34/Lib/unittest/case.py:625: size=3232 B (+3232 B), count=4 (+4), average=808 B/home/ncoghlan/devel/py34/Lib/test/test_codecs.py:2637: size=2964 B (+2964 B), count=6 (+6), average=494 B/home/ncoghlan/devel/py34/Lib/test/support/__init__.py:1739: size=2432 B (+2432 B), count=32 (+32), average=76 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:272: size=2362 B (+2362 B), count=12 (+12), average=197 B/home/ncoghlan/devel/py34/Lib/unittest/loader.py:70: size=2304 B (+2304 B), count=32 (+32), average=72 B/home/ncoghlan/devel/py34/Lib/unittest/suite.py:125: size=2168 B (+2168 B), count=7 (+7), average=310 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:274: size=2137 B (+2137 B), count=11 (+11), average=194 B/home/ncoghlan/devel/py34/Lib/ctypes/__init__.py:99: size=2071 B (+2071 B), count=10 (+10), average=207 Btest_codecs leaked [5602] references, sum=5602test_codecs leaked [1056] memory blocks, sum=10561 test failed: test_codecs | |||
| msg225268 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-08-13 08:21 | |
Still not sure what is going on, but I tweaked my tracemalloc hacks based on Victor's draft patch in 19816 (his are designed to make this optional, where mine are hardcoded. I also sort by number of allocations, rather than amount of memory) | |||
| msg225270 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-08-13 08:43 | |
I assigned this to myself when I thought this was going to be an easy fix (since I knew encodings._cache wasn't getting cleared), but it looks substantially more complicated than that.I checked the codec entry was getting cleared by tearDown() correctly, and explicitly clearing the cycle between the test case and the codec implementation didn't help anything.Restoring the cache to its original state did help, but only a little bit.It may be one of the other test cases that's the problem, or else there may be something genuinely odd going on with the exception chaining and local variables. | |||
| msg226252 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-09-02 00:19 | |
ExceptionChainingTest uses a function to raise an arbitrary exception. Problem: the function and its parameter becomes part of the exception traceback. Shortly, the exception links indirectly to itself in a nice reference cycle...Example:---import tracebackclass Err(Exception): def __del__(self): print("del")def raise_exc(exc): raise exce = Err()try: raise_exc(e)except Exception as e: t = e.__traceback__#traceback.clear_frames(t)e = Noneprint("exit")---Uncommand the call the clear_frames() to see the exception deleted before the exit. | |||
| msg226253 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-09-02 00:24 | |
Attached: Draft patch to fix some reference leaks in test_codecs. | |||
| msg226254 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-09-02 00:36 | |
ExceptionChainingTest creates a random codec name. If you change the codec name to a fixed string like "xxx", ExceptionChainingTest.test_raise_by_type() doesn't leak anymore (when test_codecs_fix1.patch is applied), but other tests start to fail.We should maybe enhance the codecs API to be able to unregister a codec? | |||
| msg226259 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2014-09-02 01:59 | |
> We should maybe enhance the codecs API to be able to unregister a codec?That would be nice. At least as a private API in 3.4. | |||
| msg226262 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-02 07:02 | |
Yeah, the unique name was a hack to deal with the fact I couldn't unregister anything.As a simpler short term fix for the reference leak, perhaps it would be worth trying a call to "traceback.clear_frames(caught)" as a new final line in ExceptionChainingTest.assertWrapped? | |||
| msg226485 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-06 12:34 | |
Ah, I think I've found a possible underlying culprit: there's a separate C level "codec_search_cache" on the interpreter object that is independent of encodings._cache.The attached patch eliminates the reference leaks entirely (using gc.get_referrers() to get a reference to the otherwise inaccessible internal interpreter cache).However, the patch has a bunch of changes that may not be necessary if that cache is dealt with properly, so I'm going to revert everything, and *start* with fixing the cache cleanup. | |||
| msg226486 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-06 12:45 | |
Yep, it looks like the mess with the reference cycles in the tracebacks was just a secondary problem: the real culprit was the internal cache keeping everything else alive.Attached patch just adds clearing encodings._cache and the CPython internal cache to the ExceptionChainingTest.tearDown() method.The latter is accomplised via gc.get_referrers() and popping the codec's name from any dict remaining in that list after clearing the known caches, so it will actually handle any hidden name based caches on any interpreter. | |||
| msg226488 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2014-09-06 12:55 | |
I'd much rather see a private function somewhere (in the _codecs module) to clear that private interpreter cache. | |||
| msg226489 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-06 13:03 | |
Yeah, I'm wary of the GC hack as well. It's a nice way to prove that this the internal cache is the problem, but probably not something we want to commit.A "codecs._forget_codec" helper in the C module would be all that was needed to handle it. Other implementations could then also use that to clear their own internal cache (if they have one). | |||
| msg226490 -(view) | Author: Marc-Andre Lemburg (lemburg)*![]() | Date: 2014-09-06 13:15 | |
On 06.09.2014 15:03, Nick Coghlan wrote:> A "codecs._forget_codec" helper in the C module would be all that was needed to handle it. Other implementations could then also use that to clear their own internal cache (if they have one).If you want to take that approach, please call that functioncodecs._clear_lookup_cache() and also add a new C API_PyCodecRegistry_ClearLookupCache() toPython/codecs.c.Thanks,-- Marc-Andre LemburgeGenix.com | |||
| msg226494 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-09-06 16:06 | |
May be just do not use the unique name? | |||
| msg226505 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-06 19:14 | |
This investigation also showed why the original tests that used anon-unique name didn't work: the codec was getting cached the first timethrough and hence not seeing the different codecs registered by other tests.MAL, my plan was to provide an API to forget a specific named codec ratherthan flush the whole cache. I agree a private C API would also be a goodidea, though. | |||
| msg226512 -(view) | Author: Marc-Andre Lemburg (lemburg)*![]() | Date: 2014-09-06 20:07 | |
On 06.09.2014 21:14, Nick Coghlan wrote:> > This investigation also showed why the original tests that used a> non-unique name didn't work: the codec was getting cached the first time> through and hence not seeing the different codecs registered by other tests.> > MAL, my plan was to provide an API to forget a specific named codec rather> than flush the whole cache. I agree a private C API would also be a good> idea, though.I don't think such a specific API for only forgetting a single codecis useful outside this particular test case, since you typicallydon't know the names of the cached codecs. | |||
| msg226513 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2014-09-06 20:09 | |
Well, the point here is to add a private API just for our test suite, not something users would want to call. | |||
| msg226520 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2014-09-06 22:32 | |
Targeted API to forget a particular codec name.Normally we don't care about the caching (as evidenced by the fact it took 14+ years to want this feature).The intent with this API is to be able to selective purge a particular codec when we either care about getting the latest version of a specific codec (no current use cases) or to avoid the appearance of leaking references when looking up lots of custom codecs (this case).I considered dropping the per-test naming scheme, but decided I liked the fact it kept these test instances orthogonal, even though they all use the shared TEST_CODECS registry. The lack of global side effects is also a benefit of just popping the named codec, rather than purging the entire cache. | |||
| msg226911 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2014-09-15 11:55 | |
New changesetfcf45ec7863e by Nick Coghlan in branch '3.4':Issue#22166: clear codec caches in test_codecshttps://hg.python.org/cpython/rev/fcf45ec7863eNew changeset322ee2f2e922 by Nick Coghlan in branch 'default':Merge fix for issue#22166 from 3.4https://hg.python.org/cpython/rev/322ee2f2e922 | |||
| msg226914 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-09-15 13:19 | |
IMO test_codecs_fix1.patch is still needed.Review of Nick's change:+ interp = PyThreadState_GET()->interp;+ if (interp->codec_search_path == NULL) {+ return -1;+ }I don't think that this line raises an exception. You should use an assertion here (or raises an exception, but it would be overkill).+ encodings._cache.pop(self.codec_name, None)+ try:+ _forget_codec(self.codec_name)+ except KeyError:+ passYou may move these lines to a private encoding._forget() function.By the way, codecs._forget() should maybe catch and ignore KeyError?> _TEST_CODECS.pop(self.codec_name, None)This line may be moved to set_codec() using self.addCleanup(). (Well, it's not directly related to your change.) | |||
| msg226915 -(view) | Author: Marc-Andre Lemburg (lemburg)*![]() | Date: 2014-09-15 13:29 | |
On 15.09.2014 15:19, STINNER Victor wrote:> > STINNER Victor added the comment:> > IMO test_codecs_fix1.patch is still needed.> > Review of Nick's change:> > + interp = PyThreadState_GET()->interp;> + if (interp->codec_search_path == NULL) {> + return -1;> + }> > I don't think that this line raises an exception.Agreed.Please also add some documentation to the codecs.h file andperhaps a test for the function itself ;-) | |||
| msg245897 -(view) | Author: Alyssa Coghlan (ncoghlan)*![]() | Date: 2015-06-28 03:21 | |
Victor, would you be willing to take over pushing this one forward? I believe you have a much better idea of what's still needed than I do. | |||
| msg331925 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2018-12-16 12:34 | |
Do you mind to create a PR Victor? | |||
| msg331967 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2018-12-17 10:24 | |
> Do you mind to create a PR Victor?I completely forgot this issue and I don't plan to fix it in the short-term. Feel free to write PR if you want. | |||
| msg338026 -(view) | Author: Stéphane Wirtel (matrixise)*![]() | Date: 2019-03-15 20:09 | |
For 3.7 and 3.8, I have just tried with this command to find leaks in the test_codec* but nothing :/./python -m test -v -l -R 6:6 -u all \ test_codecencodings_iso2022 \ test_codecs \ test_codecmaps_hk test_codecmaps_tw \ test_codecencodings_tw \ test_codecencodings_cn \ test_codeccallbacks \ test_codecencodings_hk \ test_codecencodings_jp \ test_codecencodings_kr \ test_codecmaps_jp \ test_codecmaps_cn \ test_codecmaps_krIf you have another technic for the detection of leaks, please inform me but I think this issue could be closed maybe we have fixed the issue with time. | |||
| msg338437 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2019-03-20 02:49 | |
Codec tests don't leak anymore. I lost track of this very old issue. I don't think that anything should be done. Python code base changed a lot since 2014. I close the issue. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:06 | admin | set | github: 66362 |
| 2019-03-20 02:49:33 | vstinner | set | status: open -> closed resolution: fixed messages: +msg338437 stage: needs patch -> resolved |
| 2019-03-15 20:09:07 | matrixise | set | nosy: +matrixise messages: +msg338026 |
| 2018-12-20 20:21:28 | brett.cannon | set | nosy: -brett.cannon |
| 2018-12-19 09:41:30 | serhiy.storchaka | set | assignee:serhiy.storchaka |
| 2018-12-17 10:24:35 | vstinner | set | messages: +msg331967 |
| 2018-12-16 12:34:17 | serhiy.storchaka | set | messages: +msg331925 |
| 2018-12-16 12:33:20 | serhiy.storchaka | set | versions: + Python 3.7, Python 3.8, - Python 3.4, Python 3.5 |
| 2015-06-28 03:21:22 | ncoghlan | set | assignee:ncoghlan -> (no value) messages: +msg245897 |
| 2014-12-30 08:32:34 | serhiy.storchaka | set | stage: resolved -> needs patch |
| 2014-09-23 09:42:32 | ncoghlan | set | assignee:ncoghlan |
| 2014-09-15 13:29:44 | lemburg | set | messages: +msg226915 |
| 2014-09-15 13:19:10 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages: +msg226914 |
| 2014-09-15 11:56:22 | ncoghlan | set | status: open -> closed resolution: fixed stage: resolved |
| 2014-09-15 11:55:36 | python-dev | set | nosy: +python-dev messages: +msg226911 |
| 2014-09-06 22:32:51 | ncoghlan | set | files: +issue22166_with_forget_codec_helper_api.diff messages: +msg226520 |
| 2014-09-06 20:09:48 | pitrou | set | messages: +msg226513 |
| 2014-09-06 20:07:45 | lemburg | set | messages: +msg226512 |
| 2014-09-06 19:14:09 | ncoghlan | set | messages: +msg226505 |
| 2014-09-06 16:06:19 | serhiy.storchaka | set | nosy: +serhiy.storchaka messages: +msg226494 |
| 2014-09-06 13:15:57 | lemburg | set | nosy: +lemburg messages: +msg226490 |
| 2014-09-06 13:03:10 | ncoghlan | set | messages: +msg226489 |
| 2014-09-06 12:55:31 | pitrou | set | messages: +msg226488 |
| 2014-09-06 12:45:10 | ncoghlan | set | files: +issue22166_clear_internal_cache.diff messages: +msg226486 |
| 2014-09-06 12:34:42 | ncoghlan | set | files: +issue22166_no_leaks_too_many_changes.diff messages: +msg226485 |
| 2014-09-02 07:02:29 | ncoghlan | set | messages: +msg226262 |
| 2014-09-02 01:59:25 | pitrou | set | nosy: +pitrou messages: +msg226259 |
| 2014-09-02 00:36:16 | vstinner | set | messages: +msg226254 |
| 2014-09-02 00:24:02 | vstinner | set | files: +test_codecs_fix1.patch messages: +msg226253 title: test_codecs "leaking" references -> test_codecs leaks references |
| 2014-09-02 00:19:18 | vstinner | set | messages: +msg226252 |
| 2014-08-13 08:43:10 | ncoghlan | set | assignee:ncoghlan -> (no value) messages: +msg225270 |
| 2014-08-13 08:21:03 | ncoghlan | set | files: +issue22166_debugging_codec_test_leak_v2.diff messages: +msg225268 |
| 2014-08-13 07:26:14 | ncoghlan | set | files: +issue22166_debugging_codec_test_leak.diff nosy: +vstinner messages: +msg225263 assignee:ncoghlan keywords: +patch |
| 2014-08-07 20:49:27 | zach.ware | create | |