Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use less aggressive garbage collection#3045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
tacaswell merged 1 commit intomatplotlib:masterfromkmike:less-gc-collect
Jul 5, 2014

Conversation

kmike
Copy link
Contributor

For me itfixes#3044.

@kmikekmike changed the titleUse less aggressive garbage collection.Use less aggressive garbage collectionMay 6, 2014
@tacaswell
Copy link
Member

Does this actually garbage collect the mpl objects? The artists, axes, and figures tend to end up with circular references.

@kmike
Copy link
ContributorAuthor

This is what happens in current matplotlib:http://nbviewer.ipython.org/urls/gist.githubusercontent.com/kmike/93102e3fdf75dfc29631/raw/matplotlib-gc3.ipynb

This is the same example with nogc.collect() in matplotlib:http://nbviewer.ipython.org/urls/gist.githubusercontent.com/kmike/93102e3fdf75dfc29631/raw/matplotlib-gc2.ipynb

This is an example withgc.collect(1) in matplotlib:http://nbviewer.ipython.org/gist/kmike/93102e3fdf75dfc29631/matplotlib-gc.ipynb (note that subsequentgc.collect doesn't collect anything new).

So yes, I thinkgc.collect(1) collects some mpl objects. But note that the number of objects increases by about 5k after each plotting call even ifgc.collect() is called. This is what happens:

In [28]:cnt = Counter(     cls for cls in     [getattr(obj, '__class__', type(obj)) for obj in gc.get_objects()]    if 'matplotlib' in str(cls))cnt.most_common(20)Out[28]:[(matplotlib.path.Path, 486), (matplotlib.transforms.Bbox, 382), (matplotlib.font_manager.FontEntry, 359), (matplotlib.transforms.CompositeGenericTransform, 324), (matplotlib.transforms.Affine2D, 320), (matplotlib.lines.Line2D, 216), (matplotlib.markers.MarkerStyle, 216), (matplotlib.transforms.IdentityTransform, 193), (matplotlib.text.Text, 172), (matplotlib.font_manager.FontProperties, 164), (matplotlib.transforms.BboxTransformTo, 160), (matplotlib.colors.LinearSegmentedColormap, 144), (matplotlib.transforms.TransformedBbox, 116), (matplotlib.transforms.TransformedPath, 104), (matplotlib.transforms.ScaledTranslation, 52), (matplotlib.patches.Rectangle, 48), (matplotlib.axis.XTick, 40), (matplotlib.axis.YTick, 32), (matplotlib.cbook.maxdict, 21), (matplotlib.cbook.CallbackRegistry, 20)]

Numbers increase after each "hist" call followed by fullgc.collect(). So it seemsgc.collect() doesn't really help, and it can be very slow. It collects some items, andgc.collect(1) seems to collect the same items.

There is a lot of moving parts (IPython, matplotlib, gc, various plotting methods, etc) so I may miss something, and the analysis can be incorrect. But running full garbage collection that user can't control is a bad decision IMHO. If leftovers from matplotlib is an issue it is always possible to rungc.collect() manually and get the same results.

gc.collect(1) is a compromiss that seems to fix most of the problemsgc.collect() fixes without its huge overhead in presence of long-living objects.

@tacaswelltacaswell added this to thev1.4.0 milestoneMay 7, 2014
@tacaswell
Copy link
Member

Thank you for digging down into this.

This looks fine to me, but think@mdboom or@efiring should take a look at it as well.

@mdboom
Copy link
Member

@kmike: In your examples, can you clear the figure? An internal reference is still held to it in these examples... That should, at least by design, result in no leaks from the matplotlib side (though IPython may still hold references to the results of each cell). You might also try this testing outside of IPython to remove that as a factor when testing.

@kmike
Copy link
ContributorAuthor

@mdboom I can try to do it, but are you suggesting to keepgc.collect() if it collects more objects thangc.collect(1)? The disadvantage ofgc.collect() is that it checks all objects in memory, it is potentially a costly operation, whilegc.collect(1) is bounded.

@mdboom
Copy link
Member

clf() should leave no remaining objects around, at least by design. I haven't investigated in some time whether that's still the case. If making this change keeps more matplotlib objects around than it is a no-go. A speed penalty is always better than an unbounded memory leak. If that's the case, we may need to find another solution to the speed problem -- by introducing moreweakref's where appropriate or refactoring the code to reduce the number of cyclical references.

@tacaswell
Copy link
Member

My understanding is that the speed cost is due to large numbers ofuser objects makinggc.collect() take a long time.

@WeatherGod
Copy link
Member

Yeah, that is my understanding too. I think the OP makes a good point.
Calling clf() has side-effects (mostly benign, but still real).
Unfortunately, I am not savy enough on the gc module to understand the
implications of calling collect(1). The documentation merely refers to
"generations", but I have no clue what that means.

On Wed, May 7, 2014 at 1:27 PM, Thomas A Caswell
notifications@github.comwrote:

My understanding is that the speed cost is due to large numbers of _user_objects making
gc.collect() take a long time.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3045#issuecomment-42456997
.

@efiring
Copy link
Member

The docstring for gc.set_threshold() "explains" the generations and the collection scheme, but that still leaves me with a less-than-complete understanding. My interpretation is that collect(0) will look at objects that have not been checked previously; collect(1) will look at objects that have been checked exactly once; collect(2) at objects that have been checked twice or more; and collect() will look at all objects, by going through each of the three generation lists. It is not clear whether collect(1) actually operates on the generation 0 list and the generation 1 list, or only the latter. If only the latter, it would not seem to be very useful when used in isolation.
The thresholds that I see are 700, 10, and 10, meaning collect(0) is run automatically only when there have been 700 more allocations than deallocations, irrespective of the actual amount of memory involved. Allocations of some primitive objects are not counted.

@kmike
Copy link
ContributorAuthor

gc.collect(1) checks generations <= 1.

I'd even removegc.collect altogether;gc.collect(1) is just to be a bit conservative.

It is true that unbounded memory leak is worse than a speed penalty, but now we have an O(N) speed penalty where N is a number of alive user objects, not a number of matplotlib objects, and the leak is not really a leak because objects will be eventually collected - in the worst case users can callgc.collect() themselves to make this happen sooner. As for the speed penalty - they can do nothing.

A "leak" is in kilobytes of temporary allocated memory per chart (maybe megabytes in pathological cases); the speed penalty can be seconds of wait time for each executed IPython cell (seeipython/ipython#5795).

@tacaswell
Copy link
Member

I am in favor of merging this.

I think this explains some issues I was having with my code at the end of grad school but due to the need to graduate didn't take the time to track down.

@tacaswell
Copy link
Member

@mdboom@efiring What do you want to do about this? I am in favor of merging.

@efiring
Copy link
Member

Responding to@mdboom's last comment: Anything we can do to ensure prompt release of memory is a good move in general, but it doesn't address the OP's problem, which is thatgc.collect() can be damagingly slow if there is a huge number of user objects, regardless of whether any of them are actually collectible. I think that in the OP's case, these objects are not created by the plotting, so they are not under our control.

The problem addressed by this PR can be quite bad; I am in favor of giving it a try. It would certainly be good to have a clearer understanding of when, if ever in practice, it would lead to troublesome increases in memory consumption. My impression is that this should be very rare, so the tradeoff is worthwhile.

tacaswell added a commit that referenced this pull requestJul 5, 2014
Use less aggressive garbage collection
@tacaswelltacaswell merged commitb4a678a intomatplotlib:masterJul 5, 2014
tacaswell added a commit to tacaswell/matplotlib that referenced this pull requestAug 23, 2022
Matplotlib has a large number of circular references (between figure andmanager, between axes and figure, axes and artist, figure and canvas, and ...)so when the user drops their last reference to a `Figure` (and clears it frompyplot's state), the objects will not immediately deleted.To account for this we have long (goes back toe34a333 the "reorganize code" commit in 2004which is the end of history for much of the code) had a `gc.collect()` in theclose logic in order to promptly clean up after our selves.However, unconditionally calling `gc.collect` and be a major performanceissue (seematplotlib#3044 andmatplotlib#3045) because if there are alarge number of long-lived user objects Python will spend a lot of timechecking objects that are not going away are never going away.Instead of doing a full collection we switched to clearing out the lowest twogenerations.  However this both not doing what we want (as most of our objectswill actually survive) and due to clearing out the first generation opened usup to having unbounded memory usage.In cases with a very tight loop between creating the figure and destroyingit (e.g. `plt.figure(); plt.close()`) the first generation will never growlarge enough for Python to consider running the collection on the highergenerations.  This will lead to un-bounded memory usage as the long-livedobjects are never re-considered to look for reference cycles and hence arenever deleted because their reference counts will never go to zero.closesmatplotlib#23701
melissawm pushed a commit to melissawm/matplotlib that referenced this pull requestDec 19, 2022
Matplotlib has a large number of circular references (between figure andmanager, between axes and figure, axes and artist, figure and canvas, and ...)so when the user drops their last reference to a `Figure` (and clears it frompyplot's state), the objects will not immediately deleted.To account for this we have long (goes back toe34a333 the "reorganize code" commit in 2004which is the end of history for much of the code) had a `gc.collect()` in theclose logic in order to promptly clean up after our selves.However, unconditionally calling `gc.collect` and be a major performanceissue (seematplotlib#3044 andmatplotlib#3045) because if there are alarge number of long-lived user objects Python will spend a lot of timechecking objects that are not going away are never going away.Instead of doing a full collection we switched to clearing out the lowest twogenerations.  However this both not doing what we want (as most of our objectswill actually survive) and due to clearing out the first generation opened usup to having unbounded memory usage.In cases with a very tight loop between creating the figure and destroyingit (e.g. `plt.figure(); plt.close()`) the first generation will never growlarge enough for Python to consider running the collection on the highergenerations.  This will lead to un-bounded memory usage as the long-livedobjects are never re-considered to look for reference cycles and hence arenever deleted because their reference counts will never go to zero.closesmatplotlib#23701
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
v1.4.0
Development

Successfully merging this pull request may close these issues.

matplotlib shouldn't call gc.collect()
6 participants
@kmike@tacaswell@mdboom@WeatherGod@efiring@QuLogic

[8]ページ先頭

©2009-2025 Movatter.jp