Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Description
Solution
Bug summary
I work with a large 1D dataset. For a statistical analysis, I am using the bootstrap method, resampling it many times.
I am interested inlooping over all cases in order to put together on a single figure a specific result for all resamplings.
Memory issues take place though (e.g. not freed before the very end of the script, or even leaks).
Here I document some things that at least partially address the issue. None is fully satisfactory though.
I am running the same script both from Python and as a Jupyter notebook (synchronised via jupytext). I am trying to get rid of the memory issues in both cases (the RAM usage easily reaches 16–32 GB once I start playing with enough data).
Code for reproduction
importmatplotlib.pyplotaspltimportnumpyasnpda=np.sort(np.random.random(int(3e6)))# Beware: please lower this if your system has less than 32 GBdefcustom_plot(da,**kwargs):"""da is a 1-D ordered xarray.DataArray or numpy array (containing tons of data)"""plt.yscale('log')n=len(da)returnplt.plot(da,1- (np.arange(1,n+1)/ (n+1)),**kwargs);defresampling_method(da,case):""" A complex thing in reality but for the MWE, let us simply return da itself. It will lead to the same memory problem. """returndaplt.figure(figsize=(15,8),dpi=150)plt.ylim((1e-10,1))forcaseinnp.arange(50):custom_plot(resampling_method(da,case))# each time getting the curve for a different resampling of dacustom_plot(da)# curve for the original daplt.savefig("output.png")plt.show()importgcgc.collect();print("Technically the programme would continue with more calculations.")print("Notice how the memory won't be freed however until the entire script is finished.")importtimetime.sleep(120)# This simulates the fact that the programme continues with other calculations.print("Now the programme exits")
Actual outcome
Memory issues are taking place no matter what I've tried so far. Depending on what is being attempted, it can lead to the memory either not being freed after the plot has been shown/is closed, or even memory leaks and massive swap usage.
Expected outcome
Memory freed well before the end of the programme. I would expect it to be freed soon after the figure is closed.
Additional information
NB: I did also try many other things (incl.plt.cla
and the like), as well as changing backend (notably "Agg" and "Qt5Agg") but that did not solve the problem in the slightest, so I won't document them.
Things that have some effect
- If you do
plt.show()
- It will show the plot as a new window when run from the terminal with Python but the memory usage related to the figure won't be freed after that. It will remain in use until the end of the entire script.
- it will be freed in Jupyter soon after displaying the figure however.
plt.show()importgcgc.collect();
- If you use
block=False
,time.sleep
andclose('all')
, the memory will be freed after the plot has been created both with Jupyter and Python. However, in Python, a window will be created (stealing focus) and nothing will ever appear in it (it will be closed after 5 seconds). It'd therefore be tempting to comment outplt.show(block=False)
but if you do, Jupyter will no longer clear the memory...
plt.show(block=False)## If you comment this out, then Jupyter will not clear the memory..importtimetime.sleep(5)plt.close('all')importgcgc.collect();
- Given what precedes, let us check whether Jupyter or Python is being used.
deftype_of_script():"""source: https://stackoverflow.com/a/47428575/452522"""try:ipy_str=str(type(get_ipython()))if'zmqshell'inipy_str:return'jupyter'if'terminal'inipy_str:return'ipython'except:return'terminal'iftype_of_script()=="jupyter":plt.show(block=False)## If you comment this out, then Jupyter will not clear the memory..else:passimporttimetime.sleep(5)plt.close('all')importgcgc.collect();
With this:
- Jupyter will create a file and will also display the figure inline.
- Python will only create a file and won't try to show a window
Both will clean the memory after that figure has been closed (or 5 seconds after rather).
This is the most satisfactory one... not exactly nice though.
Should one want to have the figure displayed when running python from CLI however, I haven't found a method
where the memory wouldn't remain in use until the very end of the entire script.
Some further notes:
There are known memory issues with matplotlib and looping, such as:http://datasideoflife.com/?p=1443
but here I do not create a at each iteration of a loop, but accumulate plots from a loop and plot the end result. The solution put forward there (i.e. useplt.close(fig)
does not work in this case).This is also distinct fromMemory leak in plt.close() when unshown figures in GUI backends are closed #20300
Operating system
Ubuntu
Matplotlib Version
3.7.3
Matplotlib Backend
module://matplotlib_inline.backend_inline (default)
Python version
3.8.10
Jupyter version
6.5.2
Installation
pip