Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Bug]:Tcl_AsyncDelete: async handler deleted by the wrong thread despite being on the main thread #27713

Open
@bluenote10

Description

@bluenote10

Bug summary

I recently started running into the infamousTcl_AsyncDelete: async handler deleted by the wrong thread crash quite frequently despite using matplotlib strictly from the main thread.

This occurs every time when using a pytorchDataLoader in multi-processing mode in combination with doing some plotting. I'm aware that matplotlib itself is not thread safe. Note however that the multiprocessing usage is internal to pytorch. The usage of matplotlib plotting is purely from the main process/thread, so there shouldn't be such an interference between pytorch and matplotlib leading to crashes.

Also, theusual work-around of enforcing theagg backend is not applicable in my use cases: In some cases I do want to do some interactive plotting during training neural networks, because it offers more possibilities to inspect what a neural model is doing exactly. (And not using theDataLoader in multi-processing mode slows down the training a lot.)

The pytorch version for reproduction is 2.1.2.

Code for reproduction

"""This example needs pytorch, i.e., `pip install torch==2.1.2`.The example reflects a minimal version of the use case patternthat is causing the crashes."""importosimportthreadingimportmatplotlibimportmatplotlib.pyplotaspltimporttorchfromtorch.utils.dataimportDataLoader,DatasetclassDummyDataset(Dataset):def__len__(self)->int:return1000def__getitem__(self,idx:int)->dict[str,torch.Tensor]:returndict(X=torch.zeros(10),        )defplot_something(interactive:bool=False)->None:print(f"[plot] pid ={os.getpid()}, thread id ={threading.get_ident()}")fig,ax=plt.subplots(1,1,figsize=(14,10))ax.plot([0.0]*1000)fig.savefig("/tmp/test.png")ifinteractive:plt.show()plt.close(fig)print("Matplotlib backend:",matplotlib.get_backend())print(f"[main] pid ={os.getpid()}, thread id ={threading.get_ident()}")dataset=DummyDataset()data_loader=DataLoader(dataset,batch_size=16,num_workers=4)whileTrue:forbatchindata_loader:X=batch["X"]# Most of the time the plot should run headless (plot to file), but under# certain conditions the plot should run interactively (using `.show()`).some_condition=Falseplot_something(some_condition)

Actual outcome

The snippets runs for a while and eventually crashes. The crashes are non-deterministic in nature, but the snippet crashes for me typically after half a minute or so. The output is (truncated the messages up to the crash):

Matplotlib backend: TkAgg[main] pid = 64946, thread id = 139642307104768[plot] pid = 64946, thread id = 139642307104768[plot] pid = 64946, thread id = 139642307104768<...>[plot] pid = 64946, thread id = 139642307104768[plot] pid = 64946, thread id = 139642307104768Exception ignored in: <function Image.__del__ at 0x7f0049027250>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 4056, in __del__    self.tk.call('image', 'delete', self.name)RuntimeError: main thread is not in main loopException ignored in: <function Image.__del__ at 0x7f0049027250>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 4056, in __del__    self.tk.call('image', 'delete', self.name)RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopException ignored in: <function Variable.__del__ at 0x7f0048f97ac0>Traceback (most recent call last):  File "/usr/lib/python3.10/tkinter/__init__.py", line 388, in __del__    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):RuntimeError: main thread is not in main loopTcl_AsyncDelete: async handler deleted by the wrong threadTcl_AsyncDelete: async handler deleted by the wrong threadTraceback (most recent call last):  File "/home/fabian/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1132, in _try_get_data    data = self._data_queue.get(timeout=timeout)  File "/usr/lib/python3.10/multiprocessing/queues.py", line 113, in get    if not self._poll(timeout):  File "/usr/lib/python3.10/multiprocessing/connection.py", line 257, in poll    return self._poll(timeout)  File "/usr/lib/python3.10/multiprocessing/connection.py", line 424, in _poll    r = wait([self], timeout)  File "/usr/lib/python3.10/multiprocessing/connection.py", line 931, in wait    ready = selector.select(timeout)  File "/usr/lib/python3.10/selectors.py", line 416, in select    fd_event_list = self._selector.poll(timeout)  File "/home/me/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler    _error_if_any_worker_fails()RuntimeError: DataLoader worker (pid 67648) is killed by signal: Aborted. The above exception was the direct cause of the following exception:Traceback (most recent call last):  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main    return _run_code(code, main_globals, None,  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code    exec(code, run_globals)  File "/home/me/git/repo/_debug_matplotlib_crash.py", line 40, in <module>    for batch in data_loader:  File "/home/me/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__    data = self._next_data()  File "/home/me/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1328, in _next_data    idx, data = self._get_data()  File "/home/me/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1294, in _get_data    success, data = self._try_get_data()  File "/home/me/git/repo/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1145, in _try_get_data    raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from eRuntimeError: DataLoader worker (pid(s) 67648) exited unexpectedly

Note that the reported pid / thread id for the plotting code is obviously just the main pid / thread id.

Expected outcome

It should not crash.

Additional information

Has this worked in earlier versions?

I'm not sure, but it may have worked, because I'm using this pattern for a long time, and it only started crashing recently. However due to the non-deterministic nature of the crash it is hard to verify. It may just be that my previous usages of the pattern where not plotting "aggressively" enough.

Operating system

Ubuntu 22.04

Matplotlib Version

3.8.2

Matplotlib Backend

TkAgg

Python version

Python 3.10.12

Jupyter version

none

Installation

pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp