Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34k
Description
Bug report
Theforkserver start method provides the ability to callset_forkserver_preload on the multiprocessing context to load modules into and configure the forkserver process. By doing this carefully, you can avoid having to do module loading and other work each time the forkserver process is forked to create a new process. Without doing such work,forkserver can be way slower than the traditionalfork start method
You can specify the module'__main__' in theset_forkserver_preload list, and the forkserver source has special code when you do this. It ensures that the main file path does not have to be configured/loaded after each fork. To do this, inmultiprocessing.forkserver.ensure_running, it callsmultiprocessing.spawn.get_preparation_data and then uses themain_path entry that may be in the returned dict.
Unfortunately, 3 months after it was introduced, this functionality was broken in commit9a76735. That commit renamed themain_path dictionary entry returned inget_preparation_data toinit_main_from_path, but didn't update the use inmultiprocessing.fork_server
Not having the ability to load and configure main on the forkserver ends up being unusually painful for my recent scenario, which led to tracking this down. I have a python program on a share that spawns short-lived processes at a high rate. Then multiple machines run this program from the share. Huge slowdown ensues as smbd processes on the server go crazy responding to every new process on every client reading the file and stat-ing the directory the file is contained in.
A simple fix inmultiprocessing.forkserver accounting for the changed name rectifies the problem. I'll work on putting that PR together. Any thoughts on a workaround that doesn't require modifying the python source are welcome, as I imagine it will be a while until I'm on a python with the fix.
Here's a simple repro.
import timeimport multiprocessingprint('hi from forkserver_repro')def _silly(i): time.sleep(0.2) return i%2def run_subprocesses(): process_list=[] for i in range(10): process_list.append(multiprocessing.Process(target=_silly,args=[i])) for process in process_list: process.start() for process in process_list: process.join() return 0if __name__ == '__main__': multiprocessing.set_start_method('forkserver') ctx = multiprocessing.get_context('forkserver') ctx.set_forkserver_preload(['__main__',]) print('Only one more "hi from forkserver_repro" should print! More means a bug!') run_subprocesses()Your environment
Ubuntu 20.04.4 LTS, CPython 3.8
Python source code examination indicates this bug is still present in the current version of CPython.
- PR:[3.14] gh-98552: flush std streams in the multiprocessing forkserver before fork (GH-141849) #141851