NotificationsYou must be signed in to change notification settings
Fork34k
Star71.3k

multiprocessing forkserver does not flush output before fork (was: preloading 'main' with forkserver has been broken for a long time) #98552

Closed

multiprocessing forkserver does not flush output before fork (was: preloading '__main__' with forkserver has been broken for a long time)#98552

Assignees

Labels

3.11only security fixes3.12only security fixesstdlibStandard Library Python modules in the Lib/ directorytopic-multiprocessingtype-bugAn unexpected behavior, bug, or error

Description

aggieNick02

opened

on Oct 22, 2022

Bug report

Theforkserver start method provides the ability to callset_forkserver_preload on the multiprocessing context to load modules into and configure the forkserver process. By doing this carefully, you can avoid having to do module loading and other work each time the forkserver process is forked to create a new process. Without doing such work,forkserver can be way slower than the traditionalfork start method

You can specify the module'__main__' in theset_forkserver_preload list, and the forkserver source has special code when you do this. It ensures that the main file path does not have to be configured/loaded after each fork. To do this, inmultiprocessing.forkserver.ensure_running, it callsmultiprocessing.spawn.get_preparation_data and then uses themain_path entry that may be in the returned dict.

Unfortunately, 3 months after it was introduced, this functionality was broken in commit9a76735. That commit renamed themain_path dictionary entry returned inget_preparation_data toinit_main_from_path, but didn't update the use inmultiprocessing.fork_server

Not having the ability to load and configure main on the forkserver ends up being unusually painful for my recent scenario, which led to tracking this down. I have a python program on a share that spawns short-lived processes at a high rate. Then multiple machines run this program from the share. Huge slowdown ensues as smbd processes on the server go crazy responding to every new process on every client reading the file and stat-ing the directory the file is contained in.

A simple fix inmultiprocessing.forkserver accounting for the changed name rectifies the problem. I'll work on putting that PR together. Any thoughts on a workaround that doesn't require modifying the python source are welcome, as I imagine it will be a while until I'm on a python with the fix.

Here's a simple repro.

import timeimport multiprocessingprint('hi from forkserver_repro')def _silly(i):    time.sleep(0.2)    return i%2def run_subprocesses():    process_list=[]    for i in range(10):        process_list.append(multiprocessing.Process(target=_silly,args=[i]))    for process in process_list:        process.start()    for process in process_list:        process.join()    return 0if __name__ == '__main__':    multiprocessing.set_start_method('forkserver')    ctx = multiprocessing.get_context('forkserver')    ctx.set_forkserver_preload(['__main__',])    print('Only one more "hi from forkserver_repro" should print! More means a bug!')    run_subprocesses()