Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

multiprocessing forkserver does not flush output before fork (was: preloading '__main__' with forkserver has been broken for a long time) #98552

Closed
Assignees
gpshead
Labels
3.11only security fixes3.12only security fixesstdlibStandard Library Python modules in the Lib/ directorytopic-multiprocessingtype-bugAn unexpected behavior, bug, or error
@aggieNick02

Description

@aggieNick02

Bug report

Theforkserver start method provides the ability to callset_forkserver_preload on the multiprocessing context to load modules into and configure the forkserver process. By doing this carefully, you can avoid having to do module loading and other work each time the forkserver process is forked to create a new process. Without doing such work,forkserver can be way slower than the traditionalfork start method

You can specify the module'__main__' in theset_forkserver_preload list, and the forkserver source has special code when you do this. It ensures that the main file path does not have to be configured/loaded after each fork. To do this, inmultiprocessing.forkserver.ensure_running, it callsmultiprocessing.spawn.get_preparation_data and then uses themain_path entry that may be in the returned dict.

Unfortunately, 3 months after it was introduced, this functionality was broken in commit9a76735. That commit renamed themain_path dictionary entry returned inget_preparation_data toinit_main_from_path, but didn't update the use inmultiprocessing.fork_server

Not having the ability to load and configure main on the forkserver ends up being unusually painful for my recent scenario, which led to tracking this down. I have a python program on a share that spawns short-lived processes at a high rate. Then multiple machines run this program from the share. Huge slowdown ensues as smbd processes on the server go crazy responding to every new process on every client reading the file and stat-ing the directory the file is contained in.

A simple fix inmultiprocessing.forkserver accounting for the changed name rectifies the problem. I'll work on putting that PR together. Any thoughts on a workaround that doesn't require modifying the python source are welcome, as I imagine it will be a while until I'm on a python with the fix.

Here's a simple repro.

import timeimport multiprocessingprint('hi from forkserver_repro')def _silly(i):    time.sleep(0.2)    return i%2def run_subprocesses():    process_list=[]    for i in range(10):        process_list.append(multiprocessing.Process(target=_silly,args=[i]))    for process in process_list:        process.start()    for process in process_list:        process.join()    return 0if __name__ == '__main__':    multiprocessing.set_start_method('forkserver')    ctx = multiprocessing.get_context('forkserver')    ctx.set_forkserver_preload(['__main__',])    print('Only one more "hi from forkserver_repro" should print! More means a bug!')    run_subprocesses()

Your environment

Ubuntu 20.04.4 LTS, CPython 3.8
Python source code examination indicates this bug is still present in the current version of CPython.

Metadata

Metadata

Assignees

Labels

3.11only security fixes3.12only security fixesstdlibStandard Library Python modules in the Lib/ directorytopic-multiprocessingtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions


    [8]ページ先頭

    ©2009-2026 Movatter.jp