Python Enhancement Proposals

Python »
PEP Index »
PEP 371

PEP 371 – Addition of the multiprocessing package to the standard library

Author:: Jesse Noller <jnoller at gmail.com>,Richard Oudkerk <r.m.oudkerk at googlemail.com>
Status:

Table of Contents

Abstract

This PEP proposes the inclusion of thepyProcessing[1] packageinto the Python standard library, renamed to “multiprocessing”.

Theprocessing package mimics the standard librarythreadingmodule functionality to provide a process-based approach tothreaded programming allowing end-users to dispatch multipletasks that effectively side-step the global interpreter lock.

The package also provides server and client functionality(processing.Manager) to provide remote sharing and management ofobjects and tasks so that applications may not only leveragemultiple cores on the local machine, but also distribute objectsand tasks across a cluster of networked machines.

While the distributed capabilities of the package are beneficial,the primary focus of this PEP is the core threading-like API andcapabilities of the package.

Rationale

The current CPython interpreter implements the Global InterpreterLock (GIL) and barring work in Python 3000 or other versionscurrently planned[2], the GIL will remain as-is within theCPython interpreter for the foreseeable future. While the GILitself enables clean and easy to maintain C code for theinterpreter and extensions base, it is frequently an issue forthose Python programmers who are leveraging multi-core machines.

The GIL itself prevents more than a single thread from runningwithin the interpreter at any given point in time, effectivelyremoving Python’s ability to take advantage of multi-processorsystems.

The pyprocessing package offers a method to side-step the GILallowing applications within CPython to take advantage ofmulti-core architectures without asking users to completely changetheir programming paradigm (i.e.: dropping threaded programmingfor another “concurrent” approach - Twisted, Actors, etc).

The Processing package offers CPython a “known API” which mirrorsalbeit in aPEP 8 compliant manner, that of the threading API,with known semantics and easy scalability.

In the future, the package might not be as relevant should theCPython interpreter enable “true” threading, however for someapplications, forking an OS process may sometimes be moredesirable than using lightweight threads, especially on thoseplatforms where process creation is fast and optimized.

For example, a simple threaded application:

fromthreadingimportThreadasworkerdefafunc(number):printnumber*3t=worker(target=afunc,args=(4,))t.start()t.join()

The pyprocessing package mirrored the API so well, that with asimple change of the import to:

fromprocessingimportprocessasworker

The code would now execute through the processing.process class.Obviously, with the renaming of the API toPEP 8 compliance therewould be additional renaming which would need to occur withinuser applications, however minor.

This type of compatibility means that, with a minor (in most cases)change in code, users’ applications will be able to leverage allcores and processors on a given machine for parallel execution.In many cases the pyprocessing package is even faster than thenormal threading approach for I/O bound programs. This of course,takes into account that the pyprocessing package is in optimized Ccode, while the threading module is not.

The “Distributed” Problem

In the discussion on Python-Dev about the inclusion of thispackage[3] there was confusion about the intentions this PEP withan attempt to solve the “Distributed” problem - frequentlycomparing the functionality of this package with other solutionslike MPI-based communication[4], CORBA, or other distributedobject approaches[5].

The “distributed” problem is large and varied. Each programmerworking within this domain has either very strong opinions abouttheir favorite module/method or a highly customized problem forwhich no existing solution works.

The acceptance of this package does not preclude or recommend thatprogrammers working on the “distributed” problem not examine othersolutions for their problem domain. The intent of including thispackage is to provide entry-level capabilities for localconcurrency and the basic support to spread that concurrencyacross a network of machines - although the two are not tightlycoupled, the pyprocessing package could in fact, be used inconjunction with any of the other solutions including MPI/etc.

If necessary - it is possible to completely decouple the localconcurrency abilities of the package from thenetwork-capable/shared aspects of the package. Without seriousconcerns or cause however, the author of this PEP does notrecommend that approach.

Performance Comparison

As we all know - there are “lies, damned lies, and benchmarks”.These speed comparisons, while aimed at showcasing the performanceof the pyprocessing package, are by no means comprehensive orapplicable to all possible use cases or environments. Especiallyfor those platforms with sluggish process forking timing.

All benchmarks were run using the following:

4 Core Intel Xeon CPU @ 3.00GHz
16 GB of RAM
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)
pyProcessing 0.52

All of the code for this can be downloaded fromhttp://jessenoller.com/code/bench-src.tgz

The basic method of execution for these benchmarks is in therun_benchmarks.py[6] script, which is simply a wrapper to execute atarget function through a single threaded (linear), multi-threaded(via threading), and multi-process (via pyprocessing) function fora static number of iterations with increasing numbers of executionloops and/or threads.

The run_benchmarks.py script executes each function 100 times,picking the best run of that 100 iterations via the timeit module.

First, to identify the overhead of the spawning of the workers, weexecute a function which is simply a pass statement (empty):

cmd:pythonrun_benchmarks.pyempty_func.pyImportingempty_funcStartingtests...non_threaded(1iters)0.000001secondsthreaded(1threads)0.000796secondsprocesses(1procs)0.000714secondsnon_threaded(2iters)0.000002secondsthreaded(2threads)0.001963secondsprocesses(2procs)0.001466secondsnon_threaded(4iters)0.000002secondsthreaded(4threads)0.003986secondsprocesses(4procs)0.002701secondsnon_threaded(8iters)0.000003secondsthreaded(8threads)0.007990secondsprocesses(8procs)0.005512seconds

As you can see, process forking via the pyprocessing package isfaster than the speed of building and then executing the threadedversion of the code.

The second test calculates 50000 Fibonacci numbers inside of eachthread (isolated and shared nothing):

cmd:pythonrun_benchmarks.pyfibonacci.pyImportingfibonacciStartingtests...non_threaded(1iters)0.195548secondsthreaded(1threads)0.197909secondsprocesses(1procs)0.201175secondsnon_threaded(2iters)0.397540secondsthreaded(2threads)0.397637secondsprocesses(2procs)0.204265secondsnon_threaded(4iters)0.795333secondsthreaded(4threads)0.797262secondsprocesses(4procs)0.206990secondsnon_threaded(8iters)1.591680secondsthreaded(8threads)1.596824secondsprocesses(8procs)0.417899seconds

The third test calculates the sum of all primes below 100000,again sharing nothing:

cmd:run_benchmarks.pycrunch_primes.pyImportingcrunch_primesStartingtests...non_threaded(1iters)0.495157secondsthreaded(1threads)0.522320secondsprocesses(1procs)0.523757secondsnon_threaded(2iters)1.052048secondsthreaded(2threads)1.154726secondsprocesses(2procs)0.524603secondsnon_threaded(4iters)2.104733secondsthreaded(4threads)2.455215secondsprocesses(4procs)0.530688secondsnon_threaded(8iters)4.217455secondsthreaded(8threads)5.109192secondsprocesses(8procs)1.077939seconds

The reason why tests two and three focused on pure numericcrunching is to showcase how the current threading implementationdoes hinder non-I/O applications. Obviously, these tests could beimproved to use a queue for coordination of results and chunks ofwork but that is not required to show the performance of thepackage and core processing.process module.

The next test is an I/O bound test. This is normally where we seea steep improvement in the threading module approach versus asingle-threaded approach. In this case, each worker is opening adescriptor to lorem.txt, randomly seeking within it and writinglines to /dev/null:

cmd:pythonrun_benchmarks.pyfile_io.pyImportingfile_ioStartingtests...non_threaded(1iters)0.057750secondsthreaded(1threads)0.089992secondsprocesses(1procs)0.090817secondsnon_threaded(2iters)0.180256secondsthreaded(2threads)0.329961secondsprocesses(2procs)0.096683secondsnon_threaded(4iters)0.370841secondsthreaded(4threads)1.103678secondsprocesses(4procs)0.101535secondsnon_threaded(8iters)0.749571secondsthreaded(8threads)2.437204secondsprocesses(8procs)0.203438seconds

As you can see, pyprocessing is still faster on this I/O operationthan using multiple threads. And using multiple threads is slowerthan the single threaded execution itself.

Finally, we will run a socket-based test to show network I/Operformance. This function grabs a URL from a server on the LANthat is a simple error page from tomcat. It gets the page 100times. The network is silent, and a 10G connection:

cmd:pythonrun_benchmarks.pyurl_get.pyImportingurl_getStartingtests...non_threaded(1iters)0.124774secondsthreaded(1threads)0.120478secondsprocesses(1procs)0.121404secondsnon_threaded(2iters)0.239574secondsthreaded(2threads)0.146138secondsprocesses(2procs)0.138366secondsnon_threaded(4iters)0.479159secondsthreaded(4threads)0.200985secondsprocesses(4procs)0.188847secondsnon_threaded(8iters)0.960621secondsthreaded(8threads)0.659298secondsprocesses(8procs)0.298625seconds

We finally see threaded performance surpass that ofsingle-threaded execution, but the pyprocessing package is stillfaster when increasing the number of workers. If you stay withone or two threads/workers, then the timing between threads andpyprocessing is fairly close.

One item of note however, is that there is an implicit overheadwithin the pyprocessing package’sQueue implementation due to theobject serialization.

Alec Thomas provided a short example based on therun_benchmarks.py script to demonstrate this overhead versus thedefaultQueue implementation:

cmd:run_bench_queue.pynon_threaded(1iters)0.010546secondsthreaded(1threads)0.015164secondsprocesses(1procs)0.066167secondsnon_threaded(2iters)0.020768secondsthreaded(2threads)0.041635secondsprocesses(2procs)0.084270secondsnon_threaded(4iters)0.041718secondsthreaded(4threads)0.086394secondsprocesses(4procs)0.144176secondsnon_threaded(8iters)0.083488secondsthreaded(8threads)0.184254secondsprocesses(8procs)0.302999seconds

Additional benchmarks can be found in the pyprocessing package’ssource distribution’s examples/ directory. The examples will beincluded in the package’s documentation.

Maintenance

Richard M. Oudkerk - the author of the pyprocessing package hasagreed to maintain the package within Python SVN. Jesse Nollerhas volunteered to also help maintain/document and test thepackage.

API Naming

While the aim of the package’s API is designed to closely mimic that ofthe threading andQueue modules as of python 2.x, those modules are notPEP 8 compliant. It has been decided that instead of adding the package“as is” and therefore perpetuating the non-PEP 8 compliant naming, wewill rename all APIs, classes, etc to be fullyPEP 8 compliant.

This change does affect the ease-of-drop in replacement for those usingthe threading module, but that is an acceptable side-effect in the viewof the authors, especially given that the threading module’s own APIwill change.

Issue 3042 in the tracker proposes that for Python 2.6 there will betwo APIs for the threading module - the current one, and thePEP 8compliant one. Warnings about the upcoming removal of the originaljava-style API will be issued when -3 is invoked.

In Python 3000, the threading API will becomePEP 8 compliant, whichmeans that the multiprocessing module and the threading module willagain have matching APIs.

Timing/Schedule

Some concerns have been raised about the timing/lateness of thisPEP for the 2.6 and 3.0 releases this year, however it is felt byboth the authors and others that the functionality this packageoffers surpasses the risk of inclusion.

However, taking into account the desire not to destabilizePython-core, some refactoring of pyprocessing’s code “into”Python-core can be withheld until the next 2.x/3.x releases. Thismeans that the actual risk to Python-core is minimal, and largelyconstrained to the actual package itself.

Open Issues

Confirm no “default” remote connection capabilities, if neededenable the remote security mechanisms by default for thoseclasses which offer remote capabilities.
Some of the API (Queue methodsqsize(),task_done() andjoin())either need to be added, or the reason for their exclusion needsto be identified and documented clearly.

Closed Issues

ThePyGILState bug patch submitted in issue 1683 by roudkerkmust be applied for the package unit tests to work.
Existing documentation has to be moved to ReST formatting.
Reliance on ctypes: Thepyprocessing package’s reliance onctypes prevents the package from functioning on platforms wherectypes is not supported. This is not a restriction of thispackage, but rather of ctypes.
DONE: Rename top-level package from “pyprocessing” to“multiprocessing”.
DONE: Also note that the default behavior of process spawningdoes not make it compatible with use within IDLE as-is, thiswill be examined as a bug-fix or “setExecutable” enhancement.
DONE: Add in “multiprocessing.setExecutable()” method to override thedefault behavior of the package to spawn processes using thecurrent executable name rather than the Python interpreter. Notethat Mark Hammond has suggested a factory-style interface forthis[7].

References

[1]

The 2008 era PyProcessing project (the pyprocessing name was since repurposed)https://web.archive.org/web/20080914113946/https://pyprocessing.berlios.de/

[2]

See Adam Olsen’s “safe threading” projecthttps://code.google.com/archive/p/python-safethread/

[3]

See: Addition of “pyprocessing” module to standard lib.https://mail.python.org/pipermail/python-dev/2008-May/079417.html

[4]

https://mpi4py.readthedocs.io/

[5]

See “Cluster Computing”https://wiki.python.org/moin/ParallelProcessing#Cluster_Computing

[6]

The original run_benchmark.py code was published in PythonMagazine in December 2007: “Python Threads and the GlobalInterpreter Lock” by Jesse Noller. It has been modified forthis PEP.

[7]

http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0371.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換

PEP 371 – Addition of the multiprocessing package to the standard library