9.3.Parallelism, resource management, and configuration#
9.3.1.Parallelism#
Some scikit-learn estimators and utilities parallelize costly operationsusing multiple CPU cores.
Depending on the type of estimator and sometimes the values of theconstructor parameters, this is either done:
with higher-level parallelism viajoblib.
with lower-level parallelism via OpenMP, used in C or Cython code.
with lower-level parallelism via BLAS, used by NumPy and SciPy for generic operationson arrays.
Then_jobs
parameters of estimators always controls the amount of parallelismmanaged by joblib (processes or threads depending on the joblib backend).The thread-level parallelism managed by OpenMP in scikit-learn’s own Cython codeor by BLAS & LAPACK libraries used by NumPy and SciPy operations used in scikit-learnis always controlled by environment variables orthreadpoolctl
as explained below.Note that some estimators can leverage all three kinds of parallelism at differentpoints of their training and prediction methods.
We describe these 3 types of parallelism in the following subsections in more details.
9.3.1.1.Higher-level parallelism with joblib#
When the underlying implementation uses joblib, the number of workers(threads or processes) that are spawned in parallel can be controlled via then_jobs
parameter.
Note
Where (and how) parallelization happens in the estimators using joblib byspecifyingn_jobs
is currently poorly documented.Please help us by improving our docs and tackleissue 14228!
Joblib is able to support both multi-processing and multi-threading. Whetherjoblib chooses to spawn a thread or a process depends on thebackendthat it’s using.
scikit-learn generally relies on theloky
backend, which is joblib’sdefault backend. Loky is a multi-processing backend. When doingmulti-processing, in order to avoid duplicating the memory in each process(which isn’t reasonable with big datasets), joblib will create amemmapthat all processes can share, when the data is bigger than 1MB.
In some specific cases (when the code that is run in parallel releases theGIL), scikit-learn will indicate tojoblib
that a multi-threadingbackend is preferable.
As a user, you may control the backend that joblib will use (regardless ofwhat scikit-learn recommends) by using a context manager:
fromjoblibimportparallel_backendwithparallel_backend('threading',n_jobs=2):# Your scikit-learn code here
Please refer to thejoblib’s docsfor more details.
In practice, whether parallelism is helpful at improving runtime depends onmany factors. It is usually a good idea to experiment rather than assumingthat increasing the number of workers is always a good thing. In some casesit can be highly detrimental to performance to run multiple copies of someestimators or functions in parallel (seeoversubscription below).
9.3.1.2.Lower-level parallelism with OpenMP#
OpenMP is used to parallelize code written in Cython or C, relying onmulti-threading exclusively. By default, the implementations using OpenMPwill use as many threads as possible, i.e. as many threads as logical cores.
You can control the exact number of threads that are used either:
via the
OMP_NUM_THREADS
environment variable, for instance when:running a python script:OMP_NUM_THREADS=4pythonmy_script.py
or via
threadpoolctl
as explained bythis piece of documentation.
9.3.1.3.Parallel NumPy and SciPy routines from numerical libraries#
scikit-learn relies heavily on NumPy and SciPy, which internally callmulti-threaded linear algebra routines (BLAS & LAPACK) implemented in librariessuch as MKL, OpenBLAS or BLIS.
You can control the exact number of threads used by BLAS for each libraryusing environment variables, namely:
MKL_NUM_THREADS
sets the number of threads MKL uses,OPENBLAS_NUM_THREADS
sets the number of threads OpenBLAS usesBLIS_NUM_THREADS
sets the number of threads BLIS uses
Note that BLAS & LAPACK implementations can also be impacted byOMP_NUM_THREADS
. To check whether this is the case in your environment,you can inspect how the number of threads effectively used by those librariesis affected when running the following command in a bash or zsh terminalfor different values ofOMP_NUM_THREADS
:
OMP_NUM_THREADS=2python-mthreadpoolctl-inumpyscipy
Note
At the time of writing (2022), NumPy and SciPy packages which aredistributed on pypi.org (i.e. the ones installed viapipinstall
)and on the conda-forge channel (i.e. the ones installed viacondainstall--channelconda-forge
) are linked with OpenBLAS, whileNumPy and SciPy packages shipped on thedefaults
condachannel from Anaconda.org (i.e. the ones installed viacondainstall
)are linked by default with MKL.
9.3.1.4.Oversubscription: spawning too many threads#
It is generally recommended to avoid using significantly more processes orthreads than the number of CPUs on a machine. Over-subscription happens whena program is running too many threads at the same time.
Suppose you have a machine with 8 CPUs. Consider a case where you’re runningaGridSearchCV
(parallelized with joblib)withn_jobs=8
over aHistGradientBoostingClassifier
(parallelized withOpenMP). Each instance ofHistGradientBoostingClassifier
will spawn 8 threads(since you have 8 CPUs). That’s a total of8*8=64
threads, whichleads to oversubscription of threads for physical CPU resources and thusto scheduling overhead.
Oversubscription can arise in the exact same fashion with parallelizedroutines from MKL, OpenBLAS or BLIS that are nested in joblib calls.
Starting fromjoblib>=0.14
, when theloky
backend is used (whichis the default), joblib will tell its childprocesses to limit thenumber of threads they can use, so as to avoid oversubscription. In practicethe heuristic that joblib uses is to tell the processes to usemax_threads=n_cpus//n_jobs
, via their corresponding environment variable. Back toour example from above, since the joblib backend ofGridSearchCV
isloky
, each process willonly be able to use 1 thread instead of 8, thus mitigating theoversubscription issue.
Note that:
Manually setting one of the environment variables (
OMP_NUM_THREADS
,MKL_NUM_THREADS
,OPENBLAS_NUM_THREADS
, orBLIS_NUM_THREADS
)will take precedence over what joblib tries to do. The total number ofthreads will ben_jobs*<LIB>_NUM_THREADS
. Note that setting thislimit will also impact your computations in the main process, which willonly use<LIB>_NUM_THREADS
. Joblib exposes a context manager forfiner control over the number of threads in its workers (see joblib docslinked below).When joblib is configured to use the
threading
backend, there is nomechanism to avoid oversubscriptions when calling into parallel nativelibraries in the joblib-managed threads.All scikit-learn estimators that explicitly rely on OpenMP in their Cython codealways use
threadpoolctl
internally to automatically adapt the numbers ofthreads used by OpenMP and potentially nested BLAS calls so as to avoidoversubscription.
You will find additional details about joblib mitigation of oversubscriptioninjoblib documentation.
You will find additional details about parallelism in numerical python librariesinthis document from Thomas J. Fan.
9.3.2.Configuration switches#
9.3.2.1.Python API#
sklearn.set_config
andsklearn.config_context
can be used to changeparameters of the configuration which control aspect of parallelism.
9.3.2.2.Environment variables#
These environment variables should be set before importing scikit-learn.
9.3.2.2.1.SKLEARN_ASSUME_FINITE
#
Sets the default value for theassume_finite
argument ofsklearn.set_config
.
9.3.2.2.2.SKLEARN_WORKING_MEMORY
#
Sets the default value for theworking_memory
argument ofsklearn.set_config
.
9.3.2.2.3.SKLEARN_SEED
#
Sets the seed of the global random generator when running the tests, forreproducibility.
Note that scikit-learn tests are expected to run deterministically withexplicit seeding of their own independent RNG instances instead of relying onthe numpy or Python standard library RNG singletons to make sure that testresults are independent of the test execution order. However some tests mightforget to use explicit seeding and this variable is a way to control the initialstate of the aforementioned singletons.
9.3.2.2.4.SKLEARN_TESTS_GLOBAL_RANDOM_SEED
#
Controls the seeding of the random number generator used in tests that rely ontheglobal_random_seed
fixture.
All tests that use this fixture accept the contract that they shoulddeterministically pass for any seed value from 0 to 99 included.
In nightly CI builds, theSKLEARN_TESTS_GLOBAL_RANDOM_SEED
environmentvariable is drawn randomly in the above range and all fixtured tests will runfor that specific seed. The goal is to ensure that, over time, our CI will runall tests with different seeds while keeping the test duration of a single runof the full test suite limited. This will check that the assertions of testswritten to use this fixture are not dependent on a specific seed value.
The range of admissible seed values is limited to [0, 99] because it is oftennot possible to write a test that can work for any possible seed and we want toavoid having tests that randomly fail on the CI.
Valid values forSKLEARN_TESTS_GLOBAL_RANDOM_SEED
:
SKLEARN_TESTS_GLOBAL_RANDOM_SEED="42"
: run tests with a fixed seed of 42SKLEARN_TESTS_GLOBAL_RANDOM_SEED="40-42"
: run the tests with all seedsbetween 40 and 42 includedSKLEARN_TESTS_GLOBAL_RANDOM_SEED="all"
: run the tests with all seedsbetween 0 and 99 included. This can take a long time: only use for individualtests, not the full test suite!
If the variable is not set, then 42 is used as the global seed in adeterministic manner. This ensures that, by default, the scikit-learn testsuite is as deterministic as possible to avoid disrupting our friendlythird-party package maintainers. Similarly, this variable should not be set inthe CI config of pull-requests to make sure that our friendly contributors arenot the first people to encounter a seed-sensitivity regression in a testunrelated to the changes of their own PR. Only the scikit-learn maintainers whowatch the results of the nightly builds are expected to be annoyed by this.
When writing a new test function that uses this fixture, please use thefollowing command to make sure that it passes deterministically for alladmissible seeds on your local machine:
SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all"pytest-v-ktest_your_test_name
9.3.2.2.5.SKLEARN_SKIP_NETWORK_TESTS
#
When this environment variable is set to a non zero value, the tests that neednetwork access are skipped. When this environment variable is not set thennetwork tests are skipped.
9.3.2.2.6.SKLEARN_RUN_FLOAT32_TESTS
#
When this environment variable is set to ‘1’, the tests using theglobal_dtype
fixture are also run on float32 data.When this environment variable is not set, the tests are only run onfloat64 data.
9.3.2.2.7.SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES
#
When this environment variable is set to a non zero value, theCython
derivative,boundscheck
is set toTrue
. This is useful for findingsegfaults.
9.3.2.2.8.SKLEARN_BUILD_ENABLE_DEBUG_SYMBOLS
#
When this environment variable is set to a non zero value, the debug symbolswill be included in the compiled C extensions. Only debug symbols for POSIXsystems are configured.
9.3.2.2.9.SKLEARN_PAIRWISE_DIST_CHUNK_SIZE
#
This sets the size of chunk to be used by the underlyingPairwiseDistancesReductions
implementations. The default value is256
which has been showed to be adequate onmost machines.
Users looking for the best performance might want to tune this variable usingpowers of 2 so as to get the best parallelism behavior for their hardware,especially with respect to their caches’ sizes.
9.3.2.2.10.SKLEARN_WARNINGS_AS_ERRORS
#
This environment variable is used to turn warnings into errors in tests anddocumentation build.
Some CI (Continuous Integration) builds setSKLEARN_WARNINGS_AS_ERRORS=1
, forexample to make sure that we catch deprecation warnings from our dependenciesand that we adapt our code.
To locally run with the same “warnings as errors” setting as in these CI buildsyou can setSKLEARN_WARNINGS_AS_ERRORS=1
.
By default, warnings are not turned into errors. This is the case ifSKLEARN_WARNINGS_AS_ERRORS
is unset, orSKLEARN_WARNINGS_AS_ERRORS=0
.
This environment variable uses specific warning filters to ignore some warnings,since sometimes warnings originate from third-party libraries and there is notmuch we can do about it. You can see the warning filters in the_get_warnings_filters_info_list
function insklearn/utils/_testing.py
.
Note that for documentation build,SKLEARN_WARNING_AS_ERRORS=1
is checkingthat the documentation build, in particular running examples, does not produceany warnings. This is different from the-W
sphinx-build
argument thatcatches syntax warnings in the rst files.