- Notifications
You must be signed in to change notification settings - Fork12
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
License
eXascaleInfolab/PyExPool
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Lightweight Multi-Process Execution Pool with load balancing and customizable resource consumption constraints.
\author: (c) Artem Lutovartem@exascale.info
\license:Apache License, Version 2.0
\organizations:eXascale Infolab,Lumais,ScienceWise
\date: 2015-07 v1, 2017-06 v2, 2018-05 v3
\grants: Swiss National Science Foundation grant numberCRSII2_147609
, European Commission grantGraphint 683253
BibTeX:
@misc{pyexpool,author ={Artem Lutov and Philippe Cudré-Mauroux},url ={https://github.com/eXascaleInfolab/PyExPool},title ={PyExPool-v.3: A Lightweight Execution Pool with Constraint-aware Load-Balancer.},year ={2018}}
A Lightweight Multi-Process Execution Pool with load balancing to schedule Jobs execution withper-job timeout, optionally grouping them into Tasks and specifying optional execution parameters considering NUMA architecture peculiarities:
- automatic rescheduling andload balancing (reduction) of the worker processes and on low memory condition for thein-RAM computations (requirespsutil, can be disabled)
- chained termination of the related worker processes (started jobs) and non-started jobs rescheduling to satisfytimeout andmemory limit constraints
- automatic CPU affinity management and maximization of the dedicated CPU cache vs parallelization for a worker process
- timeout per each Job (it was the main initial motivation to implement this module, because this feature is not provided by any Python implementation out of the box)
- onstart/ondonecallbacks, ondone is called only on successful completion (not termination) for both Jobs and Tasks (group of jobs)
- stdout/err output, which can be redirected to any custom file or PIPE
- custom parameters for each Job and respective owner Task besides the name/id
Automatic rescheduling of the workers on low memory condition for the in-RAM computations is an optional and the only feature that requires an external package,psutil.
All scheduling jobs share the same CPU affinity policy, which is convenient for the benchmarking, but not so suitable for scheduling both single and multi-threaded apps with distinct demands for the CPU cache.
All main functionality is implemented as asingle-file module to beeasily included into your project and customized as a part of your distribution (like inPyCaBeM to execute muliple apps in parralel on the dedicated CPU cores and avoiding their swapping from the main memory), also it can be installed as a library. An optional minimalistic Web interface is provided in the separate file to inspect and profile the load balancer and execution pool.
The main purpose of the main single-file module is theconcurrent execution of modules and external executables with custom resource consumption constraints, cache / parallelization tuning and automatic balancing of the worker processes for the in memory computations on the single server. PyExPool is typically used as an application framework for benchmarking or heavy-loaded multi-process execution activities on constrained computational resources.
If the concurrent execution ofPython functions is required, usage of external modules is not a problem and the automatic jobs scheduling for the in-RAM computations is not necessary, then a more handy and straightforward approach is to usePebble library. A pretty convenient transparent parallel computations are provided by theJoblib. If a distributed task queue is required with advanced monitoring and reporting facilities thenCelery might be a good choice. For the comprehensive parallel computingDask is a good choice. For the parallel execution of only the shell scripts theGNU parallel might be a good option.
The only another existing open-source load balancer I'm aware about, which has wider functionality than PyExPool (but can not be integrated into your Python scripts so seamlessly) isSlurm Workload Manager.
Theload balancing is enabled when the global variables_LIMIT_WORKERS_RAM
and_CHAINED_CONSTRAINTS
are set, jobs.category
and relative.size
(if known) specified. The balancing is performed to use as much RAM and CPU resources as possible performing in-RAM computations and meeting the specified timeout and memory constraints for each job and for the whole pool.
Large executing jobs can be postponed for the later execution with less number of worker processes after completion of the smaller jobs. The number of workers is reduced automatically (balanced) on the jobs queue processing to meet memory constraints. It is recommended to add jobs in the order of the increasing memory/time complexity if possible to reduce the number of worker processes terminations on jobs postponing (rescheduling).
Demo of thescheduling with memory constraints for the worker processes:
Demo of thescheduling with cache L1 maximization for single-threaded processes on the server with cross-node CPUs enumeration. Whole physical CPU core consisting of two hardware threads assigned to each worker process, so the L1 cache is dedicated (not shared), but the maximal loading over all CPUs is 50%:
Demo of the WebUI for the Jobs and Tasks tracing and profiling:Exactly the same fully functional interface is accessible from the console usingw3m or other terminal browsers:
To explore the WebUI demo execute the following testcase
$ MANUAL=1 python -m unittest mpetests.TestWebUI.test_failures
and openhttp://localhost:8081 (or :8080) in the browser.
Include the following modules:
- mpepool - execution pool with load balancer, the only mandatory module,
- mpewui - optional WebUI for the interactive profiling of the scheduled Jobs and Tasks.
These modules can be install either manually fromGitHub or from thepypi repository:
$ pip install pyexpool
WebUI(
mpewui
module) renders interface from the bottle html templates located in the.
,./views/
or any other folder from thebottle.TEMPLATE_PATH
list, where custom views can be placed to overwrite the default pages.
Additionally,hwloc / lstopo should be installed if customized CPU affinity masking and cache control are required, seeRequirements section.
Multi-Process Execution Poolcan be run without any external modules with automatically disabled load balancing.
The external modules / apps are required only for the extended functionality:
- Platform-specific requirements:
- hwloc (includes
lstopo
) is required to identify enumeration type of logical CPUs to perform correct CPU affinity masking. Required only for the automatic affinity masking with cache usage optimization and only if the CPU enumeration type is not specified manually.$ sudo apt-get install -y hwloc
- hwloc (includes
- Cross-platform Python requirements:
psutil is required for the dynamic jobs balancing to perform the in-RAM computations (
_LIMIT_WORKERS_RAM = True
) and limit memory consumption of the workers.$ sudo pip install psutil
To perform in-memory computations dedicating almost all available RAM (specifyingmemlimit ~= physical memory), it is recommended to set swappiness to 1 .. 10:
$ sudo sysctl -w vm.swappiness=5
or set it permanently in/etc/sysctl.conf
:vm.swappiness = 5
.bottle is required for the minimalistic optional WebUI to monitor executing jobs.
$ sudo pip install bottle
WebUI(
mpewui
module) renders interface from the bottle html templates located in the.
,./views/
or any other folder from thebottle.TEMPLATE_PATH
list, where custom views can be placed to overwrite the default pages.mock is required exclusively for the unit testing under Python2,
mock
is included in the standard lib of Python3.$ sudo pip install mock
All Python requirements are optional and installed automatically from thepip
distribution ($ pip install pyexpool
) or can be installed manually from thepyreqsopt.txt
file:
$ sudo pip install -r pyreqsopt.txt
lstopo
app ofhwloc
package is a system requirement and should be installed manually from the system-specific package repository or built from thesources.
Flexible API providesautomatic CPU affinity management, maximization of the dedicated CPU cache, limitation of the minimal dedicated RAM per worker process, balancing of the worker processes and rescheduling of chains of the related jobs on low memory condition for the in-RAM computations, optional automatic restart of jobs on timeout, access to job's process, parent task, start and stop execution time and more...ExecPool
represents a pool of worker processes to executeJob
s that can be grouped into the hierarchy ofTasks
s for more flexible management.
# Global Parameters# Limit the amount of memory (<= RAM) used by worker processes# NOTE: requires import of psutils_LIMIT_WORKERS_RAM=True# Use chained constraints (timeout and memory limitation) in jobs to terminate# also related worker processes and/or reschedule jobs, which have the same# category and heavier than the origin violating the constraintsCHAINED_CONSTRAINTS=TrueJob(name,workdir=None,args=(),timeout=0,rsrtonto=False,task=None#,*,startdelay=0.,onstart=None,ondone=None,onfinish=None,params=None,category=None,size=0,slowdown=1.,omitafn=False,memkind=1,memlim=0.,stdout=sys.stdout,stderr=sys.stderr,poutlog=None,perrlog=None):"""Initialize job to be executedJob is executed in a separate process via Popen or Process object and ismanaged by the Process Pool ExecutorMain parameters:name: str - job nameworkdir - working directory for the corresponding process, None means the dir of the benchmarkingargs - execution arguments including the executable itself for the processNOTE: can be None to make make a stub process and execute the callbackstimeout - execution timeout in seconds. Default: 0, means infinityrsrtonto - restart the job on timeout, Default: False. Can be used fornon-deterministic Jobs like generation of the synthetic networks to regeneratethe network on border cases overcoming getting stuck on specific values of the rand variables.task: Task - origin task if this job is a part of the taskstartdelay - delay after the job process starting to execute it for some time,executed in the CONTEXT OF THE CALLER (main process).ATTENTION: should be small (0.1 .. 1 sec)onstart - a callback, which is executed on the job starting (before the executionstarted) in the CONTEXT OF THE CALLER (main process) with the single argument,the job. Default: None.If onstart() raises an exception then the job is completed before been started (.proc = None)returning the error code (can be 0) and tracing the cause to the stderr.ATTENTION: must be lightweightNOTE:- It can be executed several times if the job is restarted on timeout- Most of the runtime job attributes are not defined yetondone - a callback, which is executed on successful completion of the job in theCONTEXT OF THE CALLER (main process) with the single argument, the job. Default: NoneATTENTION: must be lightweightonfinish - a callback, which is executed on either completion or termination of the job in theCONTEXT OF THE CALLER (main process) with the single argument, the job. Default: NoneATTENTION: must be lightweightparams - additional parameters to be used in callbacksstdout - None or file name or PIPE for the buffered output to be APPENDED.The path is interpreted in the CONTEXT of the CALLERstderr - None or file name or PIPE or STDOUT for the unbuffered error output to be APPENDEDATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimited.The path is interpreted in the CONTEXT of the CALLERpoutlog: str - file name to log non-empty piped stdout pre-pended with the timestamp. Actual only if stdout is PIPE.perrlog: str - file name to log non-empty piped stderr pre-pended with the timestamp. Actual only if stderr is PIPE.Scheduling parameters:omitafn - omit affinity policy of the scheduler, which is actual when the affinity is enabledand the process has multiple treadscategory - classification category, typically semantic context or part of the name,used to identify related jobs;requires _CHAINED_CONSTRAINTSsize - expected relative memory complexity of the jobs of the same category,typically it is size of the processing data, >= 0, 0 means undefined sizeand prevents jobs chaining on constraints violation;used on _LIMIT_WORKERS_RAM or _CHAINED_CONSTRAINTSslowdown - execution slowdown ratio, >= 0, where (0, 1) - speedup, > 1 - slowdown; 1 by default;used for the accurate timeout estimation of the jobs having the same .category and .size.requires _CHAINED_CONSTRAINTSmemkind - kind of memory to be evaluated (average of virtual and resident memoryto not overestimate the instant potential consumption of RAM):0 - mem for the process itself omitting the spawned sub-processes (if any)1 - mem for the heaviest process of the process tree spawned by the original process(including the origin itself)2 - mem for the whole spawned process tree including the origin processmemlim: float - max amount of memory in GB allowed for the job execution, 0 - unlimitedExecution parameters, initialized automatically on execution:tstart - start time, filled automatically on the execution start (before onstart). Default: Nonetstop - termination / completion time after ondoneNOTE: onstart() and ondone() callbacks execution is included in the job execution timeproc - process of the job, can be used in the ondone() to read its PIPEpipedout - contains output from the PIPE supplied to stdout if any, None otherwiseNOTE: pipedout is used to avoid a deadlock waiting on the process completion having a piped stdouthttps://docs.python.org/3/library/subprocess.html#subprocess.Popen.waitpipederr - contains output from the PIPE supplied to stderr if any, None otherwiseNOTE: pipederr is used to avoid a deadlock waiting on the process completion having a piped stderrhttps://docs.python.org/3/library/subprocess.html#subprocess.Popen.waitmem - consuming memory (smooth max of average of VMS and RSS, not just the current value)or the least expected value inherited from the jobs of the same category having non-smaller size;requires _LIMIT_WORKERS_RAMterminates - accumulated number of the received termination requests caused by the constraints violationNOTE: > 0 (1 .. ExecPool._KILLDELAY) for the apps terminated by the execution pool(resource constrains violation or ExecPool exception),== 0 for the crashed appswkslim - worker processes limit (max number) on the job postponing if any,the job is postponed until at most this number of worker processes operate;requires _LIMIT_WORKERS_RAMchtermtime - chained termination: None - disabled, False - by memory, True - by time;requires _CHAINED_CONSTRAINTS"""
Task(name,timeout=0,onstart=None,ondone=None,onfinish=None,params=None,task=None,latency=1.5,stdout=sys.stdout,stderr=sys.stderr):"""Initialize task, which is a group of subtasks including jobs to be executedTask is a managing container for subtasks and Jobs.Note: the task is considered to be failed if at least one subtask / job is failed(terminated or completed with non-zero return code).name: str - task nametimeout - execution timeout in seconds. Default: 0, means infinity. ATTENTION: not implementedonstart - a callback, which is executed on the task start (before the subtasks/jobs executionstarted) in the CONTEXT OF THE CALLER (main process) with the single argument,the task. Default: NoneATTENTION: must be lightweightondone - a callback, which is executed on the SUCCESSFUL completion of the task in theCONTEXT OF THE CALLER (main process) with the single argument, the task. Default: NoneATTENTION: must be lightweightonfinish - a callback, which is executed on either completion or termination of the task in theCONTEXT OF THE CALLER (main process) with the single argument, the task. Default: NoneATTENTION: must be lightweightparams - additional parameters to be used in callbackstask: Task - optional owner super-tasklatency: float - lock timeout in seconds: None means infinite,<= 0 means non-bocking, > 0 is the actual timeoutstdout - None or file name or PIPE for the buffered output to be APPENDEDstderr - None or file name or PIPE or STDOUT for the unbuffered error output to be APPENDEDATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimitedAutomatically initialized and updated properties:tstart - start time is filled automatically on the execution start (before onstart). Default: Nonetstop - termination / completion time after ondone.numadded: uint - the number of direct added subtasksnumdone: uint - the number of completed DIRECT subtasks(each subtask may contain multiple jobs or sub-sub-tasks)numterm: uint - the number of terminated direct subtasks (including jobs) that are not restartingnumdone + numterm <= numadded"""
AffinityMask(afnstep,first=True,sequential=cpusequential())"""Affinity maskAffinity table is a reduced CPU table by the non-primary HW treads in each core.Typically, CPUs are enumerated across the nodes:NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31In case the number of HW threads per core is 2 then the physical CPU cores are 1 .. 15:NUMA node0 CPU(s): 0,2,4,6,8,10,12,14(16,18,20,22,24,26,28,30 - 2nd HW treads)NUMA node1 CPU(s): 1,3,5,7,9,11,13,15(17,19,21,23,25,27,29,31 - 2nd HW treads)But the enumeration can be also sequential:NUMA node0 CPU(s): 0,(1),2,(3),......Hardware threads share all levels of the CPU cache, physical CPU cores share only thelast level of the CPU cache (L2/3).The number of worker processes in the pool should be equal to the:- physical CPU cores for the cache L1/2 maximization- NUMA nodes for the cache L2/3 maximizationNOTE: `hwloc` utility can be used to detect the type of logical CPUs enumeration:`$ sudo apt-get install hwloc`See details: http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Serviceafnstep: int - affinity step, integer if applied, allowed values:1, CORE_THREADS * n, n E {1, 2, ... CPUS / (NODES * CORE_THREADS)}Used to bind worker processes to the logical CPUs to have warm cache and,optionally, maximize cache size per a worker process.Groups of logical CPUs are selected in a way to maximize the cache locality:the single physical CPU is used taking all its hardware threads in each corebefore allocating another core.Typical Values:1 - maximize parallelization for the single-threaded apps(the number of worker processes = logical CPUs)CORE_THREADS - maximize the dedicated CPU cache L1/2(the number of worker processes = physical CPU cores)CPUS / NODES - maximize the dedicated CPU cache L3(the number of worker processes = physical CPUs)first - mask the first logical unit or all units in the selected group.One unit per the group maximizes the dedicated CPU cache for thesingle-threaded worker, all units should be used for the multi-threadedapps.sequential - sequential or cross nodes enumeration of the CPUs in the NUMA nodes:None - undefined, interpreted as cross-nodes (the most widely used on servers)False - cross-nodesTrue - sequentialFor two hardware threads per a physical CPU core, where secondary HW threadsare taken in brackets:Crossnodes enumeration, often used for the server CPUsNUMA node0 CPU(s): 0,2(,4,6)NUMA node1 CPU(s): 1,3(,5,7)Sequential enumeration, often used for the laptop CPUsNUMA node0 CPU(s): 0(,1),2(,3)NUMA node1 CPU(s): 4(,5),6(,7)"""
ExecPool(wksnum=max(cpu_count()-1,1),afnmask=None,memlimit=0.,latency=0.,name=None,webuiapp=None)"""Multi-process execution pool of jobsA worker in the pool executes only a single job, a new worker is created foreach subsequent job.wksnum: int - number of resident worker processes, >=1. The reasonablevalue <= logical CPUs (returned by cpu_count()) = NUMA nodes * node CPUs,where node CPUs = CPU cores * HW treads per core.The recommended value is max(cpu_count() - 1, 1) to leave one logicalCPU for the benchmarking framework and OS applications.To guarantee minimal average RAM per a process, for example 2.5 GBwithout _LIMIT_WORKERS_RAM flag (not using psutil for the dynamiccontrol of memory consumption):wksnum = min(cpu_count(), max(ramfracs(2.5), 1))afnmask - affinity mask for the worker processes, AffinityMaskNone if not appliedmemlimit - limit total amount of Memory (automatically reduced tothe amount of physical RAM if the larger value is specified) in gigabytesthat can be used by worker processes to provide in-RAM computations, >= 0.Dynamically reduces the number of workers to consume not more memorythan specified. The workers are rescheduled starting from themost memory-heavy processes.NOTE:- applicable only if _LIMIT_WORKERS_RAM- 0 means unlimited (some jobs might be [partially] swapped)- value > 0 is automatically limited with total physical RAM to processjobs in RAM almost without the swappinglatency - approximate minimal latency of the workers monitoring in sec, float >= 0;0 means automatically defined value (recommended, typically 2-3 sec)name - name of the execution pool to distinguish traces from subsequentlycreated execution pools (only on creation or termination)webuiapp: WebUiApp - WebUI app to inspect load balancer remotelyInternal attributes:alive - whether the execution pool is alive or terminating, bool.Should be reseted to True on reuse after the termination.NOTE: should be reseted to True if the execution pool is reusedafter the joining or termination.failures: [JobInfo] - failed (terminated or crashed) jobs with timestamps.NOTE: failures contain both terminated, crashed jobs that jobs completed with non-zero return codeexcluding the jobs terminated by timeout that have set .rsrtonto (will be restarted)jobsdone: uint - the number of successfully completed (non-terminated) jobs with zero codetasks: set(Task) - tasks associated with the scheduled jobs"""execute(job,concur=True):"""Schedule the job for the executionjob: Job - the job to be executed, instance of Jobconcur: bool - concurrent execution or wait until execution completed NOTE: concurrent tasks are started at oncereturn int - 0 on successful execution, process return code otherwise"""join(timeout=0):"""Execution cycletimeout: int - execution timeout in seconds before the workers termination, >= 0.0 means unlimited time. The time is measured SINCE the first jobwas scheduled UNTIL the completion of all scheduled jobs.return bool - True on graceful completion, False on termination by the specifiedconstraints (timeout, memory limit, etc.)"""clear():"""Clear execution pool to reuse itRaises:ValueError: attempt to clear a terminating execution pool"""__del__():"""Force termination of the pool"""__finalize__():"""Force termination of the pool"""
A simple Web UI is designed to profile Jobs and Tasks, interactively trace their failures and resource consumption. It is implemented in the optional modulempewui and can be spawned by instantiating theWebUiApp
class. A dedicatedWebUiApp
instance can be created per eachExecPool
, serving the interfaces on the dedicated addresses (host:port). However, typically, asingle global instance ofWebUiApp
is created and supplied to all employedExecPool
instances.
Web UI module requires HTML templates installed by default from thepip
distribution, which can be overwritten with the custom pages located in theviews directory.
SeeWebUI queries manual for API details. An example of the WebUI usage is shown in thempetests.TestWebUI.test_failures
of thempetests.
WebUiApp
instance works in thededicated thread of the load balancer application and designed for the internal profiling with relatively small number of queries but not as a public web interface for the huge number of clients.
WARNING: high loading of the WebUI may increase latency of the load balancer.
WebUiApp(host='localhost',port=8080,name=None,daemon=None,group=None,args=(),kwargs={})"""WebUI App starting in the dedicated thread and providing remote interface to inspect ExecPoolATTENTION: Once constructed, the WebUI App lives in the dedicated thread until the main program exit.Args:uihost: str - Web UI hostuiport: uint16 - Web UI portname: str - The thread name. By default, a unique nameis constructed of the form “Thread-N” where N is a small decimal number.daemon: bool - Start the thread in the daemon mode tobe automatically terminated on the main app exit.group - Reserved for future extensionwhen a ThreadGroup class is implemented.args: tuple - The argument tuple for the target invocation.kwargs: dict - A dictionary of keyword arguments for the target invocation.Internal attributes:cmd: UiCmd - UI command to be executed, which includes (reserved) attribute(s) for the invocation result."""
UiCmdId=IntEnum('UiCmdId','FAILURES LIST_JOBS LIST_TASKS API_MANUAL')"""UI Command Identifier associated with the REST URL"""
deframfracs(fracsize):"""Evaluate the minimal number of RAM fractions of the specified size in GBUsed to estimate the reasonable number of processes with the specified minimaldedicated RAM.fracsize - minimal size of each fraction in GB, can be a fractional numberreturn the minimal number of RAM fractions having the specified size in GB"""defcpucorethreads():"""The number of hardware treads per a CPU coreUsed to specify CPU affinity dedicating the maximal amount of CPU cache L1/2."""defcpunodes():"""The number of NUMA nodes, where CPUs are locatedUsed to evaluate CPU index from the affinity table index considering the NUMA architecture."""defcpusequential():"""Enumeration type of the logical CPUs: cross-nodes or sequentialThe enumeration can be cross-nodes starting with one hardware thread per eachNUMA node, or sequential by enumerating all cores and hardware threads in eachNUMA node first.For two hardware threads per a physical CPU core, where secondary hw threadsare taken in brackets:Crossnodes enumeration, often used for the server CPUsNUMA node0 CPU(s): 0,2(,4,6)=> PU L#1 (P#4)NUMA node1 CPU(s): 1,3(,5,7)Sequential enumeration, often used for the laptop CPUsNUMA node0 CPU(s): 0(,1),2(,3)=> PU L#1 (P#1) - indicates sequentialNUMA node1 CPU(s): 4(,5),6(,7)ATTENTION: `hwloc` utility is required to detect the type of logical CPUsenumeration: `$ sudo apt-get install hwloc`See details: http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Servicereturn - enumeration type of the logical CPUs, bool or None:None - was not defined, most likely cross-nodesFalse - cross-nodesTrue - sequential"""
Target version of the Python is 2.7+ including 3.x, also works fine on PyPy.
The workflow consists of the following steps:
- Create Execution Pool.
- Create and schedule Jobs with required parameters, callbacks and optionally packing them into Tasks.
- Wait on Execution pool until all the jobs are completed or terminated, or until the global timeout is elapsed.
Seeunit tests (TestExecPool
,TestProcMemTree
,TestTasks
classes) for the advanced examples.
frommultiprocessingimportcpu_countfromsysimportexecutableasPYEXEC# Full path to the current Python interpreterfrommpepoolimportAffinityMask,ExecPool,Job,Task# Import all required classes# 1. Create Multi-process execution pool with the optimal affinity step to maximize the dedicated CPU cache sizeexecpool=ExecPool(max(cpu_count()-1,1),cpucorethreads())global_timeout=30*60# 30 min, timeout to execute all scheduled jobs or terminate them# 2. Schedule jobs execution in the pool# 2.a Job scheduling using external executable: "ls -la"execpool.execute(Job(name='list_dir',args=('ls','-la')))# 2.b Job scheduling using python function / code fragment,# which is not a goal of the design, but is possible.# 2.b.1 Create the job with specified parametersjobname='NetShuffling'jobtimeout=3*60# 3 min# The network shuffling routine to be scheduled as a job,# which can also be a call of any external executable (see 2.alt below)args= (PYEXEC,'-c',"""import osimport subprocessbasenet = '{jobname}' + '{_EXTNETFILE}'#print('basenet:', basenet, file=sys.stderr)for i in range(1, {shufnum} + 1):netfile = ''.join(('{jobname}', '.', str(i), '{_EXTNETFILE}'))if {overwrite} or not os.path.exists(netfile):# sort -R pgp_udir.net -o pgp_udir_rand3.netsubprocess.call(('sort', '-R', basenet, '-o', netfile))""".format(jobname=jobname,_EXTNETFILE='.net',shufnum=5,overwrite=False))# 2.b.2 Schedule the job execution, which might be postponed# if there are no any free executor processes availableexecpool.execute(Job(name=jobname,workdir='this_sub_dir',args=args,timeout=jobtimeout# Note: onstart/ondone callbacks, custom parameters and others can be also specified here!))# Add another jobs# ...# 3. Wait for the jobs execution for the specified timeout at mostexecpool.join(global_timeout)# 30 min
In case the execution pool is required locally then it can be used in the following way:
...# Limit of the memory consumption for the all worker processes with max(32 GB, RAM)# and provide latency of 1.5 sec for the jobs reschedulingwithExecPool(max(cpu_count()-1,1),vmlimit=32,latency=1.5)asxpool:job=Job('jmem_proc',args=(PYEXEC,'-c',TestProcMemTree.allocAndSpawnProg(allocDelayProg(inBytes(amem),duration),allocDelayProg(inBytes(camem),duration))),timeout=timeout,memkind=0,ondone=mock.MagicMock())jobx=Job('jmem_max-subproc',args=(PYEXEC,'-c',TestProcMemTree.allocAndSpawnProg(allocDelayProg(inBytes(amem),duration),allocDelayProg(inBytes(camem),duration))),timeout=timeout,memkind=1,ondone=mock.MagicMock())...xpool.execute(job)xpool.execute(jobx)...xpool.join(10)# Timeout for the execution of all jobs is 10 sec [+latency]
The code shown above is fetched from theTestProcMemTree
unit test.
To performgraceful termination of the Jobs in case of external termination of your program, signal handlers can be set:
importsignal# Intercept kill signals# Use execpool as a global variable, which is set to None when all jobs are done,# and recreated on jobs schedulingexecpool=NonedefterminationHandler(signal=None,frame=None,terminate=True):"""Signal termination handlersignal - raised signalframe - origin stack frameterminate - whether to terminate the application"""globalexecpoolifexecpool:delexecpool# Destructors are called later# Define _execpool to avoid unnecessary trash in the error log, which might# be caused by the attempt of subsequent deletion on destructionexecpool=None# Note: otherwise _execpool becomes undefinedifterminate:sys.exit()# exit(0), 0 is the default exit code.# Set handlers of external signals, which can be the first lines inside# if __name__ == '__main__':signal.signal(signal.SIGTERM,terminationHandler)signal.signal(signal.SIGHUP,terminationHandler)signal.signal(signal.SIGINT,terminationHandler)signal.signal(signal.SIGQUIT,terminationHandler)signal.signal(signal.SIGABRT,terminationHandler)# Ignore terminated children procs to avoid zombies# ATTENTION: signal.SIG_IGN affects the return code of the former zombie resetting it to 0,# where signal.SIG_DFL works fine and without any the side effects.signal.signal(signal.SIGCHLD,signal.SIG_DFL)# Define execpool to schedule some jobsexecpool=ExecPool(max(cpu_count()-1,1))# Failsafe usage of execpool ...
Also it is recommended to register the termination handler for the normal interpreter termination usingatexit:
importatexit...# Set termination handler for the internal terminationatexit.register(terminationHandler,terminate=False)
Note: Please,star this project if you use it.
- ExecTime -failover lightweight resource consumption profiler (timings and memory), applicable to multiple processes with optionalper-process results labeling and synchronizedoutput to the specified file or
stderr
- PyCABeM - Python Benchmarking Framework for the Clustering Algorithms Evaluation. Uses extrinsic (NMIs) and intrinsic (Q) measures for the clusters quality evaluation considering overlaps (nodes membership by multiple clusters).
About
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture