Optimization improvements
Version1.8.12.99 ofbacktrader includes an improvement in howdata feeds andresults are managed during multiprocessing.
Note
The behavior for both has been made
The behavior of these options can be controlled through two newCerebroparameters:
optdatas (default:True)
IfTrue and optimizing (and the system canpreload and userunonce, data preloading will be done only once in the main processto save time and resources.
optreturn (default:True)
IfTrue the optimization results will not be fullStrategyobjects (and alldatas,indicators,observers …) but and objectwith the following attributes (same as inStrategy):
In most occassions, only theanalyzers and with whichparams arethe things needed to evaluate a the performance of a strategy. Ifdetailed analysis of the generated values for (for example)indicators is needed, turn this off
Data Feed Management
In aOptimization scenario this is a likely combination ofCerebroparameters:
preload=True (default)
Data Feeeds will be preloaded before running any backtesting code
runonce=True (default)
Indicators will be calculated in batch mode a tightfor loop, insteadof step by step.
If both conditions areTrue andoptdatas=True, then:
- TheData Feeds will be preloaded in the main process before spawning new subprocesses (the ones in charge of executing thebacktesting)
Results management
In aOptimization scenario two things should play the most important rolewhen evaluating the different parameters with which eachStrategy was run:
strategy.params (orstrategy.p)
The actual set of values used for the backtesting
strategy.analyzers
The objects in charge of providing the evaluation of how theStrategyhas actually performed. Example:
SharpeRatio_A (the annualizedSharpeRatio)
Whenoptreturn=True, instead of returning fullstrategy instances,placeholder objects will be created which carry the two attributesaforementioned to let the evaluation take place.
This avoids passing back lots of generated data like for example the valuesgenerated by indicators during thebacktesting
Should thefull strategy objects be wished, simply setoptreturn=Falseduring cerebroinstantiation or when doingcerebro.run.
Some test runs
Theoptimization sample in thebacktrader sources has been extended to addcontrol foroptdatas andoptreturn (actually to disable them)
Single Core Run
As a reference what happens when the amount of CPUs is limited to1 and themultiprocessing module is not used:
$ ./optimization.py --maxcpus 1==================================================**************************************************--------------------------------------------------OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 12), (u'macdperiod2', 26), (u'macdperiod3', 9)])**************************************************--------------------------------------------------OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 13), (u'macdperiod2', 26), (u'macdperiod3', 9)])......OrderedDict([(u'smaperiod', 29), (u'macdperiod1', 19), (u'macdperiod2', 29), (u'macdperiod3', 14)])==================================================Time used: 184.922727833
Multiple Core Runs
Without limiting the number of CPUs, the Pythonmultiprocessing module willtry to use all of them.optdatas andoptreturn will be disabled
Bothoptdata andoptreturn active
The default behavior:
$ ./optimization.py.........==================================================Time used: 56.5889185394
The total improvement by having multicore and thedata feed andresultsimprovements means going down from184.92 to56.58 seconds.
Take into account that the sample is using252 bars and the indicatorsgenerate only values with a length of252 points. This is just an example.
The real question is how much of this is attributable to the new behavior.
optreturn deactivated
Let’s pass fullstrategy objects back to the caller:
$ ./optimization.py --no-optreturn.........==================================================Time used: 67.056914007
The execution time has increased18.50% (or a speed-up of15.62%) is inplace.
optdatas deactivated
Each subproccess is forced to load its own set of values for thedata feeds:
$ ./optimization.py --no-optdatas.........==================================================Time used: 72.7238112637
The execution time has increased28.52% (or a speed-up of22.19%) is inplace.
Both deactivated
Still using multicore but with the old non-improved behavior:
$ ./optimization.py --no-optdatas --no-optreturn.........==================================================Time used: 83.6246643786
The execution time has increased47.79% (or a speed-up of32.34%) is inplace.
This shows that the used of multiple cores is the major contributor to the timeimprovement.
Note
The executions have been done in a Laptop with ai7-4710HQ (4-core / 8logical) with 16 GBytes of RAM under Windows 10 64bit. The mileage may varyunder other conditions
Concluding
The greatest factor in time reduction during optimization is the use of the multiple cores
The sample runs withoptdatas andoptreturn show speed-ups of around22.19% and15.62% each (32.34% both together in the test)
Sample Usage
$ ./optimization.py --helpusage: optimization.py [-h] [--data DATA] [--fromdate FROMDATE] [--todate TODATE] [--maxcpus MAXCPUS] [--no-runonce] [--exactbars EXACTBARS] [--no-optdatas] [--no-optreturn] [--ma_low MA_LOW] [--ma_high MA_HIGH] [--m1_low M1_LOW] [--m1_high M1_HIGH] [--m2_low M2_LOW] [--m2_high M2_HIGH] [--m3_low M3_LOW] [--m3_high M3_HIGH]Optimizationoptional arguments: -h, --help show this help message and exit --data DATA, -d DATA data to add to the system --fromdate FROMDATE, -f FROMDATE Starting date in YYYY-MM-DD format --todate TODATE, -t TODATE Starting date in YYYY-MM-DD format --maxcpus MAXCPUS, -m MAXCPUS Number of CPUs to use in the optimization - 0 (default): use all available CPUs - 1 -> n: use as many as specified --no-runonce Run in next mode --exactbars EXACTBARS Use the specified exactbars still compatible with preload 0 No memory savings -1 Moderate memory savings -2 Less moderate memory savings --no-optdatas Do not optimize data preloading in optimization --no-optreturn Do not optimize the returned values to save time --ma_low MA_LOW SMA range low to optimize --ma_high MA_HIGH SMA range high to optimize --m1_low M1_LOW MACD Fast MA range low to optimize --m1_high M1_HIGH MACD Fast MA range high to optimize --m2_low M2_LOW MACD Slow MA range low to optimize --m2_high M2_HIGH MACD Slow MA range high to optimize --m3_low M3_LOW MACD Signal range low to optimize --m3_high M3_HIGH MACD Signal range high to optimize