After assessing the correctness and timing of the execution time of the program, we are ready toidentify the parts of the code that need to betuned for performance. We typically aim to identify parts that are small compared to the size of the program.
Two profiling modules are available through the Python standard library, as outlined here:
- The profile module: This module is written in pure Python and adds significantoverhead to the program execution. Its presence in the standard library is due to its vast platform support and the ease with which it can be extended.
- The cProfile module: This is the main profiling module, with an interface equivalentto
profile
. It is written in C, has a small overhead, and is suitable as a general-purpose profiler.
ThecProfile
module can be used in three different ways, as follows:
- From the command line
- As a Python module
- With IPython
cProfile
does not require any change in the source code and can be executed directly on an existing Python script or function. You can usecProfile
from the command line in this way:
$ python -m cProfile simul.py
This will print a long output containing several profiling metrics of all of the functions called in the application. You can use the-s
option to sort the output by a specific metric. In the following snippet, the output is sorted by thetottime
metric, which will be described here:
$ python -m cProfile -s tottime simul.py
The dataproduced bycProfile
can be saved in an outputfile by passing the-o
option. The format thatcProfile
uses is readable by thestats
module and other tools. The usage of the-o
option is shown here:
$ python -m cProfile -o prof.out simul.py
The usage ofcProfile
as a Python module requires invoking thecProfile.run
function in the following way:
from simul import benchmark import cProfile cProfile.run("benchmark()")
You can also wrap a section of code between method calls of acProfile.Profile
object, as shown here:
from simul import benchmark import cProfile pr = cProfile.Profile() pr.enable() benchmark() pr.disable() pr.print_stats()
cProfile
can alsobe used interactively with IPython. The%prun
magiccommand lets you profile an individual function call, as illustrated in the following screenshot:
Figure 1.5 – Using cProfile within IPython
ThecProfile
output is divided into five columns, as follows:
ncalls
: The number of times the function was called.tottime
: The total time spent in the function without taking into account the calls to other functions.cumtime
: The time spent in the function including other function calls.percall
: The time spent for a single call of the function—this can be obtained by dividing the total or cumulative time by the number of calls.filename:lineno
: The filename and corresponding line numbers. This information is not available when calling C extension modules.
The most important metric istottime
, the actual time spent in the function body excluding subcalls, which tells us exactly where the bottleneck is.
Unsurprisingly, the largest portion of time is spent in theevolve
function. We can imagine thatthe loop is the section of the code thatneeds performance tuning.cProfile
only provides information at the function level and does not tell us which specific statements are responsible for the bottleneck. Fortunately, as we will see in the next section, theline_profiler
tool is capable of providing line-by-line information of the time spent in the function.
Analyzing thecProfile
text output can be daunting for big programs with a lot of calls and subcalls. Some visual tools aid the task by improving navigation with an interactive, graphical interface.
Graphically analyzing profiling results
KCachegrindis agraphical user interface (GUI) useful for analyzing the profiling output emitted bycProfile
.
KCachegrind is availablein the Ubuntu 16.04 official repositories. The Qt port, QCacheGrind, can bedownloaded for Windows fromhttp://sourceforge.net/projects/qcachegrindwin/. Mac users can compile QCacheGrindusing MacPorts (http://www.macports.org/) by following the instructions present in the blog post athttp://blogs.perl.org/users/rurban/2013/04/install-kachegrind-on-macosx-with-ports.html.
KCachegrindcan't directly read the output files produced bycProfile
. Luckily, thepyprof2calltree
third-party Python module is able to convert thecProfile
output file into a format readable by KCachegrind.
You can installpyprof2calltree
from thePython Package Index (PyPI) using thepip install pyprof2calltree
command.
To best show the KCachegrind features, we will use another example with a more diversified structure. We define a recursive function,factorial
, and two other functions that usefactorial
, namedtaylor_exp
andtaylor_sin
. They represent the polynomial coefficients of theTaylor approximations ofexp(x)
andsin(x)
and are illustrated in the following code snippet:
def factorial(n): if n == 0: return 1.0 else: return n * factorial(n-1) def taylor_exp(n): return [1.0/factorial(i) for i in range(n)] def taylor_sin(n): res = [] for i in range(n): if i % 2 == 1: res.append((-1)**((i-1)/2)/ float(factorial(i))) else: res.append(0.0) return res def benchmark(): taylor_exp(500) taylor_sin(500) if __name__ == '__main__': benchmark()
To access profileinformation, we first need to generate acProfile
output file, as follows:
$ python -m cProfile -o prof.out taylor.py
Then, we can convert the output file withpyprof2calltree
and launch KCachegrind by running the following code:
$ pyprof2calltree -i prof.out -o prof.calltree$ kcachegrind prof.calltree # or qcachegrind prof.calltree
The output is shown in the following screenshot:
Figure 1.6 – Profiling output generated by pyprof2calltree and displayed by KCachegrind
The screenshot showsthe KCachegrind UI. On the left, we have an output fairly similar tocProfile
. The actual column names are slightly different:Incl. translates to thecProfile
module'scumtime
value andSelf translates totottime
. The values are given in percentages by clicking on theRelative button on the menu bar. By clicking on the column headers, you can sort them by the corresponding property.
On the top right, a click on theCallee Map tab will display a diagram of the function costs. In the diagram shown inFigure 1.6, the time percentage spent by the function is proportional to the area of the rectangle. Rectangles can contain sub-rectangles that represent subcalls to other functions. In this case, we can easily see that there are two rectangles for thefactorial
function. The one on the left corresponds to the calls made bytaylor_exp
and the one on the right to the calls made bytaylor_sin
.
On the bottom right, you can display another diagram, acall graph, by clicking on theCall Graph tab. A call graph is a graphical representation of the calling relationship between the functions; each square represents a function and the arrows imply a calling relationship. For example,taylor_exp
callsfactorial
500 times, andtaylor_sin
callsfactorial
250 times. KCachegrind also detects recursive calls:factorial
calls itself187250 times.
You can navigate to theCall Graph or theCallee Map tab by double-clicking on the rectangles; the interface will update accordingly, showing that the timing properties are relative to theselected function. For example, double-clicking ontaylor_exp
will cause the graph to change, showing only the contribution oftaylor_exp
to the total cost.
Gprof2Dot (https://github.com/jrfonseca/gprof2dot) is another popular tool used to producecall graphs. Starting from output files produced by one of the supported profilers, it will generate a.dot
diagram representing a call graph.
Profiling line by line with line_profiler
Now that we know which function we have to optimize, we can use theline_profiler
module thatprovides information on how time is spent in a line-by-line fashion. This is very useful in situations where it's difficult to determine which statements are costly. Theline_profiler
module is a third-party module that is available on PyPI and can be installed by following the instructions athttps://github.com/rkern/line_profiler.
In order to useline_profiler
, we need to apply a@profile
decorator to the functions we intend to monitor. Note that you don't have to import theprofile
function from another module as it gets injected into the global namespace when running thekernprof.py
profiling script. To produce profiling output for our program, we need to add the@profile
decorator to theevolve
function, as follows:
@profile def evolve(self, dt): # code
Thekernprof.py
script will produce an output file and print the result of the profiling on the standard output. We should run the script with two options, as follows:
-l
to use theline_profiler
function-v
to immediately print the results on screen
The usage ofkernprof.py
is illustrated in the following line of code:
$ kernprof.py -l -v simul.py
It is alsopossible to run the profiler in an IPython shell for interactive editing. You should first load theline_profiler
extension that will provide thelprun
magic command. Using that command, you can avoid adding the@profile
decorator, as illustrated in the following screenshot:
Figure 1.7 – Using line_profiler within IPython
The output is quite intuitive and is divided into six columns, as follows:
Line #
: The number of the line that was runHits
: The number of times that line was runTime
: The execution time of the line in microseconds (Time
)Per Hit
: Time/hits% Time
: Fraction of the total time spent executing that lineLine Contents
: The content of the line
By looking at the% Time column, we can get a pretty good idea of where the time is spent. In thiscase, there are a few statements in thefor
loop body with a cost of around 10-20 percent each.