Python support for the Linuxperf profiler

author:

Pablo Galindo

The Linux perf profileris a very powerful tool that allows you to profile and obtaininformation about the performance of your application.perf also has a very vibrant ecosystem of toolsthat aid with the analysis of the data that it produces.

The main problem with using theperf profiler with Python applications is thatperf only gets information about native symbols, that is, the names offunctions and procedures written in C. This means that the names and file namesof Python functions in your code will not appear in the output ofperf.

Since Python 3.12, the interpreter can run in a special mode that allows Pythonfunctions to appear in the output of theperf profiler. When this mode isenabled, the interpreter will interpose a small piece of code compiled on thefly before the execution of every Python function and it will teachperf therelationship between this piece of code and the associated Python function usingperf map files.

Note

Support for theperf profiler is currently only available for Linux onselect architectures. Check the output of theconfigure build step orcheck the output ofpython-msysconfig|grepHAVE_PERF_TRAMPOLINEto see if your system is supported.

For example, consider the following script:

deffoo(n):result=0for_inrange(n):result+=1returnresultdefbar(n):foo(n)defbaz(n):bar(n)if__name__=="__main__":baz(1000000)

We can runperf to sample CPU stack traces at 9999 hertz:

$perfrecord-F9999-g-operf.datapythonmy_script.py

Then we can useperfreport to analyze the data:

$perfreport--stdio-n-g#ChildrenSelfSamplesCommandSharedObjectSymbol#..................................................................................................#    91.08%     0.00%             0  python.exe  python.exe          [.] _start            |            ---_start            |                --90.71%--__libc_start_main                        Py_BytesMain                        |                        |--56.88%--pymain_run_python.constprop.0                        |          |                        |          |--56.13%--_PyRun_AnyFileObject                        |          |          _PyRun_SimpleFileObject                        |          |          |                        |          |          |--55.02%--run_mod                        |          |          |          |                        |          |          |           --54.65%--PyEval_EvalCode                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     |                        |          |          |                     |--51.67%--_PyEval_EvalFrameDefault                        |          |          |                     |          |                        |          |          |                     |          |--11.52%--_PyLong_Add                        |          |          |                     |          |          |                        |          |          |                     |          |          |--2.97%--_PyObject_Malloc...

As you can see, the Python functions are not shown in the output, only_PyEval_EvalFrameDefault(the function that evaluates the Python bytecode) shows up. Unfortunately that’s not very useful because all Pythonfunctions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to whichbytecode-evaluating function.

Instead, if we run the same experiment withperf support enabled we get:

$perfreport--stdio-n-g#ChildrenSelfSamplesCommandSharedObjectSymbol#.............................................................................................................................#    90.58%     0.36%             1  python.exe  python.exe          [.] _start            |            ---_start            |                --89.86%--__libc_start_main                        Py_BytesMain                        |                        |--55.43%--pymain_run_python.constprop.0                        |          |                        |          |--54.71%--_PyRun_AnyFileObject                        |          |          _PyRun_SimpleFileObject                        |          |          |                        |          |          |--53.62%--run_mod                        |          |          |          |                        |          |          |           --53.26%--PyEval_EvalCode                        |          |          |                     py::<module>:/src/script.py                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     py::baz:/src/script.py                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     py::bar:/src/script.py                        |          |          |                     _PyEval_EvalFrameDefault                        |          |          |                     PyObject_Vectorcall                        |          |          |                     _PyEval_Vector                        |          |          |                     py::foo:/src/script.py                        |          |          |                     |                        |          |          |                     |--51.81%--_PyEval_EvalFrameDefault                        |          |          |                     |          |                        |          |          |                     |          |--13.77%--_PyLong_Add                        |          |          |                     |          |          |                        |          |          |                     |          |          |--3.26%--_PyObject_Malloc

How to enableperf profiling support

perf profiling support can be enabled either from the start usingthe environment variablePYTHONPERFSUPPORT or the-Xperf option,or dynamically usingsys.activate_stack_trampoline() andsys.deactivate_stack_trampoline().

Thesys functions take precedence over the-X option,the-X option takes precedence over the environment variable.

Example, using the environment variable:

$PYTHONPERFSUPPORT=1pythonscript.py$perfreport-g-iperf.data

Example, using the-X option:

$python-Xperfscript.py$perfreport-g-iperf.data

Example, using thesys APIs in fileexample.py:

importsyssys.activate_stack_trampoline("perf")do_profiled_stuff()sys.deactivate_stack_trampoline()non_profiled_stuff()

…then:

$python./example.py$perfreport-g-iperf.data

How to obtain the best results

For best results, Python should be compiled withCFLAGS="-fno-omit-frame-pointer-mno-omit-leaf-frame-pointer" as this allowsprofilers to unwind using only the frame pointer and not on DWARF debuginformation. This is because as the code that is interposed to allowperfsupport is dynamically generated it doesn’t have any DWARF debugging informationavailable.

You can check if your system has been compiled with this flag by running:

$python-msysconfig|grep'no-omit-frame-pointer'

If you don’t see any output it means that your interpreter has not been compiled withframe pointers and therefore it may not be able to show Python functions in the outputofperf.