- Notifications
You must be signed in to change notification settings - Fork1.3k
Description
TL;DR: benchmarks are poorly readable and could be greatly improved. This is key element in convincing people of the soundness of RustPython so it should probably not be neglected IMHO.
The violin plots availablehere are not easily readable and their Y-axes labels are hardly readable at all because they got left-cut at some point. This is especially troublesome for theMICROBENCHMARKS
section, for which it is impossible to tell RustPython from CPython.
This issue could be alleviated by doing the following:
- Use a specific color for CPython and another one for RustPython (andkeep this color pair consistent across all plots).
- Always have CPython data on top and RustPython data on bottom (this is not consistent: in the
EXECUTION
tab, CPython is on top and RustPython on bottom, while in tabPARSE_TO_AST
it is the other way around). - Only keep the name of the benchmark in the Y-axis labels, i.e. replace
execution/mandelbrot.py/cpython
by eitherMandelbrot
(and use a legend to indicate which color is which interpreter), or make a plot title sayingMandelbrot
and use the Y-axis labels to tell whether it is CPython or RustPython.
In addition to these visual issues, some other improvements could be implemented:
- Make the plots user-friendly using some interactive backend such as
plotly
. - Put hyperlinks to the benchmark script location / source-code, so that users can check what the benchmarks are actually doing.
- In the same line of thought, add a small descriptive text about what the benchmark does / why it is relevant (for instance "benchmark X is particularly I/O intensive" or whatnot).
- On top of the page, give the hash of the commit / version (possibly with release date to know at a glance if they're outdated or not) of both CPython and RustPython binaries that were used, whether they were recompiled with
-o3
locally, as well as the machine specs (this would allow for meaningful comparison and reproducibility).
I think that benchmarks one of the key element that might convince anyone to switch from one interpreter to another (apart from functionalities / low-level bindings). Hence they should not be neglected.
If someone could point me to where these plots are generated, I'd be happy to help typesetting them / add further info (although I might need some technical support about why benchmark X is especially relevant or not).