Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Description
Bug summary
I often work with plots have have a large number of lines, for example showing the trajectories of a large number of particles in a physics simulation. For publication it is good for the axis to be in vector format, but keeping all the tracks as vector can create large file sizes.
Rasterization of some elements in the plot can solve this. Matplotlib offersrasterized
kwarg toplot()
, orArtist.set_rasterized()
. But for this use case it results in a separate bitmap element for each line, making the file size even larger.
I'm not the first person to have issues related to this#13718
Code for reproduction
For example, this creates a 2.6MB file:
N = 20rast = Trues = np.arange(100)tracks = []for a in np.logspace(0.001, 0.5, N):for t in np.linspace(0.1, pi, N):t = a*np.sin(s*0.05 + t)tracks.append(t)for t in tracks:plt.plot(s, t, "-k", alpha=0.5, lw=0.5, rasterized=rast)plt.savefig("tracks1.svg")
grep "<image" tracks1.svg | wc -l
shows that there are 400 image elements.
Calling plot just once with all the tracks makes no difference.
tracks_na = np.array(tracks).Tplt.plot(s, tracks_na, "-k", alpha=0.5, lw=0.5, rasterized=rast)
RFC
This happens due to the waystart_rasterizing()
andstop_rasterizing()
get called the fromallow_rasterization()
wrapper. Ifrasterized
is set on the Line2D objects, then start and stop are called around each Line2D.draw().stop_rasterizing()
renders out to bitmap and calls_renderer.draw_image()
. The only way to stay in rasterizing mode isset_rasterized()
on the parent artist of the lines, which would be the axes, but that means that the axes labels and everything else get rasterized.
I have thought of a few solutions, but I would like some feedback before starting.
I think an ideal solution would be automatic for the user. matplotlib would just merge the bitmaps in an optimal way. This would need not effect draw order, so would need to be smart about zorder. It might avoid merging 2 non-overlapping bitmaps if that increases total area of bitmap.
A possible implementation would be changes to theallow_rasterization
wrapper so that the rasterization does not stop between consecutive rasterized artists.
A more manual approach is to create an object that can contain multiple artists, and therefor keep the render in rasterization mode for drawing those artists.Collections
don't fit this role as they have some limitations, e.g.LineCollection
does not have markers. I also looked atContainer
but that seems designed for specific use cases. So this could beArtistGroup
orArtistCollection
. It would derive from Artist, and itsdraw()
method would iterate through its children and draw them.
I have a minimal prototype, that lets me do:
ag = ArtistGroup()for t in tracks:ag.add_child(matplotlib.lines.Line2D(s, t, color="k", linestyle="-", alpha=0.5, lw=0.5))ag.set_rasterized(True)plt.gca().add_artist(ag)plt.savefig("tracks3.svg")
This results in a 304 kB file with a single<image>
element (compared with 2.6 MB with original code). IncreasingN
makes the file size difference even larger.
If this is a good approach it could maybe be used inside theplot()
and similar commands, so that lines drawn with a single plot would end up in a single rasterization (when requested) with no further user input.
Matplotlib version
- Operating system: Linux
- Matplotlib version: 3.0.3 and git master
- Matplotlib backend: svg
- Python version: 3.7.6
Note, I have been testing with the SVG backend, because its easier to see what is going on the output files. Judging from file size and loading times all this is true in the PDF backend too.