Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Description
This program plots 3 million dots with random colors:
import matplotlib.pyplot as pltimport numpy as npN = 3000000x = np.random.random(N)y = np.random.random(N)c = np.random.random((N, 3)) # RGBplt.scatter(x, y, 1, c, '.')plt.show()
The initial display is very slow. Even more problematic: zooming is very slow. If you setc
toNone
it will use a single color for all points and it will be fast, with zooming taking about 1 second, vs about 20 seconds with multiple colors.
If you zoom until only a few points are visible, the single-color plot will respond instantly, but the multi-color one will still take 20 seconds. It's as if all 3 million colors are being slowly remapped every time--even for points which can't be seen.
I would expect multi-color scatter plots to be only marginally slower than single-color ones. A 10x slowdown or worse makes me want to disable colors, but then I can't visualize my data properly.
#2156 (four years ago) was aimed at scatter plot performance but seems to have neglected the multi-color case, which as@ChrisBeaumont pointed out is a main use case forscatter()
:#2156 (comment)
Unfortunately, the biggest speedup in this PR (the blitting) essentially replicates what plot can do already. The compelling functionality of scatter is, IMO, the ability to map color and/or size onto data. I can envision two "medium"-hanging fruit optimizations, that might push this kind of functionality into the 10^5-6 points range [...]
My real data has more points but only 12 distinct colors, so I'd be happy with a speedup even if it only applies when there are, say, up to 50 distinct colors. I also useColorMap
in my real application, but again I only take a few distinct choices from the map (whereas in the example above, every point has a unique color).
I'm using Matplotlib 2.0.2, NumPy 1.12.1, and Python 3.5.3 on 64-bit Linux with 128 GB of RAM.