Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Sprint] scatter plots are (reportedly) too slow#2156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mdboom merged 2 commits intomatplotlib:masterfrommdboom:scatter-speed
Sep 30, 2013

Conversation

mdboom
Copy link
Member

@ChrisBeaumont reported in his great SciPy talk that scatter plots are too slow. We should get to the bottom of that 😉.

@ChrisBeaumont
Copy link
Contributor

This is essentially the issue discussedhere -- callingplt.plot is about 10x faster thanplt.scatter on my machine. I think this is becauseplot copies a stamp over and over in Agg, whereasscatter has to re-draw markers at every datapoint.

It would begreat if matplotlib had some way of mapping size and/or color to data, without sacrificing all of that speed! I have to think there is some middle ground between the speed ofplot and the flexibility ofscatter.

@WeatherGod
Copy link
Member

Last year, john Hunter explained that he did work to optimize line
collections, which assume the same marker throughout a line, while I think
scatter makes patch collections, which can't? I don't exactly recall it
all, but he did say that some of the optimizations could be brought over to
scatter.

Ben
On Jun 26, 2013 6:18 PM, "Chris Beaumont"notifications@github.com wrote:

This is essentially the issue discussed herehttps://github.com/astrofrog/mpl-scatter-density/issues/2#issuecomment-2508221-- calling
plt.plot is about 10x faster than plt.scatter on my machine. I think this
is because plot renders a stamp over and over, whereas scatter has to
re-draw markers at every datapoint.

It would begreat if matplotlib had some way of mapping size and/or
color to data, without sacrificing all of that speed! I have to think there
is some middle ground between the speed of plot and the flexibility of
scatter.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2156#issuecomment-20087105
.

@mdboom
Copy link
MemberAuthor

@ChrisBeaumont: I've attached some low-hanging optimizations to scatter.

I've benchmarked this using points of different sizes (5, 25, 125 pixel diameters) and different numbers of points (100, 1000, 10000, 100000 and 1000000). Here's the baseline timings:

baseline

Greater than linear growth -- probably mostly due to memory/cache pressures with the greater number of points. Here's the relative speedups with this patch.

combined

Three separate optimizations were found:

  1. zip was being used to combine the X and Y coordinates.np.dstack is obviously much better.

dstack

  1. When the points are all the same size, shape and color (etc...), it falls back todraw_markers, which draws the shape once as scanlines and then draws from those scanlines multiple times.

blit

  1. The transformation for each scatter point was being stored in a list of Transform objects. Storing the transformation matrices in a Nx3x3 array is much more efficient both to create and run through.

array_matrices

Beyond this, I haven't found much other low-hanging fruit that doesn't involve changing the output. I'll save that report for another issue.

@mdboom
Copy link
MemberAuthor

The benchmark, FYI.

import matplotlibmatplotlib.use("Agg")import matplotlib.pyplot as pltimport numpy as npfrom numpy import randomimport ioimport timeimport sysnpoints = int(sys.argv[1])size = int(sys.argv[2])np.random.seed(0)xy = np.random.rand(npoints, 2)if size > 0:    sizes = np.power(np.random.rand(npoints) * float(size), 2.0)else:    sizes = -sizes = time.time()x = plt.scatter(xy[:, 0], xy[:, 1], sizes)plt.savefig(io.BytesIO())print(time.time() - s)

@ChrisBeaumont: I'd appreciate some testing in the context of the kind of data you're seeing, just to see how it compares to this completely random and synthetic benchmark.

width + scanlines.max_x() + 1.0,
height+ scanlines.max_y() + 1.0);
height- scanlines.min_y() + 1.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just curious, what is the purpose of the clipping change in _backend_agg.cpp? Was that a bug?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yep. It was a bug where it would avoid drawing some legitimate things.

@ChrisBeaumont
Copy link
Contributor

Nice! I'll play around with this.

@ChrisBeaumont
Copy link
Contributor

My benchmarks are similar: before this PR:
benchmark_old

And after (note the different yaxis scaling):
benchmark

The comparison withplot is with `plot(x, y, 'o', ms=4), which produces output quite similar to default scatter.

Unfortunately, the biggest speedup in this PR (the blitting) essentially replicates whatplot can do already. The compelling functionality ofscatter is, IMO, the ability to map color and/or size onto data. I can envision two "medium"-hanging fruit optimizations, that might push this kind of functionality into the 10^5-6 points range:

draw_markers could accept an optional color array for each marker. I'm not positive about the pixel blending math, but I suspect it's possible to pre-render a stamp and then blit it with different tints each time? If so, this would speed up calls to scatter with constant sizes but variable colors.

Sometimes the number of combinations of colors/sizes/markers will be small, especially if they are discretized first. In these situations, scatter would benefit from caching the markers, and using the same blitting trick in draw_markers.

Do you think either is feasible? I could take a crack at them in a new PR, to earn my matplotlib/C++ merit badge

@mdboom
Copy link
MemberAuthor

I think handling multiple colors in draw_markers should be possible. It probably requires implementing a custom blender for agg, though. Sounds, um, magical.

As for caching markers based on multiple sizes, that should also be possible. I had experimented with generating one stamp and then resampling it to different sizes, but, even with low-quality nearest neighbor resampling, doesn't gain much over just rendering the vector object directly. But if you built a discrete number of sizes and used those, that should help, I suspect. We probably want some sort of option to scatter where one specifies a number of "bins" to sort the sizes into (or maybe have an "auto" option that would do something smarter). And maybe not do anything special when the number of points is small, since it will add some preprocessing overhead.

@dpsanders
Copy link

Looks really nice!
A non-matplotlib comment: If you want to claim something about the growth compared to linear, it would be useful to add a linear growth comparison to the graph.

@ChrisBeaumont
Copy link
Contributor

Anything with curvature in a log-log plot is definitely super-linear :)

@dpsanders
Copy link

True, but even with curvature it might have not have reached linear yet by
the end of the time window shown!

On Wed, Jul 24, 2013 at 8:28 AM, Chris Beaumontnotifications@github.comwrote:

Anything with curvature in a log-log plot is definitely super-linear :)


Reply to this email directly or view it on GitHubhttps://github.com//pull/2156#issuecomment-21484316
.

Dr. David P. Sanders

Profesor Titular "A" / Associate Professor
Departamento de Física, Facultad de Ciencias
Universidad Nacional Autónoma de México (UNAM)

dpsanders@gmail.com
http://sistemas.fciencias.unam.mx/~dsanders

Cubículo / office:#414, 4o. piso del Depto. de Física

Tel.: +52 55 5622 4965

… list of Transform objects.When the collection has the same styling and only varies by offsets, use draw_markers instead.
@mdboom
Copy link
MemberAuthor

I know@ChrisBeaumont wanted to take this even further, but I wonder if there's any objections to merging this as-is (since it appears to be an improvement on all fronts anyway, even if it doesn't go as far as it could).

@ChrisBeaumont
Copy link
Contributor

+1 on merging. I do still want to push on this more, but am mired in thesis work for the next month. In any event, that can happen in a new PR

mdboom added a commit that referenced this pull requestSep 30, 2013
[Sprint] scatter plots are (reportedly) too slow
@mdboommdboom merged commit379b14c intomatplotlib:masterSep 30, 2013
@mdboommdboom deleted the scatter-speed branchAugust 7, 2014 13:52
@tacaswelltacaswell mentioned this pull requestDec 4, 2014
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@mdboom@ChrisBeaumont@WeatherGod@dpsanders@tonysyu@rainwoodman@tacaswell

[8]ページ先頭

©2009-2025 Movatter.jp