Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Store vertices as (masked) arrays in Poly3DCollection#16688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

apaszke
Copy link
Contributor

PR Summary

This removes a bunch of very expensive list comprehensions, dropping
rendering times by half for some 3D surface plots.

Runtime of the following code drops from 16s to 10s after#16675, while stacking this PR on top of#16675 lowers the time further to around 5s. Another 3s are now spent inPolyCollection.set_verts which I'm hoping to improve later.

importnumpyasnpimportmatplotlib.pyplotaspltfig=plt.figure()ax=fig.gca(projection='3d')x=y=np.linspace(-1,1,400)x,y=np.meshgrid(x,y)z=x**2+y**2ax.plot_surface(x,y,z,rstride=1,cstride=1)fig.savefig('dump.png')plt.close()

PR Checklist

  • Has Pytest style unit tests
  • Code isFlake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way


Parameters
----------
vecs : 3xN np.ndarray or np.ma.MaskedArray
Copy link
Contributor

@eric-wiesereric-wieserMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It might make more sense for this to take and return(..., 3) array, which would mean your caller does not need to reshape, and would make the below...

(doing it this way is consistent with how gufuncs behave in numpy, in that they care only about the last axis by default)

Copy link
ContributorAuthor

@apaszkeapaszkeMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note that those reshapes should not trigger any reallocations, so I'm not sure if that would actually help.We could push the transposes into this function, but we cannot make them go away without changing the conventions aroundrenderer.M. Still, I don't think they cause any harm here.

Copy link
Contributor

@eric-wiesereric-wieserMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This suggestion is more based off simplicity and convention than it is performance.

You're correct that reshapes typically are free, and.T is always free.

Copy link
ContributorAuthor

@apaszkeapaszkeMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've tried your suggestions and it seems like while the original version takes ~72ms on average, the proposed changes increase the runtime to ~127ms. This is likely due to strided memory patters that start pop up everywhere (padding the array, normalizing the product, reductions over z axis). I do agree that the code is nicer though. Please do let me know how to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for profiling. Is that the runtime of the entire plot, or just this function?

Copy link
Contributor

@eric-wiesereric-wieserMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seems that something screwy is going on withmatmul(A, B) vsmatmul(B.T, A.T).T for 2dA andB.matmul is supposed to know they are equivalent and pick the faster one, but it seems to be doing the same thing for both.

Edit:numpy/numpy#15742, it only works with an explicitout argument.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How large part of the slowdown is caused by matmul picking the wrong/having a slow kernel for this case then?

Copy link
Contributor

@eric-wiesereric-wieserMar 11, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can get a speedup with this mess:

def_proj_transform_vectors_new(vecs,M):is_masked=isinstance(vecs,np.ma.MaskedArray)ifis_masked:mask=vecs.mask# allocate with a convenient memory ordervecs_pad=np.moveaxis(np.empty((vecs.shape[-1]+1,)+vecs.shape[:-1]),0,-1)vecs_pad[..., :-1]=vecsvecs_pad[...,-1]=1product=np.empty_like(vecs_pad)np.matmul(vecs_pad,M.T,out=product)result=product[..., :3]/product[...,3:]ifis_masked:returnnp.ma.array(result,mask=mask)else:returnresultdefnew():# reshape is for speed only, it works correctly withoutpsegments=_proj_transform_vectors_new(segments.reshape(-1,3),M).reshape(segments.shape)epilogue(psegments)returnpsegments

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This indeed yields a speedup, but a tiny one on my laptop:

New: 42.17msBaseline: 45.15msDiff: 2.98ms

I'm ok with going either way, just let me know 🤷‍♂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think my request would perhaps be just to move the reshapes you have at the call site inside the function - that way you can keep the conventional(...,N) API, but the implementation can be the easy-to-read and efficient code you have now.

@apaszke
Copy link
ContributorAuthor

It would be great if we could merge this, because it is necessary to actually benefit from the patches I'm preparing now.

@apaszkeapaszkeforce-pushed thevectorize_3d_projection branch from872818b tofbee126CompareMarch 22, 2020 12:31
@apaszke
Copy link
ContributorAuthor

@eric-wieser@timhoffm I've addressed your requests, rebased on top of master and squashed the commits.

Comment on lines 678 to 689
face_z = self._zsortfunc(psegments[..., 2], axis=-1)
if isinstance(face_z, np.ma.MaskedArray):
# NOTE: Unpacking .data is safe here, because every face has to
# contain a valid vertex.
face_z = face_z.data
face_order = np.argsort(face_z, axis=-1)[::-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the point here thatface_z has a single element per face, and is therefore safe?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Exactly. My comment is based on the assumption that you don't have faces of length 0 (i.e. faces with all vertices masked). I think that's reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Alternatively, you could useface_z.filled(-np.inf), to be safe.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Now that I think about it, I don't think that it matters? Faces with no valid vertices should get all vertices coordinates filled with NaN values before drawing making them invisible, so it doesn't matter where do they end up in the face order.

Comment on lines 680 to 687
# NOTE: Unpacking .data is safe here, because every face has to
# contain a valid vertex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you confident enough about this to add

assert not face_z.mask.any()

?

If not, then it would be better to handle it.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See my comment above. I would be comfortable with adding this assert if we're ok with making an assumption that you can't have faces of length 0. This certainly doesn't happen forplot_surface and I don't see why would you do that in any other case too.

if self._codes3d is not None:
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
segments_2d = [s.compressed().reshape(-1, 2)
Copy link
Contributor

@eric-wiesereric-wieserMar 22, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

compressed is going to do something bad if somehow only one component was masked (eg, it wasnan at construction).

Safer would be:

Suggested change
segments_2d= [s.compressed().reshape(-1,2)
segments_2d= [s.data[~s.mask.any(axis=-1)]

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How would you like to interpret the data if one component was masked? That makes no sense because you get one point with only a single coordinate.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh you mean that multiple similar errors like this could cancel out? That is true, but it's such a strong internal invariant violation that I'm not sure if what your solution does (i.e. just take masked data without asking too many questions) is better? Another alternative would be to dos.data[~s.mask[..., 0]], as the last dimension of mask is always supposed to be constant.

Copy link
Contributor

@eric-wiesereric-wieserMar 22, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You're right it makes no sense, but the way you have it written the behavior would be to interpret[[x1, y1], [x2, --], [x3, y3], [x4, --]] as[[x1, y1], [x2, x3], [y3, x4]], which is going to be very confusing to the user as it combines adjacent points.

Usingany as I do means that if the point is fully masked, its skipped, otherwise it propagates to the plotting anyway. Usingall would also be fine, if you want partial masks to skip plotting too. What's not ok is corrupting adjacent points which aren't bad.

If you really wanted to play it safe you could assert thatnp.all(mask.any(axis=-1) == mask.all(axis-1), but that would cost you a bunch of time.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, I guess usingmask.all might be more reasonable. Would that be ok with you?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've looked into it, and I'm not sure if we should be doing anything like that. Note that right below this line there is another comprehension that builds up a list of codes, each with length that is supposed to be equal to the length of this segment. So we can't just drop vertices arbitrarily and the only way to make this sane is to assume that the invariant I've mentioned holds. What do you think?

@timhoffm
Copy link
Member

Please fix the flake8 errors:
./lib/mpl_toolkits/mplot3d/art3d.py:618:80: E501 line too long (83 > 79 characters)
./lib/mpl_toolkits/mplot3d/art3d.py:623:55: E231 missing whitespace after ','

@apaszkeapaszkeforce-pushed thevectorize_3d_projection branch from41f36ac to1e08084CompareMarch 22, 2020 17:24
if self._codes3d is not None:
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
segments_2d = [s.data[~s.mask.all(axis=-1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

With my implementation in the constructor, you now have the invariant thatmask.all(axis==-1) == mask[:,0], which if you wanted to you could use for a very marginal speedup:

Suggested change
segments_2d= [s.data[~s.mask.all(axis=-1)]
# this invariant is ensured by the constructor
asserts.mask.strides[1]==0
# which means that s.mask[:,i] is all the same location in memory, so there's no need to call `all`.
segments_2d= [s.data[~s.mask[:,0]]

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Or I could put in the assert and usecompressed(), which avoids the unnecessary mask manipulation!

Copy link
Contributor

@eric-wiesereric-wieserMar 22, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

compressed is internally more complex that that operation anyway though. You're comparingdata[~mask[:,0]] withdata.ravel()[mask.ravel()].reshape((-1, 2))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In fact,mask.ravel() is now a lot more expensive thanmask[:,0], as it has to create a copy to handle the 0 stride.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, here's what I came up with:

@@ -667,6 +667,7 @@ class Poly3DCollection(PolyCollection):             self._facecolors3d = self._facecolors            psegments = proj3d._proj_transform_vectors(self._segments, renderer.M)+        is_masked = isinstance(psegments, np.ma.MaskedArray)         num_faces = len(psegments)           # This extra fuss is to re-order face / edge colors@@ -681,19 +682,24 @@ class Poly3DCollection(PolyCollection):                 cedge = cedge.repeat(num_faces, axis=0)           face_z = self._zsortfunc(psegments[..., 2], axis=-1)-        if isinstance(face_z, np.ma.MaskedArray):+        if is_masked:             # NOTE: Unpacking .data is safe here, because every face has to             #       contain a valid vertex.             face_z = face_z.data         face_order = np.argsort(face_z, axis=-1)[::-1]+        segments_2d = psegments[face_order, :, :2]         if self._codes3d is not None:-            segments_2d = [s.data[~s.mask.all(axis=-1)]-                           for s in psegments[face_order, :, :2]]+            if is_masked:+                # NOTE: We cannot assert the same on segments_2d, as it is a+                #       result of advanced indexing, so its mask is a newly+                #       allocated tensor. However, both of those asserts are+                #       equivalent.+                assert psegments.mask.strides[-1] == 0+                segments_2d = [s.compressed().reshape(-1, 2) for s in segments_2d]             codes = [self._codes3d[idx] for idx in face_order]             PolyCollection.set_verts_and_codes(self, segments_2d, codes)         else:-            segments_2d = psegments[face_order, :, :2]             PolyCollection.set_verts(self, segments_2d, self._closed)           self._facecolors2d = cface[face_order]

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Those.ravel()s are no-ops, becausesegments_2d comes from advanced indexing anyway, so I still went withcompressed().

This removes a bunch of very expensive list comprehensions, droppingrendering times by half for some 3D surface plots.
@apaszkeapaszkeforce-pushed thevectorize_3d_projection branch from1e08084 to8633bbdCompareMarch 22, 2020 18:35
@eric-wieser
Copy link
Contributor

Restarting an earlier conversation:

I'm not convinced masked arrays are the right approach here. Component-wise masking feels a little awkward. Would it be better to store a_segment_lengths array separately?

It would be a better model of the data, but that wouldn't allow doing vectorized Z-reductions, because there are no NumPy kernels for arrays with lengths.

Where exactly does this vectorized reduction occur? A option might be to store:

self._segment_3d_data= ...# (n_faces, max_verts, 3)self._vertex_mask= ...# (n_faces, max_verts, 1)

and reassemble them into a masked array only when needed. The assembly should be free, and viabroadcast_to you can avoid_vertex_mask ever tripling in size. It also means you can easily maintain your invariants.

@apaszke
Copy link
ContributorAuthor

apaszke commentedMar 22, 2020
edited
Loading

This is the reduction. I think that keeping the mask along with the array really does make the code simpler, and I don't think it should add a lot of performance penalties (it should be waaaay better than without this patch). I do agree however that it may be simpler to keep the invariants if we separate those. I'm ok with doing that if you think that's the way to go. No strong preference from my side anymore.

@eric-wieser
Copy link
Contributor

eric-wieser commentedMar 22, 2020
edited
Loading

If that's the only place, then you could consider

ifisinstance(segments3d,np.ma.MaskedArray):self._segment_3d=segments3d.data# (n_faces, max_verts, 3)self._vertex_valid=~segments3d.mask.any(axis-1)# (n_faces, max_verts)elifisinstance(segments3d,np.ndarray):self._segment_3d=segments3d# (n_faces, max_verts, 3)self._vertex_valid=np.True_else:# existing code without the broadcast_to, and an inverted condition
vertex_z=psegments[...,2]ifself._vertex_validisnotnp.True_:vertex_z=np.ma.array(psegments[...,2],mask=~self._vertex_valid)face_z=self._zsortfunc(vertex_z,axis=-1)
# the if is an optimization to avoid an unnecessary copy, it would work unconditionallyifself._vertex_validisnotnp.True_:segments_2d= [s[self.self._vertex_valid, :]forsinsegments_2d]

That second snippet becomes even simpler and 4x faster (on an array of 10000 elements) once the numpy version bumps:

vertex_z=psegments[...,2]face_z=self._zsortfunc(vertex_z,axis=-1,where=self._vertex_valid)

@apaszke
Copy link
ContributorAuthor

Yeah I think that's fair and makes things nicer in the end. I'll update the code tomorrow!

eric-wieser reacted with thumbs up emoji

@apaszke
Copy link
ContributorAuthor

My only question is: why usenp.True_ instead ofTrue? Is it becauseTrue is not a validmask argument toma.MaskedArray?

@eric-wieser
Copy link
Contributor

eric-wieser commentedMar 23, 2020
edited
Loading

I guess there's no reason to usenp.True_ any more. That's left from when I was thinking aboutnp.False_, which has special meaning for masked arrays asnp.ma.nomask. If anything, usingnp.True_ may not hit the fast-path ofwhere=, so I wouldn't recommend using it!

@apaszke
Copy link
ContributorAuthor

@eric-wieser Turns out it's not as simple as we thought — see the last commit. There's one extra reduction over the Z axis at the very end of the function, and we always have to take care of masking before we callsuper().set_verts. Still, this version is likely to be a little bit faster than the old one, so I'm leaving the choice up to you.

Comment on lines +677 to +679
pzs = pfaces[..., 2]
if needs_masking:
pzs = np.ma.MaskedArray(pzs, mask=self._invalid_vertices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Tempting to move this down to line 692, where it's used for the first time.

Copy link
ContributorAuthor

@apaszkeapaszkeMar 23, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I felt like it's logically connected topfaces and it's used on lines 692, 725 and 729, so I didn't want to make the first use special in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggested this because it might be worth adding an optimization today along the lines of:

if numpy_version >= LooseVersion(1.17.0):   # use the where argument, skip the masked array entirelyelse:   ...

Happy to have that declared as out of scope

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm happy to do that, but I'm still a bit afraid that it will make the code even less readable. I'd have to insert the sameif on lines 692 and 729. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe try it and see if it actually helps with performance locally in any significant way? If not, then I think it's not worth it.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've tried this, and thewhere= reductions are indeed faster!

In [1]:importnumpyasnpIn [2]:maskf=np.zeros((399*399,4),dtype=bool)In [3]:maskt=np.ones((399*399,4),dtype=bool)In [4]:x=np.ones((399*399,4),dtype=float)In [5]:mx=np.ma.MaskedArray(x,maskf)In [6]:%timeitnp.max(mx,axis=-1)10.3ms ±441µsperloop (mean ±std.dev.of7runs,100loopseach)In [7]:%timeitnp.max(x,axis=-1,where=maskt,initial=0)7.09ms ±139µsperloop (mean ±std.dev.of7runs,100loopseach)

On the other hand, it looks likenp.average doesn't support thewhere= argument, so I don't think we can do this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh, that's a good catch. You canalmost get away with passing the mask into theweights argument of np.average, but that behaves poorly on NaNs.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hah, good point! And it looks like it still beats the masked array implementation, which is very surprising to me:

In [9]:%timeitnp.average(x,axis=-1,weights=maskt)8.69ms ±137µsperloop (mean ±std.dev.of7runs,100loopseach)In [10]:%timeitnp.average(mx,axis=-1)12.9ms ±370µsperloop (mean ±std.dev.of7runs,100loopseach)

But yes, there's an issue with invalid floating point values and there's no default argument we can pass in. I guess the latter problem could be fixed by using a wrapper likelambda *args, **kwargs: np.average(*args, weight=kwargs.pop('where', 1.), **kwargs) in the dict defining zsort functions.

So what's the final decision?

apaszkeand others added2 commitsMarch 23, 2020 14:05
for idx, ((xs, ys, zs), fc, ec)
in enumerate(zip(xyzlist, cface, cedge))),
key=lambda x: x[0], reverse=True)
face_z = self._zsortfunc(pzs, axis=-1)
Copy link
Contributor

@eric-wiesereric-wieserMar 23, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Perhaps a marker for future cleanup would be handy here.

Suggested change
face_z=self._zsortfunc(pzs,axis=-1)
# TODO: When we drop numpy < 1.17, investigate using the `where` argument to reductions
face_z=self._zsortfunc(pzs,axis=-1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since matplotlib 3.5 we require numpy >= 1.17, so this can be updated immediately.

@QuLogic
Copy link
Member

I was thinking of rebasing#11577, but it looks like this PR chain might obsolete it. What is the status on this PR, besides needing a rebase?

@QuLogic
Copy link
Member

@apaszke Gentle ping?

@scottshambaugh
Copy link
Contributor

scottshambaugh commentedJan 7, 2025
edited
Loading

I've picked this up in#29397, building off this PR. Thank you@apaszke for the initial work here, it was super helpful in showing the way to some significant speedups!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@eric-wiesereric-wiesereric-wieser left review comments

@anntzeranntzeranntzer left review comments

@timhoffmtimhoffmtimhoffm left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@apaszke@timhoffm@eric-wieser@QuLogic@scottshambaugh@anntzer

[8]ページ先頭

©2009-2025 Movatter.jp