Movatterモバイル変換

Vectorize patch extraction in Axes3D.plot_surface#16675

lib/mpl_toolkits/mplot3d/proj3d.py OutdatedShow resolvedHide resolved

This was referencedMar 5, 2020

Merged

Add a fast path for NumPy arrays to Collection.set_verts#16689

Merged

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py OutdatedShow resolvedHide resolved

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/proj3d.py OutdatedShow resolvedHide resolved

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py OutdatedShow resolvedHide resolved

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.pyShow resolvedHide resolved

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/proj3d.py Outdated


		Parameters
		----------
		vecs : 3xN np.ndarray or np.ma.MaskedArray

Copy link

Contributor

eric-wieserMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It might make more sense for this to take and return(..., 3) array, which would mean your caller does not need to reshape, and would make the below...

(doing it this way is consistent with how gufuncs behave in numpy, in that they care only about the last axis by default)

Copy link

ContributorAuthor

apaszkeMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note that those reshapes should not trigger any reallocations, so I'm not sure if that would actually help.~~We could push the transposes into this function, but we cannot make them go away without changing the conventions aroundrenderer.M. Still, I don't think they cause any harm here.~~

Copy link

Contributor

eric-wieserMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This suggestion is more based off simplicity and convention than it is performance.

You're correct that reshapes typically are free, and.T is always free.

Copy link

ContributorAuthor

apaszkeMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've tried your suggestions and it seems like while the original version takes ~72ms on average, the proposed changes increase the runtime to ~127ms. This is likely due to strided memory patters that start pop up everywhere (padding the array, normalizing the product, reductions over z axis). I do agree that the code is nicer though. Please do let me know how to proceed.

Copy link

Contributor

eric-wieserMar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for profiling. Is that the runtime of the entire plot, or just this function?

Copy link

Contributor

eric-wieserMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seems that something screwy is going on withmatmul(A, B) vsmatmul(B.T, A.T).T for 2dA andB.matmul is supposed to know they are equivalent and pick the faster one, but it seems to be doing the same thing for both.

Edit:numpy/numpy#15742, it only works with an explicitout argument.

Copy link

ContributorAuthor

apaszkeMar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How large part of the slowdown is caused by matmul picking the wrong/having a slow kernel for this case then?

Copy link

Contributor

eric-wieserMar 11, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can get a speedup with this mess:

def_proj_transform_vectors_new(vecs,M):is_masked=isinstance(vecs,np.ma.MaskedArray)ifis_masked:mask=vecs.mask# allocate with a convenient memory ordervecs_pad=np.moveaxis(np.empty((vecs.shape[-1]+1,)+vecs.shape[:-1]),0,-1)vecs_pad[..., :-1]=vecsvecs_pad[...,-1]=1product=np.empty_like(vecs_pad)np.matmul(vecs_pad,M.T,out=product)result=product[..., :3]/product[...,3:]ifis_masked:returnnp.ma.array(result,mask=mask)else:returnresultdefnew():# reshape is for speed only, it works correctly withoutpsegments=_proj_transform_vectors_new(segments.reshape(-1,3),M).reshape(segments.shape)epilogue(psegments)returnpsegments

Copy link

ContributorAuthor

apaszkeMar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This indeed yields a speedup, but a tiny one on my laptop:

New: 42.17msBaseline: 45.15msDiff: 2.98ms

I'm ok with going either way, just let me know 🤷‍♂

Copy link

Contributor

eric-wieserMar 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think my request would perhaps be just to move the reshapes you have at the call site inside the function - that way you can keep the conventional(...,N) API, but the implementation can be the easy-to-read and efficient code you have now.

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/proj3d.py OutdatedShow resolvedHide resolved

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py OutdatedShow resolvedHide resolved

eric-wieser reviewed

Pad Axes3D.plot_surface inputs to allow for vectorized processing#16699

lib/mpl_toolkits/mplot3d/art3d.py OutdatedShow resolvedHide resolved

anntzer mentioned this pull request

Mar 13, 2020

Closed

6 tasks

apaszke mentioned this pull request

Mar 20, 2020

Speeding up Axes3D.plot_surface 4-8x#16659

Closed

Copy link

ContributorAuthor

apaszke commentedMar 21, 2020

It would be great if we could merge this, because it is necessary to actually benefit from the patches I'm preparing now.

timhoffm reviewed

Mar 21, 2020

lib/mpl_toolkits/mplot3d/art3d.py OutdatedShow resolvedHide resolved

apaszke force-pushed thevectorize_3d_projection branch from872818b tofbee126Compare

March 22, 2020 12:31

Copy link

ContributorAuthor

apaszke commentedMar 22, 2020

@eric-wieser @timhoffm I've addressed your requests, rebased on top of master and squashed the commits.

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py Outdated

Comment on lines 678 to 689

		face_z = self._zsortfunc(psegments[..., 2], axis=-1)
		if isinstance(face_z, np.ma.MaskedArray):
		# NOTE: Unpacking .data is safe here, because every face has to
		# contain a valid vertex.
		face_z = face_z.data
		face_order = np.argsort(face_z, axis=-1)[::-1]

Copy link

Contributor

eric-wieserMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the point here thatface_z has a single element per face, and is therefore safe?

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Exactly. My comment is based on the assumption that you don't have faces of length 0 (i.e. faces with all vertices masked). I think that's reasonable?

Copy link

Contributor

eric-wieserMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Alternatively, you could useface_z.filled(-np.inf), to be safe.

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Now that I think about it, I don't think that it matters? Faces with no valid vertices should get all vertices coordinates filled with NaN values before drawing making them invisible, so it doesn't matter where do they end up in the face order.

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py Outdated

Comment on lines 680 to 687

		# NOTE: Unpacking .data is safe here, because every face has to
		# contain a valid vertex.

Copy link

Contributor

eric-wieserMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you confident enough about this to add

assert not face_z.mask.any()

If not, then it would be better to handle it.

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See my comment above. I would be comfortable with adding this assert if we're ok with making an assumption that you can't have faces of length 0. This certainly doesn't happen forplot_surface and I don't see why would you do that in any other case too.

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py Outdated

		if self._codes3d is not None:
		codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
		segments_2d = [s.compressed().reshape(-1, 2)

Copy link

Contributor

eric-wieserMar 22, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

compressed is going to do something bad if somehow only one component was masked (eg, it wasnan at construction).

Safer would be:

Suggested change

	segments_2d= [s.compressed().reshape(-1,2)
	segments_2d= [s.data[~s.mask.any(axis=-1)]

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How would you like to interpret the data if one component was masked? That makes no sense because you get one point with only a single coordinate.

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh you mean that multiple similar errors like this could cancel out? That is true, but it's such a strong internal invariant violation that I'm not sure if what your solution does (i.e. just take masked data without asking too many questions) is better? Another alternative would be to dos.data[~s.mask[..., 0]], as the last dimension of mask is always supposed to be constant.

Copy link

Contributor

eric-wieserMar 22, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You're right it makes no sense, but the way you have it written the behavior would be to interpret[[x1, y1], [x2, --], [x3, y3], [x4, --]] as[[x1, y1], [x2, x3], [y3, x4]], which is going to be very confusing to the user as it combines adjacent points.

Usingany as I do means that if the point is fully masked, its skipped, otherwise it propagates to the plotting anyway. Usingall would also be fine, if you want partial masks to skip plotting too. What's not ok is corrupting adjacent points which aren't bad.

If you really wanted to play it safe you could assert thatnp.all(mask.any(axis=-1) == mask.all(axis-1), but that would cost you a bunch of time.

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, I guess usingmask.all might be more reasonable. Would that be ok with you?

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've looked into it, and I'm not sure if we should be doing anything like that. Note that right below this line there is another comprehension that builds up a list of codes, each with length that is supposed to be equal to the length of this segment. So we can't just drop vertices arbitrarily and the only way to make this sane is to assume that the invariant I've mentioned holds. What do you think?

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/proj3d.py OutdatedShow resolvedHide resolved

Copy link

Member

timhoffm commentedMar 22, 2020

Please fix the flake8 errors:
./lib/mpl_toolkits/mplot3d/art3d.py:618:80: E501 line too long (83 > 79 characters)
./lib/mpl_toolkits/mplot3d/art3d.py:623:55: E231 missing whitespace after ','

apaszke force-pushed thevectorize_3d_projection branch from41f36ac to1e08084Compare

March 22, 2020 17:24

eric-wieser reviewed

lib/mpl_toolkits/mplot3d/art3d.py Outdated

		if self._codes3d is not None:
		codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
		segments_2d = [s.data[~s.mask.all(axis=-1)]

Copy link

Contributor

eric-wieserMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

With my implementation in the constructor, you now have the invariant thatmask.all(axis==-1) == mask[:,0], which if you wanted to you could use for a very marginal speedup:

Suggested change

	segments_2d= [s.data[~s.mask.all(axis=-1)]
	# this invariant is ensured by the constructor
	asserts.mask.strides[1]==0
	# which means that s.mask[:,i] is all the same location in memory, so there's no need to call `all`.
	segments_2d= [s.data[~s.mask[:,0]]

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Or I could put in the assert and usecompressed(), which avoids the unnecessary mask manipulation!

Copy link

Contributor

eric-wieserMar 22, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

compressed is internally more complex that that operation anyway though. You're comparingdata[~mask[:,0]] withdata.ravel()[mask.ravel()].reshape((-1, 2))

Copy link

Contributor

eric-wieserMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In fact,mask.ravel() is now a lot more expensive thanmask[:,0], as it has to create a copy to handle the 0 stride.

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, here's what I came up with:

@@ -667,6 +667,7 @@ class Poly3DCollection(PolyCollection):             self._facecolors3d = self._facecolors            psegments = proj3d._proj_transform_vectors(self._segments, renderer.M)+        is_masked = isinstance(psegments, np.ma.MaskedArray)         num_faces = len(psegments)           # This extra fuss is to re-order face / edge colors@@ -681,19 +682,24 @@ class Poly3DCollection(PolyCollection):                 cedge = cedge.repeat(num_faces, axis=0)           face_z = self._zsortfunc(psegments[..., 2], axis=-1)-        if isinstance(face_z, np.ma.MaskedArray):+        if is_masked:             # NOTE: Unpacking .data is safe here, because every face has to             #       contain a valid vertex.             face_z = face_z.data         face_order = np.argsort(face_z, axis=-1)[::-1]+        segments_2d = psegments[face_order, :, :2]         if self._codes3d is not None:-            segments_2d = [s.data[~s.mask.all(axis=-1)]-                           for s in psegments[face_order, :, :2]]+            if is_masked:+                # NOTE: We cannot assert the same on segments_2d, as it is a+                #       result of advanced indexing, so its mask is a newly+                #       allocated tensor. However, both of those asserts are+                #       equivalent.+                assert psegments.mask.strides[-1] == 0+                segments_2d = [s.compressed().reshape(-1, 2) for s in segments_2d]             codes = [self._codes3d[idx] for idx in face_order]             PolyCollection.set_verts_and_codes(self, segments_2d, codes)         else:-            segments_2d = psegments[face_order, :, :2]             PolyCollection.set_verts(self, segments_2d, self._closed)           self._facecolors2d = cface[face_order]

Copy link

ContributorAuthor

apaszkeMar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Those.ravel()s are no-ops, becausesegments_2d comes from advanced indexing anyway, so I still went withcompressed().

Store vertices as (masked) arrays in Poly3DCollection

8633bbd

This removes a bunch of very expensive list comprehensions, droppingrendering times by half for some 3D surface plots.

apaszke force-pushed thevectorize_3d_projection branch from1e08084 to8633bbdCompare

March 22, 2020 18:35

Copy link

Contributor

eric-wieser commentedMar 22, 2020

Restarting an earlier conversation:

I'm not convinced masked arrays are the right approach here. Component-wise masking feels a little awkward. Would it be better to store a_segment_lengths array separately?

It would be a better model of the data, but that wouldn't allow doing vectorized Z-reductions, because there are no NumPy kernels for arrays with lengths.

Where exactly does this vectorized reduction occur? A option might be to store:

self._segment_3d_data= ...# (n_faces, max_verts, 3)self._vertex_mask= ...# (n_faces, max_verts, 1)

and reassemble them into a masked array only when needed. The assembly should be free, and viabroadcast_to you can avoid_vertex_mask ever tripling in size. It also means you can easily maintain your invariants.

Copy link

ContributorAuthor

apaszke commentedMar 22, 2020•
edited
Loading

This is the reduction. I think that keeping the mask along with the array really does make the code simpler, and I don't think it should add a lot of performance penalties (it should be waaaay better than without this patch). I do agree however that it may be simpler to keep the invariants if we separate those. I'm ok with doing that if you think that's the way to go. No strong preference from my side anymore.

Copy link

Contributor

eric-wieser commentedMar 22, 2020•
edited
Loading

If that's the only place, then you could consider

ifisinstance(segments3d,np.ma.MaskedArray):self._segment_3d=segments3d.data# (n_faces, max_verts, 3)self._vertex_valid=~segments3d.mask.any(axis-1)# (n_faces, max_verts)elifisinstance(segments3d,np.ndarray):self._segment_3d=segments3d# (n_faces, max_verts, 3)self._vertex_valid=np.True_else:# existing code without the broadcast_to, and an inverted condition

vertex_z=psegments[...,2]ifself._vertex_validisnotnp.True_:vertex_z=np.ma.array(psegments[...,2],mask=~self._vertex_valid)face_z=self._zsortfunc(vertex_z,axis=-1)

# the if is an optimization to avoid an unnecessary copy, it would work unconditionallyifself._vertex_validisnotnp.True_:segments_2d= [s[self.self._vertex_valid, :]forsinsegments_2d]

That second snippet becomes even simpler and 4x faster (on an array of 10000 elements) once the numpy version bumps:

vertex_z=psegments[...,2]face_z=self._zsortfunc(vertex_z,axis=-1,where=self._vertex_valid)

Copy link

ContributorAuthor

apaszke commentedMar 22, 2020

Yeah I think that's fair and makes things nicer in the end. I'll update the code tomorrow!

Copy link

ContributorAuthor

apaszke commentedMar 22, 2020

My only question is: why usenp.True_ instead ofTrue? Is it becauseTrue is not a validmask argument toma.MaskedArray?

Copy link

Contributor

eric-wieser commentedMar 23, 2020•
edited
Loading

I guess there's no reason to usenp.True_ any more. That's left from when I was thinking aboutnp.False_, which has special meaning for masked arrays asnp.ma.nomask. If anything, usingnp.True_ may not hit the fast-path ofwhere=, so I wouldn't recommend using it!

Don't store faces as masked arrays

06105a1

Copy link

ContributorAuthor

apaszke commentedMar 23, 2020

@eric-wieser Turns out it's not as simple as we thought — see the last commit. There's one extra reduction over the Z axis at the very end of the function, and we always have to take care of masking before we callsuper().set_verts. Still, this version is likely to be a little bit faster than the old one, so I'm leaving the choice up to you.

eric-wieser reviewed

Mar 23, 2020

lib/mpl_toolkits/mplot3d/art3d.py

Comment on lines +677 to +679

		pzs = pfaces[..., 2]
		if needs_masking:
		pzs = np.ma.MaskedArray(pzs, mask=self._invalid_vertices)

Copy link

Contributor

eric-wieserMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Tempting to move this down to line 692, where it's used for the first time.

Copy link

ContributorAuthor

apaszkeMar 23, 2020•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I felt like it's logically connected topfaces and it's used on lines 692, 725 and 729, so I didn't want to make the first use special in any way.

Copy link

Contributor

eric-wieserMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggested this because it might be worth adding an optimization today along the lines of:

if numpy_version >= LooseVersion(1.17.0):   # use the where argument, skip the masked array entirelyelse:   ...

Happy to have that declared as out of scope

Copy link

ContributorAuthor

apaszkeMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm happy to do that, but I'm still a bit afraid that it will make the code even less readable. I'd have to insert the sameif on lines 692 and 729. WDYT?

Copy link

Contributor

eric-wieserMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe try it and see if it actually helps with performance locally in any significant way? If not, then I think it's not worth it.

Copy link

ContributorAuthor

apaszkeMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've tried this, and thewhere= reductions are indeed faster!

In [1]:importnumpyasnpIn [2]:maskf=np.zeros((399*399,4),dtype=bool)In [3]:maskt=np.ones((399*399,4),dtype=bool)In [4]:x=np.ones((399*399,4),dtype=float)In [5]:mx=np.ma.MaskedArray(x,maskf)In [6]:%timeitnp.max(mx,axis=-1)10.3ms ±441µsperloop (mean ±std.dev.of7runs,100loopseach)In [7]:%timeitnp.max(x,axis=-1,where=maskt,initial=0)7.09ms ±139µsperloop (mean ±std.dev.of7runs,100loopseach)

On the other hand, it looks likenp.average doesn't support thewhere= argument, so I don't think we can do this...

Copy link

Contributor

eric-wieserMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh, that's a good catch. You canalmost get away with passing the mask into theweights argument of np.average, but that behaves poorly on NaNs.

Copy link

ContributorAuthor

apaszkeMar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hah, good point! And it looks like it still beats the masked array implementation, which is very surprising to me:

In [9]:%timeitnp.average(x,axis=-1,weights=maskt)8.69ms ±137µsperloop (mean ±std.dev.of7runs,100loopseach)In [10]:%timeitnp.average(mx,axis=-1)12.9ms ±370µsperloop (mean ±std.dev.of7runs,100loopseach)

But yes, there's an issue with invalid floating point values and there's no default argument we can pass in. I guess the latter problem could be fixed by using a wrapper likelambda *args, **kwargs: np.average(*args, weight=kwargs.pop('where', 1.), **kwargs) in the dict defining zsort functions.

So what's the final decision?

eric-wieser reviewed

Mar 23, 2020

lib/mpl_toolkits/mplot3d/proj3d.py OutdatedShow resolvedHide resolved

apaszkeand others added2 commits

March 23, 2020 14:05

Update lib/mpl_toolkits/mplot3d/proj3d.py

076bfcb

Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>

Ignore floating point errors

301b6fe

eric-wieser reviewed

Mar 23, 2020