Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork10.9k
Open
Labels
Description
The following is ~5x slower when A has negative strides:
importnumpyasnpA=np.zeros((512,512))x=np.zeros((512))%timeitnp.dot(A,x)# 49.2 μs ± 19.2 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)A_flipped=A[::-1]%timeitnp.dot(A_flipped,x)# 241 μs ± 37.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
I suspect it's due to a copy before calling a blas GEMV as it doesn't allow negative strides. However, if this is the case, it is possible to just tell blas to iterate over A with a positive LDA but negative inc_y, to obtain the same results without having to perform a copy.
importnumpyasnpfromscipy.linalg.blasimportdgemvA=np.arange(9,dtype="float64").reshape((3,3))x=np.ones((3,))y1=np.empty(A.shape[0])dgemv(1.0,A[::-1],x,0.0,y1,overwrite_y=True)y2=np.empty(A.shape[0])dgemv(1.0,A,x,0.0,y2,incy=-1,overwrite_y=True)np.testing.assert_allclose(y1,y2)
If the columns have negative strides, one needs to iterate in reverse over x as well
I don't know if this is easy/worth doing in numpy, just wanted to share.