- Notifications
You must be signed in to change notification settings - Fork54
Need for dealing with strides when writing performant code against multiple array libraries#782
-
Early on when putting the array API standard together, we made a decision that strides must not be exposed in the Python API, because multiple array libraries don't support that and it's memory layout which is more implementation detail than an API that should be exposed to Python users. We made that decision so early that there isn't even a dedicated issue for it I believe; the closest one isgh-571 about C vs. Fortran memory order. It's also related togh-24 (strides are mentioned explicitly there). The strides topic came up a couple of times recently though, and we should think about this more:
Previous commentsI'll copy some of the key comments below to have them in a single place: From@jpivarski:The thing I was most surprised about in the read-through was that "strides" are not considered part of the array object. I recognize that some libraries (e.g. JAX) are able to do great things by not allowing strides to be free. But then, you show in the SciPy benchmark (Fig. 2c, 2d) that not having this handle implies a severe performance penalty, which is likely to be a major take-away from this paper by most readers. In the discussion, you say that it illustrates that users will need to apply library-specific optimizations after all. (Implementing a backend or runtime switching system is non-goal number 2.) It seems to me like strides-awareness would be a good candidate for one of the optional extensions, if extensions are not strictly sets of functions. Perhaps that's a question or suggestion I should take up with the Consortium's normal channels, not a review of the paper, but it's a question the paper leads me to have. From@leofang:Regarding strides: I have conflicting views and will continue to self-debate. On the one hand, when over half of the array libraries (Jax, TF, Dask, cuNumeric) do not offer strides for various reasons (ex: it makes no sense for distributed arrays), it's my opinion that even making it optional is not the right way to go, just like many other design considerations that we've made. Also, for those supporting the stride semantics, their strides are accessible via DLPack, it's just that currently it's hard to access them from within pure Python. On the other hand, as a low-level C++ library developer (in short, we write backends for the array libraries), I do see the pain point that it's hard for us to even imagine how our library would be adopted and consumed by the array libraries that do not have strides. It could be possible that our library simply cannot serve as the battery behind them due to different design philosophies, idk. Though, to be fair, the standard aims at the array libraries, not their backends, so if there's a standardization effort targeting array backends, it might be a better place for standardizing strides. From@tylerjereddy: From@rgommers:Let's have a look at this case - this is the code in question: # Created strided array of data segmentsifnperseg==1andnoverlap==0:result=x[...,np.newaxis]else:# https://stackoverflow.com/a/5568169step=nperseg-noverlapshape=x.shape[:-1]+((x.shape[-1]-noverlap)//step,nperseg)strides=x.strides[:-1]+(step*x.strides[-1],x.strides[-1])result=np.lib.stride_tricks.as_strided(x,shape=shape,strides=strides) The To discussIt's clear that strides cannot be universally supported, and equally clear that users who write algorithmic code may need to access strides in some cases to avoid significant performance loss. I think essentially, dealing with strides by hand for libraries that allow that is doing the work that in a more full-featured array library will/should be done by a compiler (e.g. the JIT/AOT compilers in JAX or PyTorch). We need to at least suggest and document some strategy for cases like the |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 8 comments
-
Having a look at the usage frequency of Scikit-learn: $ rg strides --type pysklearn/datasets/tests/test_samples_generator.py135: signs = signs.view(dtype="|S{0}".format(signs.strides[0]))sklearn/feature_extraction/image.py284:"""Extracts patches of any n-dimensional array in place using strides.326: patch_strides = arr.strides329: indexing_strides = arr[slices].strides336: strides = tuple(list(indexing_strides) + list(patch_strides))338: patches = as_strided(arr, shape=shape, strides=strides)$ rg as_strided --type pysklearn/feature_extraction/image.py16:from numpy.lib.stride_tricks import as_strided338: patches = as_strided(arr, shape=shape, strides=strides) SciPy: $ rg as_strided --type pysignal/_spectral_py.py1958: result = np.lib.stride_tricks.as_strided(x, shape=shape,linalg/_special_matrices.py5:from numpy.lib.stride_tricks import as_strided218:return as_strided(vals[len(c)-1:], shape=out_shp, strides=(-n,n)).copy()259:return as_strided(c_ext[L-1:], shape=(L, L), strides=(-n,n)).copy()316:return as_strided(vals, shape=out_shp, strides=(n,n)).copy()signal/_signaltools.py2124: zi = np.lib.stride_tricks.as_strided(zi, expected_shape,signal/tests/test_signaltools.py445: a = np.lib.stride_tricks.as_strided(a, shape=(n, 1000), strides=(8008, 8))linalg/special_matrices.py12:'fiedler','fiedler_companion','convolution_matrix','as_strided'interpolate/tests/test_fitpack.py347: from numpy.lib.stride_tricks import as_strided354: x =as_strided(np.zeros(()), shape=(size,)) There's some more usages of |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Interestingly enough the second of the It may be insightful to take the |
BetaWas this translation helpful?Give feedback.
All reactions
👀 1
-
Strides also come up a lot in broadcast implementations: >>>a=np.arange(2*3*5).reshape(2,3,5)>>>b=np.array(100)>>>aprime,bprime=np.broadcast_arrays(a,b)>>>bprime.shape(2,3,5)>>>bprime.strides(0,0,0) I'm not sure if this is relevant (because it's an implementation), but in case it is, this is another big way that non-trivial strides enter into array libraries without JIT-compilation/operator fusing. |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
I don't think so, broadcasting is a separate topic from strides. Only the array shape is relevant to determine broadcasting. For strided array libraries, they do have certain behavior regarding the strides for broadcast arrays, as you pointed out. But, all array libraries, strided or not, can support broadcasting, and thus it's already included in the standard. FYI this issue has been discussed in the array API meeting this week 🙂 Someone will write up a summary later, so I wouldn't dare attempt to give a poor summary in my own words. But I do just come up with another reason against standardizing strides. Strides are meaningful only for dense arrays/tensors. Suppose I wanna write an multidimensional array library that is array-API compliant and has dual dense/sparse support under the hood, without users making any decision if an array should be stored in the dense or any of the known sparse format (COO, CSR, ..., it gets complicated for N-D tensors). Then, clearly I simply cannot return strides, because it is an implementation detail of my library that most users should not need to know/rely on. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 2
-
I agree with what@leofang says. Everything up to and it's very rarely needed to explicitly access or manipulate strides. |
BetaWas this translation helpful?Give feedback.
All reactions
-
I suspect all of those uses of strides can be written using In either case, the point is that Memory order can be a useful to ensure or hint (it even makes some sense for sparse as CSC and CSR are basically the same thing as F vs. C). |
BetaWas this translation helpful?Give feedback.
All reactions
-
I like this idea—collecting the use-cases that strides have been used for and defining APIs for them instead of the strides themselves. (With 18 years of NumPy and even more for the predecessors, we must have seen all the use-cases by now.) This originally came up in the context of the paper for SciPy—indicating this as a possible direction would satisfy the needs of the paper. (Maybe after more of the group signs off on it—or maybe you're already in agreement on this and I'm just slow to catch on.) |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
Not a summary, but one point I made there and I should probably mention here too, is that for the SciPy |
BetaWas this translation helpful?Give feedback.
All reactions
This discussion was converted from issue #641 on April 04, 2024 07:57.