Migration from spmatrix to sparray#

This document provides guidance for converting code from sparsematricesto sparsearrays inscipy.sparse.

The change from sparse matrices to sparse arrays mirrors conversion fromnp.matrix tonp.ndarray. Essentially we must move from an all-2Dmatrix-multiplication-centricmatrix object to a 1D or 2D “array”object that supports the matrix multiplication operator and elementwisecomputation.

Notation: For this guide we denote the sparse array classes generally assparray and the sparse matrix classesspmatrix. Dense numpyarrays are denotednp.ndarray and dense matrix classes arenp.matrix. Supported sparse formats are denoted BSR, COO, CSC, CSR,DIA, DOK, LIL and all formats are supported by both sparray andspmatrix. The termsparse refers to eithersparray orspmatrix, whiledense refers to eithernp.ndarray ornp.matrix.

Overview and big picture#

  • The constructor names*_matrix, e.g. csr_matrix, are changedto*_array.

  • spmatrixM is always 2D (rows x columns) even e.g. M.min(axis=0).sparrayA can be 1D or 2D.Numpy scalars are returned for full (0D) reductions, i.e.M.min().

  • Iterating over a sparray gives 1D sparrays. Iterating spmatrix gives 2D row spmatrices

  • Operators that change behavior are:*,@,*=,@=,**

    • Scalar multiplication, e.g.5*A, uses*, and5@A is notimplemented.

    • sparrays use* for elementwise multiplication and@ formatrix multiplication while spmatrices use either operator* or@ for matrix multiplication. Either can useA.multiply(B) for elementwise multiplication.

    • Scalar exponents, e.g.A**2, use elementwise power for sparray andmatrix power for spmatrix. Matrix power for sparrays usesscipy.sparse.linalg.matrix_power(A,n).

  • When index arrays are provided to the constructor functions, spmatrixselects a dtype based on dtype and values of the incoming arrays, whilesparray only considers the dtype of the incoming arrays. For example,M=csr_matrix((data,indices,indptr)) results inint32 dtype forM.indices so long as the values inindices andindptr are small,even if thedtype of the incoming arrays areint64. In contrast,A=csr_array((data,indices,indptr)) results inint64 dtype forA.indices when the input arrays areint64. This provides morepredictable, often larger, index dtypes in sparrays and less castingto match dtypes.

  • Checking the sparse type and format:

    • issparse(A) returnsTrue for any sparse array/matrix.

    • isspmatrix(M) returnsTrue for any sparse matrix.

    • isspmatrix_csr(M) checks for a sparse matrix with specific format.It should be replaced with an array compatible version such as:

    • issparse(A)andA.format=='csr' which checks for a CSR sparsearray/matrix.

  • Handling your software package API with sparse input/output:

    • Inputs are fairly easy to make work with either spmatrix or sparray. Solong as you useA.multiply(B) for elementwise andA@B for matrixmultiplication, and you usesparse.linalg.matrix_power for matrixpower, you should be fine after you complete the “first pass” of themigration steps described in the next section. Your code will handleboth types of inputs interchangeably.

    • Migrating sparse outputs from your functions requires a little more thought.Make a list of all your public functions that return spmatrix objects.Check whether you feel OK returning sparrays instead. That depends onyour library and its users. If you want to allow these functions tocontinue to return spmatrix or sparray objects, you can often do thatusing a sparse input that also serves as a signal for what type of outputshould be returned. Design your function to return the type that was input.That approach can be extended to dense inputs. If the input is an np.matrixor a masked array with np.matrix as its._baseclass attribute, thenreturn spmatrix. Otherwise return an sparray. Without those inputs, twoother approaches are to create a keyword argument to signal which to return,or create a new function (like we have done with, e.g.eye_array) thathas the same basic syntax, but returns sparray. Which method you chooseshould depend on your library and your users and your preferences.

Recommended steps for migration#

  • First pass (leaving spmatrix in the code):

    • In your spmatrix code, change* to@ for matrixmultiplication. Note that scalar multiplication with sparse shoulduse*. (See helper-codeUse tests to find * and ** spots below)

    • Matrix powers, e.g. M**3, should be converted toscipy.sparse.linalg.matrix_power(A,3)

    • Implement alternatives to unsupported functions/methods likeA.getnnz() ->A.nnz (seeRemoved methods and attributesbelow).

    • Change any logic regardingissparse() andisspmatrix() asneeded. Usually, this means replacingisspmatrix withissparse,andisspmatrix_csr(G) withissparse(G)andG.format=="csr".Moreoverisspmatrix_csr(G)orisspmatrix_csc(G) becomesissparse(G)andG.formatin['csr','csc'].The git search idiomgitgrep'isspm[a-z_]*(' can help find these.

    • Convert allspdiags calls todia_matrix.See docs inspdiags.A search forspdiags is all you need here.

    • Run all your tests on the resulting code. You are still usingspmatrix, not sparray. But your code and tests are prepared forthe change and you should be able to take sparrays as input to yourcode and have them mostly “just work”.

  • Second pass (switching to sparray):

    • Convert construction functions likediags andtriu to thearray version (seeDetails: construction functions below).

    • Rename all*_matrix constructor calls to*_array.

    • Check all functions/methods for which migration causes 1D returnvalues. These are mostly indexing and the reduction functions(seeDetails: shape changes and reductions below).

    • Check all places where you iterate over spmatrices and change themto account for the sparrays yielding 1D sparrays rather than 2D spmatrices.

    • Find and change places where your code makes use ofnp.matrixfeatures. Convert those tonp.ndarray features.

    • If your code reads sparse from files withmmread,hb_readorloadmat, use the new keyword argumentspmatrix=Falsein those functions to read to sparray.

    • If you use sparse libraries that only acceptint32 index arraysfor sparse representations, we suggest using just-in-time conversion.Convert toint32 just before you call the code that requiresint32.

    • sparray selects index dtype based on the dtype of the input array insteadof the values in the array. So if you want your index arrays to beint32,you will need to ensure anint32 dtype for each index array likeindptrthat you pass tocsr_array. Withspmatrix it is tempting to use thedefault int64 dtype fornumpy arrays and rely on the sparse constructorto downcast if the values were small. But this downcasting leads to extrarecasting when working with other matrices, slices or arithmetic expressions.Forsparray you can still rely on the constructors to choose dtypes. Butyou are also given the power to choose your index dtype via the dtype of theincoming index arrays rather than their values. So, if you wantint32,set the dtype, e.g.indices=np.array([1,3,6],dtype=np.int32) orindptr=np.arange(9,dtype=np.int32), when creating the index arrays.SeeIndex Array DTypes below for more info.In many settings, the index array dtype isn’t crucial and you can just letthe constructors choose the dtype for both sparray and spmatrix.

    • Test your code. Andread your code. You have migrated to sparray.

Details: construction functions#

These four functions are new and only handle sparrays:block_array,diags_array,eye_array, andrandom_array.Their signatures are:

defblock_array(blocks,format=None,dtype=None):defdiags_array(diagonals,/,*,offsets=0,shape=None,format=None,dtype=None):defeye_array(m,n=None,*,k=0,dtype=float,format=None):defrandom_array(shape,density=0.01,format='coo',dtype=None,rng=None,data_sampler=None):

Therandom_array function has ashape (2-tuple) arg rather thantwo integers. And therng arg defaults to NumPy’s newdefault_rng().This differs from the spmatrixrand andrandom which default tothe global RandomState instance. If you don’t care much about these things,leaving it as the default should work fine. If you care about seeding yourrandom numbers, you should probably add arng=... keyword argumentto this call when you switch functions. In summary, to migrate torandom_arraychange the function name, switch the shape argument to a single tuple argument,leave any other parameters as before, and think about whatsort ofrng= argument should be used, if any.

Thediags_array function uses keyword-only rules for arguments. So you haveto type theoffsets= in front of the offsets arguments. That seems like a painduring migration from usingdiags, but it helps avoid confusion and eases reading.A single shape parameter replaces two integers for this migration as well.

Existing functions that need careful migration#

These functions return sparray or spmatrix, depending on the input types theyreceive:kron,kronsum,hstack,vstack,block_diag,tril, andtriu. Their signatures are:

defkron(A,B,format=None):defkronsum(A,B,format=None):defhstack(blocks,format=None,dtype=None):defvstack(blocks,format=None,dtype=None):defblock_diag(mats,format=None,dtype=None):deftril(A,k=0,format=None):deftriu(A,k=0,format=None):

Use of these functions should be examined and inputs adjusted to ensure returnvalues are sparrays. And in turn the outputs should be treated as sparrays.To return sparrays, at least one input must be an sparray. If you uselist-of-lists or numpy arrays as input you should convert one of themto a sparse array to get sparse arrays out.

Functions that changed names for the migration#

Function

New function

Comments

eye

eye_array

identity

eye_array

diags

diags_array

keyword-only input

spdiags

dia_array

shape as 2-tuple

bmat

block

rand

random_array

shape as 2-tuple and default_rng

random

random_array

shape as 2-tuple and default_rng

Details: shape changes and reductions#

  • Construction using 1d-list of values:

    • csr_array([1,2,3]).shape==(3,) 1D input makes a 1D array.

    • csr_matrix([1,2,3]).shape==(1,3) 1D input makes a 2D matrix.

  • Indexing and iteration:

    • Indexing of sparray allows 1D objects which can be made 2D usingnp.newaxis orNone. E.g.,A[3,None,:] gives a 2Drow. Indexing of 2D sparray with implicit (not given) column indexgives a 1D result, e.g. A[3] (note: best not to do this - write it asA[3,:] instead). If you need a 2D result, usenp.newaxis, orNone in your index, or wrap the integer index as a list for whichfancy indexing gives 2D, e.g. A[[3],:].

    • Iteration over sparse object:next(M) yields a sparse 2D row matrix,next(A) yields a sparse 1D array.

  • Reduction operations along an axis reduce the shape:

    • M.min(axis=1) returns a 2D row matrix of the min along axis 1.

    • A.min(axis=1) returns a 1Dcoo_array of the min along axis 1.Some reductions return dense arrays/matrices instead of sparse ones:

      Method

      Result

      sum(axis)

      dense

      mean(axis)

      dense

      argmin(axis)

      dense

      argmax(axis)

      dense

      min(axis)

      sparse

      max(axis)

      sparse

      nanmin(axis)

      sparse

      nanmax(axis)

      sparse

    Generally, 2D sparray inputs lead to 1D results. 2D spmatrixinputs lead to 2D results.

  • Some reductions return a scalar. Those should behave as they didbefore and shouldn’t need to be considered during migration. E.g.A.min()

Removed methods and attributes#

  • The methodsget_shape,getrow,getcol,asfptype,getnnz,getH and the attributes.A and.H are only present on spmatrices,not sparrays. It is recommended that you replace usage of them withalternatives before starting the shift to sparray.

    Function

    Alternative

    M.get_shape()

    A.shape

    M.getformat()

    A.format

    M.asfptype(…)

    A.astype(…)

    M.getmaxprint()

    A.maxprint

    M.getnnz()

    A.nnz

    M.getnnz(axis)

    A.count_nonzero(axis)

    M.getH()

    A.conj().T

    M.getrow(i)

    A[i, :]

    M.getcol(j)

    A[:, j]

    M.A

    A.toarray()

    M.H

    A.conj().T

  • Shape assignment (M.shape=(2,6)) is not permitted for sparray.Instead you should useA.reshape.

  • M.getnnz() returns the number of stored values – not the numberof non-zeros.A.nnz does the same. To get the number ofnon-zeros, useA.count_nonzero(). This is not new to themigration, but can be confusing.

    To migrate from theaxis parameter ofM.getnnz(axis=...),you can useA.count_nonzero(axis=...)but it is not an exact replacement because it counts nonzerovalues instead of stored values. The difference is the numberof explicitly stored zero values. If you really want the numberof stored values by axis you will need to use some numpy tools.

    The numpy tools approach works for COO, CSR, CSC formats, so convertto one of them. For CSR and CSC, the major axis is compressed andnp.diff(A.indptr) returns a dense 1D array with the number ofstored values for each major axis value (row for CSR and columnfor CSC). The minor axes can be computed usingnp.bincount(A.indices,minlength=N) whereN is the lengthof the minor axis (e.g. A.shape[1] for CSR). Thebincountfunction works for any axis of COO format usingA.coords[axis]in place ofA.indices.

Use tests to find * and ** spots#

  • It can be tricky to distinguish scalar multiplication* frommatrix multiplciation* as you migrate your code. Python solvedthis, in theory, by introducing the matrix multiplication operator@.* is used for scalar multiplication while@ for matrixmultiplication. But converting expressions that use* for bothcan be tricky and cause eye strain. Luckily, if your code has atest suite that covers the expressions you need to convert, youcan use it to find places where* is being used for matrixmultiplication involving sparse matrices. Change those to@.

    The approach monkey-patches the spmatrix class dunder methodsto raise an exception when* is used for matrix multiplication(and not raise for scalar multiplication). The test suite willflag a failure at these locations. And a test failure is a successhere because it shows where to make changes. Change the offending* to@, look nearby for other similar changes, and run thetests again. Similarly, this approach helps find where** isused for matrix power. SciPy raises an exception when@ isused with for scalar multiplication, so that will catch places whereyou change when you shouldn’t have. So the test suite with thismonkey-patch checks the corrections too.

    Add the following code to yourconftest.py file.Then run your tests locally. If there are many matrix expressions,you might want to test one section of your codebase at a time.A quick read of the code shows that it raises aValueError whenever* is used between two matrix-like objects (sparse or dense),and whenever** is used for matrix power. It also produces a warningwhenever sum/mean/min/max/argmin/argmax are used with an axis so theoutput will be 2D with spmatrix and 1D with sparray. That means youcheck that the code will handle either 1D or 2D output viaflatten/ravel,np.atleast_2d or indexing.

    #================== Added to check spmatrix usage ========================importscipyfromwarningsimportwarndefflag_this_call(*args,**kwds):raiseValueError("Old spmatrix function names for rand/spdiags called")scipy.sparse._construct.rand=flag_this_callscipy.sparse._construct.spdiags=flag_this_callclass_strict_mul_mixin:def__mul__(self,other):ifnotscipy.sparse._sputils.isscalarlike(other):raiseValueError('Operator * used here! Change to @?')returnsuper().__mul__(other)def__rmul__(self,other):ifnotscipy.sparse._sputils.isscalarlike(other):raiseValueError('Operator * used here! Change to @?')returnsuper().__rmul__(other)def__imul__(self,other):ifnotscipy.sparse._sputils.isscalarlike(other):raiseValueError('Operator * used here! Change to @?')returnsuper().__imul__(other)def__pow__(self,*args,**kwargs):raiseValueError('spmatrix ** found! Use linalg.matrix_power?')@propertydefA(self):raiseTypeError('spmatrix A property found! Use .toarray()')@propertydefH(self):raiseTypeError('spmatrix H property found! Use .conjugate().T')defasfptype(self):raiseTypeError('spmatrix asfptype found! rewrite needed')defget_shape(self):raiseTypeError('spmatrix get_shape found! Use .shape')defgetformat(self):raiseTypeError('spmatrix getformat found! Use .format')defgetmaxprint(self):raiseTypeError('spmatrix getmaxprint found! Use .shape')defgetnnz(self):raiseTypeError('spmatrix getnnz found! Use .nnz')defgetH(self):raiseTypeError('spmatrix getH found! Use .conjugate().T')defgetrow(self):raiseTypeError('spmatrix getrow found! Use .row')defgetcol(self):raiseTypeError('spmatrix getcol found! Use .col')defsum(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix sum found using axis={axis}. ""\nsparray with a single axis will produce 1D output. ""\nCheck nearby to ensure 1D output is handled OK in this spot.\n")print(f"{args=}{axis=}{kwds=}")returnsuper().sum(*args,**kwds)defmean(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix mean found using axis={axis}.""\nsparray with a single axis will produce 1D output.\n""Check nearby to ensure 1D output is handled OK in this spot.\n")returnsuper().mean(*args,**kwds)defmin(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix min found using axis={axis}.""\nsparray with a single axis will produce 1D output. ""Check nearby to ensure 1D output is handled OK in this spot.\n")returnsuper().min(*args,**kwds)defmax(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix max found using axis={axis}.""\nsparray with a single axis will produce 1D output. ""Check nearby to ensure 1D output is handled OK in this spot.\n")returnsuper().max(*args,**kwds)defargmin(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix argmin found using axis={axis}.""\nsparray with a single axis will produce 1D output. ""Check nearby to ensure 1D output is handled OK in this spot.\n")returnsuper().argmin(*args,**kwds)defargmax(self,*args,**kwds):axis=args[0]iflen(args)==1elseargsifargselsekwds.get("axis",None)ifaxisisnotNone:warn(f"\nMIGRATION WARNING: spmatrix argmax found using axis={axis}.""\nsparray with a single axis will produce 1D output. ""Check nearby to ensure 1D output is handled OK in this spot.\n")returnsuper().argmax(*args,**kwds)classcoo_matrix_strict(_strict_mul_mixin,scipy.sparse.coo_matrix):passclassbsr_matrix_strict(_strict_mul_mixin,scipy.sparse.bsr_matrix):passclasscsr_matrix_strict(_strict_mul_mixin,scipy.sparse.csr_matrix):passclasscsc_matrix_strict(_strict_mul_mixin,scipy.sparse.csc_matrix):passclassdok_matrix_strict(_strict_mul_mixin,scipy.sparse.dok_matrix):passclasslil_matrix_strict(_strict_mul_mixin,scipy.sparse.lil_matrix):passclassdia_matrix_strict(_strict_mul_mixin,scipy.sparse.dia_matrix):passscipy.sparse.coo_matrix=scipy.sparse._coo.coo_matrix=coo_matrix_strictscipy.sparse.bsr_matrix=scipy.sparse._bsr.bsr_matrix=bsr_matrix_strictscipy.sparse.csr_matrix=scipy.sparse._csr.csr_matrix=csr_matrix_strictscipy.sparse.csc_matrix=scipy.sparse._csc.csc_matrix=csc_matrix_strictscipy.sparse.dok_matrix=scipy.sparse._dok.dok_matrix=dok_matrix_strictscipy.sparse.lil_matrix=scipy.sparse._lil.lil_matrix=lil_matrix_strictscipy.sparse.dia_matrix=scipy.sparse._dia.dia_matrix=dia_matrix_strictscipy.sparse._compressed.csr_matrix=csr_matrix_strictscipy.sparse._construct.bsr_matrix=bsr_matrix_strictscipy.sparse._construct.coo_matrix=coo_matrix_strictscipy.sparse._construct.csc_matrix=csc_matrix_strictscipy.sparse._construct.csr_matrix=csr_matrix_strictscipy.sparse._construct.dia_matrix=dia_matrix_strictscipy.sparse._extract.coo_matrix=coo_matrix_strictscipy.sparse._matrix.bsr_matrix=bsr_matrix_strictscipy.sparse._matrix.coo_matrix=coo_matrix_strictscipy.sparse._matrix.csc_matrix=csc_matrix_strictscipy.sparse._matrix.csr_matrix=csr_matrix_strictscipy.sparse._matrix.dia_matrix=dia_matrix_strictscipy.sparse._matrix.dok_matrix=dok_matrix_strictscipy.sparse._matrix.lil_matrix=lil_matrix_strictdelcoo_matrix_strictdelbsr_matrix_strictdelcsr_matrix_strictdelcsc_matrix_strictdeldok_matrix_strictdellil_matrix_strictdeldia_matrix_strict#==========================================

Index Array DTypes#

If you provide compressed indices to a constructor,e.g. csr_array((data,indices,indptr)) sparse arrays set the indexdtype by only checking the index arrays dtype, while sparse matrices check theindex values too and may downcast to int32(seegh-18509 for more details).This means you may get int64 indexing when you used to get int32.You can control this by setting thedtype before instantiating, or byrecasting after construction.

Two sparse utility functions can help with handling the index dtype.Useget_index_dtype(arrays,maxval,check_contents) while creating indicesto find an appropriate dtype (int32 or int64) to use for your compressed indices.

Usesafely_cast_index_arrays(A,idx_dtype) for recasting after construction,while making sure you con’t create overflows during downcasting.This function doesn’t actually change the input array. The cast arrays are returned.And copies are only made when needed. So you can check if casting was done usingifindicesisnotA.indices:

The function signatures are:

defget_index_dtype(arrays=(),maxval=None,check_contents=False):defsafely_cast_index_arrays(A,idx_dtype=np.int32,msg=""):

Example idioms include the following forget_index_dtype:

..code-block::python# select index dtype before construction based on shapeshape=(3,3)idx_dtype=scipy.sparse.get_index_dtype(maxval=max(shape))indices=np.array([0,1,0],dtype=idx_dtype)indptr=np.arange(3,dtype=idx_dtype)A=csr_array((data,indices,indptr),shape=shape)

and forsafely_cast_index_arrays:

..code-block::python# rescast after construction, raising exception if shape too bigindices,indptr=scipy.sparse.safely_cast_index_arrays(B,np.int32)B.indices,B.indptr=indices,indptr

Other#

  • Binary operators+,-,*,/,@,!=,> act on sparse and/or dense operands:

    • If all inputs are sparse, the output is usually sparse as well. Theexception being/ which returns dense (dividing by the defaultvalue0 isnan).

    • If inputs are mixed sparse and dense, the result is usually dense(i.e.,np.ndarray). Exceptions are* which is sparse, and/which is not implemented fordense/sparse, and returns sparse forsparse/dense.

  • Binary operators+,-,*,/,@,!=,> with array and/or matrix operands:

    • If all inputs are arrays, the outputs are arrays and the same is true formatrices.

    • When mixing sparse arrays with sparse matrices, the leading operandprovides the type for the output, e.g. sparray+spmatrix gives asparse array while reversing the order gives a sparse matrix.

    • When mixing dense matrices with sparse arrays, the results are usuallyarrays with exceptions for comparisons, e.g. > which return densematrices.

    • When mixing dense arrays with sparse matrices, the results are usuallymatrices with an exception forarray@sparsematrix which returns adense array.