STUMPY API #

Have A Question?#

Overview

`stumpy.stump`	Compute the z-normalized matrix profile
`stumpy.stumped`	Compute the z-normalized matrix profile with a`dask`/`ray` cluster
`stumpy.gpu_stump`	Compute the z-normalized matrix profile with one or more GPU devices
`stumpy.mass`	Compute the distance profile using the MASS algorithm
`stumpy.scrump`	A class to ompute an approximate z-normalized matrix profile
`stumpy.stumpi`	A class to compute an incremental z-normalized matrix profile for streaming data
`stumpy.mstump`	Compute the multi-dimensional z-normalized matrix profile
`stumpy.mstumped`	Compute the multi-dimensional z-normalized matrix profile with a`dask`/`ray` cluster
`stumpy.subspace`	Compute the`k`-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
`stumpy.mdl`	Compute the multi-dimensional number of bits needed to compress one multi-dimensional subsequence with another along each of the`k`-dimensions using the minimum description length (MDL)
`stumpy.atsc`	Compute the anchored time series chain (ATSC)
`stumpy.allc`	Compute the all-chain set (ALLC)
`stumpy.fluss`	Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
`stumpy.floss`	A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data
`stumpy.ostinato`	Find the z-normalized consensus motif of multiple time series
`stumpy.ostinatoed`	Find the z-normalized consensus motif of multiple time series with a`dask`/`ray` cluster
`stumpy.gpu_ostinato`	Find the z-normalized consensus motif of multiple time series with one or more GPU devices
`stumpy.mpdist`	Compute the z-normalized matrix profile distance (MPdist) measure between any two time series
`stumpy.mpdisted`	Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with a`dask`/`ray` cluster
`stumpy.gpu_mpdist`	Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices
`stumpy.motifs`	Discover the top motifs for time series`T`
`stumpy.match`	Find all matches of a query`Q` in a time series`T`
`stumpy.mmotifs`	Discover the top motifs for the multi-dimensional time series`T`.
`stumpy.snippets`	Identify the top`k` snippets that best represent the time series,`T`
`stumpy.stimp`	A class to compute the Pan Matrix Profile
`stumpy.stimped`	A class to compute the Pan Matrix Profile with a`dask`/`ray` cluster
`stumpy.gpu_stimp`	A class to compute the Pan Matrix Profile with with one or more GPU devices

stump#

stumpy.stump(T_A,m,T_B=None,ignore_trivial=True,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_stump function which computes the (top-k) matrix profile according toSTOMPopt with Pearson correlations.

Parameters:

T_Anumpy.ndarray: The time series or sequence for which to compute the matrix profile.
mint: Window size.
T_Bnumpy.ndarray, default None: The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.
ignore_trivialbool, default True: Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
kint, default 1: The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1. If you have access to a GPU device, then you may be ableto leveragegpu_stump for better performance and scalability.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:

outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-kmatrix profile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.

For convenience, the matrix profile (distances) and matrix profile indices canalso be accessed via their corresponding named array attributes,.P_ and.I_,respectively. Similarly, the corresponding left matrix profile indicesand right matrix profile indices may also be accessed via the.left_I_ and.right_I_ array attributes. See examples below.

See also

stumpy.stumped: Compute the z-normalized matrix profile with adask/ray cluster
stumpy.gpu_stump: Compute the z-normalized matrix profile with one or more GPU devices
stumpy.scrump: Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distancematrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

The above reference outlines the use of the Pearson correlation via Welford’scentered sum-of-products along each diagonal of the distance matrix in place of thesliding window dot product found in the original STOMP method.

DOI: 10.1109/ICDM.2016.0085

See Table II

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Return: For every subsequence,Q, inT_A, you will get a distanceand index for the closest subsequence inT_B. Thus, the arrayreturned will have lengthT_A.shape[0]-m+1. Additionally, theleft and right matrix profiles are also returned.

Note: Unlike in the Table II whereT_A.shape is expected to be equaltoT_B.shape, this implementation is generalized so that the shapes ofT_A andT_B can be different. In the case whereT_A.shape==T_B.shape,then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>mpmparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

stumped#

stumpy.stumped(client,T_A,m,T_B=None,ignore_trivial=True,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile with adask/ray cluster

This is a highly distributed implementation around the Numba JIT-compiledparallelized_stump function which computes the (top-k) matrix profileaccording to STOMPopt with Pearson correlations.

Parameters:

clientclient: Adask/ray client. Setting up a cluster is beyond the scope of thislibrary. Please refer to thedask/ray documentation.
T_Anumpy.ndarray: The time series or sequence for which to compute the matrix profile.
mint: Window size.
T_Bnumpy.ndarray, default None: The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.
ignore_trivialbool, default True: Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
kint, default 1: The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1. If you have access to a GPU device, then you may be ableto leveragegpu_stump for better performance and scalability.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:

outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-k matrixprofile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.

See also

stumpy.stump: Compute the z-normalized matrix profile cluster
stumpy.gpu_stump: Compute the z-normalized matrix profile with one or more GPU devices
stumpy.scrump: Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distancematrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

DOI: 10.1109/ICDM.2016.0085

See Table II

This is adask/ray implementation of stump that scalesacross multiple servers and is a convenience wrapper around theparallelizedstump._stump function

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...stumpy.stumped(...dask_client,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)mparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.stumped(...ray,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>ray.shutdown()

gpu_stump#

stumpy.gpu_stump(T_A,m,T_B=None,ignore_trivial=True,device_id=0,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)#

Compute the z-normalized matrix profile with one or more GPU devices

This is a convenience wrapper around the Numbacuda.jit_gpu_stump functionwhich computes the matrix profile according to GPU-STOMP. The default number ofthreads-per-block is set to512 and may be changed by setting the globalparameterconfig.STUMPY_THREADS_PER_BLOCK to an appropriate number based onyour GPU hardware.

Parameters:

T_Anumpy.ndarray: The time series or sequence for which to compute the matrix profile.
mint: Window size.
T_Bnumpy.ndarray, default None: The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.
ignore_trivialbool, default True: Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.
device_idint or list, default 0: The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
kint, default 1: The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:

outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-k matrixprofile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.

See also

stumpy.stump: Compute the z-normalized matrix profile
stumpy.stumped: Compute the z-normalized matrix profile with adask/ray cluster
stumpy.scrump: Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1109/ICDM.2016.0085

See Table II, Figure 5, and Figure 6

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...mp=stumpy.gpu_stump(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3,...device_id=all_gpu_devices)>>>mpmparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

mass#

stumpy.mass(Q,T,M_T=None,Σ_T=None,normalize=True,p=2.0,T_subseq_isfinite=None,T_subseq_isconstant=None,Q_subseq_isconstant=None,query_idx=None)[source]#

Compute the distance profile using the MASS algorithm

This is a convenience wrapper around the Numba JIT compiled_mass function.

Parameters:

Qnumpy.ndarray: Query array or subsequence.
Tnumpy.ndarray: Time series or sequence.
M_Tnumpy.ndarray, default None: Sliding mean ofT.
Σ_Tnumpy.ndarray, default None: Sliding standard deviation ofT.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. This parameter isignored whennormalize==True.
T_subseq_isfinitenumpy.ndarray, default None: A boolean array that indicates whether a subsequence inT contains anp.nan/np.inf value (False). This parameter is ignored whennormalize==True.
T_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
Q_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether the subsequence inQ is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether the subsequence inQ is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
query_idxint, default None: This is the index position along the time series,T, where the querysubsequence,Q, is located.query_idx should be set toNone ifQ is not a subsequence ofT. IfQ is a subsequence ofT,provding this argument is optional. Ifquery_idx is provided, the distancebetweenQ andT[query_idx:query_idx+m] will automatically be set tozero.

Returns:

distance_profilenumpy.ndarray: Distance profile.

See also

stumpy.motifs: Discover the top motifs for time seriesT
stumpy.match: Find all matches of a queryQ in a time seriesT

Notes

DOI: 10.1109/ICDM.2016.0179

See Table II

Note thatQ,T are not directly required to calculateD

Note: Unlike the Matrix Profile I paper, here,M_T,Σ_T can be calculatedonce for all subsequences ofT and passed in so the redundancy is removed

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.mass(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))array([3.18792463e+00, 1.11297393e-03, 3.23874018e+00, 3.34470195e+00])

scrump#

stumpy.scrump(T_A,m,T_B=None,ignore_trivial=True,percentage=0.01,pre_scrump=False,s=None,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

A class to ompute an approximate z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_stump function which computes the matrix profile according to SCRIMP.

Parameters:

T_Anumpy.ndarray: The time series or sequence for which to compute the matrix profile.
T_Bnumpy.ndarray: The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.
mint: Window size.
ignore_trivialbool: Set toTrue if this is a self-join. Otherwise, for AB-join, set this toFalse.
percentagefloat: Approximate percentage completed. The value is between0.0 and1.0.
pre_scrumpbool: A flag for whether or not to perform the PreSCRIMP calculation prior tocomputing SCRIMP. If set toTrue, this is equivalent to computingSCRIMP++ and may lead to faster convergence
sint: The size of the PreSCRIMP fixed interval. Ifpre_scrump=True ands=None, thens will automatically be set tos=int(np.ceil(m/config.STUMPY_EXCL_ZONE_DENOM)), which is the size ofthe exclusion zone.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this class gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized class decorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
kint, default 1: The number of topk smallest distances used to construct the matrix profile.Note that this will increase the total computational time and memory usagewhenk>1.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Attributes:

P_numpy.ndarray: Get the updated (top-k) matrix profile.
I_numpy.ndarray: Get the updated (top-k) matrix profile indices.
left_I_numpy.ndarray: Get the updated left (top-1) matrix profile indices
right_I_numpy.ndarray: Get the updated right (top-1) matrix profile indices

Methods

update()

Update the matrix profile and the matrix profile indices by computing additional new distances (limited bypercentage) that make up the full distance matrix. It updates the (top-k) matrix profile, (top-1) left matrix profile, (top-1) right matrix profile, (top-k) matrix profile indices, (top-1) left matrix profile indices, and (top-1) right matrix profile indices.

See also

stumpy.stump: Compute the z-normalized matrix profile
stumpy.stumped: Compute the z-normalized matrix profile with adask/ray cluster
stumpy.gpu_stump: Compute the z-normalized matrix profile with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00099

See Algorithm 1 and Algorithm 2

Examples

>>>importstumpy>>>importnumpyasnp>>>approx_mp=stumpy.scrump(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>approx_mp.update()>>>approx_mp.P_array([2.982409  , 3.28412702,        inf, 2.982409  , 3.28412702])>>>approx_mp.I_array([ 3,  4, -1,  0,  1])

stumpi#

stumpy.stumpi(T,m,egress=True,normalize=True,p=2.0,k=1,mp=None,T_subseq_isconstant_func=None)[source]#

A class to compute an incremental z-normalized matrix profile for streaming data

This is based on the on-line STOMPI and STAMPI algorithms.

Parameters:

Tnumpy.ndarray: The time series or sequence for which the matrix profile and matrix profileindices will be returned.
mint: Window size.
egressbool, default True: If set toTrue, the oldest data point in the time series is removed andthe time series length remains constant rather than forever increasing
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this class gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized class decorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. This parameter isignored whennormalize==True.
kint, default 1: The number of topk smallest distances used to construct the matrix profile.Note that this will increase the total computational time and memory usagewhenk>1.
mpnumpy.ndarray, default None: A pre-computed matrix profile (and corresponding matrix profile indices).This is a 2D array of shape(len(T)-m+1,2*k+2), where the firstk columns are top-k matrix profile, and the nextk columns are theircorresponding indices. The last two columns correspond to the top-1 left andtop-1 right matrix profile indices. WhenNone (default), this array iscomputed internally usingstumpy.stump.
T_subseq_isconstant_funcfunction, default None: A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:

P_numpy.ndarray: Get the (top-k) matrix profile.
I_numpy.ndarray: Get the (top-k) matrix profile indices.
left_P_numpy.ndarray: Get the (top-1) left matrix profile
left_I_numpy.ndarray: Get the (top-1) left matrix profile indices
T_numpy.ndarray: Get the time series

Methods

update(t)

Append a single new data point,t, to the time series,T, and update the matrix profile.

Notes

DOI: 10.1007/s10618-017-0519-9

See Table V

Note that line 11 is missing an importantsqrt operation!

Examples

>>>importstumpy>>>importnumpyasnp>>>stream=stumpy.stumpi(...np.array([584.,-11.,23.,79.,1001.,0.]),...m=3)>>>stream.update(-19.0)>>>stream.left_P_array([       inf, 3.00009263, 2.69407392, 3.05656417])>>>stream.left_I_array([-1,  0,  1,  2])

mstump#

stumpy.mstump(T,m,include=None,discords=False,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_mstump function which computes the multi-dimensional matrix profile andmulti-dimensional matrix profile index according to mSTOMP, a variant ofmSTAMP. Note that only self-joins are supported.

Parameters:

Tnumpy.ndarray

The time series or sequence for which to compute the multi-dimensionalmatrix profile. Each row inT represents data from the samedimension while each column inT represents data from a differentdimension.

mint

Window size.

includelist, numpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance matrix which results in amulti-dimensional matrix profile that favors larger matrix profile values(i.e., discords) rather than smaller values (i.e., motifs). Note that indicesininclude are still maintained and respected.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:

Pnumpy.ndarray: The multi-dimensional matrix profile. Each row of the array correspondsto each matrix profile for a given dimension (i.e., the first row isthe 1-D matrix profile and the second row is the 2-D matrix profile).
Inumpy.ndarray: The multi-dimensional matrix profile index where each row of the arraycorresponds to each matrix profile index for a given dimension.

See also

stumpy.mstumped: Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster
stumpy.subspace: Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
stumpy.mdl: Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

Examples

>>>stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)(array([[0.        , 1.43947142, 0.        , 2.69407392, 0.11633857],        [0.777905  , 2.36179922, 1.50004632, 2.92246722, 0.777905  ]]), array([[2, 4, 0, 1, 0],        [4, 4, 0, 1, 0]]))

mstumped#

stumpy.mstumped(client,T,m,include=None,discords=False,p=2.0,normalize=True,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

This is a highly distributed implementation around the Numba JIT-compiledparallelized_mstump function which computes the multi-dimensional matrixprofile according to STOMP. Note that only self-joins are supported.

Parameters:

clientclient

Adask/ray client. Setting up a cluster is beyond the scope of thislibrary. Please refer to thedask/ray documentation.

Tnumpy.ndarray

mint

Window size.

includelist, numpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

pfloat, default 2.0

normalizebool, default True

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:

Pnumpy.ndarray: The multi-dimensional matrix profile. Each row of the array correspondsto each matrix profile for a given dimension (i.e., the first row isthe 1-D matrix profile and the second row is the 2-D matrix profile).
Inumpy.ndarray: The multi-dimensional matrix profile index where each row of the arraycorresponds to each matrix profile index for a given dimension.

See also

stumpy.mstump: Compute the multi-dimensional z-normalized matrix profile
stumpy.subspace: Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
stumpy.mdl: Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...stumpy.mstumped(...dask_client,...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)(array([[0.        , 1.43947142, 0.        , 2.69407392, 0.11633857],        [0.777905  , 2.36179922, 1.50004632, 2.92246722, 0.777905  ]]), array([[2, 4, 0, 1, 0],        [4, 4, 0, 1, 0]]))

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.mstumped(...ray,...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>ray.shutdown()

subspace#

stumpy.subspace(T,m,subseq_idx,nn_idx,k,include=None,discords=False,discretize_func=None,n_bit=8,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute thek-dimensional matrix profile subspace for a given subsequence indexand its nearest neighbor index

Parameters:

Tnumpy.ndarray

The time series or sequence for which the multi-dimensional matrix profile,multi-dimensional matrix profile indices were computed.

mint

Window size.

subseq_idxint

The subsequence index inT.

nn_idxint

The nearest neighbor index inT.

kint

The subset number of dimensions out ofD=T.shape[0]-dimensions to returnthe subspace for. Note that zero-based indexing is used.

includenumpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance profile to favor discordsrather than motifs. Note that indices ininclude are still maintained andrespected.

discretize_funcfunc, default None

A function for discretizing each input array. When this isNone, anappropriate discretization function (based on thenormalize parameter) willbe applied.

n_bitint, default 8

The number of bits used for discretization. For more information on anappropriate value, see Figure 4 in:

DOI: 10.1109/ICDM.2016.0069

and Figure 2 in:

DOI: 10.1109/ICDM.2011.54

normalizebool, default True

pfloat, default 2.0

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:

Snumpy.ndarray: An array that contains the (singular)k-th-dimensional subspace for thesubsequence with index equal tosubseq_idx. Note thatk+1 rows will bereturned.

See also

stumpy.mstump: Compute the multi-dimensional z-normalized matrix profile
stumpy.mstumped: Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster
stumpy.mdl: Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>motifs_idx=np.argsort(mps,axis=1)[:,:2]>>>k=1>>>stumpy.subspace(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3,...subseq_idx=motifs_idx[k][0],...nn_idx=indices[k][motifs_idx[k][0]],...k=k)array([0, 1])

mdl#

stumpy.mdl(T,m,subseq_idx,nn_idx,include=None,discords=False,discretize_func=None,n_bit=8,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional number of bits needed to compress onemulti-dimensional subsequence with another along each of thek-dimensionsusing the minimum description length (MDL)

Parameters:

Tnumpy.ndarray

The time series or sequence for which the multi-dimensional matrix profile,multi-dimensional matrix profile indices were computed.

mint

Window size.

subseq_idxnumpy.ndarray

The multi-dimensional subsequence indices inT

nn_idxnumpy.ndarray

The multi-dimensional nearest neighbor index inT

includenumpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance profile to favor discordsrather than motifs. Note that indices ininclude are still maintainedand respected.

discretize_funcfunc, default None

A function for discretizing each input array. When this isNone, anappropriate discretization function (based on thenormalization parameter)will be applied.

n_bitint, default 8

The number of bits used for discretization and for computing the bit size. Formore information on an appropriate value, see Figure 4 in:

DOI: 10.1109/ICDM.2016.0069

and Figure 2 in:

DOI: 10.1109/ICDM.2011.54

normalizebool, default True

pfloat, default 2.0

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:

bit_sizesnumpy.ndarray: The total number of bits computed from MDL for representing each pair ofmultidimensional subsequences.
Slist: A list of numpy.ndarrays that contain thek-th-dimensional subspaces.

See also

stumpy.mstump: Compute the multi-dimensional z-normalized matrix profile
stumpy.mstumped: Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster
stumpy.subspace: Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>motifs_idx=np.argsort(mps,axis=1)[:,0]>>>stumpy.mdl(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3,...subseq_idx=motifs_idx,...nn_idx=indices[np.arange(motifs_idx.shape[0]),motifs_idx])(array([ 80.      , 111.509775]), [array([1]), array([0, 1])])

atsc#

stumpy.atsc(IL,IR,j)[source]#

Compute the anchored time series chain (ATSC)

Note that since the matrix profile indices,IL andIR, are pre-computed,this function is agnostic to subsequence normalization.

Parameters:

ILnumpy.ndarray: Left matrix profile indices.
IRnumpy.ndarray: Right matrix profile indices.
jint: The index value for which to compute the ATSC.

Returns:

outnumpy.ndarray: Anchored time series chain for index,j

See also

stumpy.allc: Compute the all-chain set (ALLC)

Notes

DOI: 10.1109/ICDM.2017.79

See Table I

This is the implementation for the anchored time series chains (ATSC).

Unlike the original paper, we’ve replaced the while-loop with a more stablefor-loop.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.atsc(mp[:,2],mp[:,3],1)array([1, 3])

>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.atsc(mp.left_I_,mp.right_I_,1)array([1, 3])

allc#

stumpy.allc(IL,IR)[source]#

Compute the all-chain set (ALLC)

Note that since the matrix profile indices,IL andIR, are pre-computed,this function is agnostic to subsequence normalization.

Parameters:

ILnumpy.ndarray: Left matrix profile indices.
IRnumpy.ndarray: Right matrix profile indices.

Returns:

Slist(numpy.ndarray): All-chain set.
Cnumpy.ndarray: Anchored time series chain for the longest chain (also known as the unanchoredchain). Note that when there are multiple different chains with length equal tolen(C), then only one chain from this set is returned. You may iterate overthe all-chain set,S, to find all other possible chains with lengthlen(C).

See also

stumpy.atsc: Compute the anchored time series chain (ATSC)

Notes

DOI: 10.1109/ICDM.2017.79

See Table II

Unlike the original paper, we’ve replaced the while-loop with a more stablefor-loop.

This is the implementation for the all-chain set (ALLC) and the unanchoredchain is simply the longest one among the all-chain set. Both theall-chain set and unanchored chain are returned.

The all-chain set,S, is returned as a list of unique numpy arrays.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.allc(mp[:,2],mp[:,3])([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))

>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.allc(mp.left_I_,mp.right_I_)([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))

fluss#

stumpy.fluss(I,L,n_regimes,excl_factor=5,custom_iac=None)[source]#

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS)for static data (i.e., batch processing)

Essentially, this is a wrapper to compute the corrected arc curve andregime locations. Note that since the matrix profile indices,I, arepre-computed, this function is agnostic to subsequence normalization.

Parameters:

Inumpy.ndarray: The matrix profile indices for the time series of interest.
Lint: The subsequence length that is set roughly to be one period length.This is likely to be the same value as the window size,m, usedto compute the matrix profile and matrix profile index but it canbe different since this is only used to manage edge effectsand has no bearing on any of the IAC or CAC core calculations.
n_regimesint: The number of regimes to search for. This is one more than thenumber of regime changes as denoted in the original paper.
excl_factorint, default 5: The multiplying factor for the regime exclusion zone.
custom_iacnumpy.ndarray, default None: A custom idealized arc curve (IAC) that will used for correcting thearc curve.

Returns:

cacnumpy.ndarray: A corrected arc curve (CAC).
regime_locsnumpy.ndarray: The locations of the regimes.

See also

stumpy.floss: Compute the Fast Low-Cost Online Semantic Segmentation (FLOSS) for streaming data

Notes

DOI: 10.1109/ICDM.2017.21

See Section A

This is the implementation for Fast Low-cost Unipotent SemanticSegmentation (FLUSS).

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.fluss(mp[:,0],3,2)(array([1., 1., 1., 1., 1.]), array([0]))

>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.fluss(mp.P_,3,2)(array([1., 1., 1., 1., 1.]), array([0]))

floss#

stumpy.floss(mp,T,m,L,excl_factor=5,n_iter=1000,n_samples=1000,custom_iac=None,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) forstreaming data

Parameters:

mpnumpy.ndarray: The first column consists of the matrix profile, the second columnconsists of the matrix profile indices, the third column consists ofthe left matrix profile indices, and the fourth column consists ofthe right matrix profile indices.
Tnumpy.ndarray: A 1-D time series data used to generate the matrix profile and matrix profileindices found inmp. Note that the the right matrix profile index is usedand the right matrix profile is intelligently recomputed on the fly fromTinstead of using the bidirectional matrix profile.
mint: The window size for computing sliding window mass. This is identicalto the window size used in the matrix profile calculation. For managingedge effects, see theL parameter.
Lint: The subsequence length that is set roughly to be one period length.This is likely to be the same value as the window size,m, usedto compute the matrix profile and matrix profile index but it canbe different since this is only used to manage edge effectsand has no bearing on any of the IAC or CAC core calculations.
excl_factorint, default 5: The multiplying factor for the regime exclusion zone. Note that thisis unrelated to theexcl_zone used in to compute the matrix profile.
n_iterint, default 1000: Number of iterations to average over when determining the parameters forthe IAC beta distribution.
n_samplesint, default 1000: Number of distribution samples to draw during each iteration whencomputing the IAC.
custom_iacnumpy.ndarray, default None: A custom idealized arc curve (IAC) that will used for correcting thearc curve.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstant_funcfunction, default None: A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:

cac_1d_numpy.ndarray: Get the updated 1-dimensional corrected arc curve (CAC_1D)
P_numpy.ndarray: Get the updated matrix profile
I_numpy.ndarray: Get the updated (right) matrix profile indices
T_numpy.ndarray: Get the updated time series,T

Methods

update(t)

Ingress a new data point,t, onto the time series,T, followed by egressing the oldest single data point fromT. Then, update the 1-dimensional corrected arc curve (CAC_1D) and the matrix profile.

See also

stumpy.fluss: Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)

Notes

DOI: 10.1109/ICDM.2017.21

See Section C

This is the implementation for Fast Low-cost Online SemanticSegmentation (FLOSS).

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.]),m=3)>>>stream=stumpy.floss(...mp,...np.array([584.,-11.,23.,79.,1001.,0.]),...m=3,...L=3)>>>stream.update(19.)>>>stream.cac_1d_array([1., 1., 1., 1.])

ostinato#

stumpy.ostinato(Ts,m,normalize=True,p=2.0,Ts_subseq_isconstant=None)[source]#

Find the z-normalized consensus motif of multiple time series

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:

Tslist: A list of time series for which to find the most central consensus motif.
mint: Window size.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
Ts_subseq_isconstantlist, default None: A list of rolling window isconstant for each time series inTs.

Returns:

central_radiusfloat: Radius of the most central consensus motif.
central_Ts_idxint: The time series index inTs that contains the most central consensus motif.
central_subseq_idxint: The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinatoed: Find the z-normalized consensus motif of multiple time series with adask/ray cluster
stumpy.gpu_ostinato: Find the z-normalized consensus motif of multiple time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

The ostinato algorithm proposed in the paper finds the best radiusinTs. Intuitively, the radius is the minimum distance of asubsequence to encompass at least one nearest neighbor subsequencefrom all other time series. The best radius inTs is the minimumradius amongst all radii. Some data sets might contain multiplesubsequences which have the same optimal radius.The greedy Ostinato algorithm only finds one of them, which mightnot be the most central motif. The most central motif amongst thesubsequences with the best radius is the one with the smallest meandistance to nearest neighbors in all other time series. To find thiscentral motif it is necessary to search the subsequences with thebest radius viastumpy.ostinato._get_central_motif.

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.ostinato(...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)(1.2370237678153826, 0, 4)

ostinatoed#

stumpy.ostinatoed(client,Ts,m,normalize=True,p=2.0,Ts_subseq_isconstant=None)[source]#

Find the z-normalized consensus motif of multiple time series with adask/ray cluster

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:

clientclient: Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/ray Distributeddocumentation.
Tslist: A list of time series for which to find the most central consensus motif.
mint: Window size.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
Ts_subseq_isconstantlist, default None: A list of rolling window isconstant for each time series inTs.

Returns:

central_radiusfloat: Radius of the most central consensus motif.
central_Ts_idxint: The time series index inTs that contains the most central consensus motif.
central_subseq_idxint: The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinato: Find the z-normalized consensus motif of multiple time series
stumpy.gpu_ostinato: Find the z-normalized consensus motif of multiple time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":>>>withClient()asdask_client:>>>stumpy.ostinatoed(...dask_client,...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)(1.2370237678153826, 0, 4)

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.ostinatoed(...ray,...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)>>>ray.shutdown()

gpu_ostinato#

stumpy.gpu_ostinato(Ts,m,device_id=0,normalize=True,p=2.0,Ts_subseq_isconstant=None)#

Find the z-normalized consensus motif of multiple time series with one or more GPUdevices

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:

Tslist: A list of time series for which to find the most central consensus motif.
mint: Window size.
device_idint or list, default 0: The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
Ts_subseq_isconstantlist, default None: A list of rolling window isconstant for each time series inTs.

Returns:

central_radiusfloat: Radius of the most central consensus motif.
central_Ts_idxint: The time series index inTs that contains the most central consensus motif.
central_subseq_idxint: The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinato: Find the z-normalized consensus motif of multiple time series
stumpy.ostinatoed: Find the z-normalized consensus motif of multiple time series with adask/ray cluster

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...stumpy.gpu_ostinato(...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3,...device_id=all_gpu_devices)(1.2370237678153826, 0, 4)

mpdist#

stumpy.mpdist(T_A,T_B,m,percentage=0.05,k=None,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series

The MPdist distance measure considers two time series to be similar if they sharemany subsequences, regardless of the order of matching subsequences. MPdistconcatenates the output of an AB-join and a BA-join and returns thek-thsmallest value as the reported distance. Note that MPdist is a measure and not ametric. Therefore, it does not obey the triangular inequality but the method ishighly scalable.

Parameters:

T_Anumpy.ndarray: The first time series or sequence for which to compute the matrix profile.
T_Bnumpy.ndarray: The second time series or sequence for which to compute the matrix profile.
mint: Window size.
percentagefloat, default 0.05: The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0.
kint: Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:

MPdistfloat: The matrix profile distance.

See also

mpdisted: Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with adask/ray cluster
gpu_mpdist: Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.mpdist(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)0.00019935236191097894

mpdisted#

stumpy.mpdisted(client,T_A,T_B,m,percentage=0.05,k=None,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series with adask/ray cluster

Parameters:

clientclient: Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/ray documentation.
T_Anumpy.ndarray: The first time series or sequence for which to compute the matrix profile.
T_Bnumpy.ndarray: The second time series or sequence for which to compute the matrix profile.
mint: Window size.
percentagefloat, default 0.05: The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0. This parameter is ignored whenk is notNone.
kint: Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:

MPdistfloat: The matrix profile distance.

See also

mpdist: Compute the z-normalized matrix profile distance (MPdist) measure between any two time series
gpu_mpdist: Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":>>>withClient()asdask_client:>>>stumpy.mpdisted(...dask_client,...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)0.00019935236191097894

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.mpdisted(...ray,...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>ray.shutdown()

gpu_mpdist#

stumpy.gpu_mpdist(T_A,T_B,m,percentage=0.05,k=None,device_id=0,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series with one or more GPU devices

The MPdist distance measure considers two time series to be similar if they sharemany subsequences, regardless of the order of matching subsequences. MPdistconcatenates and sorts the output of an AB-join and a BA-join and returns the valueof thek-th smallest number as the reported distance. Note that MPdist is ameasure and not a metric. Therefore, it does not obey the triangular inequality butthe method is highly scalable.

Parameters:

T_Anumpy.ndarray: The first time series or sequence for which to compute the matrix profile.
T_Bnumpy.ndarray: The second time series or sequence for which to compute the matrix profile.
mint: Window size.
percentagefloat, default 0.05: The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0. This parameter is ignored whenk is notNone.
kint, default None: Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.
device_idint or list, default 0: The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_A_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.
T_B_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:

MPdistfloat: The matrix profile distance.

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...stumpy.gpu_mpdist(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3,...device_id=all_gpu_devices)0.00019935236191097894

motifs#

stumpy.motifs(T,P,min_neighbors=1,max_distance=None,cutoff=None,max_matches=10,max_motifs=1,atol=1e-08,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Discover the top motifs for time seriesT

A subsequence,Q, becomes a candidate motif if there are at leastmin_neighbor number of other subsequence matches inT (outside theexclusion zone) with a distance less or equal tomax_distance.

Note that, in the best case scenario, the returned arrays would have shape(max_motifs,max_matches) and contain all finite values. However, in reality,many conditions (see below) need to be satisfied in order for this to be true. Anytruncation in the number of rows (i.e., motifs) may be the result of insufficientcandidate motifs with matches greater than or equal tomin_neighbors or thatthe matrix profile value for the candidate motif was larger thancutoff.Similarly, any truncation in the number of columns (i.e., matches) may be the resultof insufficient matches being found with distances (to their corresponding candidatemotif) that are equal to or less thanmax_distance. Only motifs and matches thatsatisfy all of these constraints will be returned.

If you must return a shape of(max_motifs,max_matches), then you may considerspecifying a smallermin_neighbors, a largermax_distance, and/or a largercutoff. For example, while it is ill advised, settingmin_neighbors=1,max_distance=np.inf, andcutoff=np.inf will ensure that the shape of theoutput arrays will be(max_motifs,max_matches). However, given the lack ofconstraints, the quality of each motif and the quality of each match may bedrastically different. Setting appropriate conditions will help ensure appropriatelyconstrained results that may be easier to interpret.

Parameters:

Tnumpy.ndarray: The time series or sequence.
Pnumpy.ndarray: The (1-dimensional) matrix profile ofT. In the case where the matrixprofile was computed withk>1 (i.e., top-k nearest neighbors), youmust summarize the top-k nearest-neighbor distances for each subsequenceinto a single value (e.g.,np.mean,np.min, etc) and then use thatderived value as yourP.
min_neighborsint, default 1: The minimum number of similar matches a subsequence needs to have in orderto be considered a motif. This defaults to1, which means that a subsequencemust have at least one similar match in order to be considered a motif.
max_distancefloat or function, default None: For a candidate motif,Q, and a non-trivial subsequence,S,max_distance is the maximum distance allowed betweenQ andS sothatS is considered a match ofQ. Ifmax_distance is a function,then it must be a function that accepts a single parameter,D, in itsfunction signature, which is the distance profile betweenQ andT. IfNone, this defaults tonp.nanmax([np.nanmean(D)-2.0*np.nanstd(D),np.nanmin(D)]).
cutofffloat, default None: The largest matrix profile value (distance) that a candidate motif is allowedto have. IfNone, this defaults tonp.nanmax([np.nanmean(P)-2.0*np.nanstd(P),np.nanmin(P)]).
max_matchesint, default 10: The maximum amount of similar matches of a motif representative to be returned.The resulting matches are sorted by distance, so a value of10 means thatthe indices of the most similar10 subsequences is returned.IfNone, all matches withinmax_distance of the motif representativewill be returned. Note that the first match is always theself-match/trivial-match for each motif.
max_motifsint, default 1: The maximum number of motifs to return. To consider returning all possiblevalid motifs, try settingmax_motifs to the length of your input matrixprofile (i.e.,max_motifs=len(P))
atolfloat, default 1e-8: The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence inT is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:

motif_distancesnumpy.ndarray: The distances corresponding to a set of subsequence matches for each motif.Note that the first column always corresponds to the distance for theself-match/trivial-match for each motif.
motif_indicesnumpy.ndarray: The indices corresponding to a set of subsequences matches for each motif.Note that the first column always corresponds to the index for theself-match/trivial-match for each motif.

See also

stumpy.match: Find all matches of a queryQ in a time seriesT
stumpy.mmotifs: Discover the top motifs for the multi-dimensional time seriesT
stumpy.stump: Compute the z-normalized matrix profile
stumpy.stumped: Compute the z-normalized matrix profile with adask/ray cluster
stumpy.gpu_stump: Compute the z-normalized matrix profile with one or more GPU devices
stumpy.scrump: Compute an approximate z-normalized matrix profile

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.motifs(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...mp[:,0],...max_distance=2.0)(array([[0.        , 0.11633857]]), array([[0, 4]]))

>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.motifs(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...mp.P_,...max_distance=2.0)(array([[0.        , 0.11633857]]), array([[0, 4]]))

match#

stumpy.match(Q,T,M_T=None,Σ_T=None,max_distance=None,max_matches=None,atol=1e-08,query_idx=None,normalize=True,p=2.0,T_subseq_isfinite=None,T_subseq_isconstant=None,Q_subseq_isconstant=None)[source]#

Find all matches of a queryQ in a time seriesT

The indices of subsequences whose distances toQ are less than or equal tomax_distance, sorted by distance (lowest to highest). Around each occurrence, anexclusion zone is applied before searching for the next.

Parameters:

Qnumpy.ndarray: The query sequence.Q does not have to be a subsequence ofT.
Tnumpy.ndarray: The time series of interest.
M_Tnumpy.ndarray, default None: Sliding mean of time series,T.
Σ_Tnumpy.ndarray, default None: Sliding standard deviation of time series,T.
max_distancefloat or function, default None: Maximum distance betweenQ and a subsequence,S, forS to beconsidered a match. Ifmax_distance is a function,then it must be a function that accepts a single parameter,D, in itsfunction signature, which is the distance profile betweenQ andT (a1D numpy array of sizen-m+1). IfNone, this defaults tonp.nanmax([np.nanmean(D)-2*np.nanstd(D),np.nanmin(D)]) (i.e. atleast the closest match will be returned).
max_matchesint, default None: The maximum amount of similar occurrences to be returned. The resultingoccurrences are sorted by distance, so a value of10 means that theindices of the most similar10 subsequences is returned. IfNone, thenall occurrences are returned.
atolfloat, default 1e-8: The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.
query_idxint, default None: This is the index position along the time series,T, where the querysubsequence,Q, is located.query_idx should only be used when thematrix profile is a self-join and should be set toNone for matrix profilescomputed from AB-joins. Ifquery_idx is set to a specific integer value,then this will help ensure that the self-match will be returned first.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isfinitenumpy.ndarray: A boolean array that indicates whether a subsequence inT contains anp.nan/np.inf value (False). This parameter is ignored whennormalize=True.
T_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence (of length equal tolen(Q)) inT is constant (True). Alternatively, a custom,user-defined function that returns a boolean array that indicates whether asubsequence inT is constant (True). The function must only take twoarguments,a, a 1-D array, andw, the window size, while additionalarguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalse in thisboolean array.
Q_subseq_isconstantnumpy.ndarray or function, default None: A boolean array (of size1) that indicates whetherQ is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inQ is constant(True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by curryingthe user-defined function usingfunctools.partial. Any subsequence withat least onenp.nan/np.inf will automatically have its correspondingvalue set toFalse in this boolean array.

Returns:

outnumpy.ndarray: The first column consists of distances of subsequences ofT whose distancestoQ are less than or equal tomax_distance, sorted by distance (lowestto highest). The second column consists of the corresponding indices inT.

See also

stumpy.motifs: Discover the top motifs for time seriesT
stumpy.mmotifs: Discover the top motifs for the multi-dimensional time seriesT
stumpy.stump: Compute the z-normalized matrix profile
stumpy.stumped: Compute the z-normalized matrix profile with adask/ray cluster
stumpy.gpu_stump: Compute the z-normalized matrix profile with one or more GPU devices
stumpy.scrump: Compute an approximate z-normalized matrix profile

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.match(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.])...)array([[0.0011129739290248121, 1]], dtype=object)

mmotifs#

stumpy.mmotifs(T,P,I,min_neighbors=1,max_distance=None,cutoffs=None,max_matches=10,max_motifs=1,atol=1e-08,k=None,include=None,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Discover the top motifs for the multi-dimensional time seriesT.

Parameters:

Tnumpy.ndarray: The multi-dimensional time series or sequence.
Pnumpy.ndarray: Multi-dimensional Matrix Profile ofT.
Inumpy.ndarray: Multi-dimensional Matrix Profile indices.
min_neighborsint, default 1: The minimum number of similar matches a subsequence needs to have in orderto be considered a motif. This defaults to1, which means that asubsequence must have at least one similar match in order to be considered amotif.
max_distancefloat, default None: Maximal distance that is allowed between a query subsequence(a candidate motif) and all subsequences inT to be considered as amatch. IfNone, this defaults tonp.nanmax([np.nanmean(D)-2*np.nanstd(D),np.nanmin(D)])(i.e. at least the closest match will be returned).
cutoffsnumpy.ndarray or float, default None: The largest matrix profile value (distance) for each dimension of themultidimensional matrix profile that a multidimenisonal candidate motif isallowed to have. Ifcutoffs is a scalar value, then this value will beapplied to every dimension.
max_matchesint, default 10: The maximum number of similar matches (nearest neighbors) to return for eachmotif. The first match is always the self/trivial-match for each motif.
max_motifsint, default 1: The maximum number of motifs to return. To consider returning all possiblevalid motifs, try settingmax_motifs to the length of your input matrixprofile (i.e.,max_motifs=len(P))
atolfloat, default 1e-8: The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.
kint, default None: The number of dimensions (k+1) required for discovering all motifs. Thisvalue is available for doing guided search or, together withinclude, forconstrained search. IfkisNone, then this will be automatically becomputed for each motif using MDL (unconstrained search).
includenumpy.ndarray, default None: A list of (zero based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstantnumpy.ndarray, function, or list, default None: A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:

motif_distances: numpy.ndarray: The distances corresponding to a set of subsequence matches for each motif.
motif_indices: numpy.ndarray: The indices corresponding to a set of subsequences matches for each motif.
motif_subspaces: list: A list consisting of arrays that contain thek-dimensionalsubspace for each motif.
motif_mdls: list: A list consisting of arrays that contain the mdl results forfinding the dimension of each motif.

See also

stumpy.motifs: Find the top motifs for time seriesT
stumpy.match: Find all matches of a queryQ in a time seriesT
stumpy.mstump: Compute the multi-dimensional z-normalized matrix profile
stumpy.mstumped: Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster
stumpy.subspace: Compute thek-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
stumpy.mdl: Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

For more information oninclude and search types, see Section IV D and IV E

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>stumpy.mmotifs(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...mps,...indices)(array([[4.47034836e-08, 4.47034836e-08]]),  array([[0, 2]]), [array([1])], [array([ 80.      , 111.509775])])

snippets#

stumpy.snippets(T,m,k,percentage=1.0,s=None,mpdist_percentage=0.05,mpdist_k=None,normalize=True,p=2.0,mpdist_T_subseq_isconstant=None)[source]#

Identify the topk snippets that best represent the time series,T

Parameters:

Tnumpy.ndarray: The time series or sequence for which to find the snippets.
mint: The snippet window size.
kint: The desired number of snippets.
percentagefloat, default 1.0: With the length of each non-overlapping subsequence,S[i], set tom,this is the percentage ofS[i] (i.e.,percentage*m) to sets (thesub-subsequence length) to. Whenpercentage==1.0, then the full length ofS[i] is used to compute thempdist_vect. Whenpercentage<1.0,then a shorter sub-subsequence length ofs=min(math.ceil(percentage*m),m) from eachS[i] is used to computempdist_vect. Whens is notNone, then thepercentage parameteris ignored.
sint, default None: With the length of each non-overlapping subsequence,S[i], set tom,this is essentially the sub-subsequence length (i.e., a shorter part ofS[i]). Whens==m, then the full length ofS[i] is used to computethempdist_vect. Whens<m, then shorter subsequences with lengths from eachS[i] is used to computempdist_vect. Whens is notNone, then thepercentage parameter is ignored.
mpdist_percentagefloat, default 0.05: The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0.
mpdist_kint: Specify thek-th value in the concatenated matrix profiles to return. Whenmpdist_k is notNone, then thempdist_percentage parameter isignored.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
mpdist_T_subseq_isconstantnumpy.ndarray or function, default None: A boolean array that indicates whether a subsequence (of length equal tolen(s)) inT is constant (True). Alternatively, a custom,user-defined function that returns a boolean array that indicates whether asubsequence inT is constant (True). The function must only take twoarguments,a, a 1-D array, andw, the window size, while additionalarguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalse in thisboolean array.

Returns:

snippetsnumpy.ndarray: The topk snippets.
snippets_indicesnumpy.ndarray: The index locations for each of topk snippets.
snippets_profilesnumpy.ndarray: The MPdist profiles for each of the topk snippets.
snippets_fractionsnumpy.ndarray: The fraction of data that each of the topk snippets represents.
snippets_areasnumpy.ndarray: The area under the curve corresponding to each profile for each of the topk snippets.
snippets_regimes: numpy.ndarray: The index slices corresponding to the set of regimes for each of the topksnippets. The first column is the (zero-based) snippet index while the secondand third columns correspond to the (inclusive) regime start indices and the(exclusive) regime stop indices, respectively.

Notes

DOI: 10.1109/ICBK.2018.00058

See Table I

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.snippets(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3,k=2)(array([[ 584.,  -11.,   23.],        [  79., 1001.,    0.]]), array([0, 3]), array([[0.        , 3.2452632 , 3.00009263, 2.982409  , 0.11633857],        [2.982409  , 2.69407392, 3.01719586, 0.        , 2.92154586]]),array([0.6, 0.4]),array([9.3441034 , 5.81050512]),array([[0, 0, 1],       [0, 2, 3],       [0, 4, 5],       [1, 1, 2],       [1, 3, 4]]))

stimp#

stumpy.stimp(T,min_m=3,max_m=None,step=1,percentage=0.01,pre_scrump=True,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Pan Matrix Profile

This is based on the SKIMP algorithm.

Parameters:

Tnumpy.ndarray: The time series or sequence for which to compute the pan matrix profile.
min_mint, default 3: The starting (or minimum) subsequence window size for which a matrix profilemay be computed.
max_mint, default None: The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenmax_m=None, this is set to the maximum allowablesubsequence window size.
stepint, default 1: The step between subsequence window sizes.
percentagefloat, default 0.01: The percentage of the full matrix profile to compute for each subsequencewindow size. Whenpercentage<1.0, then thescrump algorithm is used.Otherwise, thestump algorithm is used when the exact matrix profile isrequested.
pre_scrumpbool, default True: A flag for whether or not to perform the PreSCRIMP calculation prior tocomputing SCRIMP. If set toTrue, this is equivalent to computingSCRIMP++. This parameter is ignored whenpercentage=1.0.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstant_funcfunction, default None: A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:

PAN_numpy.ndarray: The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.
M_numpy.ndarray: The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile

See also

stumpy.stimped: Compute the Pan Matrix Profile with adask/ray cluster
stumpy.gpu_stimp: Compute the Pan Matrix Profile with with one or more GPU devices

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>pmp=stumpy.stimp(np.array([584.,-11.,23.,79.,1001.,0.,-19.]))>>>pmp.update()>>>pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])

stimped#

stumpy.stimped(client,T,min_m=3,max_m=None,step=1,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Pan Matrix Profile with adask/ray cluster

This is based on the SKIMP algorithm.

Parameters:

clientclient: Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/raydocumentation.
Tnumpy.ndarray: The time series or sequence for which to compute the pan matrix profile.
min_mint, default 3: The starting (or minimum) subsequence window size for which a matrix profilemay be computed.
max_mint, default None: The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenmax_m=None, this is set to the maximum allowablesubsequence window size
stepint, default 1: The step between subsequence window sizes.
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstant_funcfunction, default None: A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function mustonly take two arguments,a, a 1-D array, andw, the window size,while additional arguments may be specified by currying the user-definedfunction usingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value settoFalse in this boolean array.

Attributes:

PAN_numpy.ndarray: The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.
M_numpy.ndarray: The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.

See also

stumpy.stimp: Compute the Pan Matrix Profile
stumpy.gpu_stimp: Compute the Pan Matrix Profile with with one or more GPU devices

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...pmp=stumpy.stimped(...dask_client,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))...pmp.update()...pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>pmp=stumpy.stimped(...ray,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))>>>ray.shutdown()

gpu_stimp#

stumpy.gpu_stimp(T,min_m=3,max_m=None,step=1,device_id=0,normalize=True,p=2.0,T_subseq_isconstant_func=None)#

A class to compute the Pan Matrix Profile with with one or more GPU devices

This is based on the SKIMP algorithm.

Parameters:

Tnumpy.ndarray: The time series or sequence for which to compute the pan matrix profile.
min_mint, default 3: The starting (or minimum) subsequence window size for which a matrix profilemay be computed.
max_mint, default None: The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenm_stop=None, this is set to the maximum allowablesubsequence window size.
stepint, default 1: The step between subsequence window sizes.
device_idint or list, default 0: The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].
normalizebool, default True: When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.
pfloat, default 2.0: The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.
T_subseq_isconstant_funcfunction, default None: A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function mustonly take two arguments,a, a 1-D array, andw, the window size,while additional arguments may be specified by currying the user-definedfunction usingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:

PAN_numpy.ndarray: The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.
M_numpy.ndarray: The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.

See also

stumpy.stimp: Compute the Pan Matrix Profile
stumpy.stimped: Compute the Pan Matrix Profile with adask/ray cluster

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...pmp=stumpy.gpu_stimp(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...device_id=all_gpu_devices)...pmp.update()...pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])

On this page

This Page

Show Source

Movatterモバイル変換

STUMPY API#

Have A Question?#

stump#

stumped#

gpu_stump#

mass#

scrump#

stumpi#

mstump#

mstumped#

subspace#

mdl#

atsc#

allc#

fluss#

floss#

ostinato#

ostinatoed#

gpu_ostinato#

mpdist#

mpdisted#

gpu_mpdist#

motifs#

match#

mmotifs#

snippets#

stimp#

stimped#

gpu_stimp#

This Page

STUMPY API #