STUMPY API#

Have A Question?#

Ask Here

Overview

stumpy.stump

Compute the z-normalized matrix profile

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

stumpy.mass

Compute the distance profile using the MASS algorithm

stumpy.scrump

A class to ompute an approximate z-normalized matrix profile

stumpy.stumpi

A class to compute an incremental z-normalized matrix profile for streaming data

stumpy.mstump

Compute the multi-dimensional z-normalized matrix profile

stumpy.mstumped

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

stumpy.subspace

Compute thek-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

stumpy.mdl

Compute the multi-dimensional number of bits needed to compress one multi-dimensional subsequence with another along each of thek-dimensions using the minimum description length (MDL)

stumpy.atsc

Compute the anchored time series chain (ATSC)

stumpy.allc

Compute the all-chain set (ALLC)

stumpy.fluss

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)

stumpy.floss

A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data

stumpy.ostinato

Find the z-normalized consensus motif of multiple time series

stumpy.ostinatoed

Find the z-normalized consensus motif of multiple time series with adask/ray cluster

stumpy.gpu_ostinato

Find the z-normalized consensus motif of multiple time series with one or more GPU devices

stumpy.mpdist

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series

stumpy.mpdisted

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with adask/ray cluster

stumpy.gpu_mpdist

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices

stumpy.motifs

Discover the top motifs for time seriesT

stumpy.match

Find all matches of a queryQ in a time seriesT

stumpy.mmotifs

Discover the top motifs for the multi-dimensional time seriesT.

stumpy.snippets

Identify the topk snippets that best represent the time series,T

stumpy.stimp

A class to compute the Pan Matrix Profile

stumpy.stimped

A class to compute the Pan Matrix Profile with adask/ray cluster

stumpy.gpu_stimp

A class to compute the Pan Matrix Profile with with one or more GPU devices

stump#

stumpy.stump(T_A,m,T_B=None,ignore_trivial=True,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_stump function which computes the (top-k) matrix profile according toSTOMPopt with Pearson correlations.

Parameters:
T_Anumpy.ndarray

The time series or sequence for which to compute the matrix profile.

mint

Window size.

T_Bnumpy.ndarray, default None

The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.

ignore_trivialbool, default True

Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

kint, default 1

The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1. If you have access to a GPU device, then you may be ableto leveragegpu_stump for better performance and scalability.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:
outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-kmatrix profile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.


For convenience, the matrix profile (distances) and matrix profile indices canalso be accessed via their corresponding named array attributes,.P_ and.I_,respectively. Similarly, the corresponding left matrix profile indicesand right matrix profile indices may also be accessed via the.left_I_ and.right_I_ array attributes. See examples below.

See also

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

stumpy.scrump

Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distancematrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

The above reference outlines the use of the Pearson correlation via Welford’scentered sum-of-products along each diagonal of the distance matrix in place of thesliding window dot product found in the original STOMP method.

DOI: 10.1109/ICDM.2016.0085

See Table II

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Return: For every subsequence,Q, inT_A, you will get a distanceand index for the closest subsequence inT_B. Thus, the arrayreturned will have lengthT_A.shape[0]-m+1. Additionally, theleft and right matrix profiles are also returned.

Note: Unlike in the Table II whereT_A.shape is expected to be equaltoT_B.shape, this implementation is generalized so that the shapes ofT_A andT_B can be different. In the case whereT_A.shape==T_B.shape,then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>mpmparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

stumped#

stumpy.stumped(client,T_A,m,T_B=None,ignore_trivial=True,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile with adask/ray cluster

This is a highly distributed implementation around the Numba JIT-compiledparallelized_stump function which computes the (top-k) matrix profileaccording to STOMPopt with Pearson correlations.

Parameters:
clientclient

Adask/ray client. Setting up a cluster is beyond the scope of thislibrary. Please refer to thedask/ray documentation.

T_Anumpy.ndarray

The time series or sequence for which to compute the matrix profile.

mint

Window size.

T_Bnumpy.ndarray, default None

The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.

ignore_trivialbool, default True

Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

kint, default 1

The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1. If you have access to a GPU device, then you may be ableto leveragegpu_stump for better performance and scalability.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:
outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-k matrixprofile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.


For convenience, the matrix profile (distances) and matrix profile indices canalso be accessed via their corresponding named array attributes,.P_ and.I_,respectively. Similarly, the corresponding left matrix profile indicesand right matrix profile indices may also be accessed via the.left_I_ and.right_I_ array attributes. See examples below.

See also

stumpy.stump

Compute the z-normalized matrix profile cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

stumpy.scrump

Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distancematrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

The above reference outlines the use of the Pearson correlation via Welford’scentered sum-of-products along each diagonal of the distance matrix in place of thesliding window dot product found in the original STOMP method.

DOI: 10.1109/ICDM.2016.0085

See Table II

This is adask/ray implementation of stump that scalesacross multiple servers and is a convenience wrapper around theparallelizedstump._stump function

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Return: For every subsequence,Q, inT_A, you will get a distanceand index for the closest subsequence inT_B. Thus, the arrayreturned will have lengthT_A.shape[0]-m+1. Additionally, theleft and right matrix profiles are also returned.

Note: Unlike in the Table II whereT_A.shape is expected to be equaltoT_B.shape, this implementation is generalized so that the shapes ofT_A andT_B can be different. In the case whereT_A.shape==T_B.shape,then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...stumpy.stumped(...dask_client,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)mparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.stumped(...ray,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>ray.shutdown()

gpu_stump#

stumpy.gpu_stump(T_A,m,T_B=None,ignore_trivial=True,device_id=0,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)#

Compute the z-normalized matrix profile with one or more GPU devices

This is a convenience wrapper around the Numbacuda.jit_gpu_stump functionwhich computes the matrix profile according to GPU-STOMP. The default number ofthreads-per-block is set to512 and may be changed by setting the globalparameterconfig.STUMPY_THREADS_PER_BLOCK to an appropriate number based onyour GPU hardware.

Parameters:
T_Anumpy.ndarray

The time series or sequence for which to compute the matrix profile.

mint

Window size.

T_Bnumpy.ndarray, default None

The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.Default isNone which corresponds to a self-join.

ignore_trivialbool, default True

Set toTrue if this is a self-join. Otherwise, for AB-join, set thistoFalse.

device_idint or list, default 0

The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

kint, default 1

The number of topk smallest distances used to construct the matrixprofile. Note that this will increase the total computational time and memoryusage whenk>1.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Returns:
outnumpy.ndarray

Whenk=1 (default), the first column consists of the matrix profile,the second column consists of the matrix profile indices, the third columnconsists of the left matrix profile indices, and the fourth column consistsof the right matrix profile indices. However, whenk>1, the output arraywill contain exactly2*k+2 columns. The firstk columns (i.e.,out[:,:k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:,k:2*k]) consists of the corresponding top-k matrixprofile indices, and the last two columns (i.e.,out[:,2*k] andout[:,2*k+1] or, equivalently,out[:,-2] andout[:,-1])correspond to the top-1 left matrix profile indices and the top-1 right matrixprofile indices, respectively.


For convenience, the matrix profile (distances) and matrix profile indices canalso be accessed via their corresponding named array attributes,.P_ and.I_,respectively. Similarly, the corresponding left matrix profile indicesand right matrix profile indices may also be accessed via the.left_I_ and.right_I_ array attributes. See examples below.

See also

stumpy.stump

Compute the z-normalized matrix profile

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.scrump

Compute an approximate z-normalized matrix profile

Notes

DOI: 10.1109/ICDM.2016.0085

See Table II, Figure 5, and Figure 6

Timeseries,T_A, will be annotated with the distance location(or index) of all its subsequences in another times series,T_B.

Return: For every subsequence,Q, inT_A, you will get a distanceand index for the closest subsequence inT_B. Thus, the arrayreturned will have lengthT_A.shape[0]-m+1. Additionally, theleft and right matrix profiles are also returned.

Note: Unlike in the Table II whereT_A.shape is expected to be equaltoT_B.shape, this implementation is generalized so that the shapes ofT_A andT_B can be different. In the case whereT_A.shape==T_B.shape,then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone ism/2, the defaultexclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).

For self-joins, setignore_trivial=True in order to avoid thetrivial match.

Note that left and right matrix profiles are only available for self-joins.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...mp=stumpy.gpu_stump(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3,...device_id=all_gpu_devices)>>>mpmparray([[0.11633857113691416, 4, -1, 4],         [2.694073918063438, 3, -1, 3],         [3.0000926340485923, 0, 0, 4],         [2.694073918063438, 1, 1, -1],         [0.11633857113691416, 0, 0, -1]], dtype=object)>>>>>>mp.P_mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857])>>>mp.I_mparray([4, 3, 0, 1, 0])

mass#

stumpy.mass(Q,T,M_T=None,Σ_T=None,normalize=True,p=2.0,T_subseq_isfinite=None,T_subseq_isconstant=None,Q_subseq_isconstant=None,query_idx=None)[source]#

Compute the distance profile using the MASS algorithm

This is a convenience wrapper around the Numba JIT compiled_mass function.

Parameters:
Qnumpy.ndarray

Query array or subsequence.

Tnumpy.ndarray

Time series or sequence.

M_Tnumpy.ndarray, default None

Sliding mean ofT.

Σ_Tnumpy.ndarray, default None

Sliding standard deviation ofT.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. This parameter isignored whennormalize==True.

T_subseq_isfinitenumpy.ndarray, default None

A boolean array that indicates whether a subsequence inT contains anp.nan/np.inf value (False). This parameter is ignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Q_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether the subsequence inQ is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether the subsequence inQ is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

query_idxint, default None

This is the index position along the time series,T, where the querysubsequence,Q, is located.query_idx should be set toNone ifQ is not a subsequence ofT. IfQ is a subsequence ofT,provding this argument is optional. Ifquery_idx is provided, the distancebetweenQ andT[query_idx:query_idx+m] will automatically be set tozero.

Returns:
distance_profilenumpy.ndarray

Distance profile.

See also

stumpy.motifs

Discover the top motifs for time seriesT

stumpy.match

Find all matches of a queryQ in a time seriesT

Notes

DOI: 10.1109/ICDM.2016.0179

See Table II

Note thatQ,T are not directly required to calculateD

Note: Unlike the Matrix Profile I paper, here,M_T,Σ_T can be calculatedonce for all subsequences ofT and passed in so the redundancy is removed

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.mass(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))array([3.18792463e+00, 1.11297393e-03, 3.23874018e+00, 3.34470195e+00])

scrump#

stumpy.scrump(T_A,m,T_B=None,ignore_trivial=True,percentage=0.01,pre_scrump=False,s=None,normalize=True,p=2.0,k=1,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

A class to ompute an approximate z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_stump function which computes the matrix profile according to SCRIMP.

Parameters:
T_Anumpy.ndarray

The time series or sequence for which to compute the matrix profile.

T_Bnumpy.ndarray

The time series or sequence that will be used to annotateT_A. For everysubsequence inT_A, its nearest neighbor inT_B will be recorded.

mint

Window size.

ignore_trivialbool

Set toTrue if this is a self-join. Otherwise, for AB-join, set this toFalse.

percentagefloat

Approximate percentage completed. The value is between0.0 and1.0.

pre_scrumpbool

A flag for whether or not to perform the PreSCRIMP calculation prior tocomputing SCRIMP. If set toTrue, this is equivalent to computingSCRIMP++ and may lead to faster convergence

sint

The size of the PreSCRIMP fixed interval. Ifpre_scrump=True ands=None, thens will automatically be set tos=int(np.ceil(m/config.STUMPY_EXCL_ZONE_DENOM)), which is the size ofthe exclusion zone.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this class gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized class decorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

kint, default 1

The number of topk smallest distances used to construct the matrix profile.Note that this will increase the total computational time and memory usagewhenk>1.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically haveits corresponding value set toFalse in this boolean array.

Attributes:
P_numpy.ndarray

Get the updated (top-k) matrix profile.

I_numpy.ndarray

Get the updated (top-k) matrix profile indices.

left_I_numpy.ndarray

Get the updated left (top-1) matrix profile indices

right_I_numpy.ndarray

Get the updated right (top-1) matrix profile indices

Methods

update()

Update the matrix profile and the matrix profile indices by computing additional new distances (limited bypercentage) that make up the full distance matrix. It updates the (top-k) matrix profile, (top-1) left matrix profile, (top-1) right matrix profile, (top-k) matrix profile indices, (top-1) left matrix profile indices, and (top-1) right matrix profile indices.

See also

stumpy.stump

Compute the z-normalized matrix profile

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00099

See Algorithm 1 and Algorithm 2

Examples

>>>importstumpy>>>importnumpyasnp>>>approx_mp=stumpy.scrump(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>approx_mp.update()>>>approx_mp.P_array([2.982409  , 3.28412702,        inf, 2.982409  , 3.28412702])>>>approx_mp.I_array([ 3,  4, -1,  0,  1])

stumpi#

stumpy.stumpi(T,m,egress=True,normalize=True,p=2.0,k=1,mp=None,T_subseq_isconstant_func=None)[source]#

A class to compute an incremental z-normalized matrix profile for streaming data

This is based on the on-line STOMPI and STAMPI algorithms.

Parameters:
Tnumpy.ndarray

The time series or sequence for which the matrix profile and matrix profileindices will be returned.

mint

Window size.

egressbool, default True

If set toTrue, the oldest data point in the time series is removed andthe time series length remains constant rather than forever increasing

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this class gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized class decorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. This parameter isignored whennormalize==True.

kint, default 1

The number of topk smallest distances used to construct the matrix profile.Note that this will increase the total computational time and memory usagewhenk>1.

mpnumpy.ndarray, default None

A pre-computed matrix profile (and corresponding matrix profile indices).This is a 2D array of shape(len(T)-m+1,2*k+2), where the firstk columns are top-k matrix profile, and the nextk columns are theircorresponding indices. The last two columns correspond to the top-1 left andtop-1 right matrix profile indices. WhenNone (default), this array iscomputed internally usingstumpy.stump.

T_subseq_isconstant_funcfunction, default None

A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:
P_numpy.ndarray

Get the (top-k) matrix profile.

I_numpy.ndarray

Get the (top-k) matrix profile indices.

left_P_numpy.ndarray

Get the (top-1) left matrix profile

left_I_numpy.ndarray

Get the (top-1) left matrix profile indices

T_numpy.ndarray

Get the time series

Methods

update(t)

Append a single new data point,t, to the time series,T, and update the matrix profile.

Notes

DOI: 10.1007/s10618-017-0519-9

See Table V

Note that line 11 is missing an importantsqrt operation!

Examples

>>>importstumpy>>>importnumpyasnp>>>stream=stumpy.stumpi(...np.array([584.,-11.,23.,79.,1001.,0.]),...m=3)>>>stream.update(-19.0)>>>stream.left_P_array([       inf, 3.00009263, 2.69407392, 3.05656417])>>>stream.left_I_array([-1,  0,  1,  2])

mstump#

stumpy.mstump(T,m,include=None,discords=False,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized_mstump function which computes the multi-dimensional matrix profile andmulti-dimensional matrix profile index according to mSTOMP, a variant ofmSTAMP. Note that only self-joins are supported.

Parameters:
Tnumpy.ndarray

The time series or sequence for which to compute the multi-dimensionalmatrix profile. Each row inT represents data from the samedimension while each column inT represents data from a differentdimension.

mint

Window size.

includelist, numpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance matrix which results in amulti-dimensional matrix profile that favors larger matrix profile values(i.e., discords) rather than smaller values (i.e., motifs). Note that indicesininclude are still maintained and respected.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:
Pnumpy.ndarray

The multi-dimensional matrix profile. Each row of the array correspondsto each matrix profile for a given dimension (i.e., the first row isthe 1-D matrix profile and the second row is the 2-D matrix profile).

Inumpy.ndarray

The multi-dimensional matrix profile index where each row of the arraycorresponds to each matrix profile index for a given dimension.

See also

stumpy.mstumped

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

stumpy.subspace

Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

stumpy.mdl

Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

Examples

>>>stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)(array([[0.        , 1.43947142, 0.        , 2.69407392, 0.11633857],        [0.777905  , 2.36179922, 1.50004632, 2.92246722, 0.777905  ]]), array([[2, 4, 0, 1, 0],        [4, 4, 0, 1, 0]]))

mstumped#

stumpy.mstumped(client,T,m,include=None,discords=False,p=2.0,normalize=True,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

This is a highly distributed implementation around the Numba JIT-compiledparallelized_mstump function which computes the multi-dimensional matrixprofile according to STOMP. Note that only self-joins are supported.

Parameters:
clientclient

Adask/ray client. Setting up a cluster is beyond the scope of thislibrary. Please refer to thedask/ray documentation.

Tnumpy.ndarray

The time series or sequence for which to compute the multi-dimensionalmatrix profile. Each row inT represents data from the samedimension while each column inT represents data from a differentdimension.

mint

Window size.

includelist, numpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance matrix which results in amulti-dimensional matrix profile that favors larger matrix profile values(i.e., discords) rather than smaller values (i.e., motifs). Note that indicesininclude are still maintained and respected.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:
Pnumpy.ndarray

The multi-dimensional matrix profile. Each row of the array correspondsto each matrix profile for a given dimension (i.e., the first row isthe 1-D matrix profile and the second row is the 2-D matrix profile).

Inumpy.ndarray

The multi-dimensional matrix profile index where each row of the arraycorresponds to each matrix profile index for a given dimension.

See also

stumpy.mstump

Compute the multi-dimensional z-normalized matrix profile

stumpy.subspace

Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

stumpy.mdl

Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...stumpy.mstumped(...dask_client,...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)(array([[0.        , 1.43947142, 0.        , 2.69407392, 0.11633857],        [0.777905  , 2.36179922, 1.50004632, 2.92246722, 0.777905  ]]), array([[2, 4, 0, 1, 0],        [4, 4, 0, 1, 0]]))

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.mstumped(...ray,...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>ray.shutdown()

subspace#

stumpy.subspace(T,m,subseq_idx,nn_idx,k,include=None,discords=False,discretize_func=None,n_bit=8,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute thek-dimensional matrix profile subspace for a given subsequence indexand its nearest neighbor index

Parameters:
Tnumpy.ndarray

The time series or sequence for which the multi-dimensional matrix profile,multi-dimensional matrix profile indices were computed.

mint

Window size.

subseq_idxint

The subsequence index inT.

nn_idxint

The nearest neighbor index inT.

kint

The subset number of dimensions out ofD=T.shape[0]-dimensions to returnthe subspace for. Note that zero-based indexing is used.

includenumpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance profile to favor discordsrather than motifs. Note that indices ininclude are still maintained andrespected.

discretize_funcfunc, default None

A function for discretizing each input array. When this isNone, anappropriate discretization function (based on thenormalize parameter) willbe applied.

n_bitint, default 8

The number of bits used for discretization. For more information on anappropriate value, see Figure 4 in:

DOI: 10.1109/ICDM.2016.0069

and Figure 2 in:

DOI: 10.1109/ICDM.2011.54

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:
Snumpy.ndarray

An array that contains the (singular)k-th-dimensional subspace for thesubsequence with index equal tosubseq_idx. Note thatk+1 rows will bereturned.

See also

stumpy.mstump

Compute the multi-dimensional z-normalized matrix profile

stumpy.mstumped

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

stumpy.mdl

Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>motifs_idx=np.argsort(mps,axis=1)[:,:2]>>>k=1>>>stumpy.subspace(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3,...subseq_idx=motifs_idx[k][0],...nn_idx=indices[k][motifs_idx[k][0]],...k=k)array([0, 1])

mdl#

stumpy.mdl(T,m,subseq_idx,nn_idx,include=None,discords=False,discretize_func=None,n_bit=8,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Compute the multi-dimensional number of bits needed to compress onemulti-dimensional subsequence with another along each of thek-dimensionsusing the minimum description length (MDL)

Parameters:
Tnumpy.ndarray

The time series or sequence for which the multi-dimensional matrix profile,multi-dimensional matrix profile indices were computed.

mint

Window size.

subseq_idxnumpy.ndarray

The multi-dimensional subsequence indices inT

nn_idxnumpy.ndarray

The multi-dimensional nearest neighbor index inT

includenumpy.ndarray, default None

A list of (zero-based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.For more information, see Section IV D in:

DOI: 10.1109/ICDM.2017.66

discordsbool, default False

When set toTrue, this reverses the distance profile to favor discordsrather than motifs. Note that indices ininclude are still maintainedand respected.

discretize_funcfunc, default None

A function for discretizing each input array. When this isNone, anappropriate discretization function (based on thenormalization parameter)will be applied.

n_bitint, default 8

The number of bits used for discretization and for computing the bit size. Formore information on an appropriate value, see Figure 4 in:

DOI: 10.1109/ICDM.2016.0069

and Figure 2 in:

DOI: 10.1109/ICDM.2011.54

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:
bit_sizesnumpy.ndarray

The total number of bits computed from MDL for representing each pair ofmultidimensional subsequences.

Slist

A list of numpy.ndarrays that contain thek-th-dimensional subspaces.

See also

stumpy.mstump

Compute the multi-dimensional z-normalized matrix profile

stumpy.mstumped

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

stumpy.subspace

Compute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>motifs_idx=np.argsort(mps,axis=1)[:,0]>>>stumpy.mdl(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3,...subseq_idx=motifs_idx,...nn_idx=indices[np.arange(motifs_idx.shape[0]),motifs_idx])(array([ 80.      , 111.509775]), [array([1]), array([0, 1])])

atsc#

stumpy.atsc(IL,IR,j)[source]#

Compute the anchored time series chain (ATSC)

Note that since the matrix profile indices,IL andIR, are pre-computed,this function is agnostic to subsequence normalization.

Parameters:
ILnumpy.ndarray

Left matrix profile indices.

IRnumpy.ndarray

Right matrix profile indices.

jint

The index value for which to compute the ATSC.

Returns:
outnumpy.ndarray

Anchored time series chain for index,j

See also

stumpy.allc

Compute the all-chain set (ALLC)

Notes

DOI: 10.1109/ICDM.2017.79

See Table I

This is the implementation for the anchored time series chains (ATSC).

Unlike the original paper, we’ve replaced the while-loop with a more stablefor-loop.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.atsc(mp[:,2],mp[:,3],1)array([1, 3])
>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.atsc(mp.left_I_,mp.right_I_,1)array([1, 3])

allc#

stumpy.allc(IL,IR)[source]#

Compute the all-chain set (ALLC)

Note that since the matrix profile indices,IL andIR, are pre-computed,this function is agnostic to subsequence normalization.

Parameters:
ILnumpy.ndarray

Left matrix profile indices.

IRnumpy.ndarray

Right matrix profile indices.

Returns:
Slist(numpy.ndarray)

All-chain set.

Cnumpy.ndarray

Anchored time series chain for the longest chain (also known as the unanchoredchain). Note that when there are multiple different chains with length equal tolen(C), then only one chain from this set is returned. You may iterate overthe all-chain set,S, to find all other possible chains with lengthlen(C).

See also

stumpy.atsc

Compute the anchored time series chain (ATSC)

Notes

DOI: 10.1109/ICDM.2017.79

See Table II

Unlike the original paper, we’ve replaced the while-loop with a more stablefor-loop.

This is the implementation for the all-chain set (ALLC) and the unanchoredchain is simply the longest one among the all-chain set. Both theall-chain set and unanchored chain are returned.

The all-chain set,S, is returned as a list of unique numpy arrays.

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.allc(mp[:,2],mp[:,3])([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))
>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.allc(mp.left_I_,mp.right_I_)([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))

fluss#

stumpy.fluss(I,L,n_regimes,excl_factor=5,custom_iac=None)[source]#

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS)for static data (i.e., batch processing)

Essentially, this is a wrapper to compute the corrected arc curve andregime locations. Note that since the matrix profile indices,I, arepre-computed, this function is agnostic to subsequence normalization.

Parameters:
Inumpy.ndarray

The matrix profile indices for the time series of interest.

Lint

The subsequence length that is set roughly to be one period length.This is likely to be the same value as the window size,m, usedto compute the matrix profile and matrix profile index but it canbe different since this is only used to manage edge effectsand has no bearing on any of the IAC or CAC core calculations.

n_regimesint

The number of regimes to search for. This is one more than thenumber of regime changes as denoted in the original paper.

excl_factorint, default 5

The multiplying factor for the regime exclusion zone.

custom_iacnumpy.ndarray, default None

A custom idealized arc curve (IAC) that will used for correcting thearc curve.

Returns:
cacnumpy.ndarray

A corrected arc curve (CAC).

regime_locsnumpy.ndarray

The locations of the regimes.

See also

stumpy.floss

Compute the Fast Low-Cost Online Semantic Segmentation (FLOSS) for streaming data

Notes

DOI: 10.1109/ICDM.2017.21

See Section A

This is the implementation for Fast Low-cost Unipotent SemanticSegmentation (FLUSS).

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.fluss(mp[:,0],3,2)(array([1., 1., 1., 1., 1.]), array([0]))
>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.fluss(mp.P_,3,2)(array([1., 1., 1., 1., 1.]), array([0]))

floss#

stumpy.floss(mp,T,m,L,excl_factor=5,n_iter=1000,n_samples=1000,custom_iac=None,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) forstreaming data

Parameters:
mpnumpy.ndarray

The first column consists of the matrix profile, the second columnconsists of the matrix profile indices, the third column consists ofthe left matrix profile indices, and the fourth column consists ofthe right matrix profile indices.

Tnumpy.ndarray

A 1-D time series data used to generate the matrix profile and matrix profileindices found inmp. Note that the the right matrix profile index is usedand the right matrix profile is intelligently recomputed on the fly fromTinstead of using the bidirectional matrix profile.

mint

The window size for computing sliding window mass. This is identicalto the window size used in the matrix profile calculation. For managingedge effects, see theL parameter.

Lint

The subsequence length that is set roughly to be one period length.This is likely to be the same value as the window size,m, usedto compute the matrix profile and matrix profile index but it canbe different since this is only used to manage edge effectsand has no bearing on any of the IAC or CAC core calculations.

excl_factorint, default 5

The multiplying factor for the regime exclusion zone. Note that thisis unrelated to theexcl_zone used in to compute the matrix profile.

n_iterint, default 1000

Number of iterations to average over when determining the parameters forthe IAC beta distribution.

n_samplesint, default 1000

Number of distribution samples to draw during each iteration whencomputing the IAC.

custom_iacnumpy.ndarray, default None

A custom idealized arc curve (IAC) that will used for correcting thearc curve.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstant_funcfunction, default None

A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:
cac_1d_numpy.ndarray

Get the updated 1-dimensional corrected arc curve (CAC_1D)

P_numpy.ndarray

Get the updated matrix profile

I_numpy.ndarray

Get the updated (right) matrix profile indices

T_numpy.ndarray

Get the updated time series,T

Methods

update(t)

Ingress a new data point,t, onto the time series,T, followed by egressing the oldest single data point fromT. Then, update the 1-dimensional corrected arc curve (CAC_1D) and the matrix profile.

See also

stumpy.fluss

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)

Notes

DOI: 10.1109/ICDM.2017.21

See Section C

This is the implementation for Fast Low-cost Online SemanticSegmentation (FLOSS).

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.]),m=3)>>>stream=stumpy.floss(...mp,...np.array([584.,-11.,23.,79.,1001.,0.]),...m=3,...L=3)>>>stream.update(19.)>>>stream.cac_1d_array([1., 1., 1., 1.])

ostinato#

stumpy.ostinato(Ts,m,normalize=True,p=2.0,Ts_subseq_isconstant=None)[source]#

Find the z-normalized consensus motif of multiple time series

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:
Tslist

A list of time series for which to find the most central consensus motif.

mint

Window size.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

Ts_subseq_isconstantlist, default None

A list of rolling window isconstant for each time series inTs.

Returns:
central_radiusfloat

Radius of the most central consensus motif.

central_Ts_idxint

The time series index inTs that contains the most central consensus motif.

central_subseq_idxint

The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinatoed

Find the z-normalized consensus motif of multiple time series with adask/ray cluster

stumpy.gpu_ostinato

Find the z-normalized consensus motif of multiple time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

The ostinato algorithm proposed in the paper finds the best radiusinTs. Intuitively, the radius is the minimum distance of asubsequence to encompass at least one nearest neighbor subsequencefrom all other time series. The best radius inTs is the minimumradius amongst all radii. Some data sets might contain multiplesubsequences which have the same optimal radius.The greedy Ostinato algorithm only finds one of them, which mightnot be the most central motif. The most central motif amongst thesubsequences with the best radius is the one with the smallest meandistance to nearest neighbors in all other time series. To find thiscentral motif it is necessary to search the subsequences with thebest radius viastumpy.ostinato._get_central_motif.

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.ostinato(...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)(1.2370237678153826, 0, 4)

ostinatoed#

stumpy.ostinatoed(client,Ts,m,normalize=True,p=2.0,Ts_subseq_isconstant=None)[source]#

Find the z-normalized consensus motif of multiple time series with adask/ray cluster

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:
clientclient

Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/ray Distributeddocumentation.

Tslist

A list of time series for which to find the most central consensus motif.

mint

Window size.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

Ts_subseq_isconstantlist, default None

A list of rolling window isconstant for each time series inTs.

Returns:
central_radiusfloat

Radius of the most central consensus motif.

central_Ts_idxint

The time series index inTs that contains the most central consensus motif.

central_subseq_idxint

The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinato

Find the z-normalized consensus motif of multiple time series

stumpy.gpu_ostinato

Find the z-normalized consensus motif of multiple time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

The ostinato algorithm proposed in the paper finds the best radiusinTs. Intuitively, the radius is the minimum distance of asubsequence to encompass at least one nearest neighbor subsequencefrom all other time series. The best radius inTs is the minimumradius amongst all radii. Some data sets might contain multiplesubsequences which have the same optimal radius.The greedy Ostinato algorithm only finds one of them, which mightnot be the most central motif. The most central motif amongst thesubsequences with the best radius is the one with the smallest meandistance to nearest neighbors in all other time series. To find thiscentral motif it is necessary to search the subsequences with thebest radius viastumpy.ostinato._get_central_motif.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":>>>withClient()asdask_client:>>>stumpy.ostinatoed(...dask_client,...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)(1.2370237678153826, 0, 4)

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.ostinatoed(...ray,...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3)>>>ray.shutdown()

gpu_ostinato#

stumpy.gpu_ostinato(Ts,m,device_id=0,normalize=True,p=2.0,Ts_subseq_isconstant=None)#

Find the z-normalized consensus motif of multiple time series with one or more GPUdevices

This is a wrapper around the vanilla version of the ostinato algorithmwhich finds the best radius and a helper function that finds the mostcentral conserved motif.

Parameters:
Tslist

A list of time series for which to find the most central consensus motif.

mint

Window size.

device_idint or list, default 0

The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

Ts_subseq_isconstantlist, default None

A list of rolling window isconstant for each time series inTs.

Returns:
central_radiusfloat

Radius of the most central consensus motif.

central_Ts_idxint

The time series index inTs that contains the most central consensus motif.

central_subseq_idxint

The subsequence index within time seriesTs[central_motif_Ts_idx] thatcontains the most central consensus motif.

See also

stumpy.ostinato

Find the z-normalized consensus motif of multiple time series

stumpy.ostinatoed

Find the z-normalized consensus motif of multiple time series with adask/ray cluster

Notes

DOI: 10.1109/ICDM.2019.00140

See Table 2

The ostinato algorithm proposed in the paper finds the best radiusinTs. Intuitively, the radius is the minimum distance of asubsequence to encompass at least one nearest neighbor subsequencefrom all other time series. The best radius inTs is the minimumradius amongst all radii. Some data sets might contain multiplesubsequences which have the same optimal radius.The greedy Ostinato algorithm only finds one of them, which mightnot be the most central motif. The most central motif amongst thesubsequences with the best radius is the one with the smallest meandistance to nearest neighbors in all other time series. To find thiscentral motif it is necessary to search the subsequences with thebest radius viastumpy.ostinato._get_central_motif.

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...stumpy.gpu_ostinato(...[np.array([584.,-11.,23.,79.,1001.,0.,19.]),...np.array([600.,-10.,23.,17.]),...np.array([1.,9.,6.,0.])],...m=3,...device_id=all_gpu_devices)(1.2370237678153826, 0, 4)

mpdist#

stumpy.mpdist(T_A,T_B,m,percentage=0.05,k=None,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series

The MPdist distance measure considers two time series to be similar if they sharemany subsequences, regardless of the order of matching subsequences. MPdistconcatenates the output of an AB-join and a BA-join and returns thek-thsmallest value as the reported distance. Note that MPdist is a measure and not ametric. Therefore, it does not obey the triangular inequality but the method ishighly scalable.

Parameters:
T_Anumpy.ndarray

The first time series or sequence for which to compute the matrix profile.

T_Bnumpy.ndarray

The second time series or sequence for which to compute the matrix profile.

mint

Window size.

percentagefloat, default 0.05

The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0.

kint

Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:
MPdistfloat

The matrix profile distance.

See also

mpdisted

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with adask/ray cluster

gpu_mpdist

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.mpdist(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)0.00019935236191097894

mpdisted#

stumpy.mpdisted(client,T_A,T_B,m,percentage=0.05,k=None,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)[source]#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series with adask/ray cluster

The MPdist distance measure considers two time series to be similar if they sharemany subsequences, regardless of the order of matching subsequences. MPdistconcatenates the output of an AB-join and a BA-join and returns thek-thsmallest value as the reported distance. Note that MPdist is a measure and not ametric. Therefore, it does not obey the triangular inequality but the method ishighly scalable.

Parameters:
clientclient

Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/ray documentation.

T_Anumpy.ndarray

The first time series or sequence for which to compute the matrix profile.

T_Bnumpy.ndarray

The second time series or sequence for which to compute the matrix profile.

mint

Window size.

percentagefloat, default 0.05

The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0. This parameter is ignored whenk is notNone.

kint

Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:
MPdistfloat

The matrix profile distance.

See also

mpdist

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series

gpu_mpdist

Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":>>>withClient()asdask_client:>>>stumpy.mpdisted(...dask_client,...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)0.00019935236191097894

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>stumpy.mpdisted(...ray,...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3)>>>ray.shutdown()

gpu_mpdist#

stumpy.gpu_mpdist(T_A,T_B,m,percentage=0.05,k=None,device_id=0,normalize=True,p=2.0,T_A_subseq_isconstant=None,T_B_subseq_isconstant=None)#

Compute the z-normalized matrix profile distance (MPdist) measure between any twotime series with one or more GPU devices

The MPdist distance measure considers two time series to be similar if they sharemany subsequences, regardless of the order of matching subsequences. MPdistconcatenates and sorts the output of an AB-join and a BA-join and returns the valueof thek-th smallest number as the reported distance. Note that MPdist is ameasure and not a metric. Therefore, it does not obey the triangular inequality butthe method is highly scalable.

Parameters:
T_Anumpy.ndarray

The first time series or sequence for which to compute the matrix profile.

T_Bnumpy.ndarray

The second time series or sequence for which to compute the matrix profile.

mint

Window size.

percentagefloat, default 0.05

The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0. This parameter is ignored whenk is notNone.

kint, default None

Specify thek-th value in the concatenated matrix profiles to return. Whenk is notNone, then thepercentage parameter is ignored.

device_idint or list, default 0

The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_A_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_A is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_A is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

T_B_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT_B is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT_B is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:
MPdistfloat

The matrix profile distance.

Notes

DOI: 10.1109/ICDM.2018.00119

See Section III

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...stumpy.gpu_mpdist(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...m=3,...device_id=all_gpu_devices)0.00019935236191097894

motifs#

stumpy.motifs(T,P,min_neighbors=1,max_distance=None,cutoff=None,max_matches=10,max_motifs=1,atol=1e-08,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Discover the top motifs for time seriesT

A subsequence,Q, becomes a candidate motif if there are at leastmin_neighbor number of other subsequence matches inT (outside theexclusion zone) with a distance less or equal tomax_distance.

Note that, in the best case scenario, the returned arrays would have shape(max_motifs,max_matches) and contain all finite values. However, in reality,many conditions (see below) need to be satisfied in order for this to be true. Anytruncation in the number of rows (i.e., motifs) may be the result of insufficientcandidate motifs with matches greater than or equal tomin_neighbors or thatthe matrix profile value for the candidate motif was larger thancutoff.Similarly, any truncation in the number of columns (i.e., matches) may be the resultof insufficient matches being found with distances (to their corresponding candidatemotif) that are equal to or less thanmax_distance. Only motifs and matches thatsatisfy all of these constraints will be returned.

If you must return a shape of(max_motifs,max_matches), then you may considerspecifying a smallermin_neighbors, a largermax_distance, and/or a largercutoff. For example, while it is ill advised, settingmin_neighbors=1,max_distance=np.inf, andcutoff=np.inf will ensure that the shape of theoutput arrays will be(max_motifs,max_matches). However, given the lack ofconstraints, the quality of each motif and the quality of each match may bedrastically different. Setting appropriate conditions will help ensure appropriatelyconstrained results that may be easier to interpret.

Parameters:
Tnumpy.ndarray

The time series or sequence.

Pnumpy.ndarray

The (1-dimensional) matrix profile ofT. In the case where the matrixprofile was computed withk>1 (i.e., top-k nearest neighbors), youmust summarize the top-k nearest-neighbor distances for each subsequenceinto a single value (e.g.,np.mean,np.min, etc) and then use thatderived value as yourP.

min_neighborsint, default 1

The minimum number of similar matches a subsequence needs to have in orderto be considered a motif. This defaults to1, which means that a subsequencemust have at least one similar match in order to be considered a motif.

max_distancefloat or function, default None

For a candidate motif,Q, and a non-trivial subsequence,S,max_distance is the maximum distance allowed betweenQ andS sothatS is considered a match ofQ. Ifmax_distance is a function,then it must be a function that accepts a single parameter,D, in itsfunction signature, which is the distance profile betweenQ andT. IfNone, this defaults tonp.nanmax([np.nanmean(D)-2.0*np.nanstd(D),np.nanmin(D)]).

cutofffloat, default None

The largest matrix profile value (distance) that a candidate motif is allowedto have. IfNone, this defaults tonp.nanmax([np.nanmean(P)-2.0*np.nanstd(P),np.nanmin(P)]).

max_matchesint, default 10

The maximum amount of similar matches of a motif representative to be returned.The resulting matches are sorted by distance, so a value of10 means thatthe indices of the most similar10 subsequences is returned.IfNone, all matches withinmax_distance of the motif representativewill be returned. Note that the first match is always theself-match/trivial-match for each motif.

max_motifsint, default 1

The maximum number of motifs to return. To consider returning all possiblevalid motifs, try settingmax_motifs to the length of your input matrixprofile (i.e.,max_motifs=len(P))

atolfloat, default 1e-8

The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence inT is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inT is constant(True). The function must only take two arguments,a, a 1-D array,andw, the window size, while additional arguments may be specifiedby currying the user-defined function usingfunctools.partial. Anysubsequence with at least onenp.nan/np.inf will automatically have itscorresponding value set toFalse in this boolean array.

Returns:
motif_distancesnumpy.ndarray

The distances corresponding to a set of subsequence matches for each motif.Note that the first column always corresponds to the distance for theself-match/trivial-match for each motif.

motif_indicesnumpy.ndarray

The indices corresponding to a set of subsequences matches for each motif.Note that the first column always corresponds to the index for theself-match/trivial-match for each motif.

See also

stumpy.match

Find all matches of a queryQ in a time seriesT

stumpy.mmotifs

Discover the top motifs for the multi-dimensional time seriesT

stumpy.stump

Compute the z-normalized matrix profile

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

stumpy.scrump

Compute an approximate z-normalized matrix profile

Examples

>>>importstumpy>>>importnumpyasnp>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.motifs(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...mp[:,0],...max_distance=2.0)(array([[0.        , 0.11633857]]), array([[0, 4]]))
>>># Alternative example using named attributes>>>>>>mp=stumpy.stump(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3)>>>stumpy.motifs(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...mp.P_,...max_distance=2.0)(array([[0.        , 0.11633857]]), array([[0, 4]]))

match#

stumpy.match(Q,T,M_T=None,Σ_T=None,max_distance=None,max_matches=None,atol=1e-08,query_idx=None,normalize=True,p=2.0,T_subseq_isfinite=None,T_subseq_isconstant=None,Q_subseq_isconstant=None)[source]#

Find all matches of a queryQ in a time seriesT

The indices of subsequences whose distances toQ are less than or equal tomax_distance, sorted by distance (lowest to highest). Around each occurrence, anexclusion zone is applied before searching for the next.

Parameters:
Qnumpy.ndarray

The query sequence.Q does not have to be a subsequence ofT.

Tnumpy.ndarray

The time series of interest.

M_Tnumpy.ndarray, default None

Sliding mean of time series,T.

Σ_Tnumpy.ndarray, default None

Sliding standard deviation of time series,T.

max_distancefloat or function, default None

Maximum distance betweenQ and a subsequence,S, forS to beconsidered a match. Ifmax_distance is a function,then it must be a function that accepts a single parameter,D, in itsfunction signature, which is the distance profile betweenQ andT (a1D numpy array of sizen-m+1). IfNone, this defaults tonp.nanmax([np.nanmean(D)-2*np.nanstd(D),np.nanmin(D)]) (i.e. atleast the closest match will be returned).

max_matchesint, default None

The maximum amount of similar occurrences to be returned. The resultingoccurrences are sorted by distance, so a value of10 means that theindices of the most similar10 subsequences is returned. IfNone, thenall occurrences are returned.

atolfloat, default 1e-8

The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.

query_idxint, default None

This is the index position along the time series,T, where the querysubsequence,Q, is located.query_idx should only be used when thematrix profile is a self-join and should be set toNone for matrix profilescomputed from AB-joins. Ifquery_idx is set to a specific integer value,then this will help ensure that the self-match will be returned first.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isfinitenumpy.ndarray

A boolean array that indicates whether a subsequence inT contains anp.nan/np.inf value (False). This parameter is ignored whennormalize=True.

T_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence (of length equal tolen(Q)) inT is constant (True). Alternatively, a custom,user-defined function that returns a boolean array that indicates whether asubsequence inT is constant (True). The function must only take twoarguments,a, a 1-D array, andw, the window size, while additionalarguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalse in thisboolean array.

Q_subseq_isconstantnumpy.ndarray or function, default None

A boolean array (of size1) that indicates whetherQ is constant(True). Alternatively, a custom, user-defined function that returns aboolean array that indicates whether a subsequence inQ is constant(True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by curryingthe user-defined function usingfunctools.partial. Any subsequence withat least onenp.nan/np.inf will automatically have its correspondingvalue set toFalse in this boolean array.

Returns:
outnumpy.ndarray

The first column consists of distances of subsequences ofT whose distancestoQ are less than or equal tomax_distance, sorted by distance (lowestto highest). The second column consists of the corresponding indices inT.

See also

stumpy.motifs

Discover the top motifs for time seriesT

stumpy.mmotifs

Discover the top motifs for the multi-dimensional time seriesT

stumpy.stump

Compute the z-normalized matrix profile

stumpy.stumped

Compute the z-normalized matrix profile with adask/ray cluster

stumpy.gpu_stump

Compute the z-normalized matrix profile with one or more GPU devices

stumpy.scrump

Compute an approximate z-normalized matrix profile

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.match(...np.array([-11.1,23.4,79.5,1001.0]),...np.array([584.,-11.,23.,79.,1001.,0.,-19.])...)array([[0.0011129739290248121, 1]], dtype=object)

mmotifs#

stumpy.mmotifs(T,P,I,min_neighbors=1,max_distance=None,cutoffs=None,max_matches=10,max_motifs=1,atol=1e-08,k=None,include=None,normalize=True,p=2.0,T_subseq_isconstant=None)[source]#

Discover the top motifs for the multi-dimensional time seriesT.

Parameters:
Tnumpy.ndarray

The multi-dimensional time series or sequence.

Pnumpy.ndarray

Multi-dimensional Matrix Profile ofT.

Inumpy.ndarray

Multi-dimensional Matrix Profile indices.

min_neighborsint, default 1

The minimum number of similar matches a subsequence needs to have in orderto be considered a motif. This defaults to1, which means that asubsequence must have at least one similar match in order to be considered amotif.

max_distancefloat, default None

Maximal distance that is allowed between a query subsequence(a candidate motif) and all subsequences inT to be considered as amatch. IfNone, this defaults tonp.nanmax([np.nanmean(D)-2*np.nanstd(D),np.nanmin(D)])(i.e. at least the closest match will be returned).

cutoffsnumpy.ndarray or float, default None

The largest matrix profile value (distance) for each dimension of themultidimensional matrix profile that a multidimenisonal candidate motif isallowed to have. Ifcutoffs is a scalar value, then this value will beapplied to every dimension.

max_matchesint, default 10

The maximum number of similar matches (nearest neighbors) to return for eachmotif. The first match is always the self/trivial-match for each motif.

max_motifsint, default 1

The maximum number of motifs to return. To consider returning all possiblevalid motifs, try settingmax_motifs to the length of your input matrixprofile (i.e.,max_motifs=len(P))

atolfloat, default 1e-8

The absolute tolerance parameter. This value will be added tomax_distancewhen comparing distances between subsequences.

kint, default None

The number of dimensions (k+1) required for discovering all motifs. Thisvalue is available for doing guided search or, together withinclude, forconstrained search. IfkisNone, then this will be automatically becomputed for each motif using MDL (unconstrained search).

includenumpy.ndarray, default None

A list of (zero based) indices corresponding to the dimensions inT thatmust be included in the constrained multidimensional motif search.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstantnumpy.ndarray, function, or list, default None

A parameter that is used to show whether a subsequence of a time series inTis constant (True) or not.T_subseq_isconstant can be a 2D booleannumpy.ndarray or a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to thetotal number of time series) may also be used. In this case,T_subseq_isconstant[i] corresponds to thei-th time seriesT[i] andeach element in the list can either be a 1D booleannumpy.ndarray, afunction, orNone.

Returns:
motif_distances: numpy.ndarray

The distances corresponding to a set of subsequence matches for each motif.

motif_indices: numpy.ndarray

The indices corresponding to a set of subsequences matches for each motif.

motif_subspaces: list

A list consisting of arrays that contain thek-dimensionalsubspace for each motif.

motif_mdls: list

A list consisting of arrays that contain the mdl results forfinding the dimension of each motif.

See also

stumpy.motifs

Find the top motifs for time seriesT

stumpy.match

Find all matches of a queryQ in a time seriesT

stumpy.mstump

Compute the multi-dimensional z-normalized matrix profile

stumpy.mstumped

Compute the multi-dimensional z-normalized matrix profile with adask/ray cluster

stumpy.subspace

Compute thek-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index

stumpy.mdl

Compute the number of bits needed to compress one array with another using the minimum description length (MDL)

Notes

DOI: 10.1109/ICDM.2017.66

For more information oninclude and search types, see Section IV D and IV E

Examples

>>>importstumpy>>>importnumpyasnp>>>mps,indices=stumpy.mstump(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...m=3)>>>stumpy.mmotifs(...np.array([[584.,-11.,23.,79.,1001.,0.,-19.],...[1.,2.,4.,8.,16.,0.,32.]]),...mps,...indices)(array([[4.47034836e-08, 4.47034836e-08]]),  array([[0, 2]]), [array([1])], [array([ 80.      , 111.509775])])

snippets#

stumpy.snippets(T,m,k,percentage=1.0,s=None,mpdist_percentage=0.05,mpdist_k=None,normalize=True,p=2.0,mpdist_T_subseq_isconstant=None)[source]#

Identify the topk snippets that best represent the time series,T

Parameters:
Tnumpy.ndarray

The time series or sequence for which to find the snippets.

mint

The snippet window size.

kint

The desired number of snippets.

percentagefloat, default 1.0

With the length of each non-overlapping subsequence,S[i], set tom,this is the percentage ofS[i] (i.e.,percentage*m) to sets (thesub-subsequence length) to. Whenpercentage==1.0, then the full length ofS[i] is used to compute thempdist_vect. Whenpercentage<1.0,then a shorter sub-subsequence length ofs=min(math.ceil(percentage*m),m) from eachS[i] is used to computempdist_vect. Whens is notNone, then thepercentage parameteris ignored.

sint, default None

With the length of each non-overlapping subsequence,S[i], set tom,this is essentially the sub-subsequence length (i.e., a shorter part ofS[i]). Whens==m, then the full length ofS[i] is used to computethempdist_vect. Whens<m, then shorter subsequences with lengths from eachS[i] is used to computempdist_vect. Whens is notNone, then thepercentage parameter is ignored.

mpdist_percentagefloat, default 0.05

The percentage of distances that will be used to reportmpdist. The valueis between0.0 and1.0.

mpdist_kint

Specify thek-th value in the concatenated matrix profiles to return. Whenmpdist_k is notNone, then thempdist_percentage parameter isignored.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

mpdist_T_subseq_isconstantnumpy.ndarray or function, default None

A boolean array that indicates whether a subsequence (of length equal tolen(s)) inT is constant (True). Alternatively, a custom,user-defined function that returns a boolean array that indicates whether asubsequence inT is constant (True). The function must only take twoarguments,a, a 1-D array, andw, the window size, while additionalarguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalse in thisboolean array.

Returns:
snippetsnumpy.ndarray

The topk snippets.

snippets_indicesnumpy.ndarray

The index locations for each of topk snippets.

snippets_profilesnumpy.ndarray

The MPdist profiles for each of the topk snippets.

snippets_fractionsnumpy.ndarray

The fraction of data that each of the topk snippets represents.

snippets_areasnumpy.ndarray

The area under the curve corresponding to each profile for each of the topk snippets.

snippets_regimes: numpy.ndarray

The index slices corresponding to the set of regimes for each of the topksnippets. The first column is the (zero-based) snippet index while the secondand third columns correspond to the (inclusive) regime start indices and the(exclusive) regime stop indices, respectively.

Notes

DOI: 10.1109/ICBK.2018.00058

See Table I

Examples

>>>importstumpy>>>importnumpyasnp>>>stumpy.snippets(np.array([584.,-11.,23.,79.,1001.,0.,-19.]),m=3,k=2)(array([[ 584.,  -11.,   23.],        [  79., 1001.,    0.]]), array([0, 3]), array([[0.        , 3.2452632 , 3.00009263, 2.982409  , 0.11633857],        [2.982409  , 2.69407392, 3.01719586, 0.        , 2.92154586]]),array([0.6, 0.4]),array([9.3441034 , 5.81050512]),array([[0, 0, 1],       [0, 2, 3],       [0, 4, 5],       [1, 1, 2],       [1, 3, 4]]))

stimp#

stumpy.stimp(T,min_m=3,max_m=None,step=1,percentage=0.01,pre_scrump=True,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Pan Matrix Profile

This is based on the SKIMP algorithm.

Parameters:
Tnumpy.ndarray

The time series or sequence for which to compute the pan matrix profile.

min_mint, default 3

The starting (or minimum) subsequence window size for which a matrix profilemay be computed.

max_mint, default None

The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenmax_m=None, this is set to the maximum allowablesubsequence window size.

stepint, default 1

The step between subsequence window sizes.

percentagefloat, default 0.01

The percentage of the full matrix profile to compute for each subsequencewindow size. Whenpercentage<1.0, then thescrump algorithm is used.Otherwise, thestump algorithm is used when the exact matrix profile isrequested.

pre_scrumpbool, default True

A flag for whether or not to perform the PreSCRIMP calculation prior tocomputing SCRIMP. If set toTrue, this is equivalent to computingSCRIMP++. This parameter is ignored whenpercentage=1.0.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstant_funcfunction, default None

A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function must onlytake two arguments,a, a 1-D array, andw, the window size, whileadditional arguments may be specified by currying the user-defined functionusingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:
PAN_numpy.ndarray

The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.

M_numpy.ndarray

The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile

See also

stumpy.stimped

Compute the Pan Matrix Profile with adask/ray cluster

stumpy.gpu_stimp

Compute the Pan Matrix Profile with with one or more GPU devices

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>pmp=stumpy.stimp(np.array([584.,-11.,23.,79.,1001.,0.,-19.]))>>>pmp.update()>>>pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])

stimped#

stumpy.stimped(client,T,min_m=3,max_m=None,step=1,normalize=True,p=2.0,T_subseq_isconstant_func=None)[source]#

A class to compute the Pan Matrix Profile with adask/ray cluster

This is based on the SKIMP algorithm.

Parameters:
clientclient

Adask/ray client. Setting up adask/ray cluster is beyondthe scope of this library. Please refer to thedask/raydocumentation.

Tnumpy.ndarray

The time series or sequence for which to compute the pan matrix profile.

min_mint, default 3

The starting (or minimum) subsequence window size for which a matrix profilemay be computed.

max_mint, default None

The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenmax_m=None, this is set to the maximum allowablesubsequence window size

stepint, default 1

The step between subsequence window sizes.

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstant_funcfunction, default None

A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function mustonly take two arguments,a, a 1-D array, andw, the window size,while additional arguments may be specified by currying the user-definedfunction usingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value settoFalse in this boolean array.

Attributes:
PAN_numpy.ndarray

The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.

M_numpy.ndarray

The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.

See also

stumpy.stimp

Compute the Pan Matrix Profile

stumpy.gpu_stimp

Compute the Pan Matrix Profile with with one or more GPU devices

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromdask.distributedimportClient>>>if__name__=="__main__":...withClient()asdask_client:...pmp=stumpy.stimped(...dask_client,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))...pmp.update()...pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])

Alternatively, you can also useray

>>>importray>>>if__name__=="__main__":>>>ray.init()>>>pmp=stumpy.stimped(...ray,...np.array([584.,-11.,23.,79.,1001.,0.,-19.]))>>>ray.shutdown()

gpu_stimp#

stumpy.gpu_stimp(T,min_m=3,max_m=None,step=1,device_id=0,normalize=True,p=2.0,T_subseq_isconstant_func=None)#

A class to compute the Pan Matrix Profile with with one or more GPU devices

This is based on the SKIMP algorithm.

Parameters:
Tnumpy.ndarray

The time series or sequence for which to compute the pan matrix profile.

min_mint, default 3

The starting (or minimum) subsequence window size for which a matrix profilemay be computed.

max_mint, default None

The stopping (or maximum) subsequence window size for which a matrix profilemay be computed. Whenm_stop=None, this is set to the maximum allowablesubsequence window size.

stepint, default 1

The step between subsequence window sizes.

device_idint or list, default 0

The (GPU) device number to use. The default value is0. A list ofvalid device ids (int) may also be provided for parallel GPU-STUMPcomputation. A list of all valid device ids can be obtained byexecuting[device.idfordeviceinnumba.cuda.list_devices()].

normalizebool, default True

When set toTrue, this z-normalizes subsequences prior to computingdistances. Otherwise, this function gets re-routed to its complementarynon-normalized equivalent set in the@core.non_normalized functiondecorator.

pfloat, default 2.0

The p-norm to apply for computing the Minkowski distance. Minkowski distance istypically used withp being1 or2, which correspond to theManhattan distance and the Euclidean distance, respectively. This parameter isignored whennormalize==True.

T_subseq_isconstant_funcfunction, default None

A custom, user-defined function that returns a boolean array that indicateswhether a subsequence inT is constant (True). The function mustonly take two arguments,a, a 1-D array, andw, the window size,while additional arguments may be specified by currying the user-definedfunction usingfunctools.partial. Any subsequence with at least onenp.nan/np.inf will automatically have its corresponding value set toFalse in this boolean array.

Attributes:
PAN_numpy.ndarray

The transformed (i.e., normalized, contrasted, binarized, and repeated)pan matrix profile.

M_numpy.ndarray

The full list of (breadth first search (level) ordered) subsequence windowsizes.

Methods

update():

Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.

See also

stumpy.stimp

Compute the Pan Matrix Profile

stumpy.stimped

Compute the Pan Matrix Profile with adask/ray cluster

Notes

DOI: 10.1109/ICBK.2019.00031

See Table 2

Examples

>>>importstumpy>>>importnumpyasnp>>>fromnumbaimportcuda>>>if__name__=="__main__":...all_gpu_devices=[device.idfordeviceincuda.list_devices()]...pmp=stumpy.gpu_stimp(...np.array([584.,-11.,23.,79.,1001.,0.,-19.]),...device_id=all_gpu_devices)...pmp.update()...pmp.PAN_array([[0., 1., 1., 1., 1., 1., 1.],       [0., 1., 1., 1., 1., 1., 1.]])