Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
It should be noted that the terms used in the description of the present application and the claims and the above-mentioned drawings are only used for describing the embodiments, and are not intended to limit the scope of the present application. It will be understood that the terms "comprises," "comprising," "includes," "including" and/or "having," when used herein, specify the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be further understood that, although the terms "first," "second," "third," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element without departing from the scope of the present invention. Similarly, the second element may be referred to as a first element. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
Time sequence: and (3) each numerical value of a certain phenomenon or statistical index at different time points is arranged according to the time sequence to form an ordered sequence.
User portrayal: users are clustered or categorized based on their behavioral characteristic data (typically time series data), thereby enabling characterization of the users.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be various electronic devices with display screens including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative, and that any number of terminal devices, networks, and servers may be provided as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.
The time sequence classification method provided by the embodiment of the application is generally executed by the server 105, and accordingly, the time sequence classification device is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the time-series classification method provided in the embodiment of the present application may be performed by the terminal devices 101, 102, 103, and accordingly, the time-series classification apparatus may be provided in the terminal devices 101, 102, 103, which is not particularly limited in the present exemplary embodiment. For example, in an exemplary embodiment, the user may upload the time sequence to the server 105 through the terminal devices 101, 102, 103, and the server 105 processes the time sequence through the time sequence classification method provided by the embodiment of the present application, and sends the obtained classification result to the terminal devices 101, 102, 103.
It can be understood that the time sequence classification method can be applied to classification scenes of time sequence data in any real world, such as electrocardiographic signal abnormal classification and sensing action classification, and in addition, the time sequence classification method provided by the embodiment of the application can also be applied to user portrait analysis, and users are effectively classified through time sequence classification, so that descriptions of different types of users are realized.
In view of the application scenario of portrayal analysis of game users, identification of minors in game users is a very important issue today in order to protect the physical and mental health of minors. Currently, although minors are authenticated, a phenomenon that a large number of minors play games by using home-length mobile phones still exists. On the one hand, overplay can affect the healthy growth of the minors, and on the other hand, this can lead to the potential problem of minors paying for games through parent cell phones, thereby causing undesirable refund complaints and negative social public opinion. Therefore, how to identify underage game users remains a significant and research-worthy problem.
In the application scenario of the game user portrait analysis, the server 105 may be a game server, and the terminal devices 101, 102, 103 may be terminal devices provided with a game application program in which a game account number is registered. The game data of the user playing the game through the game account is provided with a time stamp, so that a time sequence can be formed, and therefore, whether the user playing the game is a minor user or not can be identified by classifying the time sequence.
The method comprises the steps that a user to be identified is a user needing to be identified by an underage user, the user to be identified performs game behaviors through a game application program of terminal equipment to form a time sequence to be classified, a game server can acquire the time sequence to be classified, and meanwhile the game server can acquire a time sequence set to be classified, wherein the time sequence set to be classified comprises a plurality of time sequences to be classified and categories of the time sequences to be classified, the time sequences to be classified can take the refund behaviors of the user as categories, and the time sequences to be classified comprise a plurality of first subsequences obtained by dividing the target sliding window length; then, the game server can divide the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; and finally, the game server determines the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time, thereby realizing the depiction of the user portrait of the user to be identified so as to judge whether the user to be identified is a minor.
Through the technical scheme of time sequence classification, analysis of game users can be effectively realized, and the descriptions of the game users are realized, so that underage users in the game are identified, the phenomenon of playing games by underage people is explored, and potential refund complaints and negative social public opinion are avoided.
It should be noted that the above application scenario is merely an exemplary example, and does not limit the application scenario of the technical solution of the embodiment of the present application, and the technical solution of the embodiment of the present application may be applied to any time series data classification scenario.
The implementation details of the technical scheme of the embodiment of the application are described in detail below:
classification methods for classifying time series mainly include nearest neighbor classification methods, shape sequence analysis methods, bag of Patterns (BoP) methods, and integrated hierarchical voting method based on conversion (Hierarchical Vote Collective of Transformation-based engines, HIVE-COTE).
The nearest neighbor classification method adopts methods such as Euclidean distance and the like to measure the similarity between time sequences, and then selects the category of the most similar time sequence as a classification result; searching a specific subsequence by using the shape sequence analysis method, and using the existence of the subsequence as a key feature for distinguishing different types of sequences, wherein the depiction of the existence of the subsequence is based on Euclidean distance; the pattern bag method comprises the steps of firstly converting real value sequences into symbol sequences by using an alternative method (Simple API for XML) technology of XML analysis, constructing a dictionary according to words appearing in the symbol sequences after determining word lengths, recording frequency information of the words appearing, and finally classifying by using the frequency; compared with a nearest neighbor classification method, a shape sequence analysis method and a pattern bag method, the classification accuracy of the integrated classification voting collective method based on conversion is higher, and the method integrates more than 30 independent classifiers, including the nearest neighbor classification method, the shape sequence analysis method, the pattern bag method and the like.
In addition to classifying time series, classification of real-valued sequences using deep neural networks, for example, using residual networks (Residual Neural Network, resNet) that include 9 residual convolutional layers and over 500000 network parameters.
The time sequence classification method default time sequence data is preprocessed and is an equal-length, no-deletion and no-outlier sequence. However, in practical applications, in terms of game applications, the data of payment, activity, etc. of the user in the game are all time-stamped, and thus are typical time-series data. And when the portrait analysis is carried out on the users, the time sequence classification can effectively realize the classification of the users and the depiction of different types of users. However, since the frequency of the game varies, time series of different lengths are generated by different users, and furthermore, due to processing errors or recording deletions, there are also often missing values and abnormal values in the time series data. Thus, there are typically deletions and outliers in the time series data, and not as long.
Although the time series can be guaranteed to be of equal length by simple truncation or sampling, this may lose part of the information of the time series, and the time series of different lengths are naturally occurring themselves. For example, the time series of heart beats are typically of unequal length, as the heart beats themselves are not exactly equally spaced. The operation of changing the time series to equal length itself may distort the data, causing problems in subsequent analysis.
The missing values may be complemented by interpolation or the like, but these methods are applicable only to cases where there are a small number of discontinuous missing values, and are not applicable to cases where there are continuous missing values. This situation of continuous loss is very common and is often caused by network transmission problems. In addition, even if the completion can be completed, the validity of the completion value is difficult to ensure, and the subsequent analysis may be affected.
Outliers can be identified by anomaly detection algorithms, but how outliers can be handled later remains a tricky problem. It is often recognized that outliers can be handled as missing, but this still suffers from the problem of missing value handling.
According to the time sequence classification method, the nearest neighbor classification method can measure the similarity of two different long-time sequences by selecting the dynamic time warping algorithm, but the method cannot deal with the problems of missing values and abnormal values, and the dynamic time warping algorithm has high time complexity and is difficult to be suitable for a scene of large-scale data.
The Shapelet sequence analysis method can avoid the problems of unequal length, missing values and abnormal values, but the method has extremely high time complexity and completely lacks practicality because all possible subsequences are traversed. In addition, the actual classification performance of the method is poor and is obviously lower than that of the nearest neighbor method. The pattern bag method can be applied to sequences of different lengths of time, but the missing values and outliers must be preprocessed.
The integrated hierarchical voting collective method based on conversion is based on basic algorithms such as nearest neighbor, shape, mode bag and the like, and the problem cannot be avoided naturally, and the deep learning method also needs to preprocess a time sequence in advance to realize effective model training.
In summary, the time series classification algorithm usually defaults to preprocessing the time series data to be classified, and the time series data in the real world often have different lengths, and have missing and abnormal values, so that it is difficult to simply and effectively apply the related time series classification algorithm to the data in the real environment. In addition, the quality of the data after processing is difficult to ensure in the traditional preprocessing flow, so that the subsequent data analysis has potential problems.
In this regard, the embodiment of the application provides a time sequence classification method, which includes obtaining a time sequence to be classified and a classified time sequence set, where the classified time sequence set includes a plurality of classified time sequences and categories of each classified time sequence, each classified time sequence includes a plurality of first subsequences obtained by dividing a target sliding window length, then dividing the time sequence to be classified according to the target sliding window length to obtain a plurality of second subsequences, further, calculating to obtain a similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences, and finally, determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time. The technical scheme provided by the embodiment of the application does not need to additionally preprocess the time sequence, can be directly applied to original time sequence data existing in a real scene, divides the time sequence data by utilizing the length of the target sliding window to obtain the subsequence, effectively avoids the influence of unequal time sequences, missing values or abnormal values, is compatible with any similarity measurement method, can effectively classify the time sequence, and improves the classification accuracy and efficiency.
Fig. 2 shows a flow chart of a time series classification method according to an embodiment of the application, which may be performed by a server, which may be the server 105 shown in fig. 1, but which may of course also be performed by a terminal device, such as the terminal 101 shown in fig. 1. Referring to fig. 2, the time series classification method includes:
step S210, obtaining a time sequence to be classified and a classified time sequence set, wherein the classified time sequence set comprises a plurality of classified time sequences and categories of each classified time sequence, and each classified time sequence comprises a plurality of first subsequences obtained by dividing the target sliding window length;
step S220, dividing the time sequence to be classified according to the target sliding window length to obtain a plurality of second subsequences;
step S230, calculating the similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences contained in each classified time sequence;
step S240, determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time sequence.
These steps are described in detail below.
In step S210, a time sequence to be classified and a set of classified time sequences are acquired, the set of classified time sequences including a plurality of classified time sequences and categories of respective classified time sequences, the respective classified time sequences including a plurality of first sub-sequences divided by a target sliding window length.
In this embodiment, the time sequence to be classified is a time sequence to be classified, and the classified time sequence set is a set composed of a plurality of classified time sequences, and the set includes the categories of the classified time sequences.
Dividing each classified time sequence into a plurality of first sub-sequences based on the target sliding window length, wherein the length of each first sub-sequence obtained by dividing is the target sliding window length. For example, if the classified time series is T1 =(t1 ,t2 ,t3 ,t4 ,t5 ) If the target sliding window length is 2, then T can be determined based on the target sliding window length1 The 5 first sub-sequences obtained by division are T11 =(t1 ,t2 ),T12 =(t2 ,t3 ),T13 =(t3 ,t4 ),T14 =(t3 ,t4 ),T15 =(t4 ,t5 )。
In step S220, the time sequence to be classified is divided according to the target sliding window length, so as to obtain a plurality of second sub-sequences.
In addition to dividing each classified time sequence by the target sliding window length to obtain a plurality of first subsequences, the time sequence to be classified is further divided based on the target sliding window length to obtain a plurality of second subsequences.
In step S230, a similarity between the time series to be classified and the respective classified time series is calculated according to the plurality of second sub-series and the plurality of first sub-series included in the respective classified time series.
After the time sequence to be classified is obtained and the time sequence to be classified is divided to obtain a plurality of second subsequences, the similarity between the time sequence to be classified and each classified time sequence can be calculated through the plurality of second subsequences and a plurality of first subsequences contained in each classified time sequence. The method for calculating the similarity may include euclidean distance, dynamic time warping, and the like, and the method for calculating the similarity is not particularly limited in this embodiment of the present application.
For example, suppose that the set of categorized time sequences includes 4 categorized time sequences, T respectively1 =(t1 ,t2 ,t3 ,t4 ),T2 =(t5 ,t6 ),T3 =(t7 ,t8 ),T4 =(t9 ,t10 ,t11 ) The time sequence to be classified is P1 =(p1 ,p2 ,p3 ) The target sliding window length is 2. Multiple first sub-sequences can be obtained based on the target sliding window length, which are respectively T11 =(t1 ,t2 ),T12 =(t2 ,t3 ),T13 =(t3 ,t4 ),T21 =(t5 ,t6 ),T31 =(t7 ,t8 ),T41 =(t9 ,t10 ),T42 =(t10 ,t11 ) Multiple second sub-sequences can be obtained based on the target sliding window length, which are respectively P11 =(p1 ,p2 ),P12 =(p2 ,p3 )。
To calculate the classified time series T1 To be classified into Time series P1 Similarity between them can be determined by calculating T11 =(t1 ,t2 ) And P11 =(p1 ,p2 ) Similarity S between1 ,T11 =(t1 ,t2 ) And P12 =(p2 ,p3 ) Similarity S between2 ,T12 =(t2 ,t3 ) And P11 =(p1 ,p2 ) The similarity between is S3 ,T12 =(t2 ,t3 ) And P12 =(p2 ,p3 ) Similarity S between4 ,T13 =(t3 ,t4 ) And P11 =(p1 ,p2 ) Similarity S between5 ,T13 =(t3 ,t4 ) And P12 =(p2 ,p3 ) Similarity S between6 Obtaining six similarity degrees S1 、S2 、S3 、S4 、S5 、S6 The maximum value of the six similarities can then be taken as the categorized time series T1 With the time series P to be classified1 Similarity between the time series T and the time series T can be calculated by the same principle2 With the time series P to be classified1 Similarity between the classified time series T3 With the time series P to be classified1 Similarity between the classified time series T4 With the time series P to be classified1 Similarity between them.
In step S240, the category of the time series to be classified is determined according to the similarity between the time series to be classified and the respective classified time series.
The similarity describes the degree of similarity of the local features between the time series to be classified and each of the classified time series, and therefore, after the similarity between the time series to be classified and each of the classified time series is calculated in step S230, the category of the time series to be classified can be determined therefrom.
In one embodiment of the present application, after the similarity between the time series to be classified and each classified time series is calculated, the maximum similarity among the similarities between the time series to be classified and each classified time series may be obtained, and the category of the classified time series corresponding to the maximum similarity is taken as the category of the time series to be classified.
Based on the technical scheme of the embodiment, the sub-sequence is obtained by dividing the time sequence data by utilizing the target sliding window length, so that the influence of time sequence unequal length, missing value or abnormal value is effectively avoided, the time sequence is not required to be additionally preprocessed, the method can be directly applied to the original time sequence data existing in a real scene, any similarity measurement method is compatible, the time sequence can be effectively classified, and the classification accuracy and efficiency are improved.
FIG. 3 shows a flow chart of determining a target sliding window length according to one embodiment of the application, as shown in FIG. 3, which may specifically include steps S310-S340, as follows:
step S310, the classified time sequence set is divided into a first subset and a second subset, and a plurality of sliding window lengths are generated according to the sequence lengths of the classified time sequences.
In this embodiment, in order to determine the target sliding window length, the classified time series set may be first divided, specifically, the classified time series set may be divided into a first subset and a second subset according to a certain number proportion, where the first subset and the second subset respectively include different numbers of classified time series. Meanwhile, a plurality of sliding window lengths may be generated according to the sequence lengths of the respective classified time sequences.
The plurality of sliding window lengths may be all lengths equal to or less than the shortest sequence length in the classified time series set, for example, assuming that the classified time series set includes 5 classified time series, the sequence lengths of the 5 classified time series are 10,6, 12,4,5, respectively, and since the shortest sequence length in the classified time series set is 4, the plurality of sliding window lengths may be 1,2,3, and 4.
Here, the first subset may be used as a training subset for training the classification of the time series; and the second subset is used as a verification subset for verifying whether the training subset classifies the time series correctly or not. Thus, to ensure a classification effect, the number of classified time sequences contained in the first subset may be made larger than the number of classified time sequences contained in the second subset, e.g. the ratio of the number of the first subset to the number of the second subset is 7:3.
Step S320, dividing the classified time sequences included in the first subset according to the lengths of the sliding windows to obtain a plurality of third sub-sequences corresponding to the lengths of the sliding windows, and dividing the classified time sequences included in the second subset according to the lengths of the sliding windows to obtain a plurality of fourth sub-sequences corresponding to the lengths of the sliding windows.
After generating the plurality of sliding window lengths, the classified time sequences contained in the first subset may be further divided according to the sliding window lengths to obtain a plurality of third sub-sequences corresponding to the sliding window lengths, and at the same time, the classified time sequences contained in the second subset may be further divided according to the sliding window lengths to obtain a plurality of fourth sub-sequences corresponding to the sliding window lengths.
For example, suppose that the first subset includes 3 categorized time sequences, T respectively1 、T2 And T3 Wherein T is1 =(t1 ,t2 )、T2 =(t3 ,t4 ,t5 ,t6 )、T3 =(t7 ,t8 ,t9 ) The second subset contains 2 classified subsequences, T respectively4 And T5 Wherein T is4 =(t10 ,t11 ,t12 ,t13 ,t14 ,t15 ),T5 =(t16 ,t17 ,t18 ) Since the shortest sequence length in the classified time series set is 2, 2 sliding window lengths w can be generated1 And w2 W is respectively1 =1,w2 =2。
Thus, at w1 When=1, the 3 classified time sequences included in the first subset may be divided to obtain 9 third subsequences: t (T)111 =(t1 ),T112 =(t2 ),T121 =(t3 ),T122 =(t4 ),T123 =(t5 ),T124 =(t6 ),T131 =(t7 ),T132 =(t8 ),T133 =(t9 ) The method comprises the steps of carrying out a first treatment on the surface of the At w2 When=2, the 3 classified time sequences included in the first subset may be divided to obtain 6 third subsequences: t (T)211 =(t1 ,t2 ),T221 =(t3 ,t4 ),T222 =(t4 ,t5 ),T223 =(t5 ,t6 ),T231 =(t7 ,t8 ),T232 =(t8 ,t9 )。
Similarly, w1 When=1, dividing the 2 classified time sequences included in the second subset may result in 9 fourth subsequences: t (T)141 =(t10 ),T142 =(t11 ),T143 =(t12 ),T144 =(t13 ),T145 =(t14 ),T146 =(t15 ),T151 =(t16 ),T152 =(t17 ),T153 =(t18 );w2 When=2, dividing the 2 classified time sequences included in the second subset may result in 7 fourth subsequences: t (T)241 =(t10 ,t11 ),T242 =(t11 ,t12 ),T243 =(t12 ,t13 ),T244 =(t13 ,t14 ),T245 =(t14 ,t15 ),T251 =(t16 ,t17 ),T252 =(t17 ,t18 )。
Step S330, determining the classification accuracy corresponding to each sliding window length according to the third subsequences and the fourth subsequences.
The third sub-sequence is obtained by dividing the first sub-sequence, the first sub-sequence can be used as a training set for training the classification of the time sequence, and the fourth sub-sequence is obtained by dividing the second sub-sequence, and the second sub-sequence can be used as a verification sub-sequence for verifying whether the training sub-sequence is classified into the time sequence or not. Therefore, through classification of the third subsequences and verification of the fourth subsequences, classification accuracy corresponding to the lengths of the sliding windows can be determined.
In one embodiment of the present application, as shown in fig. 4, step S330 specifically includes steps S410 to S430, which are specifically described as follows:
step S410, calculating the similarity of the classified time sequences included in the first subset and the classified time sequences included in the second subset with respect to the respective sliding window lengths according to the third sub-sequences and the fourth sub-sequences.
In this embodiment, in order to determine the classification accuracy corresponding to each sliding window length, the similarity between the classified time sequence included in the first subset and the classified time sequence included in the second subset with respect to each sliding window length may be calculated first according to the third subsequences and the fourth subsequences.
Continuing with the illustration in step S320, the sliding window length w1 When=1, the 3 classified time sequences included in the first subset are divided to obtain 9 third subsequences: t (T)111 =(t1 ),T112 =(t2 ),T121 =(t3 ),T122 =(t4 ),T123 =(t5 ),T124 =(t6 ),T131 =(t7 ),T132 =(t8 ),T133 =(t9 ) Dividing the 2 classified time sequences contained in the second subset to obtain 9 fourth subsequences: t (T)141 =(t10 ),T142 =(t11 ),T143 =(t12 ),T144 =(t13 ),T145 =(t14 ),T146 =(t15 ),T151 =(t16 ),T152 =(t17 ),T153 =(t18 ) Thus, through T111 =(t1 ),T112 =(t2 ) Respectively with T141 =(t10 ),T142 =(t11 ),T143 =(t12 ),T144 =(t13 ),T145 =(t14 ),T146 =(t15 ) A plurality of similarities can be calculated, and after the plurality of similarities are calculated, the maximum value in the plurality of similarities can be used as T1 And T is4 Similarity S between11 . Similarly, T may also be calculated from the third and fourth sub-sequences2 And T is4 Similarity S between12 ,T3 And T is4 Similarity S between13 ,T1 And T is5 Similarity S between14 ,T2 And T is5 Similarity S between15 ,T3 And T is5 Similarity S between16 。
Sliding window length w2 When=2, the 3 classified time sequences included in the first subset are divided to obtain 6 third subsequences: t (T)211 =(t1 ,t2 ),T221 =(t3 ,t4 ),T222 =(t4 ,t5 ),T223 =(t5 ,t6 ),T231 =(t7 ,t8 ),T232 =(t8 ,t9 ) Dividing the 2 classified time sequences contained in the second subset to obtain 7 fourth subsequences: t (T)241 =(t10 ,t11 ),T242 =(t11 ,t12 ),T243 =(t12 ,t13 ),T244 =(t13 ,t14 ),T245 =(t14 ,t15 ),T251 =(t16 ,t17 ),T252 =(t17 ,t18 ) Thus, by the third subsequence T211 =(t1 ,t2 ) And the fourth subsequence T241 =(t10 ,t11 ),T242 =(t11 ,t12 ),T243 =(t12 ,t13 ),T244 =(t13 ,t14 ),T245 =(t14 ,t15 ) A plurality of similarities can be calculated, and after the plurality of similarities are calculated, the maximum value in the plurality of similarities can be used as T1 And T is4 Similarity S between21 . Similarly, T may also be calculated from the third and fourth sub-sequences2 And T is4 Similarity S between22 ,T3 And T is4 Similarity S between23 ,T1 And T is5 Similarity S between24 ,T2 And T is5 Similarity S between25 ,T3 And T is5 Similarity S between26 。
Step S420, determining a reference category of the classified time series included in the second subset with respect to the respective sliding window length according to the similarity of the classified time series included in the first subset with respect to the respective sliding window length.
After calculating the similarity of the classified time series contained in the first subset and the classified time series contained in the second subset with respect to the respective sliding window lengths, i.e. the degree of similarity of the local features between the classified time series contained in the first subset and the classified time series contained in the second subset is described by the similarity, the reference class of the classified time series contained in the second subset with respect to the respective sliding window lengths can be determined accordingly.
In one embodiment of the present application, step S420 may specifically include:
and acquiring the maximum similarity of the similarity between the classified time sequences contained in the first subset and the classified time sequences contained in the second subset relative to the sliding window lengths, and taking the category of the classified time sequences corresponding to the maximum similarity as the reference category of the classified time sequences contained in the second subset relative to the sliding window lengths.
Specifically, the maximum similarity describes the maximum degree of similarity between the classified time series included in the first subset and the classified time series included in the second subset, and therefore, the category of the classified time series corresponding to the maximum similarity may be regarded as the reference category of the classified time series included in the second subset with respect to the respective sliding window lengths.
Continuing with the illustration in step S410, the sliding window length w1 When=1, T can be calculated1 And T is4 Similarity S between11 ,T2 And T is4 Similarity S between12 ,T3 And T is4 Similarity S between13 ,T1 And T is5 Similarity S between14 ,T2 And T is5 Similarity S between15 ,T3 And T is5 Similarity S between16 . If S11 、S12 、S13 S in (2)12 For maximum similarity, T can be set2 Is taken as T4 Is defined by the reference class of (2); if S14 、S15 、S16 S in (2)16 For maximum similarity, T can be set3 Is taken as T5 Is defined in the specification.
Sliding window length w2 When=2, T can be calculated1 And T is4 Similarity S between21 Similarly, T can also be calculated2 And T is4 Similarity S between22 ,T3 And T is4 Similarity S between23 ,T1 And T is5 Similarity S between24 ,T2 And T is5 Similarity S between25 ,T3 And T is5 Similarity S between26 . If S21 、S22 、S23 S in (2)23 For maximum similarity, T can be set3 Is taken as T4 Is defined by the reference class of (2); if S24 、S25 、S26 S in (2)25 For maximum similarity, T can be set2 Is taken as T5 Is defined in the specification.
Step S430, determining the classification accuracy corresponding to each sliding window length according to the category of the classified time sequence included in the second subset and the reference category of the classified time sequence included in the second subset relative to each sliding window length.
It will be appreciated that if the category of the classified time series is the same as the reference category of the classified time series, the classification accuracy of the classified time series may be determined to be 100%, whereas if it is not the same, the classification accuracy of the classified time series may be determined to be 0%.
In particular, in this step, if the reference category of the classified time series included in the second subset with respect to the respective sliding window lengths is the same as the category of the classified time series included in the second subset, the classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths may be determined to be 100%, whereas the classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths may be determined to be 0%.
Further, according to the classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths, the classification accuracy corresponding to the respective sliding window lengths may be determined.
In one embodiment of the present application, step S430 may specifically include:
calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
In this embodiment, the ratio of the sum of the classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths to the number of classified time series included in the second subset may be regarded as the classification accuracy corresponding to the respective sliding window lengths.
Continuing with the illustration in step S420, the sliding window length w1 When=1, T will be2 Is taken as T4 Will T3 Is taken as T5 If T2 Category c of (2)1 T is then4 Reference category c of (2)1 And T is4 Is also of category c1 Thus T is4 The classification accuracy with respect to the sliding window length w=1 is 100%; if T3 Category c of (2)2 T is then5 Reference category c of (2)2 And T is5 Category c of (2)3 T is then5 The classification accuracy with respect to the sliding window length w=1 is 0%, and thus, the sliding window length w can be calculated1 The corresponding classification accuracy was (100% + 0%)/2=50%.
Sliding window length w2 When=2, T will be3 Is taken as T4 Will T2 Is taken as T5 Reference category T of (C)3 Category c of (2)2 T is then4 Reference category c of (2)2 And T is4 Is also of category c1 Thus T is4 The classification accuracy relative to the sliding window length w=2 is 0%; t (T)2 Category c of (2)1 T is then5 Reference category c of (2)1 And T is5 Category c of (2)3 T is then5 The classification accuracy with respect to the sliding window length w=2 is 0%, and thus, the sliding window length w can be calculated2 The corresponding classification accuracy is (0% +0%)/2=0%.
In one embodiment of the present application, as shown in fig. 6, step S330 may further specifically include step S610 to step S640, which are specifically described below:
step S610, calculating, multiple times, the similarity of the classified time sequences included in the first subset and the classified time sequences included in the second subset with respect to the respective sliding window lengths according to the third sub-sequences and the fourth sub-sequences.
In this embodiment, in order to determine the classification accuracy corresponding to each sliding window length, the similarity between the classified time sequence included in the first subset and the classified time sequence included in the second subset may be calculated first according to the third subsequences and the fourth subsequences, and multiple calculations may be performed. Wherein the number of times of the multiple calculations may be inversely proportional to the amount of data calculated, i.e. if the amount of data calculated is large, the number of times may be reduced; if the calculated data amount is small, the number of times can be increased, and the determination can be specifically performed according to the actual situation.
Step S620, determining a single reference category of the classified time series included in the second subset with respect to the respective sliding window lengths according to the similarity obtained by each calculation.
After performing the calculations a number of times, a single reference class of the categorized time series contained in the second subset relative to the respective sliding window length may be determined based on the similarity obtained from each calculation. The specific determination method is in step S420.
Step S630, determining a plurality of classification accuracy rates corresponding to the sliding window lengths according to the classification of the classified time sequences contained in the second subset and the single reference classification of the classified time sequences contained in the second subset relative to the sliding window lengths.
Specifically, if the category of the classified time series included in the second subset and the single reference category of the classified time series included in the second subset with respect to the respective sliding window lengths are the same, the single classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths may be determined to be 100%, whereas the single classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths may be determined to be 0%.
Further, according to the single classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths, the single classification accuracy corresponding to the respective sliding window lengths may be determined. For example, a ratio of the sum of the single-pass classification accuracy of the classified time series included in the second subset with respect to the respective sliding window lengths to the number of classified time series included in the second subset is calculated, and the ratio is taken as the single-pass classification accuracy corresponding to the respective sliding window lengths.
After obtaining the single classification accuracy corresponding to each sliding window length, obtaining a plurality of classification accuracy corresponding to each sliding window length.
Continuing with the illustration in step S430, the sliding window length w calculated by step S4301 The corresponding classification accuracy is (100% +0%)/2=50%, sliding window length w2 The corresponding classification accuracy is (0% +0%)/2=0%, and as a result of one calculation, if 5 calculations are performed in the present embodiment, 5 classification accuracy can be obtained, and schematically, the 5 calculation results can be shown in table 1.
| w1 =1 | w2 =2 |
| First time | 50% | 0% |
| Second time | 50% | 50% |
| Third time | 50% | 0% |
| Fourth time | 0% | 0% |
| Fifth time | 50% | 0% |
TABLE 1
Step S640, calculating the ratio between the sum of the classification accuracy rates corresponding to the sliding window lengths and the times, and taking the calculated ratio as the classification accuracy rate corresponding to the sliding window lengths.
After determining the multiple classification accuracy rates corresponding to the lengths of the sliding windows, the ratio between the sum of the multiple classification accuracy rates and the times can be further calculated, so that the calculated ratio is used as the classification accuracy rate corresponding to the lengths of the sliding windows.
For example, assuming that the classification accuracy shown in Table 1 above is obtained, the sliding window length w can be obtained1 The corresponding classification accuracy is (50% +50% +50% +0% + 50%)/5=40%, sliding window length w2 The corresponding classification accuracy was (0% +50% +0%)/5=10%.
With continued reference to fig. 3, in step S340, the target sliding window length is determined according to the classification accuracy corresponding to the respective sliding window lengths.
After the classification accuracy corresponding to each sliding window length is obtained through the above embodiment, the target sliding window length may be determined according to the classification accuracy corresponding to each sliding window length, for example, the sliding window length corresponding to the classification accuracy greater than the preset threshold value in the classification accuracy corresponding to each sliding window length may be used as the target sliding window length.
In one embodiment of the present application, after determining the classification accuracy corresponding to each sliding window length, the maximum classification accuracy in the classification accuracy corresponding to each sliding window length may also be obtained, and the sliding window length corresponding to the maximum classification accuracy is used as the target sliding window length.
In this embodiment, the sliding window length corresponding to the maximum classification accuracy may be used as the target sliding window length, for example, in the step S640 of obtaining the sliding window length w1 The corresponding classification accuracy is the maximum classification accuracy, so the sliding window length w can be used for1 As a target sliding window length.
The following describes an embodiment of the apparatus of the present application that may be used to perform the time series classification method of the above-described embodiment of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the time series classification method of the present application.
Fig. 7 shows a block diagram of a time series classification apparatus according to an embodiment of the application, and referring to fig. 7, a time series classification apparatus 700 according to an embodiment of the application includes: an acquisition unit 702, a first division unit 704, a calculation unit 706, and a first determination unit 708.
Wherein the obtaining unit 702 is configured to obtain a time sequence to be classified and a classified time sequence set, where the classified time sequence set includes a plurality of classified time sequences and categories of each classified time sequence, and each classified time sequence includes a plurality of first sub-sequences obtained by dividing by a target sliding window length; the first dividing unit 704 is configured to divide the time sequence to be classified according to the target sliding window length, so as to obtain a plurality of second subsequences; a calculating unit 706 configured to calculate a similarity between the time series to be classified and the respective classified time series from the plurality of second sub-series and the plurality of first sub-series included in the respective classified time series; a first determining unit 708 configured to determine a category of the time series to be classified according to a similarity between the time series to be classified and the respective classified time series.
In some embodiments of the present application, the first determining unit 708 is configured to: and obtaining the maximum similarity in the similarity between the time sequence to be classified and each classified time sequence, and taking the category of the classified time sequence corresponding to the maximum similarity as the category of the time sequence to be classified.
In some embodiments of the application, the apparatus further comprises: a generation unit configured to divide the set of classified time series into a first subset and a second subset, and generate a plurality of sliding window lengths according to the sequence lengths of the respective classified time series; a second dividing unit, configured to divide the classified time sequences included in the first subset according to the lengths of the sliding windows to obtain a plurality of third subsequences corresponding to the lengths of the sliding windows, and divide the classified time sequences included in the second subset according to the lengths of the sliding windows to obtain a plurality of fourth subsequences corresponding to the lengths of the sliding windows; a second determining unit configured to determine a classification accuracy corresponding to the respective sliding window lengths according to the plurality of third sub-sequences and the plurality of fourth sub-sequences; and the third determining unit is configured to determine the target sliding window length according to the classification accuracy corresponding to each sliding window length.
In some embodiments of the application, the third determining unit is configured to: and acquiring the maximum classification accuracy rate in the classification accuracy rates corresponding to the sliding window lengths, and taking the sliding window length corresponding to the maximum classification accuracy rate as a target sliding window length.
In some embodiments of the application, the second determining unit comprises: a calculating subunit configured to calculate, from the plurality of third subsequences and the plurality of fourth subsequences, a similarity of the classified time series included in the first subset and the classified time series included in the second subset with respect to the respective sliding window lengths; a first determining subunit configured to determine a reference category of the classified time series included in the second subset with respect to the respective sliding window lengths according to a similarity of the classified time series included in the first subset with the classified time series included in the second subset with respect to the respective sliding window lengths; and the second determining subunit is configured to determine the classification accuracy corresponding to each sliding window length according to the category of the classified time sequence contained in the second subset and the reference category of the classified time sequence contained in the second subset relative to each sliding window length.
In some embodiments of the application, the first determining subunit is configured to: and acquiring the maximum similarity of the similarity between the classified time sequences contained in the first subset and the classified time sequences contained in the second subset relative to the sliding window lengths, and taking the category of the classified time sequences corresponding to the maximum similarity as the reference category of the classified time sequences contained in the second subset relative to the sliding window lengths.
In some embodiments of the application, the second determining subunit is configured to: determining a classification accuracy of the classified time series contained in the second subset relative to the respective sliding window lengths according to the class of the classified time series contained in the second subset and the reference class of the classified time series contained in the second subset relative to the respective sliding window lengths; and determining the classification accuracy corresponding to each sliding window length according to the classification accuracy of the classified time sequences contained in the second subset relative to each sliding window length and the number of the classified time sequences contained in the second subset.
In some embodiments of the application, the second determining subunit is configured to: calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
In some embodiments of the application, the second determining unit is configured to: calculating a plurality of times the similarity of the classified time series included in the first subset and the classified time series included in the second subset with respect to the respective sliding window lengths according to the plurality of third sub-sequences and the plurality of fourth sub-sequences; determining a single reference class of the classified time series contained in the second subset relative to the respective sliding window lengths according to the similarity calculated each time; determining a plurality of classification accuracy rates corresponding to the sliding window lengths according to the categories of the classified time sequences contained in the second subset and the single reference category of the classified time sequences contained in the second subset relative to the sliding window lengths; calculating the ratio between the sum of the classification accuracy rates corresponding to the lengths of the sliding windows and the times, and taking the calculated ratio as the classification accuracy rate corresponding to the lengths of the sliding windows.
Fig. 8 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.
It should be noted that, the computer system 800 of the electronic device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 8, the computer system 800 includes a central processing unit (Central Processing Unit, CPU) 801 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage section 808 into a random access Memory (Random Access Memory, RAM) 803. In the RAM 803, various programs and data required for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN (Local Area Network ) card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. When executed by a Central Processing Unit (CPU) 801, performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.