HK40081297A

Movatterモバイル変換

Info

Publication number: HK40081297A
Application number: HK42023070450.4A
Authority: HK
Inventors: 吴杰
Original assignee: 腾讯科技（深圳）有限公司
Filing date: 2023-03-23
Publication date: 2023-05-19

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, the importance of managing the service use objects becomes higher and higher, and the accuracy of object management can have a great influence on the stability of service operation. In the related art, an object is generally managed by analyzing the object through service using feature data of the object. However, there may be a case where there is a sudden change in the feature data of the business-use object, thereby reducing the accuracy of the object analysis.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

Embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a storage medium, which can reduce an influence caused by a sudden change of feature data by using a variation trend of the feature data at a plurality of target time points, and can improve accuracy of object analysis.

In one aspect, an embodiment of the present invention provides a data processing method, including:

determining a plurality of target time points;

for each target time point, acquiring first characteristic data of a first object according to a target characteristic dimension;

for each target time point, acquiring second feature data of a target cluster where the first object is located according to the target feature dimension;

acquiring a first time sequence of the first characteristic data at a plurality of target time points, and acquiring a second time sequence of the second characteristic data at a plurality of target time points;

determining a first correlation value between the first time series and the second time series;

and when the first correlation value is smaller than or equal to a preset threshold value, determining the first object as a target object.

On the other hand, an embodiment of the present invention further provides a data processing apparatus, including:

a time point determination module for determining a plurality of target time points;

the first characteristic data acquisition module is used for acquiring first characteristic data of a first object according to a target characteristic dimension for each target time point;

a second feature data obtaining module, configured to, for each target time point, obtain, according to the target feature dimension, second feature data of a target cluster in which the first object is located;

a time series data determining module, configured to determine a first time series of the first feature data at a plurality of the target time points, and determine a second time series of the second feature data at a plurality of the target time points;

a correlation determination module for determining a first correlation value between the first time series and the second time series;

and the target object determining module is used for determining the first object as the target object when the first correlation value is less than or equal to a preset threshold value.

Further, the first characteristic data obtaining module is specifically configured to:

determining a plurality of member characteristics corresponding to the target characteristic dimension;

determining a target feature corresponding to the first object from a plurality of the member features;

assigning values to the target features by using first characters, and assigning values to the other member features except the target features by using second characters;

and obtaining a character string according to the first character and the second character, and taking the character string as first characteristic data of the first object.

Further, the target cluster includes a plurality of objects, and the second characteristic data obtaining module is specifically configured to:

acquiring third feature data of each object in a target cluster where the first object is located according to the target feature dimensions;

and calculating the average value of the third feature data of the plurality of objects to obtain second feature data of the target cluster.

Further, the number of the target feature dimensions is multiple, and the relevancy determination module is specifically configured to:

for each of the target feature dimensions, determining a second correlation value between the first time series and the second time series;

and obtaining a first correlation value between the first time sequence and the second time sequence according to the sum of the second correlation values of each target feature dimension.

Further, the correlation determination module is further configured to:

determining a first feature dimension;

culling the first feature dimension from a plurality of the target feature dimensions;

the determining, for each of the target feature dimensions, a second correlation value between the first time series and the second time series includes:

and determining a second correlation value between the first time series and the second time series for each target feature dimension remaining after the first feature dimension is removed.

Further, the correlation determination module is specifically configured to:

calculating an average value of the first time series to obtain first average characteristic data, and calculating a first difference value between the first characteristic data and the first average characteristic data of each target time point;

calculating an average value of the second time series to obtain second average characteristic data, and calculating a second difference value between the second characteristic data and the second average characteristic data of each target time point;

calculating the product of each first difference value and the second difference value corresponding to the same target time point;

and obtaining a second correlation value between the first time sequence and the second time sequence according to an average value of the products of the target time points.

The first object is any one of a plurality of second objects, and the data processing apparatus further includes a cluster partitioning module, where the cluster partitioning module is configured to:

acquiring fourth feature data of each second object according to the target feature dimension;

determining the number of target clusters, and determining a first cluster center corresponding to the number of the target clusters;

according to the target feature dimension, acquiring fourth feature data of each second object and fifth feature data of each first cluster center;

calculating a distance between each second object and each first cluster center according to the fourth feature data and the fifth feature data;

and determining a second cluster center corresponding to each second object from the first cluster centers according to the distance, and performing cluster division on each second object and the corresponding second cluster center to obtain target clusters corresponding to the number of the target clusters.

Further, the cluster partitioning module is specifically configured to:

determining the number of initial clusters;

carrying out incremental processing on the initial cluster number according to a preset step length to obtain a plurality of clusters to be determined;

performing cluster division on the plurality of second objects according to the number of each cluster to be determined, and determining a division error value of the cluster division for the number of each cluster to be determined;

and determining the number of target clusters from the number of the clusters to be determined according to the variation value between two adjacent dividing error values.

Further, the cluster partitioning module is specifically configured to:

determining the number of the target feature dimensions;

and determining the initial cluster number corresponding to the number range according to the number range in which the number of the target feature dimensions is positioned.

Further, the cluster partitioning module is specifically configured to:

acquiring sixth characteristic data of each cluster after cluster division is carried out on the plurality of second objects;

calculating a third difference value between the fourth feature data of each second object and the sixth feature data of the cluster where the second object is located, and performing squaring processing on the third difference value to obtain a unit error value corresponding to each fourth feature data;

and summing the unit error values to obtain a dividing error value of the cluster division.

Further, the number of the target feature dimensions is multiple, and the cluster partitioning module is further configured to:

determining a second feature dimension;

culling the second feature dimension from a plurality of the target feature dimensions;

the obtaining fourth feature data of each second object and fifth feature data of each first cluster center according to the target feature dimension includes:

and acquiring fourth feature data of each second object and fifth feature data of each first cluster center according to the target feature dimensions left after the second feature dimensions are removed.

Further, the cluster partitioning module is specifically configured to:

calculating a fourth difference between the fourth feature data and the fifth feature data, and performing squaring processing on the fourth difference to obtain a unit distance value of each target feature dimension;

and performing square-opening processing on the sum of the unit distance values of each target characteristic dimension to obtain the distance between each second object and the center of each first cluster.

On the other hand, an embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the data processing method when executing the computer program.

On the other hand, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the data processing method.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the data processing method for realizing the above.

The embodiment of the invention at least comprises the following beneficial effects: according to the embodiment of the invention, a plurality of target time points are determined, for each target time point, first characteristic data of a first object is obtained according to a target characteristic dimension, for each target time point, second characteristic data of a target cluster where the first object is located is obtained according to the target characteristic dimension, a first time sequence of the first characteristic data at the plurality of target time points is obtained, and a second time sequence of the second characteristic data at the plurality of target time points is obtained, wherein the first time sequence and the second time sequence can reflect the variation trend of the first characteristic data and the second characteristic data, so that the influence caused by data mutation can be reduced; and then, further determining a first correlation value between the first time series data and the second time series data, and when the first correlation value is smaller than or equal to a set threshold value, determining the first object as a target object, which is an object to be processed, introducing a target cluster, and comparing the correlation values of the first time series and the second time series, so that not only can the influence caused by data mutation be reduced, but also the difference between the first object and the target cluster can be accurately and reliably determined by utilizing the characteristic correlation value between the first object and the target cluster, thereby being beneficial to improving the accuracy of object analysis.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the present invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and do not constitute a limitation thereof.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a data processing method according to an embodiment of the present invention;

fig. 4 is a specific flowchart of acquiring first feature data of a first object according to a target feature dimension according to an embodiment of the present invention;

fig. 5 is a flowchart of cluster partitioning according to an embodiment of the present invention;

fig. 6 is a schematic diagram of cluster partitioning according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a complete flow of a data processing method according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a data processing method according to an embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating an application of the data processing method according to the embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 11 is a block diagram of a part of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions referred to in the embodiments of the present invention are described, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations:

object: an object is anything to be managed, either tangible or intangible, and may be, for example, a movie, an account number or a person.

Clustering: a cluster is a set composed of objects, and a plurality of objects can be divided into a plurality of clusters according to a certain policy, where each cluster includes a plurality of objects.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the development of internet technology, the importance of managing the service use objects becomes higher and higher, and the accuracy of object management can have a great influence on the stability of service operation. For example, in the field of payment services, a certain payment account is generally analyzed, and a problematic payment account is sometimes found, and for example, in the field of advertisement services, a certain target object of advertisement delivery is generally analyzed to determine whether the target object is a target object of advertisement delivery, and for example, in the field of instant messaging services, a certain social account is generally analyzed, so as to make a friend recommendation on the social account.

In the related art, an object is generally managed by analyzing the object through service using feature data of the object. However, there may be a case where there is a sudden change in the feature data of the business-use object, thereby reducing the accuracy of the object analysis. For example, the current transaction amount of a certain object is 1 yuan, and the transaction amount of the certain object at the next time point is suddenly changed to 1000 yuan, but the transaction amount of 1000 yuan actually still belongs to the normal range, and thus, the above manner may reduce the accuracy of object analysis.

Based on this, embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a storage medium, which can reduce the influence caused by abrupt change of feature data by using the change trend of feature data at a plurality of target time points, and can improve the accuracy of object analysis.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present invention, where the data processing method according to the embodiment of the present invention, by analyzing a characteristic correlation between an object and a cluster where the object is located, on one hand, may perform joint analysis with other analysis methods, and perform object management according to an analysis result, on the other hand, may also assist manual management by using the analysis result, so as to determine a management handling manner corresponding to the object.

Referring to fig. 2, fig. 2 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present invention, where the implementation environment includes a server 201 and an electronic device 202.

The server 201 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform.

In addition, the server 201 may also be a node server in a blockchain network.

The electronic device 202 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The electronic device 202 and the server 201 may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present invention is not limited herein.

It should be added that the data processing method provided by the embodiment of the present invention may be executed by the server 201 shown in fig. 2, or may also be executed by the electronic device 202 shown in fig. 2. The following description will be made by taking an example in which the server 201 executes a data processing method.

Referring to fig. 3, based on the implementation environment shown in fig. 2 exemplarily, an embodiment of the present invention provides a data processing method applied in the server 201, and the data processing method includes, but is not limited to, the following steps 301 to 306.

Step 301: the server determines a plurality of target time points;

in a possible implementation manner, the target time points are used to determine the timing for acquiring the feature data, in an embodiment of the present invention, the number of the target time points is at least two, for example, two, three, five, and the like, and the embodiment of the present invention is not limited. The server may determine a plurality of target time points according to a preset obtaining frequency, and the server may determine one time point every other hour, for example, the plurality of target time points determined by the server may be 9, 10, 00, 11. Also, the time intervals between two adjacent target time points may not be equal, for example, the plurality of target time points determined by the server may be 9.

The target time points may be historical time points, or may be a current time point and future time points determined based on a preset time interval from the current time point.

Step 302: the server acquires first characteristic data of the first object according to the target characteristic dimension for each target time point;

in one possible implementation, the feature data may be used to characterize attribute features, operation features, and the like of the object, and different feature data may correspond to different feature dimensions, for example, if the feature data is "male", the feature dimension corresponding to the feature data is "gender". The target feature dimension is a dimension used for characterizing the object in the embodiment of the present invention, and the number of the target feature dimensions may be one, two, or more. In addition, the target characteristic dimensions can be classified according to different attributes, for example, the target characteristic dimensions can be classified into characteristic dimensions of natural attributes (age, sex, area, and the like), characteristic dimensions of social attributes (occupation, hobbies, and the like), characteristic dimensions of account attributes (registration time, account type, and the like), characteristic dimensions of fund attributes (fund balance, transaction amount, transaction number, and the like), characteristic dimensions of operational attributes (login time, login number, click number, forwarding time, forwarding number, friend adding time, friend adding number, and the like), and the like. The account type may be a merchant account, a personal account, or the like.

In this embodiment of the present invention, first feature data of a corresponding target feature dimension of a first object is obtained for each target time point, for example, assuming that a plurality of target time points are 9, 10, 00, 11, and a target feature dimension is a transaction amount, then, at each of the target time points 9. It can be understood that, in the above example, the number of the target feature dimensions is one, and when the number of the target feature dimensions is multiple, the first feature data acquired at each target time point corresponds to multiple target feature dimensions. In addition, in practical applications, the above values "1000", "2000", and "3000" may be converted into corresponding vector representations.

Step 303: the server acquires second characteristic data of a target cluster where the first object is located according to the target characteristic dimension for each target time point;

in a possible implementation manner, in data processing, the embodiment of the present invention aims at multiple second objects, where a first object is any one of the multiple second objects, and before the data processing, cluster division is performed on the second object, a target cluster is a cluster where the first object is located, each target cluster may include multiple objects, and then, second feature data of a target feature dimension corresponding to the target cluster is obtained at each target time point. That is, the first feature data refers to feature data of the first object, and the second feature data refers to feature data of the target cluster. The second feature data of the target cluster may be feature data of a centroid of the target cluster, that is, an average value of feature data of all objects in the target cluster. Based on this, in this step, the second feature data of the target cluster where the first object is located is obtained according to the target feature dimension, which may be that the third feature data of each object in the target cluster where the first object is located is obtained according to the target feature dimension, and an average value of the third feature data of the plurality of objects is calculated to obtain the second feature data of the target cluster. Wherein the third feature data is used to generally refer to feature data of each object in the target cluster.

The above-described manner can be expressed by the following formula:

wherein, C_l Second characteristic data representing the ith target cluster, l is a constant, x_v Representative target cluster C_l Third feature data of the (v) th object, v being a constant, | S_l I represents the target cluster S₁ The number of objects in (1).

It is understood that, in addition to using the average value of the feature data of all the objects in the target cluster as the second feature data of the target cluster, the feature data of all the objects in the target cluster may be input to a pre-trained feature model in a manner of constructing a feature model, and then the second feature data of the entire target may be output.

As described above, the second feature data corresponds to the same target feature dimension as the first feature data, and further illustrated based on the example in step 302, when the transaction amount of the target cluster is 10000 yen at target time point 9.

Step 304: the server acquires a first time sequence of the first characteristic data at a plurality of target time points and acquires a second time sequence of the second characteristic data at a plurality of target time points;

in the embodiment of the present invention, the time series includes two or more time point feature data.

In the related art, the feature difference between a certain object and other objects is usually measured by using feature data at a certain time point, and this way, the defect is that when the feature data of the object is mutated, the mutated feature data may still be normal, but actually, the object may be abnormal. In the embodiment of the present invention, the first time series may reflect a variation trend of the first feature data, and the second time series may reflect a variation trend of the second feature data, so that the influence caused by data mutation may be reduced by obtaining the first time series and the second time series.

Further exemplifying based on the examples in step 302 and step 303, the first time series of the first characteristic data is "1000, 2000, 3000", and the second time series of the second characteristic data is "10000, 20000, 30000".

Step 305: the server determines a first correlation value between the first time series and the second time series;

therefore, by determining a first correlation value between the first time series and the second time series, the feature correlation degree between the first object and the target cluster where the first object is located can be determined according to the first correlation value, so that the analysis of the first object is realized.

Step 306: when the first correlation value is smaller than or equal to a preset threshold value, the server determines the first object as a target object.

When the first correlation value is smaller than or equal to the preset threshold value, it is indicated that the feature correlation degree between the first object and the target cluster where the first object is located is small, and the feature of the first object deviates from the features of other objects in the target cluster where the first object is located to a certain degree, so that the first object is the target object, and the target object is the object to be processed. For example, in the field of payment business, the first object may be a payment account number, and when the first object is determined as a target object, the subsequent handling manner may be to limit the transaction amount, and the like; for another example, in the field of advertisement service, the first object may be a target account for advertisement delivery, and when the first object is determined as a target object, the subsequent process may be to no longer deliver a corresponding advertisement to the first object, and so on. For another example, in the instant messaging field, the first object may be a target account recommended by a friend, and when the first object is determined to be a target object, the subsequent processing may be to no longer recommend a related friend to the first object, and the like. It is to be understood that, after the first object is determined to be the target object, the corresponding management handling manner may be further determined in combination with the manual review, for example, after the first object is determined to be the target object, the first feature data and the correlation value of the target object are displayed, so as to facilitate subsequent manual review.

It should be added that the preset threshold may be set according to actual situations, for example, may be 0.3, 0.5, and the like, and the embodiment of the present invention is not limited.

Therefore, by introducing the target cluster and comparing the correlation degree of the first time sequence and the second time sequence, the embodiment of the invention can reduce the influence caused by data mutation, and can also accurately and reliably determine the difference between the first object and the target cluster by utilizing the characteristic correlation degree between the first object and the target cluster, thereby being beneficial to improving the accuracy of object analysis.

It can be understood that the target feature dimensions may be divided into quantifiable feature dimensions and non-quantifiable feature dimensions, for example, feature dimensions such as "transaction amount", "transaction frequency", "login time", "click frequency", "forwarding frequency", "number of friends", "registration time", "age", and the like, belong to the quantifiable feature dimensions, that is, feature data corresponding to the feature dimensions is a specific numerical value, and feature dimensions such as "gender", "region", "account type", "occupation", "hobby", and the like belong to the non-quantifiable feature dimensions, that is, feature data corresponding to the feature dimensions is not a specific numerical value, at this time, when the first feature data or the second feature data is obtained by using a plurality of target feature dimensions, the specific feature data corresponding to the quantifiable feature dimensions and the non-quantifiable feature dimensions may have a problem of non-uniform dimensions.

Based on this, in a possible implementation manner, referring to fig. 4, fig. 4 is a specific flowchart of acquiring first feature data of a first object according to a target feature dimension according to an embodiment of the present invention, and in the step 302, acquiring the first feature data of the first object according to the target feature dimension may specifically include the following steps 401 to 404.

Step 401: the server determines a plurality of member characteristics corresponding to the target characteristic dimension;

step 402: the server determines a target feature corresponding to the first object from the plurality of member features;

step 403: the server assigns values to the target features by using the first characters and assigns values to the other member features except the target features by using the second characters;

step 404: and the server obtains a character string according to the first character and the second character, and the character string is used as the first characteristic data of the first object.

In a possible implementation manner, the foregoing steps 401 to 404 are directed to a non-quantized target feature dimension, and in step 401, determining a plurality of member features corresponding to the target feature dimension may be to list features belonging to the target feature dimension, for example, taking the target feature dimension as "occupation" as an example, and the member features of the target feature dimension "occupation" may be "teacher", "doctor", "student", "worker", "attorney", "driver", and the like. All member features of a certain target feature dimension may be listed, and when there may be a large number of member features corresponding to some target feature dimensions, the number of features may be set in advance, and besides the number of features, common member features may be selected according to actual situations, for example, the target feature dimension "occupation" utilizes common occupation as a member feature.

After a plurality of member features corresponding to a target feature dimension are obtained, a target feature corresponding to a first object in the member features is determined, based on the example of "occupation" of the target feature dimension, the first object is taken as an account for further explanation, occupation in account data can be extracted, if the occupation in the account data is a "student", the "student" is a target feature in the member features, then, the target feature is assigned by using a first character, and other member features except the target feature are assigned by using a second character, wherein the first character can be "1", and correspondingly, the second character can be "0", so that unquantized feature dimensions can be converted into quantized feature dimensions, and subsequent vectorization processing is facilitated. Of course, the first character and the second character are not limited in the embodiments of the present invention.

Then, after the target feature is assigned by using the first character and the remaining other member features are assigned by using the second character, a character string can be obtained according to the first character and the second character, for example, the character string can be obtained by arranging the first character and the second character according to a preset sequence, and the preset sequence can be determined according to the order of the initial letters, the number of the included characters, and the like. Based on the example of the target feature dimension "occupation", the determined member features are ranked as "worker", "teacher", "attorney", "driver", "student" and "doctor" according to the initial order, and the target feature corresponding to the first object is "student", so that the obtained character string is "000010".

As can be seen, by processing the first feature data of the first object through the steps shown in fig. 4, the first feature data of different target feature dimensions of the first object can be unified into quantifiable feature data, which facilitates subsequent calculation of the correlation value.

In a possible implementation manner, when the number of the target feature dimensions is multiple, when a first correlation value between the first time series and the second time series is determined, for each target feature dimension, a second correlation value between the first time series and the second time series is determined, and then the first correlation value between the first time series and the second time series is obtained according to a sum of the second correlation values of each target feature dimension. Specifically, the sum of the correlation values of each target feature dimension is used as a first correlation value, so that when the correlation between the first object and the target cluster is determined, a plurality of different feature dimensions can be comprehensively considered, and the accuracy and the reliability of object analysis can be improved. On the basis, corresponding weights can be given to different target feature dimensions, namely, the second correlation value is used for carrying out weighted summation to obtain the first correlation value, so that the rationality of the first correlation value can be improved, and the accuracy and the reliability of object analysis can be further improved.

In a possible implementation manner, before determining, for each target feature dimension, a second correlation value between the first time series and the second time series, the first feature dimension may be determined, the first feature dimension may be removed from the plurality of target feature dimensions, and then, for each target feature dimension remaining after the first feature dimension is removed, the second correlation value between the first time series and the second time series may be determined. In the multiple target feature dimensions, relatively stable target feature dimensions may be included, for example, "gender", "region", "occupation", and the like belong to the relatively stable feature dimensions, that is, specific features of the first object corresponding to the target feature dimensions are relatively stable and generally do not change, so that the relatively stable feature dimensions may be used as the first feature dimension, the first feature dimension is removed from the multiple target feature dimensions when the second correlation value is determined, and for each target feature dimension remaining after the first feature dimension is removed, the second correlation value between the first time series and the second time series is determined, so that the first correlation value obtained by summing the second correlation values can more obviously distinguish the difference between the first object and the target cluster, which is beneficial to further improving the accuracy and reliability of the object analysis.

In a possible implementation manner, the second correlation value between the first time series and the second time series is determined, specifically, the first average feature data is obtained by calculating an average value of the first time series, the first difference between the first feature data and the first average feature data at each target time point is calculated, the average value of the second time series is calculated to obtain the second average feature data, the second difference between the second feature data and the second average feature data at each target time point is calculated, the product of each first difference and the second difference corresponding to the same target time point is calculated, and the second correlation value between the first time series and the second time series is obtained according to the average value of the products of the first differences and the second differences at a plurality of target time points.

The determination method of the second correlation value may be represented by the following formula:

wherein r represents the second correlation value, E represents the mean, var represents the variance, c_li Is a l clusterSecond feature data, C, on a target feature dimension i at a target point in time_li Second time series, x, for I-cluster in target feature dimension I_ki Is the first characteristic data, X, of an object k on a target characteristic dimension i at a target time point_ki For a first time series of an object k in a target feature dimension i, E (X)_ki ) Represents the first average characteristic data, x_ki -E(X_ki ) Represents the first difference, E (C)_li ) Represents the second average characteristic data, c_li -E(C_li ) Representing the second difference.

In the embodiment of the invention, the variance of the first time series and the second time series is further introduced to calculate the second correlation value, so that the influence of the dimensional difference between the first time series and the second time series on the accuracy of the second correlation value can be reduced.

Based on this, the first correlation value can be expressed by the following formula:

wherein SS_k Represents a first correlation value, i is more than or equal to 1 and less than or equal to n, and n is a constant.

In the embodiment of the present invention, before second feature data of a target cluster in which a first object is located is obtained according to a target feature dimension, cluster division is performed on the second object, and a principle of performing cluster division on the second object in the embodiment of the present invention is described in detail below.

Referring to fig. 5, fig. 5 is a flowchart of cluster division according to an embodiment of the present invention, where the cluster division flow includes the following steps 501 to 504.

Step 501: the server determines the number of target clusters and determines first cluster centers corresponding to the number of the target clusters;

step 502: the server acquires fourth feature data of each second object and fifth feature data of each first cluster center according to the target feature dimension;

step 503: the server calculates the distance between each second object and each first cluster center according to the fourth feature data and the fifth feature data;

step 504: and the server determines a second cluster center corresponding to each second object from the first cluster centers according to the distance, and performs cluster division on each second object and the corresponding second cluster center to obtain target clusters corresponding to the number of the target clusters.

The number of the target clusters is used to determine the number of clusters obtained after the clusters are divided, and determine first cluster centers corresponding to the number of the target clusters, that is, the number of the first cluster centers is the same as the number of the target clusters. When performing cluster division, a coordinate system may be established, and the second object is converted into a corresponding point in the coordinate system, and the first cluster center may be randomly selected, for example, any one of the second objects may be, or other points in the coordinate system except for the second object, which is not limited in the embodiment of the present invention. The fourth feature data is used to generally refer to the feature data of each second object, and the fifth feature data is used to generally refer to the feature data of each first cluster center. After the first cluster centers are determined, the distance between each second object and each first cluster center is calculated according to the fourth feature data of each second object and the fifth feature data of each first cluster center, for example, referring to fig. 6, fig. 6 is a schematic diagram of cluster division provided by the embodiment of the present invention, and fig. 6 illustrates an example where the number of second objects is three and the number of first cluster centers is two (that is, the number of target clusters is two), it can be understood that fig. 6 is merely used as a principle illustration, and the embodiment of the present invention does not limit the specific numbers of second objects and first cluster centers. Wherein, the feature data of the object A1 is M1, the feature data of the object A2 is M2, the feature data of the object A3 is M3, the feature data of the cluster center B1 is N1, the feature data of the cluster center B2 is N2, the distance L1 between the object A1 and the cluster center B1 is calculated by the feature data M1 and the feature data N1, the distance L2 between the object A1 and the cluster center B2 is calculated by the feature data M1 and the feature data N2, similarly, the distance L3 between the object A2 and the cluster center B1 is calculated by the feature data M2 and the feature data N1, the distance L4 between the object A2 and the cluster center B2 is calculated by the feature data M2 and the feature data N2, the distance L5 between the object A3 and the cluster center B1 is calculated by the feature data M3 and the feature data N2, then, the distance L6 between the object A3 and the cluster center B2 is calculated by the feature data M3 and the feature data N2, then, the relationship between L1 and L2 is compared, if the sizes of the objects A1 and the cluster center B1 are smaller than the size of the object A3, then the cluster center A3 is divided into a cluster center B2, and a cluster size is compared with the cluster center B2, and a target size of the cluster size is compared similarly, and a size of the cluster center. Based on the above operations, the object A1, the object A2, and the object A3 may be divided into two target clusters.

In the embodiment of the present invention, when performing cluster division, the number of target clusters is determined first, and different numbers of target clusters have different influences on the cluster division, so in a possible implementation manner, when determining the number of target clusters, an initial cluster number may be determined first, the initial cluster number is incrementally increased according to a preset step size to obtain a plurality of numbers of clusters to be determined, a plurality of second objects are cluster-divided according to each number of clusters to be determined, for each number of clusters to be determined, a division error value of the cluster division is determined, and the number of target clusters is determined from the plurality of numbers of clusters to be determined according to a variation value between two adjacent division error values.

The initial cluster number may be 2, or the initial cluster number may also be set to be greater than 2 according to an actual situation. The preset step size may be 1, or the preset step size may be set to be greater than 1 according to actual situations. The number of the initial clusters is increased according to the preset step length to obtain the number of the multiple clusters to be determined, for example, if the number of the initial clusters is 2 and the preset step length is 1, the number of the clusters to be determined can be sequentially obtained to be 3, 4, 5, and the like, and if the number of the initial clusters is 2 and the preset step length is 2, the number of the clusters to be determined can be sequentially obtained to be 4, 6, 8, and the like. And after the number of the plurality of clusters to be determined is obtained, cluster division is carried out on the plurality of second objects according to the number of each cluster to be determined, and then a division error value corresponding to the number of each cluster to be determined is determined. It is understood that as the number of clusters increases, the dividing error value becomes smaller, and when the variation value between two adjacent dividing error values is the largest, that is, the current number of clusters to be determined can be taken as the target number of clusters. For example, when the number of clusters to be determined is 3, the corresponding division error value is SS1, when the number of clusters to be determined is 4, the corresponding division error value is SS2, when the number of clusters to be determined is 5, the corresponding division error value is SS3, in the process of increasing the number of clusters to be determined, the variation value between the division error values of the number of clusters 4 and the number of clusters 3 is SS2-SS1, the variation value between the division error values of the number of clusters 5 and the number of clusters 4 is SS3-SS2, and when the SS3-SS2 is greater than the SS2-SS1, the target number of clusters can be determined to be 5. The embodiment of the invention determines the number of the target clusters by using the dividing error value, so that the number of the clusters is more reasonable when the second object is subjected to cluster division, thereby being beneficial to improving the accuracy of subsequent object analysis.

It may be understood that, since the division error value becomes smaller as the number of clusters increases, in one possible implementation, the initial number of clusters may be determined by determining the number of target feature dimensions first, and then determining the initial number of clusters corresponding to the number range according to the number range in which the number of target feature dimensions is located. If the number of the target feature dimensions is larger, the feature expression of each object is richer, and the number of the divided clusters can be correspondingly increased, so that the corresponding initial cluster number can be determined based on the number range where the number of the target feature dimensions is located, and therefore, when the division error value is determined, the appropriate initial cluster number is started, and the efficiency of cluster division is improved. For example, when the number of the target feature dimensions ranges from 1 to 5, the corresponding initial cluster number may be 10, and when the number of the target feature dimensions ranges from 6 to 10, the corresponding initial cluster number may be 20.

In a possible implementation manner, the determining the dividing error value of the cluster division may be to obtain sixth feature data of each cluster after the cluster division is performed on the plurality of second objects, calculate a third difference between the fourth feature data of each second object and the sixth feature data of the cluster where the fourth feature data is located, perform squaring on the third difference to obtain a unit error value corresponding to each fourth feature data, and sum the unit error values to obtain the dividing error value of the cluster division.

The sixth feature data is used to generally refer to feature data of each cluster obtained by performing cluster division on the second object, the sixth feature data of the cluster where the second object is located may be feature data of a centroid of the cluster where the second object is located, and the determination manner of the second correlation value may be represented by the following formula:

wherein SSE represents a division error value, x represents the x-th cluster, x is more than or equal to 1 and less than or equal to k, k is a constant, cx represents the x-th cluster, and p is a cluster C_x Fourth feature data of the second object in (1), m_x As a cluster C_x Sixth feature data of the centroid, | p-m_x And | represents the third difference. When the value of SSE is lower, the cluster division effect is better. It will be appreciated that during the cluster partitioning, the centroid of the cluster may change as new second objects are added.

In a possible implementation manner, before the fourth feature data of each second object and the fifth feature data of each first cluster center are obtained according to the target feature dimension, the second feature dimension may be determined, the second feature dimension is removed from the multiple target feature dimensions, and then the fourth feature data of each second object and the fifth feature data of each first cluster center are obtained according to the target feature dimensions remaining after the second feature dimension is removed. Among the plurality of target feature dimensions, relatively unstable target feature dimensions may be included, for example, "transaction amount", "registration time", "registration number", and the like belong to the relatively unstable feature dimensions, that is, specific features of the second object corresponding to such target feature dimensions are relatively unstable, and generally change at different time points. In another aspect, a relatively stable target feature dimension is selected for cluster partitioning, such as feature dimensions of "gender", "region", "occupation", and the like. Therefore, such relatively unstable feature dimensions can be used as second feature dimensions, when the fourth feature data of each second object and the fifth feature data of each first cluster center are obtained, the second feature dimensions are firstly removed from the multiple target feature dimensions, and then the fourth feature data of each second object and the fifth feature data of each first cluster center are obtained according to the target feature dimensions left after the second feature dimensions are removed, so that feature differentiation between different second objects during cluster division can be more obvious, and the reasonability and the accuracy of cluster division can be improved.

It can be understood that, when performing cluster division, cluster division may also be performed according to a preset target feature dimension, for example, the preset target feature dimension may be "age group", "gender", "city class", "occupation", on this basis, the obtained feature of the second object may be "blue collar male youth in the third line city", "white collar female middle year in the first line city", and the cluster division is performed by presetting a common feature dimension, which is beneficial to improving interpretability of the divided clusters. It should be understood that the preset target feature dimension is only used for a schematic illustration, and the embodiment of the present invention is not limited thereto, and the preset target feature dimension may be determined according to a requirement in an actual application.

In a possible implementation manner, when the number of the target feature dimensions is multiple, the distance between each second object and each first cluster center is calculated according to the fourth feature data and the fifth feature data, specifically, a fourth difference between the fourth feature data and the fifth feature data is calculated, the fourth difference is squared to obtain a unit distance value of each target feature dimension, and the sum of the unit distance values of each target feature dimension is squared to obtain the distance between each second object and each first cluster center. The above distance may be calculated by the following formula:

wherein, X_i Represents the ith second object, C_j Represents the jth first cluster center, i, j are constants, dis (X)_i ，C_j ) Representing the distance, X, between a second object and the center of a first cluster_it Fourth feature data representing the second object in the t-th target feature dimension, C_jt And the fifth characteristic data represents the second object in the t-th target characteristic dimension, t is more than or equal to 1 and less than or equal to m, m is a constant, and m represents the number of the target characteristic dimensions. X_it -C_jt Represents the fourth difference, (X)_it -C_jt )² Representing a unit distance value.

It is to be understood that the distance between each second object and the center of each first cluster may also be calculated by using a cosine distance.

The data processing method provided by the embodiment of the invention is described in the complete flow. Referring to fig. 7, fig. 7 is a schematic diagram of a complete flow of a data processing method according to an embodiment of the present invention, where the data processing method mainly includes several main steps of feature data normalization, cluster division, correlation value calculation, and target object determination. When the feature data are standardized, the feature data are divided mainly from feature dimensions of natural attributes, social attributes, account attributes, capital attributes and operation attributes, and the feature data corresponding to non-quantized target feature dimensions are converted; when the clusters are divided, firstly determining the optimal number of target clusters by using the dividing error value, then dividing the clusters according to the distance between each object and the centroid of the clusters, and after the clusters are divided, determining the characteristic data of the centroid of each cluster for calculating the correlation value between the objects and the clusters according to the comparison of time sequences; when calculating the correlation value, respectively obtaining the characteristic data of the object at a plurality of target time points and the characteristic data of the cluster where the object is located at a plurality of target time points, and further respectively obtaining the object and the time sequence corresponding to the cluster where the object is located, namely calculating the correlation value between the object and the time sequence corresponding to the cluster where the object is located; when the target object is determined, the objects with the ranked correlation values are screened out according to the sequence of the correlation values corresponding to the objects from large to small, the threshold value of the correlation values is determined, the objects with the correlation values smaller than the threshold value are used as the target objects, and then the objects are further subjected to investigation treatment.

Based on the complete flow shown in fig. 7, referring to fig. 8, fig. 8 is a complete flow chart of a data processing method according to an embodiment of the present invention, wherein the data processing method includes the following steps 801 to 808.

Step 801: determining a plurality of second objects, and acquiring feature data of each second object according to a plurality of target feature dimensions;

step 802: converting the feature data corresponding to the non-quantized target feature dimension;

step 803: determining the number of target clusters by using the dividing error value, and performing cluster division on a plurality of second objects according to the number of the target clusters;

step 804: determining feature data of a centroid of each cluster after cluster division;

step 805: selecting one of a plurality of second objects, obtaining a time sequence of the second object according to the characteristic data of the second object at a plurality of time points, and obtaining the time sequence of a cluster where the second object is located according to the characteristic data of the cluster where the second object is located at a plurality of corresponding time points;

step 806: calculating a correlation value between the time sequence of the second object and the time sequence of the cluster in which the second object is positioned;

step 807: judging whether the correlation value is smaller than or equal to a preset threshold value, if so, skipping to a step 808, otherwise skipping to a step 805;

step 808: the second object is determined as the target object.

In step 802, the step of fig. 4 of converting the feature data of the unquantized target feature dimension is already explained, and is not repeated herein. In step 805, the second object and the time sequence of the cluster in which the second object is located may be extracted from a plurality of historical time points, or feature data of the second object and the cluster in which the second object is located in a plurality of future time points may be obtained based on a preset time interval from the current time point, so as to obtain the second object and the time sequence of the cluster in which the second object is located. In the above steps 801 to 807, by comparing the correlation values of the second object and the time series of the cluster where the second object is located, not only the influence caused by data mutation can be reduced, but also the difference between the clusters where the second object is located can be accurately and reliably determined, thereby facilitating the improvement of the accuracy of object analysis.

It will be understood that, although the steps in the respective flowcharts described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order shown and steps may be performed in other orders unless explicitly stated in the embodiment. Moreover, at least a part of the steps in the above flowcharts may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

An application of the data processing method provided by the embodiment of the present invention is explained in an actual scenario, and referring to fig. 9, fig. 9 is an application schematic diagram of the data processing method provided by the embodiment of the present invention, where the data processing method provided by the embodiment of the present invention may be exemplarily applied to scenarios such as payment account analysis, advertisement push, friend recommendation, and the like.

Specifically, the method includes the steps of firstly obtaining feature data of an account to be analyzed, performing cluster division on the account to be analyzed, then obtaining the feature data of the account to be analyzed at multiple time points, obtaining a time sequence of the account to be analyzed, correspondingly obtaining the time sequence of a cluster where the account to be analyzed is located at a corresponding time point, and then analyzing the account to be analyzed by using a correlation value between the account to be analyzed and the time sequence of the cluster where the account to be analyzed is located.

In practical application, service scenes such as online payment, news browsing, instant messaging and the like can be realized based on the same account to be analyzed, so that after the account to be analyzed is analyzed by utilizing the correlation degree value between the account to be analyzed and the time sequence of the cluster in which the account to be analyzed is located, the handling modes aiming at different scenes can be determined through the analysis result. For example, referring to fig. 9, it is assumed that a correlation value between the account to be analyzed and the time sequence of the cluster in which the account to be analyzed is located is smaller than a preset threshold, that is, the account to be analyzed is a target account, that is, an operation characteristic of the target account deviates from an operation characteristic of the cluster in which the account is located, so that in an online payment scene, it is indicated that a payment operation of the target account deviates, and the target account may be abnormal, and then corresponding control measures, such as limiting a transaction amount and transaction times, may be performed on the target account, so as to improve stability of payment service operation; in a business scene of news browsing, the browsing operation of the target account is deviated, the original advertisement push scheme can be adjusted subsequently, and the push content is determined aiming at the target account again, so that the accuracy of advertisement push is improved; and in an instant messaging service scene, the instant messaging operation of the target account is deviated, the original friend recommendation scheme can be adjusted subsequently, and the recommended friends are determined aiming at the target account again, so that the friend recommendation accuracy is improved. Therefore, the above method can synthesize various feature data of the account to be analyzed, and execute the disposal methods of different scenes according to the analysis result, so that the method has wide applicability.

It can be understood that, in the above three scenarios, when it is determined that the operation of the target account deviates, it may also be determined whether the cluster division of the target account is accurate, so as to avoid an error caused by inaccurate cluster division, and improve accuracy and reliability of analysis.

In a possible implementation manner, when different service scenes are implemented based on different account numbers to be analyzed, the account numbers to be analyzed in different service scenes can be also analyzed independently, on the basis, the feature data of the account numbers to be analyzed can be obtained in combination with specific service scenes, for example, in an online payment scene, when the feature data of the account numbers to be analyzed is obtained, the feature data related to transactions, such as transaction amount and transaction times, can be focused, and accordingly, the feature data with low relevance, such as "reading duration", "content type", and the like, can be focused, and similarly, in a service scene of news browsing, the feature data with high relevance to the service scenes, such as "reading duration", "content type", and the like, can be focused, and the feature data with low relevance, such as "transaction amount", "transaction times", and the like, can be correspondingly eliminated. Therefore, the method can perform corresponding account analysis for different service scenes, and is beneficial to improving the pertinence of the account analysis.

Referring to fig. 10, an embodiment of the present invention further provides a data processing apparatus 1000, including:

a time point determining module 1001 configured to determine a plurality of target time points;

a first feature data obtaining module 1002, configured to obtain, for each target time point, first feature data of a first object according to a target feature dimension;

a second feature data obtaining module 1003, configured to, for each target time point, obtain, according to a target feature dimension, second feature data of a target cluster where the first object is located;

a time series data determining module 1004 for determining a first time series of the first characteristic data at a plurality of target time points, and determining a second time series of the second characteristic data at a plurality of target time points;

a correlation determination module 1005, configured to determine a first correlation value between the first time series and the second time series;

a target object determining module 1006, configured to determine the first object as the target object when the first correlation value is smaller than or equal to a preset threshold.

Further, the first characteristic data obtaining module 1002 is specifically configured to:

determining a target feature corresponding to the first object from the plurality of member features;

assigning values to the target features by using the first characters, and assigning values to the other member features except the target features by using the second characters;

Further, the target cluster includes a plurality of objects, and the second characteristic data obtaining module 1003 is specifically configured to:

acquiring third characteristic data of each object in a target cluster where the first object is located according to the target characteristic dimension;

and calculating the average value of the third characteristic data of the plurality of objects to obtain the second characteristic data of the target cluster.

Further, the number of target feature dimensions is multiple, and the relevancy determining module 1005 is specifically configured to:

for each target feature dimension, determining a second correlation value between the first time series and the second time series;

and obtaining a first correlation value between the first time series and the second time series according to the sum of the second correlation values of each target characteristic dimension.

Further, the correlation determining module 1005 is further configured to:

determining a first feature dimension;

removing a first characteristic dimension from a plurality of target characteristic dimensions;

for each target feature dimension, determining a second correlation value between the first time series and the second time series, comprising:

Further, the relevancy determining module 1005 is specifically configured to:

calculating the average value of the second time sequence to obtain second average characteristic data, and calculating a second difference value between the second characteristic data of each target time point and the second average characteristic data;

calculating the product of each first difference value and a second difference value corresponding to the same target time point;

and obtaining a second correlation value between the first time sequence and the second time sequence according to the average value of the products of the target time points.

The first object is any one of a plurality of second objects, and the data processing apparatus further includes a cluster dividing module 1007, where the cluster dividing module 1007 is configured to:

determining the number of target clusters, and determining first cluster centers corresponding to the number of the target clusters;

according to the target feature dimension, obtaining fourth feature data of each second object and fifth feature data of each first cluster center;

calculating the distance between each second object and each first cluster center according to the fourth feature data and the fifth feature data;

Further, the cluster dividing module 1007 is specifically configured to:

determining the number of initial clusters;

and determining the number of the target clusters from the number of the clusters to be determined according to the variation value between two adjacent dividing error values.

Further, the cluster dividing module 1007 is specifically configured to:

determining the number of target feature dimensions;

Further, the cluster dividing module 1007 is specifically configured to:

acquiring sixth feature data of each cluster after cluster division is performed on the plurality of second objects;

calculating a third difference value between the fourth feature data of each second object and the sixth feature data of the cluster where the fourth object is located, and performing squaring processing on the third difference value to obtain a unit error value corresponding to each fourth feature data;

and summing the unit error values to obtain the dividing error values of the cluster division.

Further, the number of target feature dimensions is multiple, and the cluster dividing module 1007 is further configured to:

determining a second feature dimension;

removing a second characteristic dimension from the plurality of target characteristic dimensions;

according to the target feature dimension, acquiring fourth feature data of each second object and fifth feature data of each first cluster center, wherein the method comprises the following steps:

and acquiring fourth feature data of each second object and fifth feature data of each first cluster center according to the target feature dimensions remaining after the second feature dimensions are removed.

Further, the cluster dividing module 1007 is specifically configured to:

The data processing apparatus 1000 according to the embodiment of the present invention and the data processing method are based on the same inventive concept, and the principle and the beneficial effects of the data processing apparatus 1000 are not described herein again.

Referring to fig. 11, fig. 11 is a block diagram of a part of a server 1100 according to an embodiment of the present invention, where the server 1100 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors), a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) for storing an application program 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100.

The server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.

A processor in the server may be used to perform the data processing method.

Embodiments of the present invention further provide a computer-readable storage medium, where the computer-readable storage medium is configured to store a program code, and the program code is configured to execute the execution data processing method according to the foregoing embodiments.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The processor of the computer device may read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing method of the foregoing embodiments.

The terms "first," "second," "third," "fourth," and the like in the description of the invention and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is to be understood that, in the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It should be understood that in the description of the embodiments of the present invention, a plurality (or a plurality) means two or more, more than, less than, more than, etc. are understood as excluding the number, and more than, less than, etc. are understood as including the number.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other media capable of storing program codes.

It should also be appreciated that the various implementations provided by the embodiments of the present invention can be combined arbitrarily to achieve different technical effects.

While the preferred embodiments of the present invention have been described, it will be understood by those skilled in the art that the present invention is not limited to the above embodiments, and various equivalent modifications or substitutions can be made without departing from the spirit of the present invention and the scope of the present invention is defined by the appended claims.

Claims

1. A method of data processing, comprising:

determining a plurality of target time points;

2. The data processing method of claim 1, wherein the obtaining first feature data of the first object according to the target feature dimension comprises:

3. The data processing method according to claim 1, wherein the target cluster includes a plurality of objects, and the obtaining second feature data of the target cluster where the first object is located according to the target feature dimension includes:

4. The data processing method of claim 1, wherein the number of target feature dimensions is plural, and the determining a first correlation value between the first time series and the second time series comprises:

5. The data processing method of claim 4, wherein before determining, for each of the target feature dimensions, a second correlation value between the first time series and the second time series, the data processing method further comprises:

determining a first feature dimension;

6. The data processing method according to claim 4 or 5, wherein the determining a second correlation value between the first time series and the second time series comprises:

7. The data processing method according to claim 1, wherein the first object is any one of a plurality of second objects, and before the second feature data of the target cluster where the first object is located is obtained according to the target feature dimension, the data processing method further includes:

8. The data processing method of claim 7, wherein the determining the number of target clusters comprises:

determining the number of initial clusters;

9. The data processing method of claim 8, wherein the determining an initial number of clusters comprises:

determining the number of the target feature dimensions;

10. The data processing method of claim 8, wherein determining a partitioning error value for the cluster partitioning comprises:

11. The data processing method according to claim 7, wherein the number of the target feature dimensions is plural, and before the obtaining of the fourth feature data of each of the second objects and the fifth feature data of each of the first cluster centers according to the target feature dimensions, the data processing method further comprises:

determining a second feature dimension;

12. The data processing method of claim 7, wherein the number of target feature dimensions is plural, and the calculating the distance between each second object and each first cluster center according to the fourth feature data and the fifth feature data comprises:

calculating a fourth difference between the fourth feature data and the fifth feature data, and performing squaring on the fourth difference to obtain a unit distance value of each target feature dimension;

13. A data processing apparatus, characterized by comprising:

a second feature data acquisition module, configured to, for each target time point, acquire, according to the target feature dimension, second feature data of a target cluster in which the first object is located;

and the target object determining module is used for determining the first object as the target object when the first correlation value is smaller than or equal to a preset threshold value.

14. An electronic device comprising a memory storing a computer program, and a processor implementing the data processing method of any one of claims 1 to 12 when executing the computer program.

15. A computer-readable storage medium storing a program which is executed by a processor to implement the data processing method of any one of claims 1 to 12.