Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a user data processing method and system, and aims to solve the technical problems that in the prior art, loss prediction is carried out by analyzing single behavior data change of a user, the accuracy is low, and the coverage of the user to be lost is low.
To achieve the above object, in a first aspect, an embodiment of the present application provides a user data processing method, including the steps of:
Acquiring unique identification data of a user, wherein the attribute type of the user comprises a non-lost user and a lost user, extracting a plurality of user behavior data from a report library based on the unique identification data, wherein the user behavior data comprises a plurality of sub-behavior data under continuous months, and performing data cleaning on the user behavior data to form first standby behavior data, wherein the first standby behavior data comprises a plurality of first standby sub-data under continuous months;
Performing data transformation on the first standby behavior data to obtain second standby behavior data, wherein the second standby behavior data comprises second standby sub-data under a plurality of continuous months, and judging the relevance of the second standby behavior data to a user based on the second standby sub-data so as to select a plurality of final behavior data from the plurality of second standby behavior data, and the final behavior data comprises final sub-data under a plurality of continuous months;
Acquiring a plurality of key interaction data corresponding to the unique identification data, acquiring a single interaction index between the unique identification data and the key interaction data, and generating a calibration interaction index corresponding to the unique identification data based on the plurality of single interaction indexes;
and acquiring a first behavior index and a second behavior index corresponding to the final behavior data through the final sub-data, generating a loss index corresponding to the unique identification data based on the calibrated interaction index, the first behavior index and the second behavior index, and determining the loss risk level of the non-lost user based on the loss index.
Compared with the prior art, the method has the advantages that after the data are cleaned and changed, different user behavior data are standardized to the same dimension, whether the user has a loss risk or not is judged by considering multiple user behavior data, accuracy of loss prediction is improved, corresponding prediction coverage is improved, the final behavior data are selected, namely, part of the second standby behavior data are removed through correlation between the second standby behavior data and the user, data are reduced, influence on accuracy of loss prediction by excessive data or low-correlation data is avoided, on the basis of the user behavior data, the calibration interaction index is introduced, prediction of the loss risk is assisted through stability of interaction circles formed between the unique identification data and the key interaction data, and accuracy and coverage of loss prediction are further improved.
Further, the step of performing data cleaning on the user behavior data includes:
Acquiring the maximum number of the sub-behavior data in the user behavior data, and judging whether missing months exist in the user behavior data or not based on the maximum number;
if the missing months exist in the user behavior data, acquiring the missing quantity of the missing months, and comparing the missing quantity with a missing threshold value;
If the missing quantity is smaller than the missing threshold value, acquiring the sub-behavior data corresponding to two months adjacent to the missing month to generate filling behavior data, and filling the filling behavior data into the missing month;
if the missing quantity is larger than the missing threshold value, the preset behavior data is fed into the missing month;
and constructing a threshold range based on all the sub-behavior data in the user behavior data, and replacing abnormal data of the user behavior data based on the threshold range.
Still further, the step of performing data transformation on the first standby behavior data to obtain second standby behavior data, where the second standby behavior data includes second standby sub-data under several consecutive months includes:
Obtaining the maximum value and the minimum value of the first standby sub-data in the first standby behavior data;
converting each of the first inactive sub-data into second inactive sub-data based on the maximum value and the minimum value;
The second standby sub-data over several consecutive months is combined into the second standby behavior data.
Further, the calculation formula of the second standby sub-data is:
,
Wherein,Second standby sub-data representing the ith month in the second standby behavior data corresponding to the jth first standby behavior data,First standby sub-data representing the ith month in the jth first standby behavior data,Representing the minimum value of the first standby sub-data in the j-th first standby behavior data,Representing the maximum value of the first standby sub-data in the j-th first standby behavior data.
Still further, the step of determining the association of the second standby behavior data with the user based on the second standby sub-data to select a number of final behavior data from a number of the second standby behavior data includes:
Generating behavior change characteristics based on the adjacent second standby sub-data, generating change speed characteristics based on the adjacent behavior change characteristics, and generating change trend characteristics based on the adjacent change speed characteristics;
Fitting a plurality of change trend features into a trend straight line, acquiring the slope of the trend straight line, and comparing the slope with the attribute type;
If the slope is a positive value and the attribute type is a non-lost user, determining that the second standby behavior data corresponding to the slope is final behavior data;
And if the slope is a negative value and the attribute type is the lost user, judging the second standby behavior data corresponding to the slope as final behavior data.
Still further, the step of obtaining a plurality of key interaction data corresponding to the unique identification data includes:
Acquiring a plurality of interaction identification data associated with the unique identification data, and acquiring accumulated interaction times between the unique identification data and the interaction identification data;
And selecting a plurality of key interaction data from a plurality of interaction identification data based on the accumulated interaction times.
Still further, the step of obtaining a single interaction index between the unique identification data and the key interaction data comprises:
Acquiring a daily interaction value, a three-day interaction value, a week interaction value, a ten-day interaction value and a month interaction value between the unique identification data and the key interaction data in a preset time period;
the single engagement index is generated based on the day engagement value, the three day engagement value, the week engagement value, the ten-day engagement value, and the month engagement value.
Further, the calculation formula of the daily engagement value is as follows:
,
Wherein,The daily engagement value is indicated as such,Represents the number of interactions between the unique identification data and the key interaction data when the day is taken as a standard,Indicating the total number of days in a preset time period;
the calculation formula of the three-day interaction value is as follows:
,
Wherein,The three-day interaction value is indicated,When 3 days are taken as the standard, the number of times of interaction between the unique identification data and the key interaction data is counted;
The calculation formula of the single interaction index is as follows:
,
Wherein,Representing a single interaction index between the unique identification data and the a-th critical interaction data,The value of the week interaction is indicated,The value of the ten-day interaction is indicated,Representing the monthly contact value.
Still further, the step of determining the churn risk level for the non-churn user based on the churn index includes:
Setting a plurality of preset risk levels and index ranges corresponding to the preset risk levels;
And matching the loss index with a plurality of index ranges to select a loss risk level corresponding to the non-loss user from a plurality of preset risk levels.
In a second aspect, an embodiment of the present application provides a user data processing system, applied to the user data processing method described in the first aspect, where the system includes:
The processing module is used for acquiring unique identification data of a user, wherein the attribute type of the user comprises a non-lost user and a lost user, a plurality of user behavior data are extracted from a report library based on the unique identification data, the user behavior data comprise sub-behavior data under a plurality of continuous months, the user behavior data are subjected to data cleaning to form first standby behavior data, and the first standby behavior data comprise first standby sub-data under a plurality of continuous months;
The screening module is used for carrying out data transformation on the first standby behavior data to obtain second standby behavior data, the second standby behavior data comprises second standby sub-data under a plurality of continuous months, the relevance between the second standby behavior data and a user is judged based on the second standby sub-data, so that a plurality of final behavior data are selected from the plurality of second standby behavior data, and the final behavior data comprise final sub-data under a plurality of continuous months;
The first analysis module is used for acquiring a plurality of key interaction data corresponding to the unique identification data, acquiring a single interaction index between the unique identification data and the key interaction data, and generating a calibration interaction index corresponding to the unique identification data based on the plurality of single interaction indexes;
the second analysis module is used for acquiring a first behavior index and a second behavior index corresponding to the final behavior data through the final sub-data, generating a loss index corresponding to the unique identification data based on the calibrated interaction index, the first behavior index and the second behavior index, and determining the loss risk level of the non-lost user based on the loss index.
In a third aspect, an embodiment of the present application provides a computer, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the user data processing method according to the first aspect described above when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a user data processing method as described in the first aspect above.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a user data processing method provided in a first embodiment of the present invention includes the following steps:
S10, acquiring unique identification data of a user, wherein the attribute type of the user comprises a non-lost user and a lost user, extracting a plurality of user behavior data from a report library based on the unique identification data, wherein the user behavior data comprises a plurality of sub-behavior data under continuous months, and performing data cleaning on the user behavior data to form first standby behavior data, wherein the first standby behavior data comprises a plurality of first standby sub-data under continuous months;
in this embodiment, the unique identification data is a user number. The report library is provided with a plurality of word report libraries, and each sub report library is provided with corresponding user behavior data, wherein in the embodiment, the user behavior data comprises month flow consumption, month call time, month call times, month message sending amount, month use fee and the like. After extracting the user behavior data, the sub behavior data may have an abnormal condition due to a system error or the like, so that in order to ensure the accuracy of the subsequent go wrong risk level, the user behavior data needs to be processed correspondingly.
The step S10 includes:
S110, obtaining the maximum number of the sub-behavior data in the user behavior data, and judging whether missing months exist in the user behavior data or not based on the maximum number;
It will be appreciated that the number of sub-activity data is the same as the number of months and that each sub-activity data corresponds to a month, e.g. a user has used 1000M traffic for 1 month.
S120, if missing months exist in the user behavior data, acquiring missing quantity of the missing months, and comparing the missing quantity with a missing threshold value;
The missing threshold may be correspondingly transformed according to the number of sub-behavior data in the user behavior data. In this embodiment, the missing threshold is 30% of the number of the child behavior data in the user behavior data.
S130, if the missing quantity is smaller than the missing threshold value, acquiring the sub-behavior data corresponding to two months adjacent to the missing month to generate filling behavior data, and filling the filling behavior data into the missing month;
S140, if the missing quantity is larger than the missing threshold value, the preset behavior data are fed into the missing month;
In this embodiment, the preset behavior data is 0.
S150, constructing a threshold range based on all sub-behavior data in the user behavior data, and replacing abnormal data of the user behavior data based on the threshold range;
The method comprises the steps of sorting a plurality of pieces of sub-behavior data to select an upper quarter point and a lower quarter point from the plurality of pieces of sub-behavior data, determining a standard value based on the upper quarter point and the lower quarter point, taking the sum of the upper quarter point and the standard value as a high-point threshold value, taking the difference between the lower quarter point and the standard value as a low-point threshold value, and forming the threshold range through the low-point threshold value and the high-point threshold value. And comparing the sub-behavior data with the threshold range, and if the sub-behavior data is not in the threshold range, judging the sub-behavior data as abnormal data and replacing the abnormal data. The replacement of the abnormal data can be performed by the average value of two adjacent sub-behavior data, or can be performed by the data of the same month in different years after being averaged.
S20, carrying out data transformation on the first standby behavior data to obtain second standby behavior data, wherein the second standby behavior data comprises second standby sub-data under a plurality of continuous months, judging the relevance between the second standby behavior data and a user based on the second standby sub-data, so as to select a plurality of final behavior data from the plurality of second standby behavior data, and the final behavior data comprises final sub-data under a plurality of continuous months;
The step S20 includes:
s210, obtaining the maximum value and the minimum value of the first standby sub-data in the first standby behavior data;
s220, converting each first standby sub-data into a second standby sub-data based on the maximum value and the minimum value;
The calculation formula of the second standby sub-data is as follows:
,
Wherein,Second standby sub-data representing the ith month in the second standby behavior data corresponding to the jth first standby behavior data,First standby sub-data representing the ith month in the jth first standby behavior data,Representing the minimum value of the first standby sub-data in the j-th first standby behavior data,Representing the maximum value of the first standby sub-data in the j-th first standby behavior data.
S230, combining the second standby sub-data under a plurality of continuous months into the second standby behavior data;
Because the data dimensions of the different first standby behavior data are different, if the first standby behavior data are directly used for subsequent analysis and prediction, unified integration of the data is difficult to realize, and therefore, the first standby behavior data need to be converted into the second standby behavior data.
S240, generating behavior change characteristics based on the adjacent second standby sub-data, generating change speed characteristics based on the adjacent behavior change characteristics, and generating change trend characteristics based on the adjacent change speed characteristics;
The acquisition formula of the behavior change characteristics is as follows:
,
Wherein,Representing behavior change characteristics between the second standby sub-data of the i-th month and the second standby sub-data of the i-1-th month,Second standby sub-data representing the ith month in the second standby behavior data corresponding to the jth first standby behavior data,Representing second standby sub-data of the i-1 th month in the second standby behavior data corresponding to the j-th first standby behavior data,Representing a fixed value. The calculation modes of the change speed feature and the change trend feature are consistent with the calculation mode of the behavior change feature, and will not be described in detail here.
S250, fitting a plurality of change trend features into a trend straight line, acquiring the slope of the trend straight line, and comparing the slope with the attribute type;
S260, if the slope is a positive value and the attribute type is a non-lost user, judging the second standby behavior data corresponding to the slope as final behavior data;
S270, if the slope is a negative value and the attribute type is a lost user, judging the second standby behavior data corresponding to the slope as final behavior data;
It can be understood that if the slope is positive and the attribute type is a lost user, or the slope is negative and the attribute type is a non-lost user, the correlation between the second standby behavior data corresponding to the slope and the user is determined to be poor, and the second standby behavior data corresponding to the slope is rejected.
S30, acquiring a plurality of key interaction data corresponding to the unique identification data, acquiring a single interaction index between the unique identification data and the key interaction data, and generating a calibration interaction index corresponding to the unique identification data based on the plurality of single interaction indexes;
the step S30 includes:
s310, acquiring a plurality of interaction identification data associated with the unique identification data, and acquiring accumulated interaction times between the unique identification data and the interaction identification data;
S320, selecting a plurality of key interaction data from a plurality of interaction identification data based on the accumulated interaction times;
In this embodiment, the contact identification data and the key contact data are contact numbers. And screening and removing low-frequency interaction numbers to improve the accuracy of data analysis.
S330, acquiring a daily engagement value, a three-day engagement value, a week engagement value, a ten-day engagement value and a month engagement value between the unique identification data and the key engagement data in a preset time period;
in this embodiment, the preset time period is 3 months.
The calculation formula of the daily engagement value is as follows:
,
Wherein,The daily engagement value is indicated as such,Represents the number of interactions between the unique identification data and the key interaction data when the day is taken as a standard,Indicating the total number of days in the preset time period, wherein, taking 3 months as an example, 90 days are counted, if the interaction occurs in a certain day, the number of times of interaction is recorded and only 1 time is recorded,
The calculation formula of the three-day interaction value is as follows:
,
Wherein,The three-day interaction value is indicated,Taking 3 days as standard, the number of times of the unique identification data and the key data is counted, and taking 3 months as an example, 90 days are counted, and if the data is counted within 3 days, the number of times of the 1 time of the data is recorded and recorded. The cycle engagement value, the ten-day engagement value, and the month engagement value are obtained in a similar manner, and will not be described in detail here.
S340, generating the single interaction index based on the daily interaction value, the three-day interaction value, the week interaction value, the ten-day interaction value and the month interaction value;
The calculation formula of the single interaction index is as follows:
,
Wherein,Representing a single interaction index between the unique identification data and the a-th critical interaction data,The value of the week interaction is indicated,The value of the ten-day interaction is indicated,Representing the monthly contact value.
After a plurality of single interaction indexes are obtained, carrying out averaging treatment on the plurality of single interaction indexes so as to obtain the calibrated interaction indexes.
S40, acquiring a first behavior index and a second behavior index corresponding to the final behavior data through the final sub-data, generating a loss index corresponding to the unique identification data based on the calibrated interaction index, the first behavior index and the second behavior index, and determining a loss risk level of a non-lost user based on the loss index;
the first behavioral index obtaining formula is:
,
Wherein,A first behavioral index representing mth final behavioral data,Represents the nth final sub-data in the mth final behavior data,Representing the number of final sub-data in the mth final behavior data;
the second behavior index is obtained by the following formula:
,
Wherein,A second behavioural index representing mth final behavioural data,Final sub-data representing the last month in the mth final behavioral data.
The loss index is obtained by the following formula:
,
Wherein,Indicating the churn index corresponding to the b-user,Representing the number of final behavior data corresponding to the b-user,、、All represent weights, and the sum of the three is 1,Representing the nominal interaction index corresponding to user b.
The step S40 includes:
s410, setting a plurality of preset risk levels and index ranges corresponding to the preset risk levels;
In this embodiment, 4 preset risk levels are set, wherein the 4 preset risk levels are low risk, medium risk and high risk, the index range corresponding to the high risk is 0 to 0.25, the index range corresponding to the medium risk is 0.25 to 0.5, the index range corresponding to the medium risk is 0.5 to 0.75, and the index range corresponding to the low risk is 0.75 to 1.
And S420, matching the loss index with a plurality of index ranges to select a loss risk level corresponding to the non-loss user from a plurality of preset risk levels.
It can be understood that after the loss risk level is determined, prediction of user loss is completed, and then a corresponding user maintenance strategy can be determined according to the loss risk level, so as to achieve the purpose of stock preservation.
The method comprises the steps of extracting a plurality of user behavior data from a report library, normalizing different user behavior data to the same dimension after the data are cleaned and changed, judging whether a user has a loss risk or not by considering a plurality of user behavior data, improving accuracy of loss prediction and corresponding prediction coverage, removing part of second standby behavior data by selecting final behavior data, namely relevance of the second standby behavior data and the user, completing data simplification, avoiding influence of excessive data or low relevance data on accuracy of loss prediction, introducing the calibration interaction index on the basis of the user behavior data, and carrying out prediction of loss risk by assistance of stability of interaction circles formed between the unique identification data and the key interaction data, thereby further improving accuracy and coverage of loss prediction.
Referring to fig. 2, a second embodiment of the present invention provides a user data processing system, which is applied to the user data processing method in the above embodiment, and will not be described again. As used below, the terms "module," "unit," "sub-unit," and the like may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The system comprises:
The processing module 10 is configured to obtain unique identification data of a user, where an attribute type of the user includes a non-lost user and a lost user, extract a plurality of user behavior data from a report library based on the unique identification data, where the user behavior data includes sub-behavior data under a plurality of consecutive months, and perform data cleaning on the user behavior data to form first standby behavior data, where the first standby behavior data includes first standby sub-data under a plurality of consecutive months;
the processing module 10 includes:
the first unit is used for acquiring the maximum number of the sub-behavior data in the user behavior data, and judging whether missing months exist in the user behavior data or not based on the maximum number;
the second unit is used for acquiring the missing number of the missing months if the missing months exist in the user behavior data, and comparing the missing number with a missing threshold value;
a third unit, configured to obtain the sub-behavior data corresponding to two months adjacent to the missing month if the missing number is smaller than the missing threshold, so as to generate filling behavior data, and fill the filling behavior data into the missing month;
A fourth unit, configured to patch preset behavior data into the missing month if the missing number is greater than the missing threshold;
A fifth unit, configured to construct a threshold range based on all the sub-behavior data in the user behavior data, and perform abnormal data replacement on the user behavior data based on the threshold range;
The screening module 20 is configured to perform data transformation on the first standby behavior data to obtain second standby behavior data, where the second standby behavior data includes second standby sub-data under a plurality of consecutive months, determine, based on the second standby sub-data, a correlation between the second standby behavior data and a user, so as to select a plurality of final behavior data from the plurality of second standby behavior data, where the final behavior data includes final sub-data under a plurality of consecutive months;
The screening module 20 includes:
A sixth unit, configured to obtain a maximum value and a minimum value of the first standby sub-data in the first standby behavior data;
a seventh unit for converting each of the first inactive sub-data into second inactive sub-data based on the maximum value and the minimum value;
An eighth unit configured to combine the second standby sub-data for several consecutive months into the second standby behavior data;
A ninth unit, configured to generate a behavior change feature based on the adjacent second standby sub-data, generate a change speed feature based on the adjacent behavior change feature, and generate a change trend feature based on the adjacent change speed feature;
A tenth unit, configured to fit a plurality of the variation trend features to a trend line, obtain a slope of the trend line, and compare the slope with the attribute type;
An eleventh unit, configured to determine that the second standby behavior data corresponding to the slope is final behavior data if the slope is a positive value and the attribute type is a non-churn user;
A twelfth unit, configured to determine that the second standby behavior data corresponding to the slope is final behavior data if the slope is a negative value and the attribute type is a lost user;
A first analysis module 30, configured to obtain a plurality of key interaction data corresponding to the unique identification data, obtain a single interaction index between the unique identification data and the key interaction data, and generate a calibrated interaction index corresponding to the unique identification data based on the plurality of single interaction indexes;
the first analysis module 30 includes:
A thirteenth unit, configured to obtain a plurality of interaction identification data associated with the unique identification data, and obtain an accumulated interaction number between the unique identification data and the interaction identification data;
A fourteenth unit for selecting a plurality of key exchange data from a plurality of the exchange identification data based on the accumulated exchange times;
A fifteenth unit, configured to obtain a daily engagement value, a three-day engagement value, a week engagement value, a ten-day engagement value, and a month engagement value between the unique identification data and the key engagement data in a preset time period;
A sixteenth unit configured to generate the single engagement index based on the day engagement value, the three day engagement value, the week engagement value, the ten-day engagement value, and the month engagement value;
A second analysis module 40, configured to obtain a first behavioral index and a second behavioral index corresponding to the final behavioral data through the final sub-data, generate a loss index corresponding to the unique identification data based on the calibrated engagement index, the first behavioral index and the second behavioral index, and determine a loss risk level of the non-lost user based on the loss index;
the second analysis module 40 includes:
seventeenth unit, configured to set a plurality of preset risk levels and index ranges corresponding to the preset risk levels;
and an eighteenth unit, configured to match the loss index with a plurality of index ranges, so as to select a loss risk level corresponding to the non-lost user from a plurality of preset risk levels.
The invention also provides a computer, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the user data processing method in the technical scheme when executing the computer program.
The invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a user data processing method as described in the above-mentioned technical solution.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.