Disclosure of Invention
Aiming at the problems in the related art, the invention provides a method for predicting the user behavior and assisting business handling by an intelligent counter, which aims to overcome the technical problems in the prior related art.
For this purpose, the invention adopts the following specific technical scheme:
a method for predicting user behavior and assisting business handling by an intelligent counter, comprising the following steps:
S1, acquiring user basic information and behavior buried point information, and preprocessing;
s2, constructing a predicted user behavior model based on a naive Bayes classifier by combining the preprocessed user basic information and behavior buried point information, and training;
s3, classifying the users based on a predicted user behavior model in combination with a neighbor collaborative filtering algorithm, and predicting the service types of the classified users;
s4, screening the predicted user service types, and pushing the screened service types to the client of the client manager.
Further, the step of obtaining the user basic information and the behavior buried point information and preprocessing comprises the following steps:
S11, burying points in user behavior data based on function keys and operation flow steps in a mobile phone bank, and acquiring behavior burying point information;
s12, extracting user basic information according to a user hall face recognition system;
And S13, collecting the user basic information and the behavior buried point information by using the big data platform, collecting the user basic information and the behavior buried point information which are relative in a row, and processing.
Further, the behavior buried point information comprises service types, access times and access time;
The user basic information includes gender, age, native place, current residence, academic, profession, employment nature, annual income, whether or not it is a corporate legal.
Further, the collecting the user basic information and the behavior embedded point information by using the big data platform, collecting the corresponding user basic information and behavior embedded point information in the row, and processing the user basic information and the behavior embedded point information comprises the following steps:
s131, carrying out data preprocessing on behavior buried point information of a user by utilizing a big data platform;
s132, performing code conversion on discrete variables in the user basic information, and processing corresponding continuous variables in the user basic information by combining a probability density function.
Further, the probability density function has the expression:
p(xi|c)~N(μ,σ)
Where p (xi |c) is expressed as a probability density value on the ith attribute of the c-th class of samples, xi is expressed as a value of the ith attribute of the samples, c is expressed as a class of samples, N is expressed as a positive-fit distribution, μ is expressed as a mean on the ith attribute of the c-th class of samples, and σ is expressed as a variance value on the ith attribute of the c-th class of samples.
Further, the method for classifying the users based on the predicted user behavior model and combining the neighbor collaborative filtering algorithm, and predicting the service types of the classified users comprises the following steps:
s31, searching corresponding behavior buried point information in a row based on basic information of the user, and dividing the user into a new user and an old user;
s32, predicting the service type of the new user by combining the basic information of the user based on a neighbor collaborative filtering algorithm;
s33, predicting the service type of the old user by predicting a user behavior model and utilizing the basic information of the user and corresponding behavior buried point information in the row.
Further, the method for predicting the service type of the new user by combining the basic information of the user based on the neighbor collaborative filtering algorithm comprises the following steps:
s321, constructing feature vectors of the word segmentation based on the preprocessed user basic information, and carrying out Pearson similarity calculation on the user according to the feature vectors to obtain a Pearson similarity result;
S322, determining the most similar user types according to the Pearson similarity result, sorting and screening according to the ascending order based on the transacted service types of the most similar user types, and predicting the service types of the new user by using the screened service types.
Further, the Pearson similarity formula is:
SXY=cov(X,Y)/(σ(X)*σ(Y))
Where SXY denotes Pearson similarity, cov (X, Y) denotes covariance formula, σ (X) denotes standard deviation of X, and σ (Y) denotes standard deviation of Y.
Further, the expression of the predicted user behavior model is:
where P (Cj) represents the prior probability of the category Cj, P (cj|x) represents the conditional probability of the occurrence of the characteristic X under the condition of the category Cj, d represents the number of attributes, xi represents the value of X on the ith attribute, and P (X) represents the full probability.
Further, the method for screening the predicted user service types and pushing the screened service types to the client of the client manager comprises the following steps:
S41, selecting the service type of the predicted user from high to low according to the preset number and the combination probability;
S42, pushing the selected service type of the user to the client manager client.
The beneficial effects of the invention are as follows:
The invention realizes automatic and refined classification of the clients by combining the multi-dimensional data such as personal information, historical transaction records, behavior habits and the like of the clients through a big data platform, not only improves the accuracy of classification, but also reduces the early preparation time of the client manager, the bank client lobby classifies the clients according to the basic information of the clients, predicts the possible transacted business types of the clients according to the personal information, the historical transaction records, the behavior habits and the like of the clients, reduces the communication cost of the client manager and the clients, improves the transacted business efficiency, and can prepare in advance by generating the business demand prediction report of the clients in advance and transmitting the report to the client manager before the client manager contacts the clients, thereby better coping with the business demands of the clients and improving the service quality and the client satisfaction.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.
According to the embodiment of the invention, a method for predicting user behavior and assisting business handling by an intelligent counter is provided.
The invention will now be further described with reference to the drawings and detailed description, as shown in fig. 1 and 2, according to one embodiment of the invention, there is provided a method of predicting user behavior for intelligent counter-assisted business, the method of assisted business comprising the steps of:
S1, acquiring user basic information and behavior buried point information, and preprocessing.
In one embodiment, the acquiring the user basic information and the behavior buried point information and preprocessing includes the following steps:
S11, burying points of user behavior data based on function keys and operation flow steps in the mobile phone bank, and acquiring behavior burying point information.
In one embodiment, the behavior buried point information comprises business type, access times and access time, and the user basic information comprises gender, age, native, current residence, academic, occupation, hire enterprise property, annual income and whether the user is an enterprise legal.
S12, extracting user basic information according to a user hall face recognition system.
And S13, collecting the user basic information and the behavior buried point information by using the big data platform, collecting the user basic information and the behavior buried point information which are relative in a row, and processing.
In one embodiment, the collecting the user basic information and the behavior embedded point information in the row by using the big data platform, and processing the user basic information and the behavior embedded point information includes the following steps:
s131, carrying out data preprocessing on behavior buried point information of a user by utilizing a big data platform;
s132, performing code conversion on discrete variables in the user basic information, and processing corresponding continuous variables in the user basic information by combining a probability density function.
In one embodiment, the probability density function is expressed as:
p(xi|c)~N(μ,σ)
Where p (xi |c) is expressed as a probability density value on the ith attribute of the c-th class of samples, xi is expressed as a value of the ith attribute of the samples, c is expressed as a class of samples, N is expressed as a positive-fit distribution, μ is expressed as a mean on the ith attribute of the c-th class of samples, and σ is expressed as a variance value on the ith attribute of the c-th class of samples.
The method comprises the steps of carrying out user behavior data burial points through function keys and operation flow steps in a mobile phone bank, extracting basic information such as client face characteristics, gender, age, dressing, mental state and the like through a client hall face recognition system, collecting personal user information in a row and mobile phone bank user behavior burial point information through a big data platform, wherein the personal user information comprises gender, age, native, existing places, academic, occupation, employment enterprise properties, annual income, enterprise legal persons and the like, carrying out operation burial point data of [ operation 1, operation 2, operation 3, operation 4 ]. The number of the user is [ operation 2 ], carrying out data preprocessing on the client face data through the operation steps, carrying out encoding processing on the user information data through the big data platform, carrying out Label encoding mapping processing on discrete variables, namely 0:1, other 2, and carrying out encoding processing on the discrete variables, namely 0:0, 0:1, 0:0, 0:0:0, 0:0.5, and 0:0:0.0.0.0.0.5.0.0.0.0.0.0.0.0.0.0.
S2, constructing a predicted user behavior model based on a naive Bayes classifier by combining the preprocessed user basic information and behavior buried point information, and training.
S3, classifying the users based on the predicted user behavior model in combination with a neighbor collaborative filtering algorithm, and predicting the service types of the classified users.
In one embodiment, the classifying the users based on the predicted user behavior model in combination with the neighbor collaborative filtering algorithm, and predicting the service types of the classified users includes the following steps:
S31, searching corresponding behavior buried point information in a row based on basic information of the user, and dividing the user into a new user and an old user.
S32, predicting the service type of the new user based on the neighbor collaborative filtering algorithm and combining the basic information of the user.
In one embodiment, the predicting the service type of the new user based on the neighbor collaborative filtering algorithm in combination with the basic information of the user includes the following steps:
s321, constructing feature vectors of the word segmentation based on the preprocessed user basic information, and carrying out Pearson similarity calculation on the user according to the feature vectors to obtain a Pearson similarity result;
S322, determining the most similar user types according to the Pearson similarity result, sorting and screening according to the ascending order based on the transacted service types of the most similar user types, and predicting the service types of the new user by using the screened service types.
S33, predicting the service type of the old user by predicting a user behavior model and utilizing the basic information of the user and corresponding behavior buried point information in the row.
In one embodiment, the Pearson similarity formula is:
SXY=cov(X,Y)/(σ(X)*σ(Y))
Where SXY denotes Pearson similarity, cov (X, Y) denotes covariance formula, σ (X) denotes standard deviation of X, and σ (Y) denotes standard deviation of Y.
It should be explained that, for the user behavior data, feature vectors of the word are formed by word2Vec algorithm, namely [ gender, age, place, current residence, academic, occupation, hired enterprise property, annual income, whether enterprise legal person, operation 1, operation 2, operation 3, operation 4, ] - > [ category c ].
In addition, if the client belongs to a new client, the client is classified by utilizing a neighbor collaborative filtering algorithm by combining peripheral client information data and in-line client information data, and the neighbor collaborative filtering algorithm adopts Pearson similarity to generate the type of most frequently transacted business of the classified user in a sequencing way.
Wherein, pearson similarity between two vectors:
def pearson(v1,v2):
v1_mean=np.mean(v1)
v2_mean=np.mean(v2)
return(np.dot(v1-v1_mean,v2-v2_mean))/(np.linalg().norm(v1-v1_mean)*np.linalg().norm(v2-v2_mean))
1) The New client feature vector New_user_V and the historical User list feature vector marked in the row are subjected to Pearson similarity calculation one by one;
2) Ascending order sorting is carried out on the Pearson similarity;
3) Taking the user categories corresponding to the first K Pearson similarity (generally K is 10);
4) And taking the category with the largest category number in the K categories as the business category of the new user, and recommending the business type.
In addition, if the clients are existing old clients in the line, based on the basic information of the users, the historical data of the buried points of the operation behaviors of the users and the corresponding historical data of the transacted business types, the clients are combined into training samples, a naive Bayesian classifier prediction model is trained, then when the existing old clients appear in a service hall, the basic information of the users and the buried point data of the recent operation behaviors of the users are acquired, the business types to be transacted by the users are predicted, the predicted business types are ordered in ascending order, the types in the first 10 are selected, and the client manager clients are pushed.
In one embodiment, the expression of the predictive user behavior model is:
Wherein P (Cj) represents the prior probability of the category Cj, P (cj|X) represents the conditional probability of the appearance of the characteristic X under the condition of the category Cj, d represents the number of attributes, xi represents the value of X on the ith attribute, and P (X) represents the full probability;
Wherein, since P (X) is the same for all categories, the calculation of P (X) can be omitted when comparing probabilities of different categories.
It should be explained that the naive bayes classifier builds a model of predicted user behavior by calculating the probability that a given sample to be classified belongs to each class and then dividing the sample into the class with the highest probability.
In addition, P (Cj) is the prior probability of category Cj, P (cj|x) is the conditional probability of the occurrence of characteristic X under category Cj, also known as likelihood function, for naive bayes classifier model, is a classification model based on independent assumption of bayes theorem and feature condition, which assumes that features are independent of each other under a given category, i.e. the occurrence probability of one feature is not affected by other features, soWhere d is the number of attributes, xi is the value of X on the ith attribute, p (X) is the full probability, also called evidence, and the occurrence of the event is compressed in the sample space of X, since p (X) is the same for all classes, the calculation of p (X) can be omitted when comparing the probabilities of different classes.
For the followingCalculating, running Laplace smoothing process to avoid probability of 0 of one item, resulting in 0, multiplying multiple probabilities, i.eOne of the classes will appear to have a profile of 0 and the final result will be 0, and to avoid this effect, the numerator will initialize to 1 and the denominator will initialize to 2 when calculating the probabilities for the respective classes, this method is laplace smoothing, also known as plus 1 smoothing. Due toMultiplying too many small numbers causes underflow and the product result is taken as a natural logarithm.
S4, screening the predicted user service types, and pushing the screened service types to the client of the client manager.
In one embodiment, the method for screening the predicted user service types and pushing the screened service types to the client of the client manager comprises the following steps:
S41, selecting the service type of the predicted user from high to low according to the preset number and the combination probability;
S42, pushing the selected service type of the user to the client manager client.
The method includes the steps that a user enters a service hall, calls a camera to identify the user, and obtains user information; the method comprises the steps of receiving user information and behavior operation data, transmitting a naive Bayesian model, predicting user behaviors, sequencing from big to small according to probability, taking nearly 10 high-probability behaviors, pushing an intelligent client APP, enabling a client manager to see the predicted client behaviors, exchanging with a client, and assisting the client to transact business.
In addition, the Bayesian algorithm is a generic name of a class of algorithms based on Bayesian theorem, and has wide application in fields of statistics, machine learning, data mining and the like. The bayesian theorem is an theorem about the conditional probability of random events a and B, and its formula is:
P(A|B)=P(B|A)*P(A)/P(B)
Where P (A-B) represents the probability of event A occurring under the condition that event B occurs, P (B-A) represents the probability of event B occurring under the condition that event A occurs, P (A) represents the probability of event A occurring, and P (B) represents the probability of event B occurring.
The core idea of bayesian algorithms is to use known probabilities to predict the unknown probabilities. In practice, bayesian algorithms are commonly used for classification problems, especially in cases of insufficient data volume, which can be predicted by a priori probabilities and a posterior probabilities.
In addition, the normal distribution (Normal distribution), also known as the normal distribution or Gaussian distribution, is commonly referred to as X-N (μ, σ). Where μ represents the mathematical expectation (mean) of the normal distribution and σ represents the variance of the normal distribution. The normal distribution of μ=0, σ=1 is called a standard normal distribution.
Covariance (Covariance) is used in probability theory and statistics to measure the overall error of two variables. And variance is a special case of covariance, i.e. when the two variables are identical.
Covariance represents the error of the population of two variables, as opposed to the variance representing the error of only one variable. If the trends of the two variables are identical, that is to say if one is greater than the expected value of itself and the other is greater than the expected value of itself, the covariance between the two variables is a positive value. If the trend of the two variables is opposite, i.e. one is greater than the expected value of the variable and the other is less than the expected value of the variable, the covariance between the two variables is negative.
The covariance Cov (X, Y) between two real random variables X and Y, where the expected values are E [ X ] and E [ Y ], respectively, is defined as:
Coυ(X,Y)=E[(X-E[X])(Y-E[Y])]
=E[XY]-2E[Y]E[X]+E[X]E[Y]
=E[XY]-E[X]E[Y]
Pearson (Pearson) similarity, also known as Pearson correlation coefficient, is used to measure the degree of correlation between two variables. The value range is [ -1,1], 1 represents complete correlation, 0 represents no relation, and-1 represents complete negative correlation.
In order to facilitate understanding of the above technical solution of the present invention, the following technical solution of the present invention from the viewpoints of architecture and principle is further described, specifically as follows, as shown in fig. 3:
The client sends a request to the access scheduling layer, the protocol access module communicates with the distribution scheduling module, the access scheduling layer sends a request to the recommendation algorithm layer, the recall module in the recommendation algorithm layer communicates with the filtering module, the fine-ranking module and the mixed-ranking module in sequence, the recall module communicates with the recall service module in the public assembly, and the fine-ranking module communicates with the ranking service module in the public assembly.
The method comprises the steps that a client reports a log to a reporting center, the reporting center sends a message queue to a real-time processing module, the real-time processing module comprises a log processing module, a data statistics module, a click rate updating module and an image updating module, the real-time processing module sends floor data to a data warehouse in a storage unit, meanwhile, the real-time processing module sends log statistics to an index object data module in the storage unit, the real-time processing module sends feature updating to a cache buffer database in the storage unit, the data warehouse in the storage unit sends an offline data index to the index object data module in the storage unit, the index object data module in the storage unit is communicated with a report system in the storage unit, meanwhile, the index object data module in the storage unit sends a user feature to a sequencing service module in a public component, a memory index module in the storage unit sends an index table to a recall service module in the public component, the public component is communicated with a recommendation algorithm layer, the recommendation algorithm layer returns a scheduling result to an access layer, and the access scheduling layer returns the result to the client.
In summary, by means of the technical scheme, the automatic and refined classification of the clients is realized through the big data platform by combining the multi-dimensional data such as the personal information, the historical transaction record and the behavior habit of the clients, the accuracy of classification is improved, the early preparation working time of the clients is reduced, the bank client lobby classifies the clients according to the basic information of the clients, the possible business types of the clients are predicted according to the personal information, the historical transaction record and the behavior habit of the clients, the communication cost of the clients and the clients is reduced, the business handling efficiency is improved, and the business demand prediction report of the clients is generated in advance and sent to the clients before the clients are contacted with the clients, so that the clients can be prepared in advance, the business demands of the clients are better met, and the service quality and the client satisfaction are improved.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.