Disclosure of Invention
The invention aims to provide a double-cluster hot recommendation method and device based on a user behavior sequence, which are used for solving the problems in the technical background.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the first aspect of the application provides a double-cluster hot recommendation method based on a user behavior sequence, which comprises the following steps:
S1, constructing a network Graph of articles based on user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors which sequentially occur to different articles according to time sequence, the network Graph of the articles is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
S2, vector representation of each article node is obtained by using Graph Embedding method;
S3, clustering vector data of the object nodes through a preset clustering algorithm to generate K class clusters, wherein K is a positive integer;
S4, dividing the object with the behavior generated by each user into M sets according to the class clusters in the step S3, and adding and averaging vector data of object nodes in each set to obtain vector representations of M class clusters of each user, wherein the range of M is [1, K ];
s5, carrying out attenuation accumulation on the days of the behavior of the user under each class cluster from the current date, and then carrying out proportion calculation to obtain preference scores of the user on each class cluster corresponding to the user;
S6, carrying out weighted average on vectors of the user under each class cluster according to preference scores of the user on each class cluster corresponding to the user, and obtaining unique interest vector representation of the user;
s7, clustering the unique interest vectors of the users generated in the step S6 again by using a preset clustering algorithm, and recording the class clusters of each user;
S8, counting user behaviors under various clusters, calculating the click rate of each article, and arranging the articles in a descending order according to the click rate to form a popular list of the various clusters;
S9, recommending a hot list under the class cluster for each user.
Preferably, the step S1 includes forming a directed and unauthorized network Graph of clicking actions of the user according to the time of the clicking actions, wherein the network Graph comprises a plurality of object nodes.
Preferably, in the step S2, the method Graph Embedding includes one or more of RandomWalk algorithm and Node2Vector algorithm, and the method Graph Embedding is used to obtain Vector representations of the nodes of each article, and specifically includes the following steps:
s21, taking any one article node in the network Graph as an initial wander point;
S22, performing random walk near the initial walk point, and performing single walk for L times to generate a sequence with the length of L, wherein L represents the number of steps of the single random walk, and L is a positive integer;
S23, repeating the steps S21 and S22 for N times for each article node in the network Graph, and finally obtaining a sequence with the length of L of N times and V nodes, wherein N represents the number of random walks at each article node, V represents the number of article nodes included in the network Graph, and N, V are positive integers;
s24, the generated sequence data is calculated by using a Word2Vec model to obtain vector representation of each article node.
More preferably, the random walk process in step S22 specifically includes:
starting from any initial wander point of the network Graph, each wander step randomly selects one from a plurality of article nodes connected with the current article node, and continuously repeats the process until the set wander length is reached, and wander is stopped, so that new user behavior sequence data is obtained.
Preferably, in the step S3, the preset clustering algorithm is a KMeans clustering algorithm.
Preferably, the step S5 specifically includes the following steps:
S51, setting parameters of a decay_rate, a database_i and a user_cluster_score, wherein,
The decay rate is represented by the decay rate, and the value range is (0, 1);
the day_i represents the number of days that the ith action date of the user under a certain class of clusters is away from the current date;
user_cluster_score represents the preference score of the user under a certain class of clusters;
s52, performing attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula:
user_cluster_score=sum(decay_rate^days_i);
and S53, respectively performing proportion calculation on each accumulated sum obtained by the calculation in the step S62 to obtain the preference score of the user under each corresponding cluster, wherein the proportion calculation is the proportion of the preference score of the user under a certain cluster to the sum of the preference scores of all clusters corresponding to the user.
The second aspect of the present application provides a dual-cluster popular recommendation device based on a user behavior sequence, comprising:
The network Graph construction module is used for constructing a network Graph of the article based on user behavior sequence data, wherein each user behavior sequence comprises a plurality of user behaviors which sequentially occur to different articles according to time sequence, the network Graph of the article is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior;
the article node vector generation module is used for obtaining vector representation of each article node by using a Graph Embedding method;
The first clustering processing module is used for clustering vector data of the object nodes through a preset clustering algorithm to generate K class clusters, wherein K is a positive integer;
The class cluster vector generation module is used for dividing the object of which each user generates behaviors into M sets according to the class clusters generated by the first clustering module, and adding and averaging vector data of object nodes in each set to obtain vector representations of M class clusters of each user, wherein the range of M is [1, K ];
the first calculation module is used for carrying out attenuation accumulation on the days of the current date of the behavior of the user under each class cluster, and then carrying out proportion calculation to obtain preference scores of the user on each class cluster corresponding to the user;
The second calculation module is used for carrying out weighted average on the vectors of the user under each class cluster according to the preference score of the user on each class cluster corresponding to the second calculation module, so as to obtain the unique interest vector representation of the user;
the second clustering processing module is used for clustering the unique interest vectors of the users generated by the second computing module again by using a preset clustering algorithm, and recording the class clusters of each user;
The cluster recommendation list generation module is used for counting the user behaviors under various clusters, calculating the click rate of each article, and arranging the articles in a descending order according to the click rate to form a popular list of various clusters;
and the hot list recommending module is used for recommending hot lists under the class clusters for each user.
Preferably, in the first clustering processing module and the second clustering processing module, the preset clustering algorithm is a KMeans clustering algorithm.
Preferably, the first computing module includes:
The parameter setting unit is used for setting parameters, namely a decay rate, a date_i and a user_cluster_score, wherein the decay rate is represented by the date_i, the value range of the decay rate is (0, 1), the date_i represents the number of days, from the current date, of the ith action date of the user under a certain cluster, and the user_cluster_score represents the preference score of the user under the certain cluster;
The attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula, wherein the user_cluster_score=sum (decay_rate ≡days_i);
The proportion calculation unit is used for respectively carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit to obtain the preference score of the user under each corresponding cluster, wherein the proportion calculation is the proportion of the preference score of the user under a certain cluster to the sum of the preference scores of all clusters corresponding to the user.
In a third aspect, the application discloses an electronic device comprising:
Processor, and
A memory having executable code stored thereon which, when executed by a processor, causes the processor to perform the method according to the first aspect of the application.
A fourth aspect of the application provides a computer readable storage medium having stored thereon executable code which when executed by a processor of an electronic device causes the processor to perform the method of the first aspect of the application.
In the above, the article may comprise text, pictures, audio or video.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
according to the application, the double-clustering technology is applied according to the behavior data of the user, the defects of low precision and large error of the traditional recommendation method are overcome, compared with the clustering result of a single clustering algorithm, the adopted double-clustering algorithm has stronger pertinence to the corresponding recommendation result, the recommendation effect is improved, and better article recommendation can be carried out for the target user.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more obvious, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It is noted that the terms "first," "second," and the like in the description and claims of the present invention and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the data so used may be interchanged where appropriate. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Examples:
In many scenarios, a sequence of user actions needs to be analyzed and processed. The user behavior sequence is the occurrence process of a series of events such as clicking, accessing, purchasing and the like generated by a user in daily operation and use, can be expressed as a time sequence of an event set, and has the characteristics of fine granularity habit preference and the like of the user.
Fig. 1 is a flow diagram of a dual-cluster popular recommendation method based on a user behavior sequence.
Referring to fig. 1, a dual-cluster popular recommendation method based on a user behavior sequence specifically includes the following steps:
and S1, constructing a network Graph of the article based on the user behavior sequence data.
Each user behavior sequence comprises a plurality of user behaviors which are sequentially generated on different articles by the user according to time sequence, wherein a network Graph of the articles is composed of article nodes clicked by the user, and one article node represents one article corresponding to the user behavior.
In the information flow product, a large number of user click actions are generated. These click actions may constitute a directed, unauthorized network Graph comprising a plurality of item nodes, depending on the time at which the action occurred.
It should be noted that the object that the user has performed the action may be text, a picture, audio or video.
Taking news product App as an example, in the news product App, behavior data of a large number of users, such as click behavior data, can be obtained, and the dual-cluster popular recommendation method based on the user behavior sequence is specifically described below in connection with the application scene.
Referring to fig. 2, for example, when the user a clicks on news a, news b, and news c in succession, two edges from news a to news b and from news b to news c are generated in the network Graph. The clicking actions of other users are similar, and then together form a network Graph containing V item nodes.
And S2, obtaining vector representations of all the article nodes by using a Graph Embedding method.
Classical Graph Embedding techniques mainly include RandomWalk, node2 Vector.
The vector representation of each article node is obtained by using Graph Embedding method, which comprises the following steps:
Step S21, taking any one article node in the network Graph as an initial wander point.
Step S22, performing random walk around the initial walk point, and performing single walk for L times to generate a sequence with a length of L, wherein L represents the number of steps of the single random walk, and L is a positive integer.
The random walk process specifically comprises the steps of starting from any initial walk point of a network Graph, randomly selecting one from a plurality of article nodes connected with the current article node in each walk step, and continuously repeating the process until the set walk length is reached, and stopping the walk, so that new user behavior sequence data are obtained.
Step S23, repeating steps S21 and S22 for N times for each article node in the network Graph, and finally obtaining a sequence with a length of L for n×v nodes, where N represents the number of random walks at each article node, V represents the number of article nodes included in the network Graph, and N, V is a positive integer.
Step S24, using the generated sequence data, a Word2Vec model is applied to calculate the vector representation of each item node.
The word2vector algorithm is an open source algorithm, is a method for generating word vectors based on text sentences, uses vectors with specified dimensions to represent phrase information, and measures the relationship between words by using the vectors. In the present embodiment, the present embodiment is applied to article sequence data.
In this embodiment, assume that the sequence obtained by the walk sampling is:
[[a,b,d,e,g,h,v,f],
[w,r,f,h,v,s,n,d,k],
......]
Vector representations of each news are calculated by word2vector model, for example:
The vector of news a is denoted as a: [0.22,0.45,0.88,0.06,0.01,0.32];
The vector of news c is denoted as c: [0.24,0.47,0.86,0.12,0.03,0.28];
......
Here, depending on the algorithm used, there are different random walk modes, and the randomness is controlled by the parameters of the related algorithm.
And step S3, clustering the obtained news vector data through KMeans clustering algorithm to generate K class clusters, wherein K is a positive integer and is usually 5-20.
Wherein the Kmeans algorithm is an unsupervised clustering algorithm that clusters data sets of N samples into K clusters such that the sum of the variances of the clusters is minimal.
In this embodiment, the obtained news vector data are clustered into K class clusters, so that the nodes with the closer vector distance are finally divided into one class cluster as far as possible, otherwise, the nodes with the closer vector distance are divided into different class clusters, that is, the nodes with the closer vector distance are clustered together, and the nodes with the far distance are divided. This is because similar vectors will often approximate in some way, such as title text, category, author, or some underlying semantics, etc., so different clusters of classes represent different types.
Referring to fig. 3, the vector of news a, the vector of news f, the vector of news w and the vector of news C are similar in distance, representing the same type, and are classified into a cluster C1 after clustering, while the vector of news b and the vector of news h are similar in distance, representing another type, and are classified into a cluster C2 after clustering.
And S4, dividing the object of which each user generates behaviors into M sets according to the class clusters in the step S3, and adding and averaging the object vectors in the sets to obtain vector representations of the M class clusters of each user, wherein the range of M is [1, K ].
Referring to FIG. 4, in the article behaving by user A, the vector of news a and the vector of news C are divided into the same cluster C1 and the vector of news b is divided into the cluster C2, then, the vector of user A under the cluster C1 is represented as the average of the vector of news a and the vector of news C, and the vector of the cluster C1 is calculated to be [0.23,0.46,0.87,0.09,0.02,0.30], and the vector of user A under the cluster C2 is only one news b, so the vector of news b is represented as the vector of the cluster C2, and the assumption is that [0.10,0.22,0.12,0.33,0.40,0.12].
And S5, calculating preference scores of the user for each corresponding class cluster.
The method specifically comprises the following steps:
In step S51, parameters including a decay_rate, a days_i, a user_cluster_score are set, wherein,
The decay rate is represented by the decay rate, and the value range is (0, 1);
the day_i represents the number of days that the ith action date of the user under a certain class of clusters is away from the current date;
user_cluster_score represents the preference score of the user under a certain class of clusters.
Step S52, the following formula is adopted to respectively carry out attenuation accumulation on the behaviors of the user under each corresponding class cluster:
user_cluster_score=sum(decay_rate^days_i);
and step S53, respectively performing proportion calculation on each accumulated sum obtained by the step S52 to obtain the preference score of the user under each corresponding cluster, wherein the proportion calculation is the proportion of the preference score of the user under a certain cluster to the sum of the preference scores of all clusters corresponding to the user.
Referring to fig. 5, the time of the user a's action of clicking news a is 1 day from the current time, the time of the action of clicking news b is 2 days from the current time, and the time of the action of clicking news c is 3 days from the current time.
Setting the decay rate decay_rate to 0.9, the preference score of the user A for the cluster C1 is 0.9≡1+0.9≡3= 1.629, and the preference score of the user A for the cluster C2 is 0.9≡2=0.81 according to the cluster division.
And then carrying out proportion calculation, wherein the proportion calculation is as follows:
cluster-like c1: 1.629/(1.629+0.81) =0.668;
Cluster-like c2:0.81/(1.629+0.81) =0.332.
And S6, carrying out weighted average on the vectors of the user under each class cluster according to the preference score of the user for each class cluster corresponding to the user, and obtaining the unique interest vector representation of the user.
Specifically, in step S4 described above, it has been calculated that the vector of the user a for the cluster C1 is represented by [0.23,0.46,0.87,0.09,0.02,0.30], and the vector of the user a for the cluster C2 is represented by [0.10,0.22,0.12,0.33,0.40,0.12]. In the above step S5, it has been calculated that the preference score of the user a for the cluster C1 is 0.668, and the preference score of the user a for the cluster C2 is 0.332. Thus, the interest vector of the user A can be calculated, and the calculation method comprises the following steps:
[0.23,0.46,0.87,0.09,0.02,0.30]*0.668+[0.10,0.22,0.12,0.33,0.40,0.12]*0.332=[0.19,0.38,0.62,0.17,0.15,0.24]
and S7, clustering the unique interest vectors of the users generated in the step S6 again by using the KMeans clustering algorithm in the step S3, wherein the users can be clustered under 1~X class clusters, each user is divided into one class cluster, and the user groups under each class cluster have similar behavior preference, such as information like entertainment, history and the like.
Step S8, the clicking quantity of the user clicking information under each of the X class clusters is calculated respectively. For example, in the class cluster X1, the number of clicks of the user on the news a is 500, the number of clicks on the news b is 450, and the number of clicks on the news c is 200, and the clicking amounts thereof are arranged in a descending order to form a popular list of the class cluster X1, namely [ a, b, c ].
Step S9, when a certain user accesses the product App, recommending a hot list under the corresponding class cluster for the user.
It should be noted that the above application scenario is only an example of the embodiment of the present invention, and the embodiment of the present invention is not limited to the application scenario described above, but may be applied to any application scenario to which the embodiment of the present invention is applicable.
On the other hand, the application also discloses a double-cluster hot recommendation device based on the user behavior sequence.
Referring to fig. 6, the dual-cluster popular recommendation device based on the user behavior sequence includes a network Graph construction module 100, an article node vector generation module 200, a first cluster processing module 300, a cluster vector generation module 400, a first calculation module 500, a second calculation module 600, a second cluster processing module 700, a cluster recommendation list generation module 800 and a popular list recommendation module 900.
The network Graph construction module 100 is configured to construct a network Graph of an article based on user behavior sequence data, where each user behavior sequence includes a plurality of user behaviors that the user sequentially generates for different articles according to a time sequence, and the network Graph of the article is composed of article nodes clicked by the user, and one article node represents an article corresponding to the user behavior.
The article node vector generation module 200 is configured to obtain a vector representation of each article node using a Graph Embedding method.
The first clustering module 300 is configured to cluster vector data of object nodes by using a preset clustering algorithm, so as to generate K class clusters, where K is a positive integer, and the preset clustering algorithm is preferably KMeans clustering algorithm.
The cluster vector generation module 400 is configured to divide the object whose behavior has been generated by each user into M sets according to the cluster generated by the first cluster processing module 300, and sum and average vector data of object nodes in each set to obtain vector representations of M clusters of each user, where M ranges from [1, k ].
The first calculation module 500 is configured to perform attenuation accumulation on the number of days of the user's behavior under each class cluster from the current date, and perform proportional calculation to obtain a preference score of the user for each class cluster corresponding to the user.
The second calculation module 600 is configured to perform weighted average on the vectors of the user under each class cluster according to the preference score of the user for each class cluster corresponding to the user, so as to obtain a unique interest vector representation of the user.
The second clustering module 700 is configured to cluster the unique interest vectors of the users generated by the second computing module 600 again by using a preset clustering algorithm (for example, using the KMeans clustering algorithm in the first clustering module 300), and record a cluster of each user.
The cluster recommendation list generation module 800 is configured to count user behaviors under various clusters, calculate a click rate of each item, and arrange the click rates in descending order to form a popular list of various clusters.
The hot list recommending module 900 is configured to recommend, to each user, a hot list under a cluster of the class where the user is located.
Wherein the first computing module 500 includes:
The parameter setting unit is used for setting parameters, namely a decay rate, a date_i and a user_cluster_score, wherein the decay rate is represented by the date_i, the value range of the decay rate is (0, 1), the date_i represents the number of days, from the current date, of the ith action date of the user under a certain cluster, and the user_cluster_score represents the preference score of the user under the certain cluster;
The attenuation accumulation unit is used for respectively carrying out attenuation accumulation on the behaviors of the user under each corresponding class cluster by adopting the following formula, wherein the user_cluster_score=sum (decay_rate ≡days_i);
The proportion calculation unit is used for respectively carrying out proportion calculation on each accumulated sum obtained by the attenuation accumulation unit to obtain the preference score of the user under each corresponding cluster, wherein the proportion calculation is the proportion of the preference score of the user under a certain cluster to the sum of the preference scores of all clusters corresponding to the user.
FIG. 7 illustrates a schematic diagram of a computing device in accordance with an embodiment of the present disclosure.
As shown in fig. 7, the computing device of the present disclosure may include a processor and a memory. And a memory for storing a computer program, wherein the processor executes the computer program in the memory to implement the methods provided by the method embodiments described above. The specific implementation process can be referred to the relevant description above, and will not be repeated here.
In an embodiment, an electronic device is used for illustrating the double-cluster hot recommending device based on the user behavior sequence. The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.
The memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and a processor may execute the program instructions to implement the methods and/or other desired functions in the various embodiments of the present application above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
Furthermore, the present invention provides a computer-readable storage medium having stored therein a computer program for implementing the method provided by the method embodiments described above when executed by a processor.
In practice, the computer program in this embodiment may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
In practice, a computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description of the specific embodiments of the present invention has been given by way of example only, and the present invention is not limited to the above described specific embodiments. Any equivalent modifications and substitutions for the present invention will occur to those skilled in the art, and are also within the scope of the present invention. Accordingly, equivalent changes and modifications are intended to be included within the scope of the present invention without departing from the spirit and scope thereof.