Television program topic recommendation method and system based on non-negative matrix factorizationTechnical Field
The invention relates to the technical field of big data, in particular to a television program topic recommendation method and system.
Background
In personalized recommendation of the smart television, the problem that the labels intersect in the interest field is not noticed, namely, some labels are preferred by users with the same interest psychology, such as action classes and martial arts, and generally the action piece is scored high and the martial arts are scored high. However, in the algorithm for generating the recommendation list, the scores of the tags are mainly sorted according to the user, and if the scores of the user for swordsmen, action, antique and the like are high, the tags covered by the recommendation list programs are mostly the same. But there are also tags that have user scores centered on top, e.g., inspirations, adventures, comedies, etc. The proportion of the programs of the subjects such as the swordsmen action is too large, so that the recommendation space for the motivation, the adventure, the comedy and the like is extremely small, and the possibility of pushing the parts is extremely small. Therefore, the labels included in the recommended programs cannot reasonably cover the user portrait, and the recommendation effect is reduced.
non-Negative Matrix Factorization (NMF), proposed by Lee and Seung in the journal of nature 1999, is a Matrix factorization method that makes all the components after factorization non-negative (requiring purely additive description) and at the same time implements nonlinear dimensionality reduction. NMF has become one of the most popular multidimensional data processing tools in the research fields of signal processing, biomedical engineering, pattern recognition, computer vision, and image engineering.
Disclosure of Invention
The invention aims to solve the problem that the conventional television program recommendation cannot reasonably cover the portrait of a user, and provides a television program topic recommendation method and system based on non-negative matrix decomposition.
The technical scheme adopted by the invention for solving the technical problems is as follows: the television program topic recommendation method based on non-negative matrix factorization comprises the following steps:
step 1, obtaining label information in real time when a television program is to be recommended to a user, wherein the label information comprises: the system comprises user portrait data, program label data and thematic label data, wherein the user portrait data are used for representing the scores of users on each label, the program label data are used for representing television program information, and the thematic label data are used for representing one type of television program information;
step 2, constructing a non-negative matrix decomposition model, converting the label information into a full label matrix, and performing decomposition and dimension reduction processing on the full label matrix according to the non-negative matrix decomposition model to obtain a probability matrix, wherein the probability matrix is label information generated by merging according to the relevance between thematic labels;
and 3, reconstructing user portrait data and thematic label data according to the label information in the probability matrix, and recommending the television program thematic according to the reconstructed user portrait data and thematic label data.
Further, in order to realize the decomposition of the full label matrix, in step 2, the decomposing and dimension reduction processing is performed on the full label matrix through the nonnegative matrix decomposition model to obtain the probability matrix specifically:
and training the non-negative matrix decomposition model by using the program label data as training data, and fitting by using the user portrait data and the special label data as fitting data to obtain a probability matrix subjected to full label matrix decomposition and dimension reduction.
Further, in order to implement recommendation of tv program topics, in step 3, the recommending of tv program topics according to the reconstructed user portrait data and topic label data includes:
calculating cosine similarity between reconstructed user portrait data and thematic label data, sorting the cosine similarity from big to small, and selecting and recommending television program thematic corresponding to N thematic label data before the cosine similarity ranking, wherein N is a positive integer greater than or equal to 1.
Further, to realize the calculation of the cosine similarity, the cosine similarity calculation formula is as follows:
in the formula, x represents a vector modulo length corresponding to user image data, y represents a vector modulo length corresponding to thematic tag data, and T represents transposition.
The invention also provides a television program topic recommendation system based on non-negative matrix factorization, which comprises the following steps:
the system comprises an acquisition unit and a recommendation unit, wherein the acquisition unit is used for acquiring label information in real time when a television program is to be recommended to a user, and the label information comprises: the system comprises user portrait data, program label data and thematic label data, wherein the user portrait data are used for representing the scores of users on each label, the program label data are used for representing television program information, and the thematic label data are used for representing one type of television program information;
the nonnegative matrix decomposition model is used for performing decomposition and dimension reduction processing on a full label matrix to obtain a probability matrix, the full label matrix is obtained by converting the label information, and the probability matrix is label information generated by combining the label information according to the association degree between thematic labels;
and the recommending unit is used for reconstructing the user portrait data and the thematic label data according to the label information in the probability matrix and recommending the television program thematic according to the reconstructed user portrait data and the thematic label data.
Further, the decomposing and dimension reduction processing on the full label matrix to obtain the probability matrix specifically includes:
and training the non-negative matrix decomposition model by using the program label data as training data, and fitting by using the user portrait data and the special label data as fitting data to obtain a probability matrix after decomposition and dimension reduction processing is performed on the full label matrix.
Further, the recommending unit is further configured to:
calculating cosine similarity between reconstructed user portrait data and thematic label data, sorting the cosine similarity from big to small, and selecting and recommending television program thematic corresponding to N thematic label data before the cosine similarity ranking, wherein N is a positive integer greater than or equal to 1.
Further, the cosine similarity calculation formula is as follows:
in the formula, x represents a vector modulo length corresponding to user image data, y represents a vector modulo length corresponding to thematic tag data, and T represents transposition.
The invention has the beneficial effects that: according to the television program topic recommendation method and system based on non-negative matrix decomposition, a full label matrix is decomposed based on a non-negative matrix decomposition model, a probability matrix containing user portrait data and topic label data after dimension reduction is obtained, labels are optimized, and reasonable distribution of a user recommendation list in an interest field is finally achieved through combination of the interest field.
Drawings
FIG. 1 is a schematic diagram of a non-negative matrix factorization;
FIG. 2 is a schematic flow chart of a non-negative matrix factorization-based television program topic recommendation method according to the present invention;
fig. 3 is a schematic structural diagram of the television program topic recommendation system based on non-negative matrix factorization according to the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Various dimension reduction methods are contrastively analyzed, and in the dimension reduction of high-dimensional data, the distance-based dimension reduction method shows that the nearest neighbor and the farthest neighbor are almost equidistant in most cases in the high-dimensional data, and the dimension reduction effect is poor, so that the distance-based dimension reduction method is not applicable any more. The NMF non-negative matrix decomposition is to reduce the dimension based on the relevance between the features, the label data of the television program belongs to high-dimensional data, and the intersection of the labels in the interest field is analyzed by analyzing the relevance between the labels, so as to determine whether to merge the dimension reduction, therefore, the invention adopts the NMF non-negative matrix decomposition to compress the labels.
When the label of the special recommendation is compressed and reduced in dimension by adopting non-negative matrix decomposition, the dimension reduction principle is shown in fig. 1, a label matrix before dimension reduction in the special recommendation is a full label matrix V, the full label matrix V is an F × N order matrix, a matrix W is a scoring matrix for scoring all labels of each user, the scoring matrix W is an F × K order matrix, and H is a K × N order probability matrix, and the probability matrix H is used for reflecting label data after merging according to intersection of the labels in the interest field.
The invention relates to a television program topic recommendation method based on non-negative matrix factorization, which comprises the following steps as shown in figure 1: the method comprises the following steps of S1, obtaining label information in real time when a television program is to be recommended to a user, wherein the label information comprises: the system comprises user portrait data, program label data and thematic label data, wherein the user portrait data are used for representing the scores of users on each label, the program label data are used for representing television program information, and the thematic label data are used for representing one type of television program information; s2, constructing a non-negative matrix decomposition model, converting the label information into a full label matrix, and performing decomposition and dimension reduction processing on the full label matrix through the non-negative matrix decomposition model to obtain a probability matrix, wherein the probability matrix is label information generated by combining according to the relevance between thematic labels; and S3, reconstructing user portrait data and thematic label data according to the label information in the probability matrix, and recommending the television program thematic according to the reconstructed user portrait data and thematic label data.
Firstly, on the basis of recommending television programs according to scores of television program labels of users in a conventional way, label information when television programs are recommended to the users is obtained, wherein the label information comprises: the system comprises user portrait data, program label data and thematic label data, wherein the user portrait data is used for representing scores of each label of a user, the program label data is used for representing television program information such as West shorthand, Honorameng and the like, the thematic label data is used for representing television program information of one type such as martial arts, ancient costumes, love, families and the like, label dimensions of the user portrait data, the program label data and the thematic label data are dynamically increased and updated in real time, if the dimensions of the three are respectively A, B and C, a union set of the three is taken as a label dimension, namely A U B U C, and all labels are digitally encoded according to the size of the union dimension.
And then, constructing a non-negative matrix decomposition model, converting all label information subjected to digital coding into an array full label matrix, inputting the array full label matrix into the constructed non-negative matrix decomposition model, decomposing and reducing dimensions of the full label matrix, specifically, obtaining intersections among the labels in the interest field according to analysis of the relevance among the labels, and performing fusion and dimension reduction according to the intersections to obtain corresponding scoring matrixes and probability matrixes, wherein the probability matrixes are label information generated by merging according to the relevance among the thematic labels.
The probability matrix obtained by performing decomposition and dimension reduction on the full label matrix through the nonnegative matrix decomposition model may specifically be: and training the non-negative matrix decomposition model by using the program label data as training data, and fitting by using the user portrait data and the special label data as fitting data to obtain a probability matrix subjected to full label matrix decomposition and dimension reduction.
And finally, recommending the TV program topics according to the user portrait data and the topic label data in the probability matrix, specifically, calculating cosine similarity between the reconstructed user portrait data and the topic label data, sequencing the cosine similarity from large to small, and selecting the TV program topics corresponding to the topic label data with the cosine similarity ranking top N for recommendation, wherein N is a positive integer greater than or equal to 1. The reconstructed thematic label data comprises new thematic label data, such as martial arts and emotional styles, wherein the martial arts comprise swordsmen and ancient costumes, and the emotional styles comprise love and family styles. And then optimizing the tags, and finally realizing reasonable distribution of the user recommendation list in the interest field through combination of the interest field.
The cosine similarity may be calculated as:
in the formula, x represents a vector modulo length corresponding to user image data, y represents a vector modulo length corresponding to thematic tag data, and T represents transposition.
Based on the above technical solution, the present invention further provides a television program topic recommendation system based on non-negative matrix factorization, as shown in fig. 3, including:
the system comprises an acquisition unit and a recommendation unit, wherein the acquisition unit is used for acquiring label information in real time when a television program is to be recommended to a user, and the label information comprises: the system comprises user portrait data, program label data and thematic label data, wherein the user portrait data are used for representing the scores of users on each label, the program label data are used for representing television program information, and the thematic label data are used for representing one type of television program information;
the nonnegative matrix decomposition model is used for performing decomposition and dimension reduction processing on a full label matrix to obtain a probability matrix, the full label matrix is obtained by converting the label information, and the probability matrix is label information generated by combining the label information according to the association degree between thematic labels;
and the recommending unit is used for reconstructing the user portrait data and the thematic label data according to the label information in the probability matrix and recommending the television program thematic according to the reconstructed user portrait data and the thematic label data.
Optionally, the decomposing and dimension-reducing processing on the full label matrix to obtain the probability matrix specifically includes:
and training the non-negative matrix decomposition model by using the program label data as training data, and fitting by using the user portrait data and the special label data as fitting data to obtain a probability matrix after decomposition and dimension reduction processing is performed on the full label matrix.
The optional recommending unit is further configured to:
calculating cosine similarity between reconstructed user portrait data and thematic label data, sorting the cosine similarity from big to small, and selecting and recommending television program thematic corresponding to N thematic label data before the cosine similarity ranking, wherein N is a positive integer greater than or equal to 1.
Optionally, the cosine similarity calculation formula is as follows:
in the formula, x represents a vector modulo length corresponding to user image data, y represents a vector modulo length corresponding to thematic tag data, and T represents transposition.
It can be understood that, because the television program topic recommendation system based on non-negative matrix factorization of the present invention is a system for implementing the television program topic recommendation method based on non-negative matrix factorization, for the disclosed system, since it corresponds to the disclosed method, the description is simpler, and for the relevant points, refer to the partial description of the method. Because the television program topic recommendation method based on the non-negative matrix factorization can solve the problem that the prior television program recommendation can not reasonably cover the portrait of the user, the system for realizing the television program topic recommendation method based on the non-negative matrix factorization can also solve the problem that the prior television program recommendation can not reasonably cover the portrait of the user.