Text recommendation method and system based on language big modelTechnical Field
The invention relates to the field of text recommendation, in particular to a text recommendation method and system based on a language big model.
Background
With the rise of the Internet and the change of the reading modes of people, the digital reading is rapidly developed. The reading software platform attracts a large number of users to read digital text, however, it is difficult for users to select text of interest to themselves from a large number of text every day.
The existing text recommendation method comprises the following steps: word segmentation is carried out on text information input by a user, keywords are extracted, and the priority of each document is obtained according to the occurrence frequency of each keyword in each document, so that text recommendation is carried out according to the priority. The emotion factors of the user and personalized contents of the user are not considered, and the recommendation is simply performed according to the input text, so that the accuracy is low and the user experience is poor.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a text recommendation method based on a language big model, which can realize more personalized, accurate and efficient text recommendation and improve user experience and recommendation effect.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A text recommendation method based on a language big model comprises the following steps:
s1, acquiring historical browsing data and search data of a user, and constructing a user portrait based on the historical browsing data and the search data;
s2: acquiring voice data with labels, and constructing an emotion recognition model based on a cyclic neural network, wherein the emotion recognition model specifically comprises the following steps:
Acquiring a labeled voice data set, wherein each sample comprises a voice signal and an emotion label;
Preprocessing the voice signal;
extracting MFCC features of the speech signal using MFCC algorithms;
Wherein,For the amplitude spectrum of the speech signal at time t and frequency f, N is the total number of frequencies, kM is the index of the MFCC coefficients;
Combining the MFCC features with emotion tags to construct a training set;
Constructing a cyclic neural network, and training based on a training set to obtain an emotion recognition model;
s3, constructing a text recommendation model based on a transducer combined Bahdanau attention mechanism;
s4, acquiring voice data input by a user, and acquiring current emotion characteristics of the user based on an emotion recognition model;
s5, converting voice data of a user into text data, and extracting text features by using a text embedding technology;
S6, fusing the current emotion characteristics, text characteristics and user portraits of the user, inputting the fused emotion characteristics, text characteristics and user portraits into a text recommendation model for recommendation, and obtaining a recommendation text;
and S7, introducing line reinforcement learning, modeling a text recommendation process into a reinforcement learning environment, defining a reward function, giving rewards according to user feedback, and optimizing a generation strategy of a model by using a reinforcement learning algorithm to gradually learn to generate higher-quality recommendation contents.
Further, the S1 specifically is:
Acquiring articles and page information browsed by a user from log records of websites and application programs, wherein the articles and page information comprises browsing time and browsing content; recording search operation performed by a user in a platform, wherein the search operation comprises search keywords and search result clicking conditions;
Performing text processing on the historical browsing data and the search data, including word segmentation, stop word removal and stem extraction;
Then extracting features based on the preprocessed data, wherein the features comprise word frequency features, TF-IDF features, word Embedding features and click times features;
Combining the extracted features into user feature vectors, and performing feature representation by using vector stitching to obtain a user feature vector set U= { U1,u2,...,ui,...,un }, wherein Ui represents the feature vector of the ith user, and n is the number of users;
Based on K-means clustering, clustering analysis is carried out on the user characteristics, and the objective of the K-means clustering is to minimize the sum of squares of the distances from each data point to the center of the cluster to which the data point belongs, wherein the sum is expressed as:
;
Wherein, C represents cluster division, Cj represents the center point of the j-th cluster, and k is the number of clusters;
And according to the clustering result, distributing a cluster label to each user, and constructing a user portrait.
Further, the preprocessing of the voice signal includes removing the mute segment and reducing the noise, which is specifically as follows:
judging whether the voice signal is a mute segment by an energy detection method, and setting the voice signal asThe short-time energy is as follows:
;
Wherein M is the length of the window,For speech signal at time/>Amplitude values of (2); t represents the current time, and the value range of t0 is/>From time to current time t;
according to the preset energy threshold, when the short-time energy is lower than the preset energy threshold, the time period is considered to be a mute period;
And noise reduction is carried out by adopting Wiener filtering, and Wiener filterThe calculation formula is as follows:
Wherein,Is the power spectral density of the speech signal,/>Is the power spectral density of the noise.
Further, the cyclic neural network is constructed, and based on training set training, the emotion recognition model is obtained, specifically as follows:
building a GRU-based emotion recognition model, comprising:
an input layer, the dimensions of the MFCC features as inputs;
GRU layer: comprises a plurality of GRU units;
Output layer: outputting probability distribution of emotion categories by using the full connection layer;
activation function: using a softmax activation function at the output layer;
The GRU unit comprises a reset gate, an update gate and a candidate hidden state;
Let the input be MFCC featuresThe hidden state at the last moment is/>Reset gate rt, update gate zt, candidate hidden state/>If the hidden state at the current time is ht, the update formula of the GRU unit is as follows:
;
;
;
;
Wherein,,/>,/>Is a weight matrix,/>,/>,/>Is a bias vector,/>Is a Sigmoid function,/>Representing element-by-element multiplication;
hiding state of last GRU unitInput to the output layer, obtaining probability distribution of emotion type by the fully connected layer and using softmax function,
;
Wherein,And/>Is the weight and bias of the output layer;
during training, model parameters are updated by a back-propagation algorithm to minimize the loss function.
Further, the step S3 specifically includes:
Acquiring current emotion characteristics, text characteristics and user portraits of a user, converting the current emotion characteristics, the text characteristics and the user portraits of the user into vector representations, and taking the current emotion characteristics and the user portraits of the user as auxiliary characteristic vectors;
constructing an input feature vector comprising a text feature vector and an auxiliary feature vector to obtain a training data set;
using a transducer model as an infrastructure, including encoder and decoder sections;
The transducer encoder is formed by superposing a plurality of identical encoding layers, each encoding layer comprises a multi-head self-attention mechanism and a feedforward neural network, and the self-attention mechanism has the following calculation formula:
;
Wherein,A matrix representing queries, keys, and values, respectively, dk representing the dimensions of the attention header;
A Bahdanau attention mechanism is introduced in the attention layer of each encoder and decoder for calculating the attention weight between encoder and decoder, bahdanau attention is calculated as follows:
;
Wherein,Is the decoder/>Hidden state of moment,/>Is the q-th hidden state of the encoder;)Is the attention weight;
and training based on the training data set to obtain a text recommendation model.
Further, the S4 specifically is:
Performing voice recognition by using a CTC voice recognition model based on deep learning, and converting voice data of a user into text data;
text cleaning is carried out on the text data, special characters and punctuation noise are removed, and the text data is divided into words based on jieba Chinese word segmentation;
Word2Vec is used to map words to vector representations in high-dimensional space, semantic information of the words is learned through context co-occurrence relations, and text features are extracted.
Further, the step S7 specifically includes:
Modeling a text recommendation process into a reinforcement learning environment, wherein an Agent selects actions according to the current state, namely generating text recommendation contents, feeding back rewards according to the actions of the Agent, and updating the strategies of the Agent;
The method comprises the steps that a historical behavior sequence, text characteristics and recommended content of a current user are used as state representations and used for describing the environment where an Agent is located;
The Agent selects and generates different text recommended contents as actions in each state, and selects different parameter settings;
defining a reward function according to user feedback including click rate and satisfactionRepresenting rewards earned by the Agent after selecting action a in state s:
;
Wherein D is click rate, M is satisfaction,For average click rate,/>And/>Is the weight;
The generation strategy of the model is optimized based on a strategy gradient method, so that the model gradually learns to generate better recommended content.
Furthermore, the generation strategy of the model is optimized based on a strategy gradient method, and the strategy gradient is updated by using REINFORCE algorithm specifically, and the formula is as follows:
;
Wherein,Representing an objective function/>Policy network parameters/>Is a gradient of (2); /(I)Expressed in policy/>A desired return; /(I)Expressed in policy/>Under, state s takes action value function of action a; /(I)A policy function representing the selection of action a in state s, expressed using a softmax function:
;
where h (a, s) is the fraction of the neural network output, the neural network employing a fully connected neural network structure, where the input is a representation of state s, and the output is the fraction of each action a; is to sum all possible actions for normalizing the probability,/>Is a possible action.
A text recommendation system based on a language big model comprises a user individuation module, a voice input module, a emotion recognition module, a voice conversion module, a text extraction module, a text recommendation module and an reinforcement learning module;
The user individuation module acquires historical browsing data and search data of a user, and constructs a user portrait based on the historical browsing data and the search data;
The voice input module is used for acquiring voice data input by a user;
The emotion recognition module is provided with an emotion recognition model for acquiring current emotion characteristics of the user according to voice data;
the emotion recognition model specifically comprises the following steps:
Acquiring a labeled voice data set, wherein each sample comprises a voice signal and an emotion label;
Preprocessing the voice signal;
extracting MFCC features of the speech signal using MFCC algorithms;
Wherein,For the amplitude spectrum of the speech signal at time t and frequency f, N is the total number of frequencies, kM is the index of the MFCC coefficients;
Combining the MFCC features with emotion tags to construct a training set;
Constructing a cyclic neural network, and training based on a training set to obtain an emotion recognition model;
The voice conversion module is used for converting voice data of a user into text data and inputting the converted text data to the text extraction module;
the text extraction module is used for extracting text characteristics by using a text embedding technology;
The text recommendation module is provided with a text recommendation model constructed based on a transducer combined Bahdanau attention mechanism, and a recommendation text is obtained based on the current emotion characteristics, the text characteristics and the user portraits of the user;
The reinforcement learning module is used for introducing reinforcement learning, modeling a text recommendation process into a reinforcement learning environment, defining a reward function, giving rewards according to user feedback, and optimizing a generation strategy of a model by using a reinforcement learning algorithm so as to gradually learn and generate better recommended content.
The invention has the following beneficial effects:
1. According to the invention, the user portrait is constructed by analyzing the historical browsing data and the search data of the user, and the current emotion characteristics of the user are combined, so that highly personalized recommended content can be provided, the requirements and preferences of the user can be better met, and the user experience is improved; the current emotion state of the user is identified by constructing an emotion recognition model, so that a recommendation system can dynamically adjust a recommendation strategy according to the emotion change of the user, the recommendation content is more in accordance with the current emotion state of the user, and the emotion connection of the user is enhanced;
2. According to the invention, the MFCC algorithm is used for extracting the voice characteristics and the text characteristics are extracted by using the text embedding technology, so that key information of voice and text data can be efficiently captured, rich and effective characteristic representation is provided for model training, the accuracy and recommendation quality of a model are improved, and the text recommendation model constructed by combining a Transformer and a Bahdanau attention mechanism can effectively solve the problem of long-distance dependence, capture complex relations in the text and improve the accuracy and robustness of a recommendation system.
3. According to the invention, reinforcement learning is introduced, the text recommendation process is modeled as a reinforcement learning environment, and according to the generation strategy of the user feedback optimization model, a recommendation system can continuously learn and self-optimize, gradually generate better-quality recommendation contents, and the satisfaction degree and the retention rate of a user can be remarkably improved by providing more personalized and emotion-perceived recommendation contents.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram illustrating a system architecture according to an embodiment of the invention.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and specific examples:
referring to fig. 1, the invention provides a text recommendation method based on a language big model, which comprises the following steps:
s1, acquiring historical browsing data and search data of a user, and constructing a user portrait based on the historical browsing data and the search data;
s2: acquiring voice data with labels, and constructing an emotion recognition model based on a cyclic neural network, wherein the emotion recognition model specifically comprises the following steps:
Acquiring a labeled voice data set, wherein each sample comprises a voice signal and an emotion label;
Preprocessing the voice signal;
extracting MFCC features of the speech signal using MFCC algorithms;
Wherein,For the amplitude spectrum of the speech signal at time t and frequency f, N is the total number of frequencies, kM is the index of the MFCC coefficients;
Combining the MFCC features with emotion tags to construct a training set;
Constructing a cyclic neural network, and training based on a training set to obtain an emotion recognition model;
s3, constructing a text recommendation model based on a transducer combined Bahdanau attention mechanism;
s4, acquiring voice data input by a user, and acquiring current emotion characteristics of the user based on an emotion recognition model;
s5, converting voice data of a user into text data, and extracting text features by using a text embedding technology;
S6, fusing the current emotion characteristics, text characteristics and user portraits of the user, inputting the fused emotion characteristics, text characteristics and user portraits into a text recommendation model for recommendation, and obtaining a recommendation text;
and S7, introducing line reinforcement learning, modeling a text recommendation process into a reinforcement learning environment, defining a reward function, giving rewards according to user feedback, and optimizing a generation strategy of a model by using a reinforcement learning algorithm to gradually learn to generate higher-quality recommendation contents.
In this embodiment, S1 is specifically:
Acquiring articles and page information browsed by a user from log records of websites and application programs, wherein the articles and page information comprises browsing time and browsing content; recording search operation performed by a user in a platform, wherein the search operation comprises search keywords and search result clicking conditions;
Performing text processing on the historical browsing data and the search data, including word segmentation, stop word removal and stem extraction;
Then extracting features based on the preprocessed data, wherein the features comprise word frequency features, TF-IDF features, word Embedding features and click times features;
word frequency characteristics: the number of occurrences of each word in the historical browsing data and the search data is counted.
TF-IDF feature: and calculating TF-IDF values of each word, and measuring importance of the words in the text.
Word Embedding features: words are converted into dense vector representations using Word embedding techniques (e.g., word2Vec, BERT, etc.).
Click times feature: and counting the clicking times of the user on the search results, and the like.
Combining the extracted features into user feature vectors, and performing feature representation by using vector stitching to obtain a user feature vector set U= { U1,u2,...,ui,...,un }, wherein Ui represents the feature vector of the ith user, and n is the number of users;
Based on K-means clustering, clustering analysis is carried out on the user characteristics, and the objective of the K-means clustering is to minimize the sum of squares of the distances from each data point to the center of the cluster to which the data point belongs, wherein the sum is expressed as:
;
Wherein, C represents cluster division, Cj represents the center point of the j-th cluster, and k is the number of clusters;
And according to the clustering result, distributing a cluster label to each user, and constructing a user portrait.
In this embodiment, preprocessing is performed on the voice signal, including removing the mute segment and reducing the noise, which is specifically as follows:
judging whether the voice signal is a mute segment by an energy detection method, and setting the voice signal asThe short-time energy is as follows:
;
Wherein M is the length of the window,For speech signal at time/>Amplitude values of (2); t represents the current time, and the value range of t0 is/>From time to current time t;
according to the preset energy threshold, when the short-time energy is lower than the preset energy threshold, the time period is considered to be a mute period;
And noise reduction is carried out by adopting Wiener filtering, and Wiener filterThe calculation formula is as follows:
Wherein,Is the power spectral density of the speech signal,/>Is the power spectral density of the noise.
In this embodiment, a recurrent neural network is constructed, and training is performed based on a training set to obtain an emotion recognition model, which is specifically as follows:
building a GRU-based emotion recognition model, comprising:
an input layer, the dimensions of the MFCC features as inputs;
GRU layer: comprises a plurality of GRU units;
Output layer: outputting probability distribution of emotion categories by using the full connection layer;
activation function: using a softmax activation function at the output layer;
The GRU unit comprises a reset gate, an update gate and a candidate hidden state;
Let the input be MFCC featuresThe hidden state at the last moment is/>Reset gate rt, update gate zt, candidate hidden state/>If the hidden state at the current time is ht, the update formula of the GRU unit is as follows:
;
;
;
;
Wherein,,/>,/>Is a weight matrix,/>,/>,/>Is a bias vector,/>Is a Sigmoid function,/>Representing element-by-element multiplication;
hiding state of last GRU unitInput to the output layer, obtaining probability distribution of emotion type by the fully connected layer and using softmax function,
;
Wherein,And/>Is the weight and bias of the output layer;
during training, model parameters are updated by a back-propagation algorithm to minimize the loss function.
In this embodiment, S3 is specifically:
Acquiring current emotion characteristics, text characteristics and user portraits of a user, converting the current emotion characteristics, the text characteristics and the user portraits of the user into vector representations, and taking the current emotion characteristics and the user portraits of the user as auxiliary characteristic vectors;
constructing an input feature vector comprising a text feature vector and an auxiliary feature vector to obtain a training data set;
using a transducer model as an infrastructure, including encoder and decoder sections;
The transducer encoder is formed by superposing a plurality of identical encoding layers, each encoding layer comprises a multi-head self-attention mechanism and a feedforward neural network, and the self-attention mechanism has the following calculation formula:
;
Wherein,A matrix representing queries, keys, and values, respectively, dk representing the dimensions of the attention header;
A Bahdanau attention mechanism is introduced in the attention layer of each encoder and decoder for calculating the attention weight between encoder and decoder, bahdanau attention is calculated as follows:
;
Wherein,Is the decoder/>Hidden state of moment,/>Is the q-th hidden state of the encoder;)Is the attention weight;
and training based on the training data set to obtain a text recommendation model.
In this embodiment, S4 is specifically:
Performing voice recognition by using a CTC voice recognition model based on deep learning, and converting voice data of a user into text data;
text cleaning is carried out on the text data, special characters and punctuation noise are removed, and the text data is divided into words based on jieba Chinese word segmentation;
Word2Vec is used to map words to vector representations in high-dimensional space, semantic information of the words is learned through context co-occurrence relations, and text features are extracted.
In this embodiment, S7 is specifically:
Modeling a text recommendation process into a reinforcement learning environment, wherein an Agent selects actions according to the current state, namely generating text recommendation contents, feeding back rewards according to the actions of the Agent, and updating the strategies of the Agent;
The method comprises the steps that a historical behavior sequence, text characteristics and recommended content of a current user are used as state representations and used for describing the environment where an Agent is located;
The Agent selects and generates different text recommended contents as actions in each state, and selects different parameter settings;
defining a reward function according to user feedback including click rate and satisfactionRepresenting rewards earned by the Agent after selecting action a in state s:
;
Wherein D is click rate, M is satisfaction,For average click rate,/>And/>Is the weight;
The generation strategy of the model is optimized based on a strategy gradient method, so that the model gradually learns to generate better recommended content.
In this embodiment, the generation strategy of the model is optimized based on the strategy gradient method, specifically, the strategy gradient is updated by using REINFORCE algorithm, and the formula is as follows:
;
Wherein,Representing an objective function/>Policy network parameters/>Is a gradient of (2); /(I)Expressed in policy/>A desired return; /(I)Expressed in policy/>Under, state s takes action value function of action a; /(I)A policy function representing the selection of action a in state s, expressed using a softmax function:
;
where h (a, s) is the fraction of the neural network output, the neural network employing a fully connected neural network structure, where the input is a representation of state s, and the output is the fraction of each action a; is to sum all possible actions for normalizing the probability,/>Is a possible action.
Referring to fig. 2, there is further provided a text recommendation system based on a language big model in this embodiment, including a user personalization module, a voice input module, a emotion recognition module, a voice conversion module, a text extraction module, a text recommendation module, and an reinforcement learning module;
The user individuation module acquires historical browsing data and search data of a user, and constructs a user portrait based on the historical browsing data and the search data;
The voice input module is used for acquiring voice data input by a user;
The emotion recognition module is provided with an emotion recognition model for acquiring current emotion characteristics of the user according to voice data;
the emotion recognition model specifically comprises the following steps:
Acquiring a labeled voice data set, wherein each sample comprises a voice signal and an emotion label;
Preprocessing the voice signal;
extracting MFCC features of the speech signal using MFCC algorithms;
Wherein,For the amplitude spectrum of the speech signal at time t and frequency f, N is the total number of frequencies, kM is the index of the MFCC coefficients;
Combining the MFCC features with emotion tags to construct a training set;
Constructing a cyclic neural network, and training based on a training set to obtain an emotion recognition model;
The voice conversion module is used for converting voice data of a user into text data and inputting the converted text data to the text extraction module;
the text extraction module is used for extracting text characteristics by using a text embedding technology;
The text recommendation module is provided with a text recommendation model constructed based on a transducer combined Bahdanau attention mechanism, and a recommendation text is obtained based on the current emotion characteristics, the text characteristics and the user portraits of the user;
The reinforcement learning module is used for introducing reinforcement learning, modeling a text recommendation process into a reinforcement learning environment, defining a reward function, giving rewards according to user feedback, and optimizing a generation strategy of a model by using a reinforcement learning algorithm so as to gradually learn and generate better recommended content.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.