CN114363671B

Movatterモバイル変換

Info

Publication number: CN114363671B
Application number: CN202111676174.6A
Authority: CN
Inventors: 廖一桥
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2024-03-19
Anticipated expiration: 2041-12-31
Also published as: CN114363671A

Abstract

Disclosed are a multimedia resource pushing method, a model training method, a device and a storage medium. The method comprises the following steps: acquiring feature information of an object to be processed, and generating original dimension feature data and newly added dimension feature data; inputting the original dimension characteristic data and the newly added dimension characteristic data into a multimedia push model; based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model, matching with multimedia resources in a multimedia resource library to obtain candidate multimedia resources; and determining the target multimedia resources to be pushed from the candidate multimedia resources. By the embodiment proposal provided by the disclosure, the Martai effect of the pushing model is effectively reduced, and the precision and the processing efficiency of the multimedia pushing model for pushing the multimedia resources are improved.

Description

Multimedia resource pushing method, model training method, device and storage medium

Technical Field

The disclosure relates to the technical field of computer data processing, in particular to a multimedia resource pushing method, a model training device and a storage medium.

Background

Currently, in short video push model training, the cold start of new users and new videos has a crucial impact on the ecology and retention of the entire model system. However, because the new user and the new video lack enough behavior data, the number of available behavior samples in training is small, and the push model can learn the new user and the new video by using limited behavior samples, the push model tends to learn the behavior samples of the old user, the Martai effect is obvious, and the push result is inaccurate.

Disclosure of Invention

The present disclosure provides a multimedia resource pushing method, a model training method, a device, and a storage medium, which can reduce the martai effect of a pushing model and improve the accuracy of multimedia resource pushing. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a multimedia asset pushing method, including:

acquiring feature information of an object to be processed, and generating original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

inputting the original dimension feature data and the newly added dimension feature data into a multimedia push model, wherein the multimedia push model comprises an original multimedia push model and a personalized model, the original dimension feature data is processed through the original multimedia push model to obtain the feature output by the original multimedia push model, and the newly added dimension feature data is processed through the personalized model to obtain the feature output by the personalized model;

Based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model, matching with multimedia resources in a multimedia resource library to obtain candidate multimedia resources;

and determining the target multimedia resources to be pushed from the candidate multimedia resources.

Optionally, in the method, the candidate multimedia resources include a pushing value of a preset behavior, and determining, from the candidate multimedia resources, a target multimedia resource to be pushed includes:

and determining the target multimedia resources to be pushed from the candidate multimedia resources according to the pushing value of the preset behavior.

Optionally, in the method, the candidate multimedia resource includes a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, according to the push values of the preset behaviors, a target multimedia resource to be pushed from the candidate multimedia resource includes:

determining a target preset behavior in the preset behaviors;

obtaining a push value of the target preset behavior in the candidate multimedia resource;

and determining the target multimedia resources to be pushed from the candidate multimedia resources according to the pushing value of the target preset behavior.

Optionally, in the method, the object to be processed is a new account, the new account includes an account with a time length smaller than the first time length, and/or the multimedia resources in the multimedia resource library include new multimedia resources processed by the multimedia push model, and the new multimedia resources include multimedia resources with a time length smaller than the second time length.

Optionally, in the method, the matching between the characteristics output by the original multimedia push model and the characteristics output by the personalized model and multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

combining the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain combined characteristics;

and matching with the multimedia resources in the multimedia resource library according to the combination characteristics to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

the matching between the characteristics output by the original multimedia push model and the characteristics output by the personalized model and multimedia resources in a multimedia resource library is performed to obtain candidate multimedia resources, and the method comprises the following steps:

Respectively converting original dimension characteristic data of each subtask network in the multitask learning model into first target characteristics with the same dimension as the newly added dimension characteristic data;

inputting the first target characteristics of each subtask network into the personalized model to obtain second target characteristics output by each subtask network;

respectively splicing the second target features output by each subtask network with the features output by the personalized model to obtain first combined features of each subtask network;

inputting the first combined characteristic into each subtask network in the multitask learning model to obtain a third target characteristic;

and matching with the multimedia resources in the multimedia resource library according to the third target characteristics to obtain candidate multimedia resources.

the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps: inputting the original dimension characteristic data into each subtask network in the multitask learning model to obtain a fourth target characteristic output by each subtask network in the multitask learning model;

The matching between the characteristics output by the original multimedia push model and the characteristics output by the personalized model and multimedia resources in a multimedia resource library to obtain candidate multimedia resources comprises the following steps:

respectively inputting the original dimension characteristic data of each subtask network in the multitask learning model into the personalized model to obtain a one-dimensional fifth target characteristic;

carrying out weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multitask learning model;

and matching with the multimedia resources in the multimedia resource library according to the second combination characteristic to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multi-task learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain a sixth target characteristic output by each expert network in the multi-task learning model;

The characteristics output based on the original multimedia push model and the characteristics output by the personalized model are matched with multimedia resources in a multimedia resource library, and candidate multimedia resources are obtained and include;

combining the characteristics output by the personalized model with the sixth target characteristics output by each expert network respectively to obtain third combined characteristics corresponding to each expert network;

inputting the third combined feature into a gating network in the multi-task learning model to obtain a seventh target feature;

and according to the seventh target characteristic, matching with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model;

the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps: inputting the original dimension characteristic data into each single-head attention model in the multi-head attention model to obtain a seventh target characteristic output by each single-head attention network in the multi-head attention model;

combining the characteristics output by the personalized model with seventh characteristics output by each single-head attention network in the multi-head attention model to obtain fourth combined characteristics corresponding to each single-head attention network;

and matching with the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

Optionally, in the method, in a case that the original multimedia push model is a multi-task learning model, different sub-task networks in the multi-task learning model use personalized models with different parameters to perform data processing.

According to a second aspect of the embodiments of the present disclosure, there is further provided a training method of a multimedia resource push model, including:

acquiring characteristic information of a training object, and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

processing the newly added training sample through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

Combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics;

under the condition that the training object comprises a newly added account, matching the combination characteristic with multimedia resources in a multimedia resource library to obtain multimedia push information of the newly added account, wherein the newly added account comprises an account with a time length smaller than a first time length;

and comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

Optionally, in the method, in a case that the training object includes a new multimedia resource, after obtaining the combined feature, the method further includes:

inputting the combined features into the original multimedia pushing model, and matching the combined features with the features of the multimedia resources to obtain multimedia pushing information of the newly-added multimedia resources, wherein the newly-added multimedia resources comprise multimedia resources with the time length shorter than a second time length;

and comparing the multimedia pushing information of the newly-added multimedia resource with the characteristic information of the newly-added multimedia resource, and updating parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps:

inputting the first training characteristics of each subtask network into the personalized model to obtain second training characteristics output by each subtask network;

and respectively splicing the second training features output by each subtask network with the features output by the personalized model to obtain first training combination features of each subtask network.

The processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps: and carrying out weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combination characteristics.

Optionally, in the method, the original multimedia push model is a multitask learning model, and the personalized model is a convolutional neural network;

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

Combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model to obtain third training combined characteristics corresponding to each expert network;

and inputting the third training combination characteristic into a gating network in the multi-task learning model to obtain a fourth training combination characteristic.

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

Combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps: and respectively combining the characteristics output by the personalized model with fourth training characteristics output by each single-head attention network in the multi-head attention model to obtain fifth combined characteristics corresponding to each single-head attention network.

According to a third aspect of the embodiments of the present disclosure, there is also provided a multimedia asset pushing device, including:

the feature generation module is used for acquiring feature information of the object to be processed and generating original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

the feature processing module is used for inputting the original dimension feature data and the newly added dimension feature data into a multimedia pushing model, wherein the multimedia pushing model comprises an original multimedia pushing model and a personalized model, the original dimension feature data is processed through the original multimedia pushing model to obtain the feature output by the original multimedia pushing model, and the newly added dimension feature data is processed through the personalized model to obtain the feature output by the personalized model;

The combination matching module is used for matching with the multimedia resources in the multimedia resource library based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain candidate multimedia resources;

and the pushing resource determining module is used for determining target multimedia resources to be pushed from the candidate multimedia resources.

Optionally, in the device, the candidate multimedia resource includes a pushing value of a preset behavior, and the determining, from the candidate multimedia resource, the target multimedia resource to be pushed includes:

Optionally, in the device, the candidate multimedia resource includes a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, according to the push values of the preset behaviors, a target multimedia resource to be pushed from the candidate multimedia resource includes:

determining a target preset behavior in the preset behaviors;

Optionally, in the device, the object to be processed is a new account, the new account includes an account with a time length smaller than the first time length, and/or the multimedia resources in the multimedia resource library include new multimedia resources processed by the multimedia push model, and the new multimedia resources include multimedia resources with a time length smaller than the second time length.

Optionally, in the device, the matching between the characteristics output based on the original multimedia push model and the characteristics output based on the personalized model and multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

Optionally, in the device, in a case that the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron, the combination matching module includes:

the first target feature unit is used for respectively converting the original dimension feature data of each subtask network in the multitask learning model into first target features with the same dimension as the newly added dimension feature data;

The second target feature unit is used for inputting the first target features of each subtask network into the personalized model to obtain second target features output by each subtask network;

the first combination unit is used for respectively splicing the second target features output by each subtask network with the features output by the personalized model to obtain first combination features of each subtask network;

the third target feature unit is used for inputting the first combined feature into each subtask network in the multitask learning model to obtain a third target feature;

and the first matching unit is used for matching with the multimedia resources in the multimedia resource library according to the third target characteristic to obtain candidate multimedia resources.

Optionally, in the device, in the case that the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron,

the feature processing module processes the original dimension feature data through the original multimedia push model, and the feature output by the original multimedia push model is obtained by the feature processing module comprises the following steps: inputting the original dimension characteristic data into each subtask network in the multitask learning model to obtain a fourth target characteristic output by each subtask network in the multitask learning model;

The combination matching module comprises:

a fifth target feature unit, configured to input original dimension feature data of each subtask network in the multitask learning model into the personalized model, to obtain a one-dimensional fifth target feature;

the second combination unit is used for carrying out weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combination features output by all subtask networks in the multitask learning model;

and the second matching unit is used for matching with the multimedia resources in the multimedia resource library according to the second combination characteristic to obtain candidate multimedia resources.

Optionally, in the device, when the original multimedia push model is a multi-task learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

the feature processing module processes the original dimension feature data through the original multimedia push model, and the feature output by the original multimedia push model is obtained by the feature processing module comprises the following steps: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain a sixth target characteristic output by each expert network in the multi-task learning model;

The combination matching module comprises;

the third combination unit is used for respectively combining the characteristics output by the personalized model with the sixth target characteristics output by each expert network to obtain third combination characteristics corresponding to each expert network;

a seventh target feature unit, configured to input the third combined feature into a gating network in the multi-task learning model, to obtain a seventh target feature;

and the third matching unit is used for matching with the multimedia resources in the multimedia resource library according to the seventh target characteristic to obtain candidate multimedia resources.

Optionally, in the device, in the case that the original multimedia push model is a multi-head attention model and the personalized model is a convolutional neural network, the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model respectively,

the feature processing module processes the original dimension feature data through the original multimedia push model, and the feature output by the original multimedia push model is obtained by the feature processing module comprises the following steps: inputting the original dimension characteristic data into each single-head attention model in the multi-head attention model to obtain a seventh target characteristic output by each single-head attention network in the multi-head attention model;

The combination matching module comprises;

the fourth combination unit is used for respectively combining the characteristics output by the personalized model with seventh characteristics output by each single-head attention network in the multi-head attention model to obtain fourth combination characteristics corresponding to each single-head attention network;

and the fourth matching unit is used for matching with the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

Optionally, in the device, in a case that the original multimedia push model is a multi-task learning model, different sub-task networks in the multi-task learning model use personalized models with different parameters to perform data processing.

According to a fourth aspect of embodiments of the present disclosure, there is further provided a training apparatus for a multimedia asset push model, the multimedia asset push model including an original multimedia push model and a personalized model, the apparatus comprising:

the training sample generation module is used for acquiring the characteristic information of the training object and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

the training sample processing module is used for processing the newly added training samples through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample pairs through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

The training feature processing module is used for combining the features output by the personalized model with the features output by the original multimedia push model to obtain combined features;

the account feature processing module is used for matching the combination features with multimedia resources in a multimedia resource library to obtain multimedia push information of the newly added account when the training object comprises the newly added account, wherein the newly added account comprises an account with a time length smaller than a first time length;

and the first parameter updating module is used for comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

Optionally, the apparatus further includes:

the multimedia resource feature processing module is used for inputting the combined features into the original multimedia pushing model and matching the combined features with the features of the multimedia resources to obtain multimedia pushing information of the new multimedia resources when the training object comprises the new multimedia resources, wherein the new multimedia resources comprise multimedia resources with the generation time length smaller than the second time length;

And the second parameter updating module is used for comparing the multimedia pushing information of the newly-added multimedia resource with the characteristic information of the newly-added multimedia resource, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

Optionally, in the device, in a case that the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the training sample processing module processes the original training sample pair through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

the training feature processing module combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining of the combined features comprises the following steps:

the training sample processing module processes the original training sample pair through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

the training feature processing module combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining of the combined features comprises the following steps: and carrying out weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combination characteristics.

Optionally, in the device, in the case that the original multimedia push model is a multitask learning model and the personalized model is a convolutional neural network,

The training sample processing module processes the original training sample pair through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

the training feature processing module combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining of the combined features comprises the following steps: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model to obtain third training combined characteristics corresponding to each expert network;

The training sample processing module processes the original training sample pair through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

the training feature processing module combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining of the combined features comprises the following steps: and respectively combining the characteristics output by the personalized model with fourth training characteristics output by each single-head attention network in the multi-head attention model to obtain fifth combined characteristics corresponding to each single-head attention network.

In a fifth aspect of embodiments of the present disclosure, there is also provided a computer device, comprising:

At least one processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any of the first and/or second aspects of the present disclosure.

A sixth aspect of embodiments of the present disclosure further provides a computer readable storage medium, which when executed by a processor of a computer device, causes the electronic device to perform the method of any one of the first and/or second aspects of the present disclosure.

A seventh aspect of the embodiments of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first and/or second aspects of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, multiple different dimensional characteristic data can be generated based on the characteristic information of the object to be processed and used as input of a multimedia push model. The original multimedia push model in the multimedia push model still uses the original dimension feature data generated based on the feature information to process, and meanwhile, another piece of newly added dimension feature data can be generated based on the feature information and can be used as input of a personalized model in the multimedia push model to process. Further, the personalized model output characteristics and the original multimedia recommendation model output characteristics can be combined and matched with multimedia resources in a multimedia resource library to obtain candidate multimedia resources, and then the target multimedia resources to be pushed are determined. The multimedia pushing model provided in the embodiment of the disclosure adopts the way that the personalized model is fused into the original multimedia recommendation model, and the characteristic data used by the personalized model is generated based on the original characteristic information, so that the characteristic information of the user is more fully utilized (trained or predicted), and the pushing effect of the whole multimedia pushing model is prompted. And the personalized model and the original multimedia push model are combined and then used on line, the original multimedia push model is still processed by using the original dimension characteristic data generated based on the characteristic information, so that the invasion of the personalized model to the original multimedia model is greatly reduced, and the iteration difficulty of the model is reduced or even not increased. The multimedia resource pushing is carried out by utilizing the multimedia pushing model provided by the disclosure, so that the problem that the prediction result of the multimedia pushing model always tends to learn the behavior of the old user (Martai effect) is effectively reduced, and the accuracy of the multimedia resource pushing is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is an application environment diagram illustrating a multimedia asset pushing method according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment.

Fig. 3 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment.

Fig. 4 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment.

Fig. 5 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment.

Fig. 6 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment.

Fig. 7 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 8 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 9 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 10 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 11 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 12 is a flow chart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 13 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 14 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 15 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 16 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 17 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 18 is a block diagram of a training device of a multimedia asset push model, according to an example embodiment.

Fig. 19 is a block diagram of a training device for a multimedia asset push model, according to an example embodiment.

Fig. 20 is a schematic block diagram illustrating an internal structure of a computer device according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For example, if first, second, etc. words are used to indicate a name, but not any particular order.

It should be further noted that, information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) of a user or account, an object to be processed, a training object, etc. related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party. The multimedia described in this disclosure may be an integration of a variety of media, generally including one or more media forms of text, sound, images, video, animation, and the like. The multimedia push model of the embodiment of the present disclosure may be used for video push, and for convenience of description, in the following embodiments, the embodiment of the present disclosure is described with video push as an application scenario, but the embodiment of the present disclosure is not limited to the application scenario of video push.

To reduce the martai effect of the model, some solutions use a separate model alone for new users or new videos to deal with the problem of cold start of the push model. The cold start problem described in this disclosure may generally include business processes for business processes where the amount of business data is small (not meeting the requirements for normal business processes) or no business data. For example, a new user that is newly registered may be a cold start, because there is no historical data of the new user's viewing, praying, collection, etc. in the video application before how to match the new user's push video. Similarly, how the newly generated new video is more accurately described or classified, to which users the new video is suitable to push, etc., may also be a cold start for the new video. At present, aiming at the cold start problem of a new user or a new video, a technical scheme based on meta-learning can be used for processing, but the scheme generally only affects the initialization of the embedding or mapping, which is a mode of converting discrete variables into continuous vectors, and can be applied to the cold start of the new user and the new video with a small amount of sample characteristic information even without any data. However, sparse historical data exists in both a new user and a new video which are cold-started in an actual scene, service personnel usually ensure the accuracy of a model, generally use the historical data as far as possible under the condition of behavior sample characteristic information, and take the historical data as the sample characteristic information of the model, but the mode still cannot solve the Martai effect problem that the model tends to learn the behavior of an old user when a small amount of sample characteristic information exists. And the scheme of independently deploying the models of the new user or the new video requires that multiple sets of models, such as two sets of models of the new user (or the new video) and the old user, are deployed on line simultaneously, so that more demands are made on machine resources, and the iteration processing difficulty is higher.

Aiming at the cold start problem of a new user or a new video, the embodiment proposal provided by the present disclosure can construct a new user and/or a new video model in an original multimedia push model, and can enable each new user and/or each new video to have a separate personalized model for characterization. After the personalized model and the original multimedia push model are combined (can be regarded as a multimedia resource push model), the online use is carried out, so that only one model is needed to be deployed online, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more sufficient training based on the historical data of each user and video, and the personalized model is fused into the original multimedia push model fusion, so that the personalized model affects the output of the whole multimedia push model, and the problem that the final push model always tends to learn the behavior of the old user (Martai effect) is effectively reduced.

The method for pushing the multimedia assets provided by the disclosure can be applied to an application environment as shown in fig. 1. The server 120 may include a server for pushing multimedia resources, and may communicate with the terminal 110 to push the multimedia resources to the terminal 110. Of course, the servers (including the computer devices described below) described in the embodiments of the present disclosure may be single servers, server clusters, distributed subsystems, cloud processing platforms, servers containing blockchain nodes, and combinations thereof.

The following describes an implementation scenario of performing a multimedia resource pushing process by fusing a multimedia resource pushing model formed by adding a new user and/or a personalized model of a new video based on an original multimedia pushing model. It will be appreciated that the descriptions of the new user, the new video, the old user, the old video, etc. described in the embodiments of the present disclosure may be preset to determine whether the user is a new user or an old user and whether the video is a new video or an old video. For example, a user whose account registration time is less than one week from the training or predicted time of the multimedia push model may be defined as a new user. As described above, in the following embodiments, the embodiments of the present disclosure are described with video in a multimedia resource as an application scenario. Fig. 2 is a flow chart of a method for pushing multimedia assets, which may be implemented in the server 120 as shown in fig. 2, and may include the following steps according to an exemplary embodiment.

In step S20, feature information of an object to be processed is acquired, and original dimension feature data and newly added dimension feature data of the object to be processed are generated according to the feature information.

The object to be processed may be an account to which the multimedia resource is currently required to be pushed. In the model training stage, the training object can be an account or a multimedia resource. When the object to be processed is an account, the feature information of the object to be processed may include account identification, gender, age, geographic location, hobbies and interests, etc. Original dimension characteristic data and newly added dimension characteristic data can be generated according to characteristic information of an object to be processed. In the embodiment of the disclosure, the original dimension feature data and the newly added dimension feature data may be generated based on feature information and used for data processing in different models in different multimedia push models respectively.

The original dimension feature data can be expanded in a plurality of ways to obtain newly added dimension feature data. Such as new data obtained by using GAN (generation of an countermeasure network) or learning by embdding on the basis of original dimension feature data as newly added dimension feature data, etc. The method can realize the conversion of high-dimensional sparse original characteristic data into low-dimensional dense characteristic data, and the converted EMBedding characteristic data can be used as original or newly added dimensional characteristic data to participate in the training of the neural network. For example, in one example, feature information_fea1 such as identification, age, geographic location and the like of the user_user1 is acquired, and the feature information_fea1 is converted by the enabling device to generate 128-dimensional feature data_d1, where the feature data_d1 can be used as original dimension feature data of an original multimedia recommendation model. And based on the characteristic information of the user_use1, the characteristic information_Fea1 is converted into another 128-dimensional characteristic data_D2 through the ebedding, and the characteristic data_D2 can be used as newly added dimension characteristic data for a personalized model. Of course, the above is merely an example of generating the original dimension feature data and the newly added dimension feature data according to the feature information, and the specific processing may further include other processing steps or other processing manners such as deformation, transformation, and combination of the data.

In step S22, the original dimension feature data and the newly added dimension feature data are input into a multimedia push model, where the multimedia push model includes an original multimedia push model and a personalized model, the original dimension feature data are processed by the original multimedia push model to obtain features output by the original multimedia push model, and the newly added dimension feature data are processed by the personalized model to obtain features output by the personalized model.

In this embodiment, a pre-built multimedia push model may be used, where the push model may be obtained by combining an original multimedia push model with a newly generated personalized model. The original multimedia push model may include a multimedia asset push model that has been previously existing or used, such as a model that currently pushes video to an old user. The original multimedia push model can be a multi-task model or a single-task model, such as a pre-estimated model for judging only the probability of clicking by a user.

The personalized model may include a model used for selecting a new user or a new video in some embodiments of the present disclosure, and training a new training sample generated by using feature information used by an original multimedia push model. In the model training phase, a trained user or video may be referred to as a training object, and an original training sample and a newly added training sample may be generated based on feature information of the training object. The feature information of the training object may also include other data information, such as behavior data of the user on the pushed video, such as praise, attention, viewing duration, etc. Whether the training object is a user or a video, typically, the characteristic information acquired is typically the data information that is true of the user or video that was acquired.

After the original dimension feature data and the newly added dimension feature are obtained, the original dimension feature data can be processed through the original multimedia push model to obtain the feature output by the original multimedia push model, and the newly added dimension feature data can be processed through the personalized model to obtain the feature output by the personalized model.

In step S24, the candidate multimedia resources are obtained by matching with the multimedia resources in the multimedia resource library based on the features output by the original multimedia push model and the features output by the personalized model.

The characteristics output by the original multimedia push model and the characteristics output by the personalized model can be processed according to the model structure or other structures and parameter adjustment modes after the personalized model and the original push model are fused, for example, the characteristics output by the original multimedia push model and the characteristics output by the personalized model are combined into one or a group of new characteristics. And then matching the multimedia resources based on the new characteristics to obtain candidate multimedia resources.

In this way, when the multimedia resources are matched, the characteristics output by the original multimedia resource pushing model are not only used for matching, but also the characteristics output by the personalized model are fused, and the characteristics output by the original multimedia pushing model and the characteristics output by the personalized model are matched with the multimedia resources in the multimedia resource library, so that the subsequent multimedia resources with more tendency to the characteristics output by the personalized model can be obtained, the Martai effect generated when the original multimedia pushing model performs the multimedia resource matching only according to the original dimension characteristic data is effectively reduced, and the obtained candidate multimedia resources are more accurate.

In step S26, a target multimedia resource to be pushed is determined from the candidate multimedia resources.

The candidate multimedia assets typically include a plurality of multimedia assets. In some schemes of this embodiment, the candidate multimedia resources may be directly pushed to the user as target multimedia resources, or the target multimedia resources to be pushed may be determined from the subsequent multimedia resources according to other set manners, for example, the target multimedia resources are determined according to output results of other models, or the candidate multimedia resources are further screened by adopting other manners, so as to determine the target multimedia resources to be pushed.

In another implementation of the method provided by the present disclosure, the candidate multimedia resources include a pushing value of a preset behavior, and the determining, from the candidate multimedia resources, the target multimedia resources to be pushed includes:

s260: and determining the target multimedia resources to be pushed from the candidate multimedia resources according to the pushing value of the preset behavior.

In other implementations, the candidate multimedia asset may include a plurality of preset actions. Therefore, in another embodiment of the method provided by the present disclosure, the candidate multimedia resource includes a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, according to the push values of the preset behaviors, the target multimedia resource to be pushed from the candidate multimedia resource may include:

s2600: determining a target preset behavior in the preset behaviors;

s2602: obtaining a push value of the target preset behavior in the candidate multimedia resource;

s2604: and determining the target multimedia resources to be pushed from the candidate multimedia resources according to the pushing value of the target preset behavior.

In other embodiments of the method of the present disclosure, the matching the characteristics output by the original multimedia push model and the characteristics output by the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

and matching the first combination characteristic with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources.

The original dimension characteristic data is input into the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the newly added dimension characteristic data is input into the personalized model to obtain the characteristics output by the personalized model. In this embodiment, the features output by the original multimedia push model and the features output by the personalized model may be combined to obtain a combined feature. Different combination modes can be set according to the type, structure and matching of multimedia resources of the original multimedia push model and/or the personalized model and/or push requirements, such as splicing N-dimensional features output by the personalized model with M-dimensional features output by the original multimedia push model to form (M+N) -dimensional combination features, or further weighting, normalizing, correcting and the like the features output by the original multimedia push model by utilizing N-dimensional features output by the personalized model. Thus, the combined features are fused with the features output by the personalized model, so that the Martai effect in the original multimedia push model is relieved, and the multimedia push accuracy of the whole multimedia push model for new users can be improved.

The personalized model in the multimedia push model provided by the present disclosure may include multiple types of models. For example, the personalized model may be an MLP model (MLP, multilayer Perceptron, multi-layer perceptron, also called artificial neural network), a CNN (Convolutional Neural Networks, CNN, model convolutional neural network), an RNN model (Recurrent Neural Network, RNN, recurrent neural network) or a transducer model. Taking an MLP model as an example, if 216-dimensional features are taken, the personalized model can be changed into three-layer MLP parameters, and each layer of input/output channels is 8, namely 216 dimensions (8×8+8) ×3=216 are shared. If the CNN model is a CNN model, if the 72-dimensional feature is taken, the CNN model can be changed into a layer 1-dimensional convolution layer parameter, the input channel and the output channel are both 8, and the convolution kernel size is 1, namely (8×1+1) ×8=72. Fig. 3 is a flow chart (not partially shown) of a multimedia asset pushing method according to an exemplary embodiment. Specifically, as shown in fig. 3, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

S302: respectively converting original dimension characteristic data of each subtask network in the multitask learning model into first target characteristics with the same dimension as the newly added dimension characteristic data;

s304: inputting the first target characteristics of each subtask network into the personalized model to obtain second target characteristics output by each subtask network;

s306: respectively splicing the second target features output by each subtask network with the features output by the personalized model to obtain first combined features of each subtask network;

s308: and matching the first combination characteristic with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources.

Fig. 3 is a flow chart (not partially shown) of a multimedia asset pushing method according to an exemplary embodiment. In another embodiment of the method provided by the present disclosure, the multimedia push model may be obtained by combining a personalized model into an original multimedia push model using an MLP model as the personalized model. Specifically, the matching between the characteristics output by the original multimedia push model and the characteristics output by the personalized model and the multimedia resources in the multimedia resource library to obtain candidate multimedia resources may include:

s308: inputting the first combined characteristic into each subtask network in the multitask learning model to obtain a third target characteristic;

s310: and matching with the multimedia resources in the multimedia resource library according to the third target characteristics to obtain candidate multimedia resources.

The original multimedia push model in this embodiment may be a multi-task learning model. The multitasking model typically includes a plurality of task networks, which can improve the learning efficiency and quality of each task by learning the associations and differences of the different tasks. The framework of the multi-task learning can be a shared-bottom structure, and hidden layers at the bottom are shared among different tasks, so that the risk of overfitting can be reduced. In this embodiment, the multimedia push model may use an MLP model (MLP, multilayer Perceptron, multi-layer perceptron) as a personalized model, and the MLP model may be added to a power (network or sub-network subtask) output by the multitasking model. Each network subtask may have corresponding input features and output features. The MLP corresponding to a network subtask is typically referred to as a power. There are typically several tasks in the multitasking model and several powers. In this embodiment, the input features (original dimension data) of each subtask network in the multitask learning model may be converted into first target features having dimensions identical to those of the input features (newly added dimension feature data) of the personalized model, and then the first target features of each subtask network are input into the personalized model, so as to obtain second target features of each subtask network. For example, a three-layer MLP with dimensions of 8 for the input and output features of the personalized model, then the total dimension of the personalized model is (8×8+8) ×3=216 dimensions. At this time, the input feature of the power may be transformed into a specific dimension (8 dimensions), and then input into a three-layer MLP composed of 216 dimensions of features, to obtain the second target feature of each subtask network. The personalized model can be obtained by training by using the newly added training sample. And inputting the first target characteristics of each subtask network into the personalized model to obtain the second target characteristics output by each subtask network. Further, the second target features output by the subtask networks are spliced with the features output by the personalized model respectively, so that the first combined features of the subtask networks are obtained.

And after the first combination feature is obtained, the third combination feature is used as a new input feature (third target feature) of each subtask network to be processed and matched with the multimedia resources in the multimedia resource library, so that candidate multimedia resources are obtained. In the implementation scenario, the 8-dimensional output features of the personalized model can be spliced to the first combined features obtained after three-layer MLP processing composed of 216-dimensional features, and then the first combined features are input into each power of the original multimedia recommendation model for operation processing. The stitching and combining described in the embodiments of the present disclosure include, but are not limited to, end-to-end connection between feature data, performing operations (addition, multiplication, etc.) on feature data at corresponding positions, performing feature vector operations, and the like. And e.g. splicing the 8-dimensional characteristics output by one power with the 8-dimensional characteristics output by the personalized model to obtain the 16-dimensional characteristics of the power.

In another embodiment of the multimedia push model, the input features of the power can be finally transformed into one-dimensional feature output through a three-layer MLP model. The output one-dimensional characteristics can be directly weighted and summed with the original power output to obtain the output result of the final push model. Fig. 4 is a flow chart illustrating a method of pushing multimedia assets according to an exemplary embodiment. As shown in fig. 4, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

s402: the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps:

inputting the original dimension characteristic data into each subtask network in the multitask learning model to obtain a fourth target characteristic output by each subtask network in the multitask learning model;

s404: respectively inputting the original dimension characteristic data of each subtask network in the multitask learning model into the personalized model to obtain a one-dimensional fifth target characteristic;

S406: carrying out weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multitask learning model;

s408: and matching with the multimedia resources in the multimedia resource library according to the second combination characteristic to obtain candidate multimedia resources.

The above embodiment provides another way of combining the MLP personalized model into each power in the multitask model, and the MLP personalized model can be effectively fused into the original multimedia push model, so that the parameters of the personalized model can enable the feature information of the newly added account or the newly added multimedia resource to be fully trained, the whole multimedia push model can process by combining the output features of the personalized model to reduce the martai effect, and the complexity of combining the personalized model and the original multimedia model can be reduced, and the demand on resources and the model calculation complexity are reduced.

It should be noted that, in the embodiment of the present disclosure, the combination or fusion of the features or models involved in the application and training phases of the multimedia push model is not limited to the connection manner between the models, and may include mutual operations of input data, output data, intermediate results and the like between different models. The processing of the weighted sum result as the feature of the multimedia push model for matching the multimedia resources is also one of the implementation manners of combining the original multimedia push model and the personalized model to obtain the multimedia push model according to some embodiments of the disclosure.

The disclosure also provides another implementation mode for obtaining candidate multimedia resources by utilizing the multimedia pushing model to perform multimedia resource matching. In some embodiments, a CNN model is used in the multimedia push model as the personalization model. Fig. 5 is a flow chart (not partially shown) of a multimedia asset pushing method according to an exemplary embodiment. As shown in fig. 5, the original multimedia push model is a multi-task learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

s502: the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain a sixth target characteristic output by each expert network in the multi-task learning model;

S504: combining the characteristics output by the personalized model with the sixth target characteristics output by each expert network respectively to obtain third combined characteristics corresponding to each expert network;

s506: inputting the third combined feature into a gating network in the multi-task learning model to obtain a seventh target feature;

s508: and according to the seventh target characteristic, matching with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources.

In the case that the original multimedia push model may be a multi-task learning model, such as MMoE (Modeling Task Relationships in Multi-task Learning with Multi-gate, a multi-gate controlled hybrid expert network, a multi-task learning model), and the personalized model is a convolutional neural network, respectively combining output features of the personalized model with sixth target features output by each expert network in the multi-task learning model MMoE to obtain third combined features corresponding to each expert network, where the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model MMoE;

A general multitasking learning model is to share hidden layers close to the input layer as a whole. MMoE is based on an expert network, which is adapted to multitasking learning by sharing sub-models of the expert network among tasks. MMoE can divide the shared underlying presentation layer into multiple expertise, while gate is set so that different tasks can use the shared layer in a diversified way. Taking an MMoE with 8 experiments and an 8 task model as an example, in the conventional method, the output of the 8 experiments is combined through a gating network. However, the parameters of the gating network are the same for different video IDs, and the personalization cannot be realized, so that the parameters of the gating network are better learned for old users with more behavior data, and new users and new videos with less behaviors are ignored, thereby bringing the Martai effect. In the scheme provided by the disclosure, the number of the input channels and the output channels of the personalized model of the CNN model is the same, for example, the number of the one-dimensional convolution layers is 8, and the convolution kernel size is 1. Based on the convolution layer, the output characteristics of eight experiments and the characteristics of the newly added account and/or the newly added multimedia resource can be combined in a personalized way, and after being combined, the combination is processed by a gating network. Therefore, each newly added account and/or newly added multimedia resource has different experiment combination modes, on one hand, the Martai effect can be relieved, and the newly added account and/or newly added multimedia resource can learn the experiment combination modes suitable for the user. On the other hand, through the output fusion of a plurality of experiments, information can be transmitted among the experiments, and the situation that some experiments are degraded into noise and zero output is avoided.

The present disclosure also provides another embodiment of using a CNN model as a personalized model and adding to an original multimedia push model to generate a multimedia push model for pushing multimedia resources. Fig. 6 is a flow chart (not partially shown) of a multimedia asset pushing method according to an exemplary embodiment. As shown in fig. 6, in this embodiment, the original multimedia push model may be a multi-task learning model including a multi-head attention model, where the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models included in the multi-head attention model, respectively;

s602: the processing the original dimension feature data through the original multimedia push model to obtain the output features of the original multimedia push model comprises the following steps: inputting the original dimension characteristic data into each single-head attention model in the multi-head attention model to obtain a seventh target characteristic output by each single-head attention network in the multi-head attention model;

S604: combining the characteristics output by the personalized model with seventh characteristics output by each single-head attention network in the multi-head attention model to obtain fourth combined characteristics corresponding to each single-head attention network;

s606: and matching with the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

In the embodiment of the disclosure, a CNN model may be added to Multi-head Attention model. Taking Multi-head Attention with 8 heads as an example, attention weights obtained by 8 heads in the traditional method are independent, and in practical application, a situation that a plurality of heads are degenerated into a sub-pool (a pooling layer in a neural network) often exists. According to the scheme, the personalized multimedia resource pushing of the multimedia pushing model can be realized while the attribute degradation can be effectively avoided by combining the attribute weights of different heads based on one-dimensional convolution layers with the number of input and output channels of 8, so that the Martai effect is reduced, the complexity of the processing of the multimedia pushing model can be reduced, and the efficiency of the multimedia pushing model in the processing of the multimedia resource pushing is improved.

In each embodiment of the above method, in the case that the original multimedia push model is a multi-task learning model, different sub-task networks in the multi-task learning model may use personalized models with different parameters to perform data processing. If the original multimedia push model is a multi-task learning model, different powers (tasks or network subtasks) in the multi-task learning model can use personalized models with the same parameters. In a specific example scenario, only two personalized models based on video ID and user ID may be used, or more personalized models based on other video and user feature parameters may be used. Each personalized model can be combined with an original multimedia push model of an original recommendation model to form a multimedia push model, such as a new user multimedia push model or a new video multimedia push model is obtained, or a new user and a new video multimedia push model can be simultaneously aimed at. Different subtask networks in the multi-task learning model can use personalized models with different parameters to process data, so that each task has different personalized characteristic output, the Martai effect of the whole multi-media pushing model can be further reduced, personalized pushing of multi-media resource pushing can be realized, and the user multi-media resource pushing experience is improved.

Based on the foregoing description of the embodiments of the multimedia resource pushing method, the present disclosure further provides a training method of the multimedia resource pushing model. Specifically, fig. 7 is a flowchart illustrating a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 7, a training method of a multimedia resource pushing model, where the multimedia resource pushing model includes an original multimedia pushing model and a personalized model, includes:

s702: acquiring characteristic information of a training object, and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

s704: processing the newly added training sample through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

s706: combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics;

s708: under the condition that the training object comprises a newly added account, matching the combination characteristic with multimedia resources in a multimedia resource library to obtain multimedia push information of the newly added account, wherein the newly added account comprises an account with a time length smaller than a first time length;

S710: and comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

The multimedia resource pushing model obtained by the multimedia resource pushing model training method provided by the embodiment of the disclosure can be applied to an implementation scene comprising the multimedia resource pushing method. The training method of the multimedia resource pushing model mainly comprises the steps of constructing a personalized model through a newly added training sample, and training the original multimedia pushing model and the personalized model through the newly added training sample and an original training sample, wherein the newly added training sample is generated based on characteristic information of the original training sample. The personalized model is fused into the original multimedia push model, and specific fusion can comprise adjustment of model structure and parameters, different combination processing of input and output characteristic data of each model and the like. The training object may include a newly added account or may include a newly added multimedia resource. In general, feature information of a real training object may be used, for example, a preset behavior that a certain newly added account acquired by collection actually praise a certain video may be used as feature information of the training object as the newly added account. In the model training process, the multimedia push information output by the model can be obtained, for example, the probability of praying the video by the newly added account is predicted, then the multimedia push information output by the model is compared with the preset behavior actually happening by the newly added account, and further the network parameters of the whole multimedia push model are updated according to the comparison result.

In the practical model training, user characteristics, video characteristics, user historical behaviors (such as praise, watching duration and other behaviors of a certain or a certain video pushing resource) and the like can be taken as input, and the score of a certain behavior, such as praise possibility, of the user on the video can be estimated. And comparing the estimated score with the actual behaviors of the user in the training sample, such as calculating a BCE loss function (a loss function), further updating the network parameters of the push model, and optimizing the push model.

According to the training method for the multimedia push model, provided by the embodiment of the disclosure, the personalized model of the new user can be constructed in the original multimedia push model, and each new user can be characterized by an independent personalized model. After the personalized model and the original multimedia push model are combined (can be regarded as a multimedia resource push model), the online use is carried out, so that only one model is needed to be deployed online, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more sufficient training based on the characteristic information of each user, and the personalized model is fused into the original multimedia push model fusion, so that the personalized model affects the output of the whole multimedia push model, and the problem that the original multimedia push model always tends to learn the behavior of the old user (Martai effect) is effectively reduced.

Fig. 8 is a flow chart illustrating a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 8, in the case that the training object includes a new multimedia resource, after obtaining the combined feature, the method may further include:

s802: inputting the combined features into the original multimedia pushing model, and matching the combined features with the features of the multimedia resources to obtain multimedia pushing information of the newly-added multimedia resources, wherein the newly-added multimedia resources comprise multimedia resources with the time length shorter than a second time length;

s804: and comparing the multimedia pushing information of the newly-added multimedia resource with the characteristic information of the newly-added multimedia resource, and updating parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cut-off condition.

According to the training method for the multimedia push model provided by the embodiment of the disclosure, the personalized model of the newly added multimedia (such as a new video) can be constructed in the original multimedia push model, and each newly added multimedia can be characterized by an independent personalized model. After the personalized model and the original multimedia push model are combined (can be regarded as a multimedia resource push model), the online use is carried out, so that only one model is needed to be deployed online, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more sufficient training based on the characteristic information of each newly added multimedia, and the personalized model is fused into the fusion of the original multimedia push model, so that the personalized model affects the output of the whole multimedia push model, and the Martai effect of the original multimedia push model is effectively reduced.

Fig. 9 is a flow chart (not shown in part) of a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 9, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

s902: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

s904: inputting the first training characteristics of each subtask network into the personalized model to obtain second training characteristics output by each subtask network;

s906: and respectively splicing the second training features output by each subtask network with the features output by the personalized model to obtain first training combination features of each subtask network.

In the multimedia push model training process, the characteristics output by the personalized model and the characteristics output by the original multimedia push model are combined, and the network model training is performed based on the combined characteristics, so that the Martai effect in the original multimedia push model is relieved, and the multimedia push accuracy of the whole multimedia push model for new users can be improved.

Fig. 10 is a flow chart (not shown in part) of a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 10, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

s1002: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

s1004: combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps: and carrying out weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combination characteristics.

Fig. 11 is a flow chart (not shown in part) of a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 11, the original multimedia push model is a multi-task learning model, and the personalized model is a multi-layer perceptron;

s1102: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

s1104: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model to obtain third training combined characteristics corresponding to each expert network;

s1106: and inputting the third training combination characteristic into a gating network in the multi-task learning model to obtain a fourth training combination characteristic.

In this embodiment, when the original multimedia push model may be a multi-task learning model, such as MMoE (the personalized model is a convolutional neural network), based on the convolutional layer, the output features of multiple experiments and the features of the newly added accounts and/or the newly added multimedia resources may be personalized and combined, and then processed by a gating network

Fig. 12 is a flow chart illustrating a training method of a multimedia asset push model according to an exemplary embodiment. As shown in fig. 12, the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model;

s1202: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises the following steps: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

s1204: combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics, wherein the step of obtaining the combined characteristics comprises the following steps: and respectively combining the characteristics output by the personalized model with fourth training characteristics output by each single-head attention network in the multi-head attention model to obtain fifth combined characteristics corresponding to each single-head attention network.

In the scheme of the embodiment of the disclosure, a CNN model can be added into Multi-head Attention models, and the Attention weights of different heads are combined based on one-dimensional convolution layers with the same number as that of single-head Attention models contained in the Multi-head Attention models, so that personalized multimedia resource pushing of the multimedia pushing model can be realized while the Attention degradation can be effectively avoided, the Martai effect is reduced, the complexity of processing of the multimedia pushing model can be reduced, and the efficiency of the multimedia pushing model for pushing the multimedia resource is provided.

As described above, in another embodiment of the training method of a multimedia push model, in the case where the original multimedia push model is a multi-task learning model, different sub-task networks in the multi-task learning model use personalized models with different parameters for data processing. Different subtask networks in the multi-task learning model can use personalized models with different parameters to process data, so that each task has different personalized characteristic output, the Martai effect of the whole multi-media pushing model can be further reduced, personalized pushing of multi-media resource pushing can be realized, and the user multi-media resource pushing experience is improved.

In the above embodiment of the method for training the multimedia push model, the specific manner of performing the same or similar operations as the multimedia resource push method during online application is described in detail in the embodiment related to the method, which will not be described in detail herein.

It should be understood that, in the present specification, each embodiment of the method is described in a progressive manner, and the same/similar parts of each embodiment are referred to each other, where each embodiment focuses on a difference from other embodiments. For relevance, reference should be made to the description of other method embodiments.

It should be understood that, although the steps in the flowcharts referred to in the drawings are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the figures may include a plurality of steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order in which the steps or stages are performed is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages of other steps or other steps.

Based on the description of the embodiments of the multimedia resource pushing method and the training method of the multimedia resource pushing model, the disclosure further provides a multimedia resource pushing device and a training device of the multimedia resource pushing model. The apparatus may comprise a system (including a distributed system), software (applications), modules, components, servers, clients, etc. that employ the methods described in the embodiments of the present specification in combination with the necessary apparatus to implement the hardware. Based on the same innovative concepts, embodiments of the present disclosure provide for devices in one or more embodiments as described in the following examples. Because the implementation scheme and the method for solving the problem by the device are similar, the implementation of the device in the embodiment of the present disclosure may refer to the implementation of the foregoing method, and the repetition is not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 13 is a block diagram of a multimedia asset pushing device according to an exemplary embodiment. The apparatus may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. Referring specifically to fig. 13, the apparatus 100 may include:

The feature generation module 1302 may be configured to obtain feature information of an object to be processed, and generate original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

the feature processing module 1304 may be configured to input the original dimension feature data and the newly added dimension feature data into a multimedia push model, where the multimedia push model includes an original multimedia push model and a personalized model, process the original dimension feature data through the original multimedia push model to obtain features output by the original multimedia push model, and process the newly added dimension feature data through the personalized model to obtain features output by the personalized model;

the combination matching module 1306 may be configured to match with a multimedia resource in a multimedia resource library based on the feature output by the original multimedia push model and the feature output by the personalized model, to obtain a candidate multimedia resource;

the pushing resource determining module 1308 may be configured to determine a target multimedia resource to be pushed from the candidate multimedia resources.

In another embodiment of the apparatus provided by the present disclosure, the candidate multimedia resources include a push value of a preset behavior, and the determining, from the candidate multimedia resources, a target multimedia resource to be pushed includes:

In another embodiment of the apparatus provided by the present disclosure, the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, according to the push values of the preset behaviors, a target multimedia resource to be pushed from the candidate multimedia resources includes:

determining a target preset behavior in the preset behaviors;

In another embodiment of the device provided by the disclosure, the object to be processed is a new account, the new account includes an account with a time length smaller than a first time length, and/or the multimedia resources in the multimedia resource library include new multimedia resources processed by the multimedia push model, and the new multimedia resources include multimedia resources with a time length smaller than a second time length.

In another embodiment of the apparatus provided by the present disclosure, the matching, based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model, with multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

Fig. 14 is a block diagram of a multimedia asset pushing device (partially not shown) according to an exemplary embodiment. Referring to fig. 14, in the case where the original multimedia push model is a multitasking learning model and the personalized model is a multi-layered perceptron, the combination matching module 1306 may include:

a first target feature unit 1402, configured to convert original dimension feature data of each subtask network in the multitask learning model into first target features having the same dimension as the newly added dimension feature data;

A second target feature unit 1404, configured to input the first target feature of each subtask network into the personalized model, to obtain a second target feature output by each subtask network;

the first combination unit 1406 may be configured to splice the second target features output by each subtask network and the features output by the personalized model, to obtain first combination features of each subtask network;

a third target feature unit 1408, configured to input the first combined feature into each sub-task network in the multi-task learning model to obtain a third target feature;

the first matching unit 1410 may be configured to match with a multimedia resource in the multimedia resource library according to the third target feature, so as to obtain a candidate multimedia resource.

An exemplary embodiment is shown in fig. 15, and fig. 15 is a block diagram (partially not shown) of a multimedia asset pushing device according to an exemplary embodiment. Referring to fig. 15, in the case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron, the feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining the feature output by the original multimedia push model includes: inputting the original dimension characteristic data into each subtask network in the multitask learning model to obtain a fourth target characteristic output by each subtask network in the multitask learning model;

The combination matching module 1306 includes:

a fifth target feature unit 1502, configured to input, to the personalized model, original dimension feature data of each subtask network in the multitask learning model, to obtain a one-dimensional fifth target feature;

a second combining unit 1504, configured to perform weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multitask learning model;

the second matching unit 1506 may be configured to match the multimedia resources in the multimedia resource library according to the second combination feature to obtain candidate multimedia resources.

An exemplary embodiment is shown in fig. 16, and fig. 16 is a block diagram of a multimedia asset pushing device (partially not shown) according to an exemplary embodiment. Referring to fig. 16, in the case that the original multimedia push model is a multi-task learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

The feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining the feature output by the original multimedia push model includes: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain a sixth target characteristic output by each expert network in the multi-task learning model;

the combination matching module 1306 includes;

the third combination unit 1602 may be configured to combine the features output by the personalized model with the sixth target features output by the expert networks, to obtain third combined features corresponding to the expert networks;

a seventh target feature unit 1604, configured to input the third combined feature into a gating network in the multi-task learning model to obtain a seventh target feature;

the third matching unit 1606 may be configured to match with the multimedia resources in the multimedia resource library according to the seventh target feature, so as to obtain candidate multimedia resources.

An exemplary embodiment is shown in fig. 17, and fig. 17 is a block diagram of a multimedia asset delivery device (partially not shown) according to an exemplary embodiment. Referring to fig. 17, in the case where the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network in which the number of input channels and output channels is the same as the number of single-head attention models included in the multi-head attention model,

The feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining the feature output by the original multimedia push model includes: inputting the original dimension characteristic data into each single-head attention model in the multi-head attention model to obtain a seventh target characteristic output by each single-head attention network in the multi-head attention model;

the combination matching module 1306 includes;

the fourth combination unit 1702 may be configured to combine the features output by the personalized model with seventh features output by each single-head attention network in the multi-head attention model to obtain fourth combination features corresponding to each single-head attention network;

and a fourth matching unit 1704, configured to match the multimedia resources in the multimedia resource library according to the fourth combination feature, to obtain candidate multimedia resources.

In another embodiment of the apparatus provided by the present disclosure, in a case where the original multimedia push model is a multi-task learning model, different sub-task networks in the multi-task learning model use personalized models of different parameters for data processing.

Exemplary embodiment fig. 18 is a block diagram of a training apparatus of a multimedia asset push model according to an exemplary embodiment. The apparatus may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. Referring specifically to fig. 18, the multimedia resource pushing model includes an original multimedia pushing model and a personalized model, and the apparatus 200 may include:

the training sample generation module 1802 may be configured to obtain feature information of a training object, and generate an original training sample and a newly added training sample according to the feature information of the training object;

the training sample processing module 1804 may be configured to process the newly added training sample through the personalized model to obtain a feature output by the personalized model, and process the original training sample through the original multimedia push model to obtain a feature output by the original multimedia push model;

the training feature processing module 1806 may be configured to combine the features output by the personalized model with the features output by the original multimedia push model to obtain combined features;

The account feature processing module 1808 may be configured to, when the training object includes a new account, match the combination feature with a multimedia resource in a multimedia resource library to obtain multimedia push information of the new account, where the new account includes an account with a time duration less than a first time duration;

the first parameter updating module 1810 may be configured to compare the multimedia pushing information of the new account with the feature information of the new account, and update parameters of the multimedia resource pushing model according to a comparison result until the comparison result meets a model training deadline condition.

An exemplary embodiment is shown in fig. 19, and fig. 19 is a block diagram of a training apparatus of a multimedia asset push model according to an exemplary embodiment. Referring to fig. 19, the apparatus 200 may further include:

the multimedia resource feature processing module 1902 may be configured to, when the training object includes a new multimedia resource, input the combined feature into the original multimedia push model, and match the combined feature with a feature of the multimedia resource to obtain multimedia push information of the new multimedia resource, where the new multimedia resource includes a multimedia resource that generates a duration less than a second duration;

The second parameter updating module 1904 may be configured to compare the multimedia pushing information of the newly added multimedia resource with the feature information of the newly added multimedia resource, and update parameters of the multimedia resource pushing model according to a comparison result until the comparison result meets a model training cutoff condition.

In another embodiment of the apparatus provided by the present disclosure, in a case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the training sample processing module 1804 processes the original training sample pair through the original multimedia push model, and the obtaining the feature output by the original multimedia push model includes: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

the training feature processing module 1806 combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining the combined features includes:

the training sample processing module 1804 processes the original training sample pair through the original multimedia push model, and the obtaining the feature output by the original multimedia push model includes: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

the training feature processing module 1806 combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining the combined features includes: and carrying out weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combination characteristics.

In another embodiment of the apparatus provided by the present disclosure, in the case where the original multimedia push model is a multitasking learning model, the personalized model is a convolutional neural network,

The training sample processing module 1804 processes the original training sample pair through the original multimedia push model, and the obtaining the feature output by the original multimedia push model includes: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

the training feature processing module 1806 combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining the combined features includes: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model to obtain third training combined characteristics corresponding to each expert network;

In another embodiment of the apparatus provided by the present disclosure, in the case that the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models included in the multi-head attention model,

the training sample processing module 1804 processes the original training sample pair through the original multimedia push model, and the obtaining the feature output by the original multimedia push model includes: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

the training feature processing module 1806 combines the features output by the personalized model with the features output by the original multimedia push model, and the obtaining the combined features includes: and respectively combining the characteristics output by the personalized model with fourth training characteristics output by each single-head attention network in the multi-head attention model to obtain fifth combined characteristics corresponding to each single-head attention network.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

It should be understood that the same/similar parts of the embodiments of the method and apparatus described above may be referred to each other, and each embodiment focuses on differences from other embodiments, and relevant points may be referred to in descriptions of other method embodiments.

Fig. 20 is a schematic block diagram showing an internal structure of a computer device S00 according to an exemplary embodiment. For example, device S00 may be a server. Referring to fig. 20, device S00 includes a processing component S20 that further includes one or more processors, and memory resources represented by memory S22, for storing instructions, such as applications, executable by processing component S20. The application program stored in the memory S22 may include one or more modules each corresponding to a set of instructions. Further, the processing component S20 is configured to execute instructions to perform the above-described method multimedia asset push method and/or the training method of the multimedia asset push model.

Device S00 can also include a power component S24 configured to perform power management of device S00, a wired or wireless network interface S26 configured to connect device S00 to a network, and an input/output (I/O) interface S28. Device S00 may operate based on an operating system stored in memory S22, such as Windows Server, mac OS X, unix, linux, freeBSD, or the like.

In an exemplary embodiment, a computer readable storage medium comprising instructions, such as a memory S22 comprising instructions, is also provided, the instructions being executable by a processor of the device S00 to perform the above-described multimedia resource pushing method and/or training method of the multimedia resource pushing model. The storage medium may be a computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, graphene storage device, etc.

In an exemplary embodiment, a computer program product is also provided, comprising instructions therein, which are executable by a processor of the electronic device S00 to perform the multimedia asset push method and/or the training method of the multimedia asset push model.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a hardware+program class embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment.

It should be noted that the descriptions of the apparatus, the computer device, the server, and the like according to the method embodiments may further include other implementations, and specific implementations may refer to descriptions of related method embodiments. Meanwhile, new embodiments formed by combining features of the embodiments of the method, the device, the equipment and the server still fall within the implementation scope covered by the disclosure, and are not described in detail herein.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when one or more of the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling, communication connection, etc. of the illustrated or described devices or units to each other may be implemented in a direct and/or indirect coupling/connection manner, and may be implemented in an electrical, mechanical or other form by some standard or custom interface, protocol, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A method for pushing multimedia resources, comprising:

determining a target multimedia resource to be pushed from the candidate multimedia resources;

in the foregoing, the object to be processed is a new account, the new account includes an account with a time length less than the first time length, and/or the multimedia resources in the multimedia resource library include new multimedia resources processed by the multimedia push model, and the new multimedia resources include multimedia resources with a time length less than the second time length.

2. The method according to claim 1, wherein the candidate multimedia resources include push values of preset behaviors, and the determining the target multimedia resource to be pushed from the candidate multimedia resources includes:

3. The method according to claim 2, wherein the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and the determining, from the candidate multimedia resources, the target multimedia resources to be pushed according to the push values of the preset behaviors includes:

Determining a target preset behavior in the preset behaviors;

4. The method according to claim 1, wherein the matching the multimedia resources in the multimedia resource library based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain candidate multimedia resources includes:

5. The method of claim 1, wherein the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

6. The method of claim 1, wherein the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

7. The method according to claim 1, wherein the original multimedia push model is a multi-task learning model comprising an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

8. The method of claim 1, wherein the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model;

9. The method according to claim 1, wherein in case the original multimedia push model is a multi-tasking learning model, different sub-tasking networks in the multi-tasking learning model use personalized models of different parameters for data processing.

10. A training method of a multimedia resource push model, wherein the multimedia resource push model comprises an original multimedia push model and a personalized model, the method comprising:

11. The method of claim 10, wherein in the case where the training object comprises a newly added multimedia asset, after deriving the combined feature, the method further comprises:

12. The method of claim 10, wherein the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the processing the original training sample through the original multimedia push model to obtain the output characteristics of the original multimedia push model comprises the following steps: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

13. The method of claim 10, wherein the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the processing the original training sample through the original multimedia push model to obtain the output characteristics of the original multimedia push model comprises the following steps: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

14. The method of claim 10, wherein the original multimedia push model is a multitasking learning model and the personalized model is a convolutional neural network;

the processing the original training sample through the original multimedia push model to obtain the output characteristics of the original multimedia push model comprises the following steps: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

15. The method of claim 10, wherein the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model;

the processing the original training sample through the original multimedia push model to obtain the output characteristics of the original multimedia push model comprises the following steps: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

16. The method according to claim 10, wherein in case the original multimedia push model is a multi-tasking learning model, different sub-tasking networks in the multi-tasking learning model use personalized models of different parameters for data processing.

17. A multimedia asset pushing device, comprising:

the pushing resource determining module is used for determining target multimedia resources to be pushed from the candidate multimedia resources;

18. The apparatus of claim 17, wherein the candidate multimedia resources include push values of preset actions, and wherein the determining the target multimedia resource to be pushed from the candidate multimedia resources includes:

19. The apparatus of claim 18, wherein the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and wherein determining the target multimedia resources to be pushed from the candidate multimedia resources according to the push values of the preset behaviors includes:

determining a target preset behavior in the preset behaviors;

20. The apparatus of claim 17, wherein the matching the multimedia assets in the multimedia asset library based on the features output by the original multimedia push model and the features output by the personalized model to obtain candidate multimedia assets comprises:

21. The apparatus of claim 17, wherein the composition matching module comprises, in the case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron:

22. The apparatus of claim 17, wherein, in the case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron,

The combination matching module comprises:

23. The apparatus of claim 17, wherein in the case where the original multimedia push model is a multi-task learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer having a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multi-task learning model;

The combination matching module comprises;

24. The apparatus of claim 17, wherein in the case where the original multimedia push model is a multi-head attention model and the personalized model is a convolutional neural network, the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model, respectively,

The combination matching module comprises;

25. The apparatus of claim 17, wherein in the case where the original multimedia push model is a multi-tasking learning model, different sub-tasking networks in the multi-tasking learning model use personalized models of different parameters for data processing.

26. A training device for a multimedia resource push model, wherein the multimedia resource push model comprises an original multimedia push model and a personalized model, the device comprising:

the training sample processing module is used for processing the newly added training samples through the personalized model to obtain the characteristics output by the personalized model, and processing the original training samples through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

27. The apparatus of claim 26, wherein the apparatus further comprises:

28. The apparatus of claim 26, wherein in the case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the training sample processing module processes the original training sample through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: respectively converting original training samples of all subtask networks in the multitask learning model into first training features with the same dimensionality as the newly added training samples of the personalized model;

29. The apparatus of claim 26, wherein in the case where the original multimedia push model is a multi-task learning model and the personalized model is a multi-layer perceptron;

the training sample processing module processes the original training sample through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: respectively inputting original training samples of all subtask networks in the multitask learning model into the personalized model to obtain one-dimensional training output characteristics;

30. The apparatus of claim 26, wherein, in the case where the original multimedia push model is a multitasking learning model and the personalized model is a convolutional neural network,

The training sample processing module processes the original training sample through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

31. The apparatus of claim 26, wherein in the case where the original multimedia push model is a multi-head attention model and the personalized model is a convolutional neural network, the number of input channels and output channels in the convolutional neural network is the same as the number of single-head attention models contained in the multi-head attention model, respectively,

The training sample processing module processes the original training sample through the original multimedia push model, and the obtaining of the characteristics output by the original multimedia push model comprises the following steps: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training characteristic output by each single-head attention network in the multi-head attention model;

32. The apparatus of claim 26, wherein in the case where the original multimedia push model is a multi-tasking learning model, different sub-tasking networks in the multi-tasking learning model use personalized models of different parameters for data processing.

33. A computer device, comprising:

At least one processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia resource pushing method of any of claims 1 to 9 and/or to implement the training method of the multimedia resource pushing model of any of claims 10 to 16.

34. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the computer device to perform the multimedia resource push method of any one of claims 1 to 9 and/or to implement the training method of the multimedia resource push model of any one of claims 10 to 16.