CN110852047A

Movatterモバイル変換

Info

Publication number: CN110852047A
Application number: CN201911089616.XA
Authority: CN
Inventors: 缪畅宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-02-28
Anticipated expiration: 2039-11-08
Also published as: CN110852047B

Abstract

The embodiment of the application discloses a text score method, a text score device and a computer storage medium, wherein the method relates to a natural language processing direction in the field of artificial intelligence, and the method comprises the following steps: the method comprises the steps of obtaining a sample text and multi-dimensional sample characteristic information corresponding to the sample text, predicting user feedback information of a browsing user aiming at the multi-dimension of the sample text based on a text score model and the sample characteristic information, obtaining loss corresponding to the user feedback information of each dimension based on the sample characteristic information and the user feedback information, training the text score model based on the loss corresponding to the user feedback information of each dimension to obtain a trained text score model, and predicting target score of a text to be scored based on the trained text score model. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as model input, and a plurality of optimization targets are set, so that the accuracy of the text score is improved.

Description

Translated fromChinese

一种文本配乐方法、装置、以及计算机存储介质A text soundtrack method, device, and computer storage medium

技术领域technical field

本申请涉及计算机技术领域，具体涉及一种文本配乐方法、装置、以及计算机存储介质。The present application relates to the field of computer technology, and in particular, to a text soundtrack method, device, and computer storage medium.

背景技术Background technique

用户在阅读段落、文章、或者在聊天应用中互动聊天等过程中，播放合适的背景音乐能够给用户营造良好的阅读体验，并实现显著增加用户阅读时长、以及用户交互次数等效果。但是，目前为文本配乐需要作者在编辑文本时，自行选择适合该文本或用户的背景音乐，这种文本配乐方法，不仅消耗的成本高，并且选择出的背景音乐未必准确。In the process of reading paragraphs, articles, or interactive chatting in chat applications, playing appropriate background music can create a good reading experience for the user, and achieve the effect of significantly increasing the user's reading time and the number of user interactions. However, at present, the author needs to select the background music suitable for the text or the user when editing the text. This method of text music not only consumes high cost, but also the selected background music may not be accurate.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种文本配乐方法、装置、以及计算机存储介质，可以提升文本配乐的准确性。Embodiments of the present application provide a method, device, and computer storage medium for text soundtrack, which can improve the accuracy of text soundtrack.

本申请实施例提供一种文本配乐方法，包括：The embodiment of the present application provides a text soundtrack method, including:

获取样本文本、以及所述样本文本对应多维度的样本特征信息；obtaining sample text and multi-dimensional sample feature information corresponding to the sample text;

基于文本配乐模型、以及所述样本特征信息，预测浏览用户针对所述样本文本的多维度的用户反馈信息；Based on the text soundtrack model and the sample feature information, predict the multi-dimensional user feedback information of the browsing user for the sample text;

基于所述样本特征信息、以及所述用户反馈信息，获取每个维度用户反馈信息对应的损失；Based on the sample feature information and the user feedback information, obtain the loss corresponding to the user feedback information of each dimension;

基于所述每个维度用户反馈信息对应的损失，对所述文本配乐模型进行训练，得到训练后文本配乐模型；Based on the loss corresponding to the user feedback information of each dimension, the text soundtrack model is trained to obtain a text soundtrack model after training;

基于所述训练后文本配乐模型预测待配乐文本的目标配乐。The target soundtrack of the text to be scored is predicted based on the trained text soundtrack model.

相应的，本申请实施例还提供一种文本配乐装置，包括：Correspondingly, an embodiment of the present application also provides a text soundtrack device, including:

获取模型，用于获取样本文本、以及所述样本文本对应多维度的样本特征信息；an acquisition model for acquiring sample text and multi-dimensional sample feature information corresponding to the sample text;

第一预测模型，用于基于文本配乐模型、以及所述样本特征信息，预测浏览用户针对所述样本文本的多维度的用户反馈信息；a first prediction model, used for predicting the multi-dimensional user feedback information of the browsing user for the sample text based on the text soundtrack model and the sample feature information;

损失获取模型，用于基于所述样本特征信息、以及所述用户反馈信息，获取每个维度用户反馈信息对应的损失；a loss acquisition model, used for acquiring the loss corresponding to the user feedback information of each dimension based on the sample feature information and the user feedback information;

训练模型，用于基于所述每个维度用户反馈信息对应的损失，对所述文本配乐模型进行训练，得到训练后文本配乐模型；A training model for training the text soundtrack model based on the loss corresponding to the user feedback information in each dimension, to obtain a post-training text soundtrack model;

第二预测模型，用于基于所述训练后文本配乐模型预测待配乐文本的目标配乐。The second prediction model is used for predicting the target soundtrack of the text to be composed based on the trained text soundtrack model.

可选的，在一些实施例中，所述获取模块可以包括获取子模块和提取子模块，如下：Optionally, in some embodiments, the acquisition module may include an acquisition sub-module and an extraction sub-module, as follows:

获取子模块，用于获取样本文本、以及所述样本文本对应的多种样本配乐信息；an acquisition submodule for acquiring sample text and various sample soundtrack information corresponding to the sample text;

提取子模块，用于提取所述样本文本、以及所述样本配乐信息的特征，得到多维度的样本特征信息。The extraction sub-module is used for extracting the features of the sample text and the sample soundtrack information to obtain multi-dimensional sample feature information.

则此时，所述提取子模块，具体可以用于基于预设数据库，提取所述样本配乐信息对应的样本配乐特征信息，提取所述样本文本的特征，得到所述样本文本对应的样本文本特征信息。At this time, the extraction sub-module may be specifically configured to extract the sample soundtrack feature information corresponding to the sample soundtrack information based on a preset database, extract the feature of the sample text, and obtain the sample text feature corresponding to the sample text information.

可选的，在一些实施例中，所述第一预测模块可以包括第一预测子模块、第二预测子模块和融合子模块，如下：Optionally, in some embodiments, the first prediction module may include a first prediction sub-module, a second prediction sub-module and a fusion sub-module, as follows:

第一预测子模块，用于基于所述线性子模型、以及所述样本属性信息，预测浏览用户针对所述样本文本的属性预测信息；a first prediction sub-module, configured to predict the attribute prediction information of the browsing user for the sample text based on the linear sub-model and the sample attribute information;

第二预测子模块，用于基于所述深度神经网络子模型、以及所述样本标签信息，预测浏览用户针对所述样本文本的标签预测信息；a second prediction sub-module, configured to predict the tag prediction information of the browsing user for the sample text based on the deep neural network sub-model and the sample tag information;

融合子模块，用于融合所述属性预测信息、以及所述标签预测信息，得到多维度的用户反馈信息。The fusion sub-module is used to fuse the attribute prediction information and the label prediction information to obtain multi-dimensional user feedback information.

则此时，所述第二预测子模块，具体可以用于将所述样本标签信息转换为样本标签特征向量，基于所述深度神经网络子模型、以及所述样本标签特征向量，预测浏览用户针对所述样本文本的标签预测信息。At this time, the second prediction sub-module can specifically be used to convert the sample label information into a sample label feature vector, and based on the deep neural network sub-model and the sample label feature vector, predict that the browsing user will target Label prediction information of the sample text.

可选的，在一些实施例中，所述第二预测模块可以包括第三预测子模块和确定子模块，如下：Optionally, in some embodiments, the second prediction module may include a third prediction sub-module and a determination sub-module, as follows:

第三预测子模块，用于基于所述训练后文本配乐模型、音乐库、以及待配乐文本，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息；The third prediction submodule is used to predict the multi-dimensional target user feedback information of each piece of music in the music library for the text to be composed based on the trained text soundtrack model, the music library, and the text to be composed;

确定子模块，用于根据所述目标用户反馈信息，从所述音乐库中确定所述待配乐文本的目标配乐。A determination sub-module, configured to determine the target soundtrack of the text to be soundtracked from the music library according to the target user feedback information.

则此时，所述第三预测子模块，具体可以用于获取待配乐文本、以及所述待配乐文本对应的多个文本特征，获取音乐库、以及所述音乐库中多首音乐对应的音乐特征，基于所述训练后文本配乐模型、所述文本特征、以及所述音乐特征，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息。Then, at this time, the third prediction sub-module can be specifically used to obtain the text to be composed and a plurality of text features corresponding to the text to be composed, and to obtain the music library and the music corresponding to the multiple pieces of music in the music library. feature, based on the trained text soundtrack model, the text feature, and the music feature, predict the multi-dimensional target user feedback information of each piece of music in the music library for the text to be soundtracked.

则此时，所述确定子模块，具体可以用于对所述音乐库中每首音乐对应的多维度的目标用户反馈信息进行加权融合，得到每首音乐对应的融合后用户反馈信息，根据所述融合后用户反馈信息，从所述音乐库的多首音乐中确定所述待配乐文本的目标配乐。Then, at this time, the determining sub-module can be specifically used to perform weighted fusion of the multi-dimensional target user feedback information corresponding to each music in the music library, and obtain the fusion user feedback information corresponding to each music. According to the user feedback information after the fusion, the target soundtrack of the text to be soundtracked is determined from a plurality of pieces of music in the music library.

此外，本申请实施例还提供一种计算机存储介质，所述计算机存储介质存储有多条指令，所述指令适于处理器进行加载，以执行本申请实施例提供的任一种文本配乐方法中的步骤。In addition, an embodiment of the present application further provides a computer storage medium, where the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute any of the text soundtrack methods provided by the embodiments of the present application. A step of.

本申请实施例可以获取样本文本、以及样本文本对应多维度的样本特征信息，基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息，基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失，基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型，基于训练后文本配乐模型预测待配乐文本的目标配乐。该方案可以通过将样本文本对应的多维度的样本特征信息作为模型输入，并设定多个优化目标，以提升文本配乐的准确性。In this embodiment of the present application, the sample text and the multi-dimensional sample feature information corresponding to the sample text can be obtained, and based on the text soundtrack model and the sample feature information, the multi-dimensional user feedback information of the browsing user for the sample text can be predicted, based on the sample feature information, and User feedback information, obtain the loss corresponding to the user feedback information of each dimension, train the text soundtrack model based on the loss corresponding to the user feedback information of each dimension, obtain the text soundtrack model after training, and predict the text to be soundtracked based on the post-training text soundtrack model. target soundtrack. This solution can improve the accuracy of the text soundtrack by taking the multi-dimensional sample feature information corresponding to the sample text as the model input and setting multiple optimization goals.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本申请实施例提供的文本配乐系统的场景示意图；1 is a schematic diagram of a scene of a text soundtrack system provided by an embodiment of the present application;

图2是本申请实施例提供的文本配乐方法的第一流程图；2 is a first flow chart of a text soundtrack method provided by an embodiment of the present application;

图3是本申请实施例提供的文本配乐方法的第二流程图；3 is a second flowchart of a text soundtrack method provided by an embodiment of the present application;

图4是本申请实施例提供的应用训练后文本配乐模型的流程图；4 is a flowchart of a text soundtrack model after application training provided by an embodiment of the present application;

图5是本申请实施例提供的文本配乐模型训练流程图；Fig. 5 is the training flow chart of the text soundtrack model provided by the embodiment of the present application;

图6是本申请实施例提供的文本配乐装置的结构示意图；6 is a schematic structural diagram of a text soundtrack device provided by an embodiment of the present application;

图7是本申请实施例提供的网络设备的结构示意图。FIG. 7 is a schematic structural diagram of a network device provided by an embodiment of the present application.

具体实施方式Detailed ways

请参照图式，其中相同的组件符号代表相同的组件，本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例，其不应被视为限制本申请未在此详述的其它具体实施例。Please refer to the drawings, wherein the same component symbols represent the same components, and the principles of the present application are exemplified by being implemented in a suitable computing environment. The following description is based on illustrated specific embodiments of the present application and should not be construed as limiting other specific embodiments of the present application not detailed herein.

在以下的说明中，本申请的具体实施例将参考由一部或多部计算机所执行的步骤及符号来说明，除非另有述明。因此，这些步骤及操作将有数次提到由计算机执行，本文所指的计算机执行包括了由代表了以一结构化型式中的数据的电子信号的计算机处理单元的操作。此操作转换该数据或将其维持在该计算机的内存系统中的位置处，其可重新配置或另外以本领域测试人员所熟知的方式来改变该计算机的运作。该数据所维持的数据结构为该内存的实体位置，其具有由该数据格式所定义的特定特性。但是，本申请原理以上述文字来说明，其并不代表为一种限制，本领域测试人员将可了解到以下所述的多种步骤及操作亦可实施在硬件当中。In the following description, specific embodiments of the present application will be described with reference to steps and symbols performed by one or more computers, unless otherwise stated. Accordingly, the steps and operations will be referred to several times as being performed by a computer, which reference herein includes operations by a computer processing unit of electronic signals representing data in a structured format. This operation transforms the data or maintains it in a location in the computer's memory system, which can be reconfigured or otherwise change the operation of the computer in a manner well known to testers in the art. The data structures maintained by the data are physical locations of the memory that have specific characteristics defined by the data format. However, the principle of the present application is described by the above text, which is not meant to be a limitation, and testers in the art will understand that various steps and operations described below can also be implemented in hardware.

本文所使用的术语“模块”可看作为在该运算系统上执行的软件对象。本文所述的不同组件、模块、引擎及服务可看作为在该运算系统上的实施对象。而本文所述的装置及方法可以以软件的方式进行实施，当然也可在硬件上进行实施，均在本申请保护范围之内。As used herein, the term "module" can be thought of as a software object that executes on the computing system. The different components, modules, engines, and services described herein can be viewed as objects of implementation on the computing system. The apparatus and method described herein can be implemented in software, and certainly can also be implemented in hardware, which are all within the protection scope of the present application.

本申请中的术语“第一”、“第二”和“第三”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块，而是某些实施例还包括没有列出的步骤或模块，或某些实施例还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "first," "second," and "third," etc. in this application are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or modules is not limited to the listed steps or modules, but some embodiments also include unlisted steps or modules, or some embodiments Other steps or modules inherent to these processes, methods, products or devices are also included.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本申请实施例提供一种文本配乐方法，该文本配乐方法的执行主体可以是本申请实施例提供的文本配乐装置，或者集成了该文本配乐装置的网络设备，其中该文本配乐装置可以采用硬件或者软件的方式实现。其中，网络设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等设备。网络设备包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或者多个服务器构成的云。An embodiment of the present application provides a text soundtrack method, and the execution body of the text soundtrack method may be the text soundtrack device provided by the embodiment of the present application, or a network device integrated with the text soundtrack device, wherein the text soundtrack device may use hardware or implemented in software. Wherein, the network device may be a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer and other devices. A network device includes, but is not limited to, a computer, a network host, a single network server, a set of multiple network servers, or a cloud of multiple servers.

请参阅图1，图1为本申请实施例提供的文本配乐方法的应用场景示意图，以文本配乐装置集成在网络设备中为例，网络设备可以获取样本文本、以及样本文本对应多维度的样本特征信息，基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息，基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失，基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型，基于训练后文本配乐模型预测待配乐文本的目标配乐。Please refer to FIG. 1. FIG. 1 is a schematic diagram of an application scenario of a text soundtrack method provided by an embodiment of the present application. Taking the integration of a text soundtrack device into a network device as an example, the network device can obtain sample text and multi-dimensional sample features corresponding to the sample text. information, based on the text soundtrack model and the sample feature information, predict the multi-dimensional user feedback information of the browsing users for the sample text, and obtain the loss corresponding to the user feedback information of each dimension based on the sample feature information and the user feedback information. Dimension the loss corresponding to the user feedback information, train the text soundtrack model, obtain the post-training text soundtrack model, and predict the target soundtrack of the text to be soundtracked based on the post-training text soundtrack model.

本申请实施例提供的文本配乐方法涉及人工智能领域中的计算机视觉方向。本申请实施例可以通过图像处理技术，将特征图组中的多张特征图与基准特征图对齐，进而识别出待处理视频图像对应的像素类型信息。The text soundtrack method provided by the embodiment of the present application relates to the direction of computer vision in the field of artificial intelligence. In this embodiment of the present application, the image processing technology can be used to align multiple feature maps in the feature map group with the reference feature map, so as to identify the pixel type information corresponding to the video image to be processed.

其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。其中，人工智能软件技术主要包括计算机视觉技术、机器学习/深度学习等方向。Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. . In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. Among them, artificial intelligence software technology mainly includes computer vision technology, machine learning/deep learning and other directions.

其中，自然语言处理(Nature Language processing，NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此，这一领域的研究将涉及自然语言，即人们日常使用的语言，所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。Among them, Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.

请参阅图2，图2为本申请实施例提供的文本配乐方法的流程示意图，具体通过如下实施例进行说明：Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a text soundtrack method provided by an embodiment of the present application, which is specifically described by the following embodiments:

201、获取样本文本、以及样本文本对应多维度的样本特征信息。201. Obtain sample text and multi-dimensional sample feature information corresponding to the sample text.

其中，样本文本可以为已经配乐完毕，能够作为样本进行模型训练的文本。其中，文本是一种书面语言的表现形式，具有完整、系统含义的一个句子或者多个句子的组合都可以称为文本。文本可以是一个句子、一个段落或者一篇文章，等等。样本文本的种类可以有多种，比如，样本文本可以为作者在公众号、小程序中撰写的完整的文章、可以为用户摘抄或者撰写的一部分段落、或者还可以为用户通过聊天应用进行的对话、聊天等等。Among them, the sample text can be the text that has been composed and can be used as a sample for model training. Among them, text is the expression of a written language, and a sentence or a combination of multiple sentences with complete and systematic meaning can be called text. The text can be a sentence, a paragraph or an article, etc. There can be various types of sample texts. For example, the sample text can be a complete article written by the author in the official account or applet, a part of the paragraph that can be excerpted or written by the user, or can also be the user's conversation through a chat application. , chat, and more.

其中，样本特征信息可以为与样本文本相关的，可以用于模型训练的特征信息，为了使得训练后的模型能够更准确的确定出与文本相匹配的音乐，可以利用多个维度的样本特征信息对模型进行训练。比如，样本特征信息可以包括样本文本匹配的样本音乐对应的特征信息、浏览样本文本的用户群体对应的特征信息、与样本文本发表有关的发表内容对应的特征信息，等等。Among them, the sample feature information can be related to the sample text and can be used for model training. In order to enable the trained model to more accurately determine the music that matches the text, the sample feature information of multiple dimensions can be used. Train the model. For example, the sample feature information may include feature information corresponding to sample music matched by the sample text, feature information corresponding to a user group browsing the sample text, feature information corresponding to published content related to the sample text publication, and the like.

在实际应用中，比如，对模型进行训练之前，可以对训练样本进行确定，从数据库中抽取若干文本作为样本文本，并获取与该样本文本匹配的样本音乐对应的样本特征信息、浏览过样本文本的用户群体对应的样本特征信息、以及样本文本发表的时间、平台等发表内容对应的样本特征信息，等等。为了考虑用户阅读配乐后文本的体验等因素，可以收集浏览过样本文本用户的多种样本反馈信息作为优化目标，进行模型的优化，该样本反馈信息可以通过训练样本的标签形式进行体现。如该样本反馈信息可以包括用户平均浏览时长、用户评论次数、用户打赏金额、用户分享次数中的一种或几种，等等。In practical applications, for example, before training the model, the training samples can be determined, a number of texts can be extracted from the database as sample texts, and the sample feature information corresponding to the sample music that matches the sample texts can be obtained, and the sample texts have been browsed. The sample feature information corresponding to the user group of the user group, as well as the sample feature information corresponding to the published content such as the time when the sample text was published, the platform, etc. In order to consider factors such as the user's experience of the text after reading the soundtrack, various sample feedback information of users who have browsed the sample text can be collected as the optimization target to optimize the model. The sample feedback information can be reflected in the form of labels of the training samples. For example, the sample feedback information may include one or more of the user's average browsing time, the number of user comments, the user's reward amount, and the user's sharing times, and so on.

在一实施例中，样本反馈信息可以不仅限于上述提及的多种反馈信息，还可以根据实际情况或者通过作者指定等，将其他种类的样本反馈信息作为优化目标，进行模型的优化。In an embodiment, the sample feedback information may not be limited to the above-mentioned various types of feedback information, and other types of sample feedback information may be used as optimization targets to optimize the model according to actual conditions or specified by the author.

在一实施例中，为了提升模型的建模能力，需要获取多个维度的样本特征信息，因此可以通过获取到与样本文本对应的多种样本配乐信息，进而对样本配乐信息进行特征提取，得到样本特征信息。具体地，步骤“获取样本文本、以及所述样本文本对应多维度的样本特征信息”，可以包括：In one embodiment, in order to improve the modeling ability of the model, it is necessary to obtain sample feature information of multiple dimensions. Therefore, various sample soundtrack information corresponding to the sample text can be obtained, and then feature extraction is performed on the sample soundtrack information to obtain: Sample feature information. Specifically, the step "obtaining sample text and multi-dimensional sample feature information corresponding to the sample text" may include:

获取样本文本、以及所述样本文本对应的多种样本配乐信息；Obtain sample text and various sample soundtrack information corresponding to the sample text;

提取所述样本文本、以及所述样本配乐信息的特征，得到多维度的样本特征信息。The features of the sample text and the sample soundtrack information are extracted to obtain multi-dimensional sample feature information.

其中，样本配乐信息可以为样本文本对应的与配乐相关的信息，为了提升模型的建模能力，可以获取样本文本对应的多个维度的样本配乐信息。比如，样本配乐信息可以包括音乐维度的样本配乐信息，也即与该样本文本匹配的样本音乐、用户维度的样本配乐信息，也即浏览过该样本文本的用户群体、以及上下文维度的样本配乐信息，也即样本文本发表的时间、平台等信息。The sample soundtrack information may be soundtrack-related information corresponding to the sample text. In order to improve the modeling capability of the model, sample soundtrack information of multiple dimensions corresponding to the sample text may be obtained. For example, the sample soundtrack information may include sample soundtrack information of the music dimension, that is, sample music that matches the sample text, sample soundtrack information of the user dimension, that is, user groups who have browsed the sample text, and sample soundtrack information of the context dimension , that is, the time, platform, and other information when the sample text was published.

在实际应用中，比如，可以从数据库中抽取若干文本作为样本文本，然后获取与该样本文本匹配的样本音乐、浏览过该样本文本的用户群体、以及该样本文本发表的时间、平台等样本配乐信息。其中，与该样本文本匹配的样本音乐的数目不限，如样本文本为“新年快乐”时，样本音乐可以为《新年到》、《过新年》、《新年好》等等多首与新年相关的歌曲。其中，浏览过该样本文本的用户群体可以为多个浏览过该样本文本的用户，通过将多个浏览过该样本文本的用户对应的信息作为训练样本，可以优化用户对于文本配乐的整体感受，以免模型被单个用户的行为所扰动。获取到样本配乐信息后，可以对样本配乐信息进行特征提取，得到样本特征信息。In practical applications, for example, several texts can be extracted from the database as sample texts, and then sample music matching the sample texts, user groups who have browsed the sample texts, and sample soundtracks such as the time and platform when the sample texts were published can be obtained. information. There is no limit to the number of sample music that matches the sample text. For example, when the sample text is "Happy New Year", the sample music can be "New Year's Arrival", "New Year's Eve", "Happy New Year", etc. related to the New Year song. Among them, the user group who has browsed the sample text can be multiple users who have browsed the sample text. By taking the information corresponding to the multiple users who have browsed the sample text as training samples, the user's overall experience of the text soundtrack can be optimized. so that the model is not perturbed by the behavior of a single user. After acquiring the sample soundtrack information, feature extraction can be performed on the sample soundtrack information to obtain sample feature information.

在一实施例中，样本配乐信息不仅可以包括音乐维度、用户维度、以及上下文维度，还可以根据实际情况，添加其他维度的样本配乐信息作为训练样本，以便进行模型的训练。In one embodiment, the sample soundtrack information may not only include the music dimension, the user dimension, and the context dimension, but also sample soundtrack information of other dimensions may be added as training samples according to the actual situation, so as to train the model.

在一实施例中，由于样本配乐信息中包括多个维度，针对不同维度的样本配乐信息进行特征提取的方法也不同，因此，可以根据样本配乐信息的种类，利用不同的特征提取方法提取样本特征信息。具体地，步骤“提取所述样本文本、以及所述样本配乐信息的特征，得到多维度的样本特征信息”，可以包括：In one embodiment, since the sample soundtrack information includes multiple dimensions, the methods for feature extraction for sample soundtrack information of different dimensions are also different. Therefore, different feature extraction methods can be used to extract sample features according to the type of sample soundtrack information. information. Specifically, the step of "extracting the features of the sample text and the sample soundtrack information to obtain multi-dimensional sample feature information" may include:

基于预设数据库，提取所述样本配乐信息对应的样本配乐特征信息；Based on the preset database, extract the sample soundtrack feature information corresponding to the sample soundtrack information;

提取所述样本文本的特征，得到所述样本文本对应的样本文本特征信息。Extracting features of the sample text to obtain sample text feature information corresponding to the sample text.

其中，预设数据库可以为预先建立的，与样本配乐信息相关的数据库，通过查找预设数据库，可以达到获取样本配乐信息对应的样本特征信息的目的。比如，预设数据库中可以包括音乐曲库、音乐评论库、用户画像库等等。其中，音乐曲库可以为包括多首样本音乐的曲库。音乐评论库可以为包括用户对样本音乐的评论信息的数据库。用户画像库可以为包括多个用户对应的用户信息的数据库。The preset database may be a pre-established database related to sample soundtrack information. By searching the preset database, the purpose of obtaining sample feature information corresponding to the sample soundtrack information can be achieved. For example, the preset database may include a music music library, a music comment library, a user portrait library, and the like. Wherein, the music library may be a music library including multiple pieces of sample music. The music review library may be a database that includes user review information for sample music. The user portrait library may be a database including user information corresponding to multiple users.

其中，样本特征信息可以包括样本文本特征信息、以及样本配乐特征信息。The sample feature information may include sample text feature information and sample soundtrack feature information.

其中，样本文本特征信息可以为根据样本文本提取出的特征信息，比如，样本文本特征信息可以包括文本标签信息、上下文属性信息，等等。The sample text feature information may be feature information extracted from the sample text, for example, the sample text feature information may include text label information, context attribute information, and the like.

其中，文本标签信息可以为通过文本的形式表示样本文本的特征的标签，比如，样本文本对应的文本标签信息可以为“情感类”，表示该样本文本为情感类的文本。样本文本对应的文本标签信息还可以为“历史类”、“八卦类”，等等。The text label information may be a label representing the characteristics of the sample text in the form of text. For example, the text label information corresponding to the sample text may be "emotional", indicating that the sample text is an emotional text. The text label information corresponding to the sample text may also be "history category", "gossip category", and so on.

其中，上下文属性信息可以为通过具体类别的形式表示样本文本的特征的属性信息，比如，样本文本对应的上下文属性信息可以为“发表时间2019年1月1日”，表示该样本文本的发表时间为2019年1月1日，又比如，样本文本对应的上下文属性信息还可以为“发表平台xx网站”，表示该样本文本的发表平台为xx网站。The context attribute information may be attribute information representing the characteristics of the sample text in the form of specific categories. For example, the context attribute information corresponding to the sample text may be "Published on January 1, 2019", indicating the publication time of the sample text. It is January 1, 2019. For another example, the context attribute information corresponding to the sample text may also be "publishing platform xx website", indicating that the publishing platform of the sample text is the xx website.

其中，样本配乐特征信息可以为根据样本配乐信息提取出的特征信息，比如，样本配乐特征信息可以包括音乐标签信息、音乐属性信息、读者群标签信息、读者群属性信息，等等。The sample soundtrack feature information may be feature information extracted from the sample soundtrack information. For example, the sample soundtrack feature information may include music tag information, music attribute information, reader group tag information, reader group attribute information, and so on.

其中，音乐标签信息可以为通过文本的形式表示与样本文本匹配的样本音乐的特征的标签，比如，样本音乐对应的音乐标签信息可以为“爵士乐”，表示与样本文本匹配的样本音乐为爵士乐。样本音乐对应的音乐标签信息还可以为“伤感”、“器乐”、“失恋时听”、“安静”，等等。The music tag information may be a tag representing the characteristics of the sample music matching the sample text in the form of text. For example, the music tag information corresponding to the sample music may be "jazz", indicating that the sample music matching the sample text is jazz. The music label information corresponding to the sample music may also be "sad", "instrumental music", "listen when a lovelorn is broken", "quiet", and so on.

其中，音乐属性信息可以为通过具体类别的形式表示与样本文本匹配的样本音乐的特征的属性信息，比如，样本音乐对应的音乐属性信息可以为“时长20min”，表示与样本文本匹配的样本音乐的时长为20分钟。样本音乐对应的音乐属性信息还可以为“类别国语”、“调式C大调”，等等。The music attribute information may be attribute information that expresses the characteristics of the sample music that matches the sample text in the form of specific categories. For example, the music attribute information corresponding to the sample music may be "duration 20min", indicating that the sample music that matches the sample text The duration is 20 minutes. The music attribute information corresponding to the sample music may also be "category Mandarin", "mode C major", and so on.

其中，读者群标签信息可以为通过文本的形式表示浏览样本文本的用户群的特征的标签，比如，用户群对应的读者群标签信息可以为“中年人”，表示浏览该样本文本的用户群多为中年人。用户群对应的读者群标签信息还可以为“二次元”、“耍酷”，等等。The reader group tag information may be a tag that expresses the characteristics of the user group browsing the sample text in the form of text. For example, the reader group tag information corresponding to the user group may be "middle-aged", indicating the user group browsing the sample text. Mostly middle-aged people. The reader group tag information corresponding to the user group may also be "two-dimensional", "play cool", and so on.

其中，读者群属性信息可以为通过具体类别的形式表示浏览样本文本的用户群的特征的属性信息，比如，用户群对应的读者群属性信息可以为“平均年龄28岁”，表示浏览样本文本的用户群的平均年龄为28岁。用户群对应的读者群属性信息还可以为“平均身高170cm”、“主要城市北京/上海”，等等。The reader group attribute information may be attribute information that expresses the characteristics of the user group browsing the sample text in the form of a specific category. For example, the reader group attribute information corresponding to the user group may be "average age 28", indicating that the user group browses the sample text. The average age of the user base is 28. The attribute information of the reader group corresponding to the user group may also be "average height 170cm", "main city Beijing/Shanghai", and so on.

在实际应用中，比如，如图5所示，获取到样本文本、与样本文本匹配的样本音乐、浏览样本文本的用户群后，可以通过查找音乐曲库、以及音乐评论库中的歌曲名、歌词、音乐评论等，获取到音乐标签信息；通过查找音乐曲库获取到音乐属性信息；通过查找用户画像库中用户看过的文章、发表的评论等，获取到读者群标签信息；通过用户画像库获取到读者群属性信息。In practical applications, for example, as shown in Figure 5, after obtaining sample texts, sample music matching the sample texts, and user groups who browse the sample texts, you can search the music music library, as well as the song names in the music comment library, Lyrics, music reviews, etc., to obtain music label information; to obtain music attribute information by looking up the music library; to obtain reader group label information by looking up the articles that the user has read and published comments in the user portrait library; through the user portrait The library obtains the reader group attribute information.

由于样本文本是不断增加的，因此没有通过预先建立样本文本库获取文本标签信息，而是通过对作为训练样本的样本文本进行挖掘，获取文本标签信息。由于上下文属性信息也是根据样本文本获取到的，因此也可以通过对作为训练样本的样本文本进行挖掘，获取上下文属性信息。Since the sample text is constantly increasing, the text label information is not obtained by pre-establishing a sample text library, but by mining the sample text as a training sample to obtain the text label information. Since the context attribute information is also obtained according to the sample text, the context attribute information can also be obtained by mining the sample text as a training sample.

202、基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息。202. Based on the text soundtrack model and the sample feature information, predict the multi-dimensional user feedback information of the browsing user for the sample text.

其中，文本配乐模型可以为能够为待配乐文本匹配目标音乐的网络模型。本申请实施例不对文本配乐模型的种类进行限定，只要是有监督模型，能够为待配乐文本匹配目标音乐的，都可以作为本申请实施例中的文本配乐模型。比如，文本配乐模型可以为wide&deep模型，其中，wide&deep模型是一种通过结合线性模型的记忆能力、以及深度神经网络模型的泛化能力，用于分类和回归的模型，wide&deep模型中包括线性子模型、以及深度神经网络子模型，线性子模型部分是简单的浅层模型，如逻辑回归、svm(Support VectorMachine，支持向量机)等，可以用于处理数值型、类别型等的非文本特征。深度神经网络子模型部分是DNN(Deep Neural Networks，深度神经网络)，可以用于处理标签词，提取特征向量并进行前向传播。在wide&deep模型的训练过程中，同时对两个子模型的参数进行优化，从而达到整体模型的预测能力最优。Wherein, the text soundtrack model may be a network model capable of matching the target music for the text to be soundtracked. This embodiment of the present application does not limit the types of text soundtrack models, as long as it is a supervised model that can match the target music for the text to be soundtracked, it can be used as the text soundtrack model in the embodiments of the present application. For example, the text soundtrack model can be a wide&deep model, wherein the wide&deep model is a model used for classification and regression by combining the memory ability of a linear model and the generalization ability of a deep neural network model. The wide&deep model includes linear sub-models , and the deep neural network sub-model. The linear sub-model part is a simple shallow model, such as logistic regression, svm (Support Vector Machine, support vector machine), etc., which can be used to process non-text features such as numerical and categorical types. The sub-model part of the deep neural network is DNN (Deep Neural Networks, deep neural network), which can be used to process label words, extract feature vectors and perform forward propagation. In the training process of the wide&deep model, the parameters of the two sub-models are optimized at the same time, so as to achieve the optimal prediction ability of the overall model.

其中，用户反馈信息是与用户在背景音乐下阅读文本后的反馈相关的信息，比如，用户反馈信息可以包括用户平均浏览时长、用户评论次数、用户打赏金额、用户分享次数中的一种或几种，等等。The user feedback information is information related to the user's feedback after reading the text under the background music. For example, the user feedback information may include one of the average browsing time of the user, the number of user comments, the amount of the user's reward, and the number of times the user shares or Several, etc.

在实际应用中，比如，获取到多个维度的样本特征信息后，可以将多个维度的样本特征信息输入至文本配乐模型中，并基于该文本配乐模型，预测浏览样本文本的用户针对样本文本的多维度的用户反馈信息，该用户反馈信息可以包括用户平均浏览时长、用户评论次数、用户打赏金额、用户分享次数中的一种或几种，等等。In practical applications, for example, after obtaining the sample feature information of multiple dimensions, the sample feature information of multiple dimensions can be input into the text soundtrack model, and based on the text soundtrack model, it is predicted that users who browse the sample text will respond to the sample text The multi-dimensional user feedback information, the user feedback information may include one or more of the average browsing time of the user, the number of user comments, the user's reward amount, and the number of user sharing, and so on.

在一实施例中，当文本配乐模型为wide&deep模型时，由于文本配乐模型包括线性子模型、以及深度神经网络子模型，因此，需要对样本特征信息进行分类，然后输入模型。具体地，步骤“基于文本配乐模型、以及所述样本特征信息，预测浏览用户针对所述样本文本的多维度的用户反馈信息”，可以包括：In one embodiment, when the text soundtrack model is a wide&deep model, since the text soundtrack model includes a linear sub-model and a deep neural network sub-model, the sample feature information needs to be classified and then input into the model. Specifically, the step of "predicting the multi-dimensional user feedback information of browsing users on the sample text based on the text soundtrack model and the sample feature information" may include:

基于所述线性子模型、以及所述样本属性信息，预测浏览用户针对所述样本文本的属性预测信息；based on the linear sub-model and the sample attribute information, predicting the attribute prediction information of the browsing user for the sample text;

基于所述深度神经网络子模型、以及所述样本标签信息，预测浏览用户针对所述样本文本的标签预测信息；based on the deep neural network sub-model and the sample label information, predicting the label prediction information of the browsing user for the sample text;

融合所述属性预测信息、以及所述标签预测信息，得到多维度的用户反馈信息。The attribute prediction information and the label prediction information are fused to obtain multi-dimensional user feedback information.

其中，样本特征信息包括样本标签信息、以及样本属性信息。The sample feature information includes sample label information and sample attribute information.

其中，样本标签信息可以为通过文本的形式表示样本特征的标签，比如，样本标签信息可以包括音乐标签信息、读者群标签信息、文本标签信息，等等。The sample tag information may be a tag that expresses the characteristics of the sample in the form of text. For example, the sample tag information may include music tag information, reader group tag information, text tag information, and the like.

其中，样本属性信息可以为通过具体类别的形式表示样本特征的属性信息，比如，样本属性信息可以包括音乐属性信息、读者群属性信息、上下文属性信息，等等。The sample attribute information may be attribute information that expresses sample characteristics in the form of specific categories. For example, the sample attribute information may include music attribute information, reader group attribute information, context attribute information, and the like.

其中，文本配乐模型包括线性子模型、以及深度神经网络子模型。Among them, the text soundtrack model includes a linear sub-model and a deep neural network sub-model.

其中，线性子模型部分是简单的浅层模型，如逻辑回归、svm等，可以用于处理数值型、类别型等的非文本特征。线性子模型具有记忆能力，即能够从历史数据中发现特征之间的相关性。Among them, the linear sub-model part is a simple shallow model, such as logistic regression, svm, etc., which can be used to deal with non-text features such as numerical and categorical types. The linear submodel has the ability to memorize, that is, it can find the correlation between features from historical data.

其中，深度神经网络子模型部分是DNN，可以用于处理标签词，提取特征向量并进行前向传播。深度神经网络子模型通过embedding(嵌入)方法，使用低维稠密特征作为输入，可以更好的泛化训练样本中未出现过的特征组合。Among them, the sub-model part of the deep neural network is DNN, which can be used to process label words, extract feature vectors and perform forward propagation. The deep neural network sub-model uses low-dimensional dense features as input through the embedding method, which can better generalize the feature combinations that have not appeared in the training samples.

在实际应用中，比如，如图5所示，可以将样本特征信息分为样本标签信息、以及样本属性信息，将样本属性信息输入至线性子模型中，预测浏览用户针对样本文本的属性预测信息，将样本标签信息输入至深度神经网络子模型，预测浏览用户针对样本文本的标签预测信息，然后将属性预测信息、以及标签预测信息进行融合，得到多维度的用户反馈信息。In practical applications, for example, as shown in Figure 5, the sample feature information can be divided into sample label information and sample attribute information, and the sample attribute information can be input into the linear sub-model to predict the attribute prediction information of the browsing user for the sample text. , input the sample label information into the deep neural network sub-model, predict the label prediction information of the browsing user for the sample text, and then fuse the attribute prediction information and the label prediction information to obtain multi-dimensional user feedback information.

在一实施例中，在深度神经网络子模型中，可以通过将样本标签信息转换为向量形式，进行用户反馈信息的预测。具体地，步骤“基于所述深度神经网络子模型、以及所述样本标签信息，预测浏览用户针对所述样本文本的标签预测信息”，可以包括：In one embodiment, in the deep neural network sub-model, the user feedback information can be predicted by converting the sample label information into a vector form. Specifically, the step of "predicting the tag prediction information of the browsing user for the sample text based on the deep neural network sub-model and the sample tag information" may include:

将所述样本标签信息转换为样本标签特征向量；converting the sample label information into a sample label feature vector;

基于所述深度神经网络子模型、以及所述样本标签特征向量，预测浏览用户针对所述样本文本的标签预测信息。Based on the deep neural network sub-model and the sample tag feature vector, the tag prediction information of the browsing user for the sample text is predicted.

其中，深度神经网络子模型包括嵌入层、以及隐藏层。Among them, the deep neural network sub-model includes an embedded layer and a hidden layer.

其中，嵌入层为位于深度神经网络子模型中，用于处理稀疏特征的网络结构，通过embedding(嵌入)算法的权重矩阵计算来降低维度，从而达到对稀疏特征进行降维的目的。Among them, the embedding layer is a network structure located in the deep neural network sub-model and used to process sparse features. The dimension is reduced by the weight matrix calculation of the embedding algorithm, so as to achieve the purpose of reducing the dimension of the sparse features.

其中，隐藏层位于深度神经网络子模型中，深度神经网络子模型中除输入层和输出层以外的其他各层都为隐藏层，隐藏层不直接接受外界的信号，也不直接向外界发送信号。隐藏层通过对输入特征进行多层次的抽象，最终将输入特征线性划分为不同类型的数据。Among them, the hidden layer is located in the deep neural network sub-model. All other layers except the input layer and the output layer in the deep neural network sub-model are hidden layers. The hidden layer does not directly receive external signals, nor does it directly send signals to the outside world. . The hidden layer abstracts the input features at multiple levels, and finally linearly divides the input features into different types of data.

在实际应用中，比如，可以将样本标签信息输入至深度神经网络子模型，通过嵌入层将样本标签信息转换为样本标签特征向量，该样本标签特征向量可以表示为向量的形式，然后通过隐藏层，预测浏览用户针对样本文本的标签预测信息。In practical applications, for example, the sample label information can be input into the deep neural network sub-model, and the sample label information can be converted into a sample label feature vector through the embedding layer. The sample label feature vector can be expressed in the form of a vector, and then passed through the hidden layer , to predict the label prediction information of the browsing user for the sample text.

203、基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失。203. Based on the sample feature information and the user feedback information, obtain a loss corresponding to the user feedback information of each dimension.

其中，目标函数可以为机器学习中用于达到需要的目标的函数。在机器学习中，为了完成某个目标，需要构造目标函数，然后让函数取极大值或者极小值，从而得到机器学习算法的模型参数。The objective function may be a function used in machine learning to achieve a desired objective. In machine learning, in order to achieve a certain goal, it is necessary to construct an objective function, and then let the function take a maximum or minimum value, so as to obtain the model parameters of the machine learning algorithm.

其中，由于有监督模型在模型训练的时候需要有目标函数，本申请实施例中为了将多种用户反馈情况考虑到模型中，因此可以借鉴多任务学习的框架，设立多个目标函数。比如，本申请实施例可以围绕“用户在背景音乐下阅读文本后的反馈”这一核心思想构建目标函数，构建出的目标函数都是与用户阅读后行为相关的。如目标函数可以根据用户平均浏览时长、用户评论次数、用户打赏金额、用户分享次数等构建。其中，本申请实施例中不对目标函数进行限制，只要是满足核心思想的目标函数都可以，目标函数的种类可以根据实际情况进行调整。Among them, since the supervised model needs to have an objective function during model training, in the embodiment of the present application, in order to consider various user feedback conditions into the model, a multi-task learning framework can be used for reference, and multiple objective functions can be established. For example, in the embodiment of the present application, an objective function may be constructed around the core idea of "feedback after reading a text by a user under background music", and the constructed objective functions are all related to the user's behavior after reading. For example, the objective function can be constructed according to the average browsing time of users, the number of user comments, the amount of user rewards, and the number of user sharing times. The objective function is not limited in the embodiments of the present application, as long as the objective function meets the core idea, and the type of the objective function can be adjusted according to the actual situation.

在实际应用中，比如，基于文本配乐模型获取到多个维度的用户反馈信息之后，可以根据用户反馈信息、以及样本特征信息对应的样本反馈信息，构建多个维度的用户反馈信息对应的多个目标函数，然后对目标函数进行求解，得到求解后的结果。In practical applications, for example, after obtaining the user feedback information of multiple dimensions based on the text soundtrack model, a plurality of user feedback information corresponding to the user feedback information of multiple dimensions can be constructed according to the user feedback information and the sample feedback information corresponding to the sample feature information. Objective function, and then solve the objective function to get the solved result.

204、基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型。204. Based on the loss corresponding to the user feedback information in each dimension, train the text soundtrack model to obtain a trained text soundtrack model.

在实际应用中，比如，对多个目标函数进行求解后，可以根据目标函数的求解结果，对文本配乐模型中的参数进行调整，以达到对文本配乐模型进行训练的目的，当文本配乐模型训练至收敛时，可以得到训练后文本配乐模型。In practical applications, for example, after solving multiple objective functions, the parameters in the text soundtrack model can be adjusted according to the solution results of the objective functions, so as to achieve the purpose of training the text soundtrack model. When the text soundtrack model is trained When it converges, the post-training text soundtrack model can be obtained.

205、基于训练后文本配乐模型预测待配乐文本的目标配乐。205. Predict the target soundtrack of the text to be soundtracked based on the trained text soundtrack model.

在实际应用中，比如，文本配乐模型经过训练得到训练后文本配乐模型之后，可以利用该训练后文本配乐模型预测待配乐文本的目标配乐。In practical applications, for example, after the text soundtrack model is trained to obtain a trained text soundtrack model, the trained text soundtrack model can be used to predict the target soundtrack of the text to be sounded.

在一实施例中，可以通过预测音乐库中每首音乐针对待配乐文本的目标用户反馈信息，进而从音乐库中确定目标配乐。具体地，步骤“基于所述训练后文本配乐模型预测待配乐文本的目标配乐”，可以包括：In one embodiment, the target soundtrack can be determined from the music library by predicting the target user feedback information for each piece of music in the music library with respect to the text to be soundtracked. Specifically, the step "predicting the target soundtrack of the text to be composed based on the trained text soundtrack model" may include:

基于所述训练后文本配乐模型、音乐库、以及待配乐文本，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息；Predict the multi-dimensional target user feedback information of each piece of music in the music library for the text to be composed based on the trained text score model, the music library, and the text to be composed;

根据所述目标用户反馈信息，从所述音乐库中确定所述待配乐文本的目标配乐。According to the target user feedback information, the target soundtrack of the text to be soundtracked is determined from the music library.

在实际应用中，比如，可以获取音乐库，该音乐库中包括多首音乐，训练后文本配乐模型可以从音乐库中确定出目标配乐。还可以获取需要进行文本配乐的待配乐文本，然后使得训练后文本配乐模型，根据音乐库、以及待配乐文本中的信息，分别预测音乐库中每首音乐针对待配乐文本，产生的目标用户反馈信息，然后根据获取到的目标用户反馈信息，从音乐库中确定待配乐文本的目标配乐。In practical applications, for example, a music library can be obtained, the music library includes multiple pieces of music, and the text soundtrack model after training can determine the target soundtrack from the music library. It is also possible to obtain the text to be composed that needs to be composed, and then make the post-training text composition model, according to the music library and the information in the text to be composed, respectively predict the target user feedback generated by each piece of music in the music library for the text to be composed. Then, according to the obtained feedback information of the target user, the target soundtrack of the text to be soundtracked is determined from the music library.

在一实施例中，可以通过提取待配乐文本、以及音乐库对应的特征，方便训练后文本配乐模型进行目标用户反馈信息的预测。具体地，步骤“基于所述训练后文本配乐模型、音乐库、以及待配乐文本，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息”，可以包括：In one embodiment, by extracting the text to be composed and the corresponding features of the music library, it is convenient for the post-training text composition model to predict the feedback information of the target user. Specifically, the step of "predicting the multi-dimensional target user feedback information of each piece of music in the music library for the text to be composed based on the trained text score model, the music library, and the text to be composed" may include:

获取待配乐文本、以及所述待配乐文本对应的多个文本特征；Obtaining the text to be composed and a plurality of text features corresponding to the text to be composed;

获取音乐库、以及所述音乐库中多首音乐对应的音乐特征；Obtain a music library and the music features corresponding to multiple pieces of music in the music library;

基于所述训练后文本配乐模型、所述文本特征、以及所述音乐特征，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息。Based on the trained text soundtrack model, the text feature, and the music feature, predict the multi-dimensional target user feedback information of each piece of music in the music library for the text to be soundtracked.

在实际应用中，比如，如图4所示，可以通过预设的音乐曲库、以及音乐评论库，获取音乐库中每首音乐对应的音乐标签信息、以及音乐属性信息。在作者写完一篇待配乐文本后，可以通过与模型训练时相同的方法获取该待配乐文本对应的文本特征，该文本特征可以包括文本特征信息、读者群特征信息、作者特征信息、以及上下文特征信息，具体地，该文本特征可以包括文本标签信息、读者群标签信息、读者群属性信息、上下文属性信息、以及作者特征信息。然后将音乐库中每首音乐对应的音乐标签信息、音乐库中每首音乐对应的音乐属性信息、待配乐文本对应的文本标签信息、待配乐文本对应的读者群标签信息、以及待配乐文本对应的上下文属性信息输入至训练后文本配乐模型中，预测出音乐库中每首音乐针对待配乐文本的多维度的目标用户反馈信息，如用户平均浏览时长、用户评论次数、用户打赏金额、用户分享次数中的一种或几种。In practical applications, for example, as shown in FIG. 4 , music tag information and music attribute information corresponding to each piece of music in the music library can be obtained through a preset music music library and a music comment library. After the author has written a text to be composed, the text features corresponding to the text to be composed can be obtained by the same method as when the model is trained. The text features can include text feature information, reader group feature information, author feature information, and context. Feature information, specifically, the text feature may include text tag information, reader group tag information, reader group attribute information, context attribute information, and author feature information. Then, the music tag information corresponding to each piece of music in the music library, the music attribute information corresponding to each piece of music in the music library, the text tag information corresponding to the text to be composed, the reader group tag information corresponding to the text to be composed, and the corresponding text to be composed The context attribute information is input into the post-training text soundtrack model, and the multi-dimensional target user feedback information of each music in the music library for the text to be soundtracked is predicted, such as the average browsing time of users, the number of user comments, the amount of user rewards, and the user feedback information. One or more of the share times.

其中，在模型训练阶段，只有与样本文本匹配的样本音乐，才与样本文本一起输入模型中进行模型训练。而在模型使用阶段，可以将音乐库中的每首音乐都与待配乐文本一起输入模型中，进行用户反馈信息的预测，通过这种方法，可以预测到播放音乐库中每首音乐作为待配乐文本的背景音乐时，可能产生的目标用户反馈信息的情况。Among them, in the model training stage, only the sample music that matches the sample text is input into the model together with the sample text for model training. In the model use stage, each music in the music library can be input into the model together with the text to be composed, and the user feedback information can be predicted. Through this method, each music in the music library can be predicted to be played as the music to be composed. The background music of the text may generate feedback information from the target user.

在一实施例中，为了提升目标音乐选取的灵活性，可以根据作者的设置，基于作者更关心的用户反馈信息进行目标音乐的选取。具体地，步骤“根据所述目标用户反馈信息，从所述音乐库中确定所述待配乐文本的目标配乐”，可以包括：In one embodiment, in order to improve the flexibility of target music selection, target music may be selected based on user feedback information that the author cares more about according to the author's settings. Specifically, the step of "determining the target soundtrack of the text to be soundtracked from the music library according to the target user feedback information" may include:

对所述音乐库中每首音乐对应的多维度的目标用户反馈信息进行加权融合，得到每首音乐对应的融合后用户反馈信息；weighted fusion of the multi-dimensional target user feedback information corresponding to each piece of music in the music library, to obtain the post-fusion user feedback information corresponding to each piece of music;

根据所述融合后用户反馈信息，从所述音乐库的多首音乐中确定所述待配乐文本的目标配乐。According to the user feedback information after the fusion, the target soundtrack of the text to be soundtracked is determined from a plurality of pieces of music in the music library.

在实际应用中，比如，当基于训练后文本配乐模型，获取到音乐库中每首音乐对应的多维度的目标用户反馈信息后，如获取到音乐库中每首音乐对应的目标用户平均浏览时长、目标用户评论次数、目标用户打赏金额、目标用户分享次数之后，可以根据作者的指定，对音乐库中的多首音乐进行排序，并推荐目标配乐。In practical applications, for example, after obtaining the multi-dimensional target user feedback information corresponding to each music in the music library based on the trained text soundtrack model, such as obtaining the average browsing time of the target users corresponding to each music in the music library , the number of comments by the target user, the reward amount of the target user, and the number of shares shared by the target user, you can sort the music in the music library according to the author's designation, and recommend the target soundtrack.

比如，获取到音乐库中每首音乐对应的目标用户平均浏览时长、目标用户评论次数、目标用户打赏金额、目标用户分享次数之后，作者更希望提升用户的打赏金额，此时，作者可以设置为按照目标用户打赏金额进行推荐，系统可以根据音乐库中每首音乐预测出的目标用户打赏金额的数值，对音乐库中的多首音乐进行排序，然后将其中目标用户打赏金额数值高的一首或者几首音乐作为目标配乐推荐给作者，作者可以从推荐的目标配乐中选取待配乐文本的配乐。这样，作者能够轻易地了解到为待配乐文本匹配哪些背景音乐，可能会得到更多的打赏。目标用户反馈信息的类型可以根据实际情况进行调整，如可以删除掉作者不关心的用户反馈信息，或者添加新的用户反馈信息。For example, after obtaining the target user's average browsing time, the target user's comments, the target user's reward amount, and the target user's share times corresponding to each music in the music library, the author hopes to increase the user's reward amount. At this time, the author can It is set to recommend according to the reward amount of the target user. The system can sort the music in the music library according to the value of the target user's reward amount predicted by each music in the music library, and then assign the target user's reward amount. One or several pieces of music with a high value are recommended to the author as the target soundtrack, and the author can select the soundtrack of the text to be composed from the recommended target soundtrack. In this way, the author can easily know which background music is matched to the text to be scored, and may get more tips. The type of target user feedback information can be adjusted according to the actual situation, for example, user feedback information that the author does not care about can be deleted, or new user feedback information can be added.

又比如，获取到音乐库中每首音乐对应的目标用户平均浏览时长、目标用户评论次数、目标用户打赏金额、目标用户分享次数之后，作者还可以设置作者本身对于每种用户反馈信息的关注程度，如作者可以将目标用户分享次数设置为最高的关注程度，其次为目标用户打赏金额，再其次为目标用户评论次数，再其次为目标用户平均浏览时长。然后，系统可以根据作者对每种用户反馈信息设置的关注程度，为每种用户反馈信息设置对应的权重，关注度高的用户反馈信息可以给予更高的权重，如可以将目标用户分享次数的权重设置为40％，将目标用户打赏金额的权重设置为30％，将目标用户评论次数的权重设置为20％，将目标用户平均浏览时长的权重设置为10％，然后根据权重，对多种目标用户反馈信息进行加权融合，得到每首音乐对应的融合后用户反馈信息，然后根据融合后用户反馈信息对音乐库中的多首音乐进行排序，并将融合后用户反馈信息较大的一首或几首音乐作为目标配乐推荐给作者，作者可以从推荐的目标配乐中选取待配乐文本的配乐。这样，作者能够根据自己对不同用户反馈信息的关注程度，进行目标音乐的选取。For another example, after obtaining the average browsing time of the target user, the number of comments by the target user, the reward amount of the target user, and the number of shares shared by the target user for each music in the music library, the author can also set the author's own attention to each kind of user feedback information. For example, the author can set the target user’s share count as the highest level of attention, followed by the target user’s reward amount, followed by the target user’s number of comments, and then the target user’s average browsing time. Then, the system can set the corresponding weight for each user feedback information according to the degree of attention set by the author to each user feedback information, and the user feedback information with a high degree of attention can be given a higher weight. The weight is set to 40%, the weight of the reward amount of the target user is set to 30%, the weight of the number of comments of the target user is set to 20%, and the weight of the average browsing time of the target user is set to 10%. The feedback information of each target user is weighted and fused to obtain the user feedback information after fusion corresponding to each music, and then the music in the music library is sorted according to the user feedback information after fusion. One or several pieces of music are recommended to the author as the target soundtrack, and the author can select the soundtrack of the text to be composed from the recommended target soundtrack. In this way, the author can select the target music according to the degree of his attention to the feedback information of different users.

在一实施例中，作者可以不仅可以指定作者本身针对每种用户反馈信息的关注程度的排序，还可以直接指定每种用户反馈信息的权重，如作者可以直接对每种用户反馈信息的权重进行设置。另外，系统对权重的分配不仅限于上述的权重数值，只要保证能够达到作者要求的权重都可以。In one embodiment, the author can not only specify the order of the author's own attention to each kind of user feedback information, but also directly specify the weight of each kind of user feedback information. set up. In addition, the distribution of weights by the system is not limited to the above-mentioned weight values, as long as the weights required by the author can be met.

该文本配乐方法可以应用在文本配乐场景中，一方面能够节省作者的时间，辅助作者寻找到与待配乐文本最匹配的目标音乐，一方面还能够通过系统的智能推荐，为作者寻找到更多潜在的可以作为背景音乐的目标音乐。The text soundtrack method can be applied to the text soundtrack scene. On the one hand, it can save the author's time and assist the author to find the target music that best matches the text to be soundtracked. On the other hand, it can also find more for the author through the system's intelligent recommendation Potential target music that can be used as background music.

在一实施例中，该文本配乐方法不仅能够应用于文本配乐的场景中，还可以应用于多模态闲聊系统中，比如，在用户通过聊天应用进行聊天时，系统可以自动地推荐音乐或者有声读物供用户选择，用户就可以根据系统的推荐，在聊天时播放与当前聊天内容匹配的音乐或者有声读物，如当用户在聊天系统中输入“生日快乐”时，系统可以自动推荐《祝你生日快乐》、《生日快乐》、《生日》等与生日快乐相关的歌曲，这样用户可以选择一首歌曲进行播放，大大提升了聊天时的用户体验。In one embodiment, the text soundtrack method can be applied not only to the scene of text soundtrack, but also to a multimodal chat system. For example, when a user chats through a chat application, the system can automatically recommend music or sound. Readings are for the user to choose, and the user can play music or audiobooks that match the current chat content according to the system's recommendation. For example, when the user enters "Happy Birthday" in the chat system, the system can automatically recommend "Happy Birthday to You". Happy", "Happy Birthday", "Birthday" and other songs related to happy birthday, so that users can choose a song to play, which greatly improves the user experience when chatting.

由上可知，本申请实施例可以获取样本文本、以及样本文本对应多维度的样本特征信息，基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息，基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失，基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型，基于训练后文本配乐模型预测待配乐文本的目标配乐。该方案可以通过将样本文本对应的多维度的样本特征信息作为模型输入，并设定多个优化目标，提升了文本配乐模型的建模能力。根据该文本配乐模型对待配乐文本匹配音乐，不仅能够辅助作者寻找到合适的背景音乐，节省作者的时间，而且能够为作者寻找更多潜在的能够作为背景音乐的音乐，使得作者选取目标音乐的行为更加灵活，并且提升了为待配乐文本匹配音乐的准确性。It can be seen from the above that the embodiment of the present application can obtain the sample text and the multi-dimensional sample feature information corresponding to the sample text, and predict the multi-dimensional user feedback information of the browsing user for the sample text based on the text soundtrack model and the sample feature information, based on the sample text. Feature information and user feedback information, obtain the loss corresponding to the user feedback information of each dimension, train the text soundtrack model based on the loss corresponding to the user feedback information of each dimension, and obtain the post-training text soundtrack model, based on the post-training text soundtrack model Predict the target soundtrack of the text to be scored. This solution can improve the modeling ability of the text soundtrack model by taking the multi-dimensional sample feature information corresponding to the sample text as the model input and setting multiple optimization goals. Treating the soundtrack text matching music according to the text soundtrack model can not only assist the author to find suitable background music, save the author's time, but also find more potential music that can be used as background music for the author, so that the author can choose the target music behavior. More flexibility and improved accuracy in matching music to text to be scored.

根据前面实施例所描述的方法，以下将以该文本配乐装置具体集成在网络设备举例作进一步详细说明。According to the method described in the foregoing embodiment, the following will take the example that the text soundtrack device is specifically integrated in a network device for further detailed description.

参考图3，本申请实施例的文本配乐方法的具体流程可以如下：Referring to FIG. 3 , the specific process of the text soundtrack method according to the embodiment of the present application may be as follows:

301、网络设备获取训练样本，训练样本中包括样本文本、与样本文本匹配的样本音乐、以及浏览过样本文本的样本读者群。301. The network device acquires a training sample, where the training sample includes sample text, sample music matching the sample text, and sample reader groups who have browsed the sample text.

在实际应用中，比如，可以从数据库中抽取的已经配乐完成的文本作为样本文本，并获取与该样本文本匹配的歌曲作为样本音乐、以及浏览过该样本文本的样本读者群的信息，该样本文本、样本音乐、以及样本读者群可以共同构成一个训练样本。In practical applications, for example, the text that has been composed can be extracted from the database as sample text, and the song matching the sample text can be obtained as sample music, and the sample reader group who has browsed the sample text. The text, sample music, and sample readership can collectively form a training sample.

在一实施例中，可以通过丰富训练样本的方式，提升文本配乐模型的准确程度，因此，可以从数据库中获取多个样本文本，以便构成多个训练样本，进行文本配乐模型的训练。另外，与一个样本文本匹配的样本音乐不仅限于一首，也即样本音乐不仅限于历史作为样本文本背景音乐的一首音乐，还可以将与样本文本内容相关，能够作为该样本文本背景音乐的多首音乐都作为样本音乐，以丰富训练样本。同时，为了避免由于单个读者的个别行为，影响模型训练的效果，因此，样本读者群信息可以包括多个浏览过该样本文本的读者，通过扩大样本读者群的数目，保证样本读者群的一般性、以及准确性。In one embodiment, the accuracy of the text soundtrack model can be improved by enriching the training samples. Therefore, multiple sample texts can be obtained from the database to form multiple training samples for training the text soundtrack model. In addition, the sample music that matches a sample text is not limited to one piece, that is, the sample music is not limited to a piece of music that has historically been used as the background music of the sample text. The first music is used as sample music to enrich the training samples. At the same time, in order to avoid the effect of model training due to the individual behavior of a single reader, the sample reader group information can include multiple readers who have browsed the sample text. By expanding the number of sample reader groups, the generality of the sample reader group is guaranteed. , and accuracy.

在一实施例中，为了在文本配乐模型训练过程中，可以根据经过文本配乐模型预测得到的多种用户反馈信息，与真实的样本反馈信息，进行多个维度上目标函数的构建，并通过多个维度上的目标函数进行目标优化，因此，在训练阶段，需要收集浏览过样本文本的读者阅读后的多种样本反馈信息，该收集到的信息为样本文本对应的真实的样本反馈信息，如读者平均浏览时长、读者评论次数、读者打赏金额、以及读者分享次数。这些真实的样本反馈信息可以作为训练模型的标签进行体现。In one embodiment, in the training process of the text soundtrack model, the objective function can be constructed in multiple dimensions according to various user feedback information predicted by the text soundtrack model and the real sample feedback information. Therefore, in the training phase, it is necessary to collect various sample feedback information after reading by readers who have browsed the sample text. The collected information is the real sample feedback information corresponding to the sample text, such as The average browsing time of readers, the number of reader comments, the amount of reader reward, and the number of reader shares. These real sample feedback information can be reflected as the labels of the training model.

302、网络设备获取样本特征信息，样本特征信息包括音乐标签信息、读者群标签信息、文本标签信息、音乐属性信息、读者群属性信息、以及上下文属性信息。302. The network device acquires sample feature information, where the sample feature information includes music tag information, reader group tag information, text tag information, music attribute information, reader group attribute information, and context attribute information.

在实际应用中，比如，获取到样本文本、样本音乐、以及样本读者群之后，可以通过查找预先设定好的音乐曲库、音乐评论库、以及读者画像库，获取音乐标签信息、读者群标签信息、音乐属性信息、读者群属性信息。由于数据库中的样本文本是不停增加的，因此不需要预先建立数据库，而是直接对样本文本进行挖掘，得到文本标签信息、以及上下文属性信息。In practical applications, for example, after obtaining sample text, sample music, and sample reader groups, you can obtain music tag information and reader group tags by searching the preset music music library, music comment library, and reader portrait library. Information, music attribute information, readership attribute information. Since the sample text in the database is constantly increasing, it is not necessary to establish a database in advance, but directly mine the sample text to obtain text label information and context attribute information.

比如，如图5所示，可以查找音乐曲库、以及音乐评论库，通过音乐的音乐名、歌词、音乐评论等，对音乐标签信息进行挖掘；通过查找音乐曲库，得到音乐属性信息；查找读者画像库，通过读者看过的文章、发表的评论等文本中挖掘得到读者群标签信息、以及读者群属性信息。For example, as shown in Figure 5, the music library and the music comment library can be searched, and the music tag information can be mined through the music name, lyrics, music comment, etc.; the music attribute information can be obtained by searching the music library; Reader portrait database, through mining the texts such as articles that readers have read, comments published, etc., to obtain reader group tag information and reader group attribute information.

其中，标签信息是利用文本形式表示的特征，标签信息可以表示为词语的形式，比如，音乐标签信息可以为“爵士乐”、“伤感”、“器乐”、“失恋时听”、“安静”，等等；读者群标签信息可以为“二次元”、“中年人”、“耍酷”，等等；文本标签信息可以为“情感类”、“历史类”、“八卦类”，等等。Among them, the tag information is a feature expressed in the form of text, and the tag information can be expressed in the form of words. For example, the music tag information can be "jazz", "sad", "instrumental music", "listen to when a love is broken", "quiet", And so on; the reader group tag information can be "two-dimensional", "middle-aged", "playing cool", etc.; the text tag information can be "emotional", "historical", "gossip", etc. .

其中，属性信息是利用具体类别形式表示的特征，可以通过“特征名特征值”的形式表示，比如，音乐属性信息可以为“时长20min”、“类别国语”、“调式C大调”，等等；读者群属性信息可以为“平均年龄28岁”、“平均身高170cm”、“主要城市北京/上海”，等等；上下文属性信息可以为“发表时间2019年1月1日”、“发表平台xx网站”，等等。Among them, attribute information is a feature expressed in the form of a specific category, which can be expressed in the form of "feature name feature value", for example, music attribute information can be "duration 20min", "category Mandarin", "mode C major", etc. etc.; readership attribute information can be "average age 28", "average height 170cm", "main city Beijing/Shanghai", etc.; context attribute information can be "published on January 1, 2019", "published Platform xx site", etc.

303、网络设备将样本特征信息输入至文本配乐模型中，预测出样本文本对应多维度的用户反馈信息。303. The network device inputs the sample feature information into the text soundtrack model, and predicts that the sample text corresponds to multi-dimensional user feedback information.

在实际应用中，比如，当文本配乐模型为wide&deep模型时，由于wide&deep模型中包括wide部分、以及deep部分，其中，deep部分是深度神经网络，用于处理标签信息等文本特征，做embedding并前向传播，wide部分是简单的浅层模型，如逻辑回归、svm等，用于处理属性信息等非文本特征。因此，可以将样本特征信息划分为样本标签信息、以及样本属性信息，其中，样本标签信息中包括音乐标签信息、读者群标签信息、以及文本标签信息；样本属性信息中包括音乐属性信息、读者群属性信息、以及上下文属性信息。In practical applications, for example, when the text soundtrack model is a wide&deep model, since the wide&deep model includes a wide part and a deep part, the deep part is a deep neural network, which is used to process text features such as label information. For forward propagation, the wide part is a simple shallow model, such as logistic regression, svm, etc., for processing non-text features such as attribute information. Therefore, the sample feature information can be divided into sample tag information and sample attribute information, wherein the sample tag information includes music tag information, reader group tag information, and text tag information; the sample attribute information includes music attribute information, reader group information attribute information, and contextual attribute information.

然后，将样本标签信息输入至wide&deep模型中的deep部分，通过嵌入算法将样本标签信息转化为向量形式的样本标签特征向量，然后通过多层神经网络预测出样本标签信息对应的标签预测信息。将样本属性信息输入至wide&deep模型中的wide部分，通过wide部分的线性模型预测出样本属性信息对应的属性预测信息。然后将获取到的标签预测信息和属性预测信息进行融合，得到多维度的用户反馈信息。Then, input the sample label information into the deep part of the wide&deep model, convert the sample label information into the sample label feature vector in vector form through the embedding algorithm, and then predict the label prediction information corresponding to the sample label information through the multi-layer neural network. Input the sample attribute information into the wide part of the wide&deep model, and predict the attribute prediction information corresponding to the sample attribute information through the linear model of the wide part. Then, the obtained label prediction information and attribute prediction information are fused to obtain multi-dimensional user feedback information.

其中，本申请实施例中不对文本配乐模型的结构或者类型进行限制，只要是有监督模型，能够预测出样本文本对应多维度的用户反馈信息的网络模型都可以。The embodiments of the present application do not limit the structure or type of the text soundtrack model, as long as it is a supervised model, any network model capable of predicting the multi-dimensional user feedback information corresponding to the sample text may be used.

304、网络设备根据多维度的用户反馈信息、以及训练样本对应的样本标签，构建多个目标函数。304. The network device constructs multiple objective functions according to the multi-dimensional user feedback information and the sample labels corresponding to the training samples.

在实际应用中，由于在网络模型进行训练的过程中，需要利用目标函数使得模型获取到需要的目标。因此，本申请实施例可以借鉴多任务学习的框架，围绕“用户在背景音乐下阅读文章后的反馈”这一核心思想，构建多个与用户阅读后行为相关的目标函数。In practical applications, in the process of training the network model, it is necessary to use the objective function to make the model obtain the required target. Therefore, the embodiments of the present application can draw on the framework of multi-task learning, and build a plurality of objective functions related to the user's behavior after reading around the core idea of "the user's feedback after reading the article in the background music".

比如，获取到多维度的用户反馈信息后，可以将其与训练样本标签对应的样本反馈信息进行比较，并根据多维度的用户反馈信息、以及训练样本对应的样本反馈信息，构建多个维度的目标函数，其中，每个目标函数代表对一种用户反馈信息进行优化，也即，多个目标函数分别代表对读者平均浏览时长、读者评论次数、读者打赏金额、读者分享次数等用户反馈信息进行优化。为了提升文本配乐模型的灵活性，目标函数的种类可以根据实际情况进行调整，只要满足“用户在背景音乐下阅读文章后的反馈”这一核心思想的目标函数，都可以纳入系统之中。For example, after obtaining the multi-dimensional user feedback information, it can be compared with the sample feedback information corresponding to the training sample labels, and according to the multi-dimensional user feedback information and the sample feedback information corresponding to the training samples, a multi-dimensional sample feedback information can be constructed. Objective function, in which each objective function represents the optimization of a kind of user feedback information, that is, multiple objective functions respectively represent user feedback information such as the average browsing time of readers, the number of reader comments, the amount of reader reward, and the number of readers sharing. optimize. In order to improve the flexibility of the text soundtrack model, the type of objective function can be adjusted according to the actual situation, as long as the objective function that meets the core idea of "the user's feedback after reading the article in the background music" can be incorporated into the system.

305、网络设备求解多个目标函数，并根据求解结果调整文本配乐模型的参数，得到训练后文本配乐模型。305. The network device solves the multiple objective functions, and adjusts the parameters of the text soundtrack model according to the solution results, so as to obtain the text soundtrack model after training.

在实际应用中，比如，构建出多个目标函数后，可以对多个目标函数进行求解，并根据求解目标函数的求解结果，对文本配乐模型中的参数进行调整，直至文本配乐模型收敛，得到训练后文本配乐模型。In practical applications, for example, after constructing multiple objective functions, the multiple objective functions can be solved, and according to the solution results of solving the objective functions, the parameters in the text soundtrack model can be adjusted until the text soundtrack model converges, obtaining Text soundtrack model after training.

306、网络设备获取待配乐文本对应的文本特征、以及音乐库中每首音乐对应的音乐特征。306. The network device acquires the text feature corresponding to the text to be composed, and the music feature corresponding to each piece of music in the music library.

在实际应用中，比如，模型训练完毕得到训练后文本配乐模型后，可以利用该训练后文本配乐模型对待配乐文本进行目标配乐的预测。可以获取需要匹配背景音乐的待配乐文本，并挖掘该待配乐文本对应的文本特征，该文本特征可以包括待配乐文本对应的文本标签信息、读者群标签信息、读者群属性信息、上下文属性信息、以及作者特征信息。并且可以获取音乐库，该音乐库中包括多首音乐，然后根据预设的音乐曲库、音乐评论库、以及读者画像库，挖掘音乐库中每首音乐对应的音乐特征，该音乐特征可以包括音乐库中每首音乐对应的音乐标签信息、以及音乐属性信息。In practical applications, for example, after the model is trained to obtain a trained text soundtrack model, the post-trained text soundtrack model can be used to predict the target soundtrack for the soundtrack text. The text to be composed that needs to be matched with the background music can be obtained, and the text features corresponding to the text to be composed can be mined. The text features can include text label information corresponding to the text to be composed, reader group tag information, reader group attribute information, context attribute information, and author characteristics information. And a music library can be obtained, the music library includes multiple pieces of music, and then according to the preset music library, music comment library, and reader portrait library, the music characteristics corresponding to each music in the music library are excavated, and the music characteristics can include: Music tag information and music attribute information corresponding to each piece of music in the music library.

307、网络设备将文本特征和音乐特征输入至训练后文本配乐模型中，预测得到多维度的目标用户反馈信息。307. The network device inputs the text feature and the music feature into the trained text soundtrack model, and predicts to obtain multi-dimensional target user feedback information.

在实际应用中，比如，可以将待配乐文本对应的文本特征、以及音乐库中每首音乐对应的音乐特征输入至训练后文本配乐模型中，然后，基于训练后文本配乐模型预测出音乐库中每首音乐针对待配乐文本的多维度的目标用户反馈信息。In practical applications, for example, the text features corresponding to the text to be scored and the music features corresponding to each piece of music in the music library can be input into the trained text soundtrack model, and then, based on the trained text soundtrack model, the music library can be predicted Each piece of music has multi-dimensional target user feedback information for the text to be composed.

如音乐库中包括音乐1、音乐2、音乐3…等多首音乐，训练后文本配乐模型可以分别预测出音乐1作为待配乐文本的背景音乐时多维度的目标用户反馈信息；音乐2作为待配乐文本的背景音乐时多维度的目标用户反馈信息；音乐3作为待配乐文本的背景音乐时多维度的目标用户反馈信息...。其中，该多维度的目标用户反馈信息包括读者平均浏览时长、读者评论次数、读者打赏金额、以及读者分享次数。For example, the music library includes music 1, music 2, music 3, etc. After training, the text soundtrack model can predict the multi-dimensional target user feedback information when music 1 is used as the background music of the text to be composed; Multi-dimensional target user feedback information when the background music of the soundtrack text; multi-dimensional target user feedback information when Music 3 is used as the background music of the text to be soundtracked.... The multi-dimensional target user feedback information includes the average browsing time of readers, the number of reader comments, the amount of reader reward, and the number of readers sharing.

其中，在训练文本配乐模型时，作为训练样本输入的音乐为与样本文本匹配的样本音乐，而在应用训练后文本配乐模型进行信息预测时，输入的音乐不仅限于与待配乐文本匹配的音乐，而是可以将音乐库中所有的音乐都输入，此时，可以获取到音乐库中每首音乐针对待配乐文本的多维度的目标用户反馈信息。Among them, when training the text soundtrack model, the music input as the training sample is the sample music that matches the sample text, and when applying the post-training text soundtrack model for information prediction, the input music is not limited to the music that matches the text to be soundtracked. Instead, all the music in the music library can be input, and at this time, the multi-dimensional target user feedback information of each music in the music library for the text to be composed can be obtained.

308、网络设备根据目标用户反馈信息，从音乐库的多首音乐中确定目标配乐。308. The network device determines the target soundtrack from multiple pieces of music in the music library according to the feedback information of the target user.

在实际应用中，比如，获取到音乐库中每首音乐针对待配乐文本的多维度的目标用户反馈信息之后，可以为待配乐文本的作者提供多种排序方法，当待配乐文本的作者更关注用户打赏金额时，可以按照用户打赏金额的数值，对音乐库中的多首音乐进行排序，并将音乐库中的一首或者几首音乐作为目标配乐推荐给作者，以便作者从中选取更合适的音乐作为待配乐文本的背景音乐。In practical applications, for example, after obtaining the multi-dimensional target user feedback information of each music in the music library for the text to be composed, a variety of sorting methods can be provided for the author of the text to be composed. When the author of the text to be composed is more concerned about When the user is tipping the amount, he or she can sort the multiple pieces of music in the music library according to the value of the user’s tipping amount, and recommend one or several pieces of music in the music library as the target soundtrack to the author, so that the author can choose more music from it. Appropriate music as the background music for the text to be scored.

在一实施例中，又比如，获取到音乐库中每首音乐针对待配乐文本的多维度的目标用户反馈信息之后，可以按照待配乐文本的作者指定的权重，对多维度的目标用户反馈信息进行融合，得到融合后用户反馈信息，然后根据融合后用户反馈信息的数值，对音乐库中的多首音乐进行排序，并将音乐库中的一首或者几首音乐作为目标配乐推荐给作者，以便作者从中选取更合适的音乐作为待配乐文本的背景音乐。In one embodiment, for another example, after obtaining the multi-dimensional target user feedback information for each piece of music in the music library for the text to be composed, the multi-dimensional target user feedback information can be given according to the weight specified by the author of the text to be composed. Fusion is performed to obtain the user feedback information after fusion, and then according to the value of the user feedback information after fusion, the music in the music library is sorted, and one or several music in the music library is recommended to the author as the target soundtrack. So that the author can choose more suitable music as the background music of the text to be composed.

由上可知，本申请实施例可以通过网络设备获取训练样本，训练样本中包括样本文本、与样本文本匹配的样本音乐、以及浏览过样本文本的样本读者群，获取样本特征信息，样本特征信息包括音乐标签信息、读者群标签信息、文本标签信息、音乐属性信息、读者群属性信息、以及上下文属性信息，将样本特征信息输入至文本配乐模型中，预测出样本文本对应多维度的用户反馈信息，根据多维度的用户反馈信息、以及训练样本对应的样本标签，构建多个目标函数，求解多个目标函数，并根据求解结果调整文本配乐模型的参数，得到训练后文本配乐模型，获取待配乐文本对应的文本特征、以及音乐库中每首音乐对应的音乐特征，将文本特征和音乐特征输入至训练后文本配乐模型中，预测得到多维度的目标用户反馈信息，根据目标用户反馈信息，从音乐库的多首音乐中确定目标配乐。该方案可以通过将样本文本对应的多维度的样本特征信息作为模型输入，并设定多个优化目标，提升了文本配乐模型的建模能力。根据该文本配乐模型对待配乐文本匹配音乐，不仅能够辅助作者寻找到合适的背景音乐，节省作者的时间，而且能够为作者寻找更多潜在的能够作为背景音乐的音乐，使得作者选取目标音乐的行为更加灵活，并且提升了为待配乐文本匹配音乐的准确性。It can be seen from the above that in this embodiment of the present application, a training sample can be obtained through a network device, and the training sample includes sample text, sample music that matches the sample text, and sample reader groups who have browsed the sample text to obtain sample feature information. The sample feature information includes: Music tag information, reader group tag information, text tag information, music attribute information, reader group attribute information, and context attribute information, input the sample feature information into the text soundtrack model, and predict the multi-dimensional user feedback information corresponding to the sample text, According to the multi-dimensional user feedback information and the sample labels corresponding to the training samples, construct multiple objective functions, solve multiple objective functions, and adjust the parameters of the text soundtrack model according to the solution results, obtain the text soundtrack model after training, and obtain the text to be soundtracked. The corresponding text features and the music features corresponding to each piece of music in the music library, input the text features and music features into the text soundtrack model after training, and predict the multi-dimensional target user feedback information, according to the target user feedback information, from the music. Identify the target soundtrack among multiple music in the library. This solution can improve the modeling ability of the text soundtrack model by taking the multi-dimensional sample feature information corresponding to the sample text as the model input and setting multiple optimization goals. Treating the soundtrack text matching music according to the text soundtrack model can not only assist the author to find suitable background music, save the author's time, but also find more potential music that can be used as background music for the author, so that the author can choose the target music behavior. More flexibility and improved accuracy in matching music to text to be scored.

为了更好地实施以上方法，本申请实施例还可以提供一种文本配乐装置，该文本配乐装置具体可以集成在网络设备中，该网络设备可以包括服务器、终端等，其中，终端可以包括：手机、平板电脑、笔记本电脑或个人计算机(PC，Personal Computer)等。In order to better implement the above method, an embodiment of the present application may further provide a text soundtrack device, which may be specifically integrated in a network device, and the network device may include a server, a terminal, etc., wherein the terminal may include: a mobile phone , tablet computer, notebook computer or personal computer (PC, Personal Computer), etc.

例如，如图6所示，该文本配乐装置可以包括获取模块61、第一预测模块62、损失获取模块63、训练模块64和第二预测模块65，如下：For example, as shown in FIG. 6, the text soundtrack device may include an acquisition module 61, a first prediction module 62, a loss acquisition module 63, a training module 64 and a second prediction module 65, as follows:

获取模块61，用于获取样本文本、以及所述样本文本对应多维度的样本特征信息；an acquisition module 61, configured to acquire sample text and multi-dimensional sample feature information corresponding to the sample text;

第一预测模块62，用于基于文本配乐模型、以及所述样本特征信息，预测浏览用户针对所述样本文本的多维度的用户反馈信息；The first prediction module 62 is configured to predict the multi-dimensional user feedback information of the browsing user for the sample text based on the text soundtrack model and the sample feature information;

损失获取模块63，用于基于所述样本特征信息、以及所述用户反馈信息，获取每个维度用户反馈信息对应的损失；A loss obtaining module 63, configured to obtain the loss corresponding to the user feedback information of each dimension based on the sample feature information and the user feedback information;

训练模块64，用于基于所述每个维度用户反馈信息对应的损失，对所述文本配乐模型进行训练，得到训练后文本配乐模型；A training module 64, configured to train the text soundtrack model based on the loss corresponding to the user feedback information of each dimension, to obtain a trained text soundtrack model;

第二预测模块65，用于基于所述训练后文本配乐模型预测待配乐文本的目标配乐。The second prediction module 65 is configured to predict the target soundtrack of the text to be sounded based on the trained text soundtrack model.

在一实施例中，所述获取模块61可以包括获取子模块611和提取子模块612，如下：In one embodiment, the acquisition module 61 may include an acquisition sub-module 611 and an extraction sub-module 612, as follows:

获取子模块611，用于获取样本文本、以及所述样本文本对应的多种样本配乐信息；Obtaining sub-module 611, for obtaining sample text and various sample soundtrack information corresponding to the sample text;

提取子模块612，用于提取所述样本文本、以及所述样本配乐信息的特征，得到多维度的样本特征信息。The extraction sub-module 612 is configured to extract the features of the sample text and the sample soundtrack information to obtain multi-dimensional sample feature information.

在一实施例中，所述提取子模块612可以具体用于：In one embodiment, the extraction sub-module 612 may be specifically used for:

在一实施例中，所述第一预测模块62可以包括第一预测子模块621、第二预测子模块622和融合子模块623，如下：In an embodiment, the first prediction module 62 may include a first prediction sub-module 621, a second prediction sub-module 622 and a fusion sub-module 623, as follows:

第一预测子模块621，用于基于所述线性子模型、以及所述样本属性信息，预测浏览用户针对所述样本文本的属性预测信息；a first prediction sub-module 621, configured to predict the attribute prediction information of the browsing user for the sample text based on the linear sub-model and the sample attribute information;

第二预测子模块622，用于基于所述深度神经网络子模型、以及所述样本标签信息，预测浏览用户针对所述样本文本的标签预测信息；The second prediction sub-module 622 is configured to predict the tag prediction information of the browsing user for the sample text based on the deep neural network sub-model and the sample tag information;

融合子模块623，用于融合所述属性预测信息、以及所述标签预测信息，得到多维度的用户反馈信息。The fusion sub-module 623 is configured to fuse the attribute prediction information and the label prediction information to obtain multi-dimensional user feedback information.

在一实施例中，所述第二预测子模块622可以具体用于：In an embodiment, the second prediction sub-module 622 may be specifically used for:

在一实施例中，所述第二预测模块65可以包括第三预测子模块651和确定子模块652，如下：In an embodiment, the second prediction module 65 may include a third prediction sub-module 651 and a determination sub-module 652, as follows:

第三预测子模块651，用于基于所述训练后文本配乐模型、音乐库、以及待配乐文本，预测所述音乐库中每首音乐针对所述待配乐文本的多维度的目标用户反馈信息；The third prediction sub-module 651 is used to predict the multi-dimensional target user feedback information of each piece of music in the music library for the text to be composed based on the trained text soundtrack model, the music library, and the text to be composed;

确定子模块652，用于根据所述目标用户反馈信息，从所述音乐库中确定所述待配乐文本的目标配乐。The determining sub-module 652 is configured to determine the target soundtrack of the text to be soundtracked from the music library according to the target user feedback information.

在一实施例中，所述第三预测子模块651可以具体用于：In an embodiment, the third prediction sub-module 651 may be specifically used for:

在一实施例中，所述确定子模块652可以具体用于：In one embodiment, the determining sub-module 652 may be specifically used for:

具体实施时，以上各个单元可以作为独立的实体来实现，也可以进行任意组合，作为同一或若干个实体来实现，以上各个单元的具体实施可参见前面的方法实施例，在此不再赘述。During specific implementation, the above units can be implemented as independent entities, or can be arbitrarily combined to be implemented as the same or several entities. The specific implementation of the above units can refer to the previous method embodiments, which will not be repeated here.

由上可知，本申请实施例可以通过获取模块61获取样本文本、以及样本文本对应多维度的样本特征信息，通过第一预测模块62基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息，通过损失获取模块63基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失，通过训练模块64基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型，通过第二预测模块65基于训练后文本配乐模型预测待配乐文本的目标配乐。该方案可以通过将样本文本对应的多维度的样本特征信息作为模型输入，并设定多个优化目标，提升了文本配乐模型的建模能力。根据该文本配乐模型对待配乐文本匹配音乐，不仅能够辅助作者寻找到合适的背景音乐，节省作者的时间，而且能够为作者寻找更多潜在的能够作为背景音乐的音乐，使得作者选取目标音乐的行为更加灵活，并且提升了为待配乐文本匹配音乐的准确性。It can be seen from the above that in this embodiment of the present application, the sample text and the multi-dimensional sample feature information corresponding to the sample text can be obtained through the obtaining module 61, and the first prediction module 62 can predict the sample text based on the text soundtrack model and the sample feature information. The multi-dimensional user feedback information, the loss acquisition module 63 obtains the loss corresponding to the user feedback information of each dimension based on the sample feature information and the user feedback information, and the training module 64 is based on the loss corresponding to the user feedback information of each dimension. The text soundtrack model is trained to obtain a trained text soundtrack model, and the second prediction module 65 predicts the target soundtrack of the text to be sounded based on the trained text soundtrack model. This solution can improve the modeling ability of the text soundtrack model by taking the multi-dimensional sample feature information corresponding to the sample text as the model input and setting multiple optimization goals. Treating the soundtrack text matching music according to the text soundtrack model can not only assist the author to find suitable background music, save the author's time, but also find more potential music that can be used as background music for the author, so that the author can choose the target music behavior. More flexibility and improved accuracy in matching music to text to be scored.

本申请实施例还提供一种网络设备，该网络设备可以集成本申请实施例所提供的任一种文本配乐装置。The embodiment of the present application further provides a network device, and the network device can integrate any text soundtrack apparatus provided by the embodiment of the present application.

例如，如图7所示，其示出了本申请实施例所涉及的网络设备的结构示意图，具体来讲：For example, as shown in FIG. 7 , which shows a schematic structural diagram of a network device involved in an embodiment of the present application, specifically:

该网络设备可以包括一个或者一个以上处理核心的处理器71、一个或一个以上计算机可读存储介质的存储器72、电源73和输入单元74等部件。本领域技术人员可以理解，图7中示出的网络设备结构并不构成对网络设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。其中：The network device may include aprocessor 71 of one or more processing cores, amemory 72 of one or more computer-readable storage media, apower supply 73 and aninput unit 74 and other components. Those skilled in the art can understand that the network device structure shown in FIG. 7 does not constitute a limitation on the network device, and may include more or less components than shown, or combine some components, or arrange different components. in:

处理器71是该网络设备的控制中心，利用各种接口和线路连接整个网络设备的各个部分，通过运行或执行存储在存储器72内的软件程序和/或模块，以及调用存储在存储器72内的数据，执行网络设备的各种功能和处理数据，从而对网络设备进行整体监控。可选的，处理器71可包括一个或多个处理核心；优选的，处理器71可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器71中。Theprocessor 71 is the control center of the network device, uses various interfaces and lines to connect various parts of the entire network device, runs or executes the software programs and/or modules stored in thememory 72, and calls the software programs stored in thememory 72. Data, perform various functions of network equipment and process data, so as to conduct overall monitoring of network equipment. Optionally, theprocessor 71 may include one or more processing cores; preferably, theprocessor 71 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc. , the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into theprocessor 71 .

存储器72可用于存储软件程序以及模块，处理器71通过运行存储在存储器72的软件程序以及模块，从而执行各种功能应用以及数据处理。存储器72可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据网络设备的使用所创建的数据等。此外，存储器72可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器72还可以包括存储器控制器，以提供处理器71对存储器72的访问。Thememory 72 can be used to store software programs and modules, and theprocessor 71 executes various functional applications and data processing by running the software programs and modules stored in thememory 72 . Thememory 72 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required for at least one function, and the like; Data created by the use of network equipment, etc. Additionally,memory 72 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly,memory 72 may also include a memory controller to provideprocessor 71 access tomemory 72 .

网络设备还包括给各个部件供电的电源73，优选的，电源73可以通过电源管理系统与处理器71逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源73还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The network device also includes apower supply 73 for supplying power to various components. Preferably, thepower supply 73 can be logically connected to theprocessor 71 through a power management system, so that functions such as charging, discharging, and power consumption management are implemented through the power management system. Thepower source 73 may also include one or more DC or AC power sources, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and any other components.

该网络设备还可包括输入单元74，该输入单元74可用于接收输入的数字或字符信息，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The network device may also include aninput unit 74 operable to receive input numerical or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.

尽管未示出，网络设备还可以包括显示单元等，在此不再赘述。具体在本实施例中，网络设备中的处理器71会按照如下的指令，将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器72中，并由处理器71来运行存储在存储器72中的应用程序，从而实现各种功能，如下：Although not shown, the network device may further include a display unit and the like, which will not be described herein again. Specifically, in this embodiment, theprocessor 71 in the network device loads the executable files corresponding to the processes of one or more application programs into thememory 72 according to the following instructions, and theprocessor 71 executes them and stores them in thememory 72 . The application program in thememory 72, thereby realizing various functions, as follows:

获取样本文本、以及样本文本对应多维度的样本特征信息，基于文本配乐模型、以及样本特征信息，预测浏览用户针对样本文本的多维度的用户反馈信息，基于样本特征信息、以及用户反馈信息，获取每个维度用户反馈信息对应的损失，基于每个维度用户反馈信息对应的损失，对文本配乐模型进行训练，得到训练后文本配乐模型，基于训练后文本配乐模型预测待配乐文本的目标配乐。Obtain the sample text and the multi-dimensional sample feature information corresponding to the sample text. Based on the text soundtrack model and the sample feature information, predict the multi-dimensional user feedback information of the browsing users for the sample text. Based on the sample feature information and user feedback information, obtain The loss corresponding to the user feedback information of each dimension is based on the loss corresponding to the user feedback information of each dimension, and the text soundtrack model is trained to obtain the trained text soundtrack model.

以上各个操作的具体实施可参见前面的实施例，在此不再赘述。For the specific implementation of the above operations, reference may be made to the foregoing embodiments, and details are not described herein again.

本领域普通技术人员可以理解，上述实施例的各种方法中的全部或部分步骤可以通过指令来完成，或通过指令控制相关的硬件来完成，该指令可以存储于一计算机可读存储介质中，并由处理器进行加载和执行。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructions, or by instructions that control relevant hardware, and the instructions can be stored in a computer-readable storage medium, and loaded and executed by the processor.

为此，本申请实施例提供一种计算机设备，其中存储有多条指令，该指令能够被处理器进行加载，以执行本申请实施例所提供的任一种文本配乐方法中的步骤。例如，该指令可以执行如下步骤：To this end, the embodiments of the present application provide a computer device, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps in any text music method provided by the embodiments of the present application. For example, the instruction can perform the following steps:

其中，该存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取记忆体(RAM，Random Access Memory)、磁盘或光盘等。Wherein, the storage medium may include: a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and the like.

由于该存储介质中所存储的指令，可以执行本申请实施例所提供的任一种文本配乐方法中的步骤，因此，可以实现本申请实施例所提供的任一种文本配乐方法所能实现的有益效果，详见前面的实施例，在此不再赘述。Because the instructions stored in the storage medium can execute the steps in any text soundtrack method provided by the embodiments of the present application, it is possible to implement any of the text soundtrack methods provided by the embodiments of the present application. For the beneficial effects, refer to the foregoing embodiments for details, which will not be repeated here.

以上对本申请实施例所提供的一种文本配乐方法、装置、以及计算机存储介质进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。A text soundtrack method, device, and computer storage medium provided by the embodiments of the present application have been described in detail above. The principles and implementations of the present application are described with specific examples. The descriptions of the above embodiments are only used for In order to help understand the method of the present application and its core idea; at the same time, for those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope. In summary, the content of this specification It should not be construed as a limitation of this application.