CN118840769A

Movatterモバイル変換

Info

Publication number: CN118840769A
Application number: CN202310442519.4A
Authority: CN
Inventors: 司马华鹏; 马希望; 汤毅平; 唐翠翠; 范宏伟; 刘杰; 胡逸
Original assignee: Nanjing Silicon Intelligence Technology Co Ltd
Current assignee: Nanjing Silicon Intelligence Technology Co Ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2024-10-25

Abstract

Translated fromChinese

本公开提供了一种网络直播场景下的人脸融合方法和内容表达设备，涉及计算机技术领域，用于修饰直播主播的个人形象。该方法包括：获取目标直播间的主播人脸图像数据；响应于用户操作，确定意向人脸数据；对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像；向直播视频生成设备发送融合主播人脸图像，以使直播视频生成设备基于融合主播人脸图像，生成直播视频流。应用本公开的技术方案，可以改变主播人脸图像数据对应的人脸形象，从而使直播主播的外在形象更符合用户期待，提高直播间观看用户的留存率。

The present disclosure provides a face fusion method and content expression device in a live broadcast scenario, which relates to the field of computer technology and is used to modify the personal image of a live broadcast host. The method includes: obtaining the host face image data of the target live broadcast room; determining the intended face data in response to user operations; processing the host face image data and the intended face data to obtain the host face feature results and the intended face feature results; fusing the host face feature results and the intended face feature results to generate a fused host face image; sending the fused host face image to a live video generation device, so that the live video generation device generates a live video stream based on the fused host face image. Applying the technical solution of the present disclosure, the facial image corresponding to the host face image data can be changed, so that the external image of the live broadcast host is more in line with user expectations and the retention rate of users watching the live broadcast room is improved.

Description

Translated fromChinese

一种网络直播场景下的人脸融合方法和内容表达设备A face fusion method and content expression device in a live broadcast scenario

技术领域Technical Field

本公开涉及互联网直播领域，尤其涉及一种网络直播场景下的人脸融合方法和内容表达设备。The present disclosure relates to the field of Internet live broadcasting, and in particular to a face fusion method and content expression device in a network live broadcasting scenario.

背景技术Background Art

随着互联网技术的发展，越来越多的人喜欢通过观看直播来丰富自己的业余生活，直播在当代越发受到年轻人的欢迎。With the development of Internet technology, more and more people like to enrich their spare time by watching live broadcasts, and live broadcasts are becoming more and more popular among young people today.

在直播过程中，当直播主播的个人形象不符合主流审美，但是主播又想保留个人形象特色时，可以尝试对主播的个人形象进行修饰，以使主播在保留个人特色的情况下，又能提高外在形象，从而使直播间的受众范围更广。因此，如何修饰主播的个人形象是目前亟需解决的问题。During the live broadcast, when the personal image of the live broadcast host does not conform to the mainstream aesthetics, but the host wants to retain the personal image characteristics, the host can try to modify the personal image of the host, so that the host can retain the personal characteristics and improve the external image, so as to make the audience of the live broadcast room wider. Therefore, how to modify the personal image of the host is an urgent problem to be solved.

发明内容Summary of the invention

为了解决上述技术问题，本公开提供了一种网络直播场景下的人脸融合方法和内容表达设备，用于修饰直播主播的个人形象。In order to solve the above technical problems, the present disclosure provides a face fusion method and content expression device in a live broadcast scenario, which are used to modify the personal image of the live broadcast host.

本公开的技术方案如下：The technical solution of the present disclosure is as follows:

第一方面，本公开提供一种网络直播场景下的人脸融合方法，包括：获取目标直播间的主播人脸图像数据；响应于用户操作，确定意向人脸数据；对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像；向直播视频生成设备发送融合主播人脸图像，以使直播视频生成设备基于融合主播人脸图像，生成直播视频流。In a first aspect, the present disclosure provides a face fusion method in a network live broadcast scenario, comprising: obtaining anchor face image data of a target live broadcast room; determining intended face data in response to user operations; processing the anchor face image data and the intended face data to obtain anchor face feature results and intended face feature results; fusing the anchor face feature results and the intended face feature results to generate a fused anchor face image; and sending the fused anchor face image to a live video generation device, so that the live video generation device generates a live video stream based on the fused anchor face image.

结合第一方面，另一种可能的实现方式中，对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果，包括：基于人脸检测模型，确定主播人脸图像数据对应主播人脸检测图像，以及意向人脸数据对应的意向人脸检测图像；基于人脸识别模型，确定主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。In combination with the first aspect, in another possible implementation method, the host facial image data and the intended facial data are processed to obtain the host facial feature results and the intended facial feature results, including: based on the face detection model, determining the host facial detection image corresponding to the host facial image data, and the intended facial detection image corresponding to the intended facial data; based on the face recognition model, determining the host facial feature results corresponding to the host facial detection image, and the intended facial feature results corresponding to the intended face detection image.

结合第一方面，另一种可能的实现方式中，基于人脸检测模型，确定主播人脸图像数据对应主播人脸检测图像，以及意向人脸数据对应的意向人脸检测图像，包括：利用人脸检测模型，分别对主播人脸图像数据和意向人脸数据进行关键点提取，得到主播人脸图像数据对应的主播人脸检测特征，以及意向人脸数据对应的意向人脸检测特征，主播人脸检测特征包括主播人脸关键点，意向人脸检测特征包括意向人脸关键点；分别将主播人脸关键点和意向人脸关键点映射到预设模板，得到主播人脸映射数据和意向人脸映射数据；分别对主播人脸映射数据和意向人脸映射数据进行空间归一化处理，得到主播人脸检测图像和意向人脸检测图像。In combination with the first aspect, in another possible implementation method, based on a face detection model, determining the host face detection image corresponding to the host face image data and the intended face detection image corresponding to the intended face data, including: using the face detection model to extract key points of the host face image data and the intended face data, respectively, to obtain the host face detection features corresponding to the host face image data, and the intended face detection features corresponding to the intended face data, the host face detection features including the host face key points, and the intended face detection features including the intended face key points; mapping the host face key points and the intended face key points to preset templates, respectively, to obtain the host face mapping data and the intended face mapping data; performing spatial normalization processing on the host face mapping data and the intended face mapping data, respectively, to obtain the host face detection image and the intended face detection image.

结合第一方面，另一种可能的实现方式中，基于人脸识别模型，确定主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果，包括：利用人脸识别模型分别对主播人脸检测图像和意向人脸检测图像进行脸部特征提取，得到主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果，主播人脸特征结果包括主播人脸特征向量，意向人脸特征结果包括意向人脸特征向量。In combination with the first aspect, in another possible implementation method, based on a face recognition model, determining the host face feature results corresponding to the host face detection image and the intended face feature results corresponding to the intended face detection image, including: using the face recognition model to extract facial features of the host face detection image and the intended face detection image respectively, to obtain the host face feature results corresponding to the host face detection image and the intended face feature results corresponding to the intended face detection image, the host face feature results including the host face feature vector, and the intended face feature results including the intended face feature vector.

结合第一方面，另一种可能的实现方式中，将脸部特征提取结果和用户的图像数据进行融合，生成融合主播人脸图像，包括：确定主播人脸图像数据和意向人脸数据的融合权重；基于融合权重，对主播人脸特征结果和意向人脸特征结果进行调整，得到融合主播人脸图像。In combination with the first aspect, in another possible implementation method, the facial feature extraction results and the user's image data are fused to generate a fused host face image, including: determining the fusion weight of the host face image data and the intended face data; based on the fusion weight, adjusting the host face feature results and the intended face feature results to obtain a fused host face image.

第二方面，本公开提供一种内容表达设备，包括获取模块、确定模块、处理模块和融合模块。其中，获取模块，用于获取主播人脸图像数据；确定模块，用于响应于用户操作，确定意向人脸数据；处理模块，用于对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；融合模块，用于对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像；发送模块，用于向直播视频生成设备发送融合主播人脸图像，以使直播视频生成设备基于融合主播人脸图像，生成直播视频流。In a second aspect, the present disclosure provides a content expression device, including an acquisition module, a determination module, a processing module and a fusion module. The acquisition module is used to acquire the host face image data; the determination module is used to determine the intended face data in response to the user operation; the processing module is used to process the host face image data and the intended face data to obtain the host face feature results and the intended face feature results; the fusion module is used to fuse the host face feature results and the intended face feature results to generate a fused host face image; the sending module is used to send the fused host face image to the live video generation device, so that the live video generation device generates a live video stream based on the fused host face image.

结合第二方面，另一种可能的实现方式中，处理模块包括检测单元和识别单元；检测单元，用于基于人脸检测模型，确定主播人脸图像数据对应主播人脸检测图像，以及意向人脸数据对应的意向人脸检测图像；识别单元，用于基于人脸识别模型，确定主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。In combination with the second aspect, in another possible implementation method, the processing module includes a detection unit and an identification unit; the detection unit is used to determine, based on a face detection model, the host face detection image corresponding to the host face image data, and the intended face detection image corresponding to the intended face data; the identification unit is used to determine, based on a face recognition model, the host face feature result corresponding to the host face detection image, and the intended face feature result corresponding to the intended face detection image.

结合第二方面，另一种可能的实现方式中，检测单元，还用于利用人脸检测模型，分别对主播人脸图像数据和意向人脸数据进行关键点提取，得到主播人脸图像数据对应的主播人脸检测特征，以及意向人脸数据对应的意向人脸检测特征，主播人脸检测特征包括主播人脸关键点，意向人脸检测特征包括意向人脸关键点；分别将主播人脸关键点和意向人脸关键点映射到预设模板，得到主播人脸映射数据和意向人脸映射数据；分别对主播人脸映射数据和意向人脸映射数据进行空间归一化处理，得到主播人脸检测图像和意向人脸检测图像。In combination with the second aspect, in another possible implementation method, the detection unit is also used to use the face detection model to extract key points of the host face image data and the intended face data, respectively, to obtain the host face detection features corresponding to the host face image data, and the intended face detection features corresponding to the intended face data, the host face detection features include the host face key points, and the intended face detection features include the intended face key points; respectively map the host face key points and the intended face key points to preset templates to obtain host face mapping data and intended face mapping data; respectively perform spatial normalization processing on the host face mapping data and the intended face mapping data to obtain the host face detection image and the intended face detection image.

结合第二方面，另一种可能的实现方式中，识别单元，还用于利用人脸识别模型分别对主播人脸检测图像和意向人脸检测图像进行脸部特征提取，得到主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果，主播人脸特征结果包括主播人脸特征向量，意向人脸特征结果包括意向人脸特征向量。In combination with the second aspect, in another possible implementation method, the recognition unit is also used to use the face recognition model to extract facial features of the host face detection image and the intended face detection image respectively, to obtain the host face feature results corresponding to the host face detection image, and the intended face feature results corresponding to the intended face detection image, the host face feature results including the host face feature vector, and the intended face feature results including the intended face feature vector.

结合第二方面，另一种可能的实现方式中，融合模块包括权重确定单元和调整单元，权重确定单元，用于确定主播人脸图像数据和意向人脸数据的融合权重；调整单元，用于基于融合权重，对主播人脸特征结果和意向人脸特征结果进行调整，得到融合主播人脸图像。In combination with the second aspect, in another possible implementation method, the fusion module includes a weight determination unit and an adjustment unit. The weight determination unit is used to determine the fusion weight of the host facial image data and the intended facial data; the adjustment unit is used to adjust the host facial feature results and the intended facial feature results based on the fusion weight to obtain a fused host facial image.

第三方面，本公开提供一种电子设备，包括：存储器和处理器，存储器用于存储计算机程序；处理器用于在执行计算机程序时，使得电子设备实现如第一方面提供的任一项的网络直播场景下的人脸融合方法。In a third aspect, the present disclosure provides an electronic device, comprising: a memory and a processor, the memory being used to store a computer program; the processor being used to enable the electronic device to implement a face fusion method in a live broadcast scenario as provided in any one of the first aspects when executing the computer program.

第四方面，本发明提供一种计算机可读存储介质，包括：计算机可读存储介质上存储计算机程序，计算机程序被处理器执行如第一方面提供的任一项的网络直播场景下的人脸融合方法。In a fourth aspect, the present invention provides a computer-readable storage medium, comprising: a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to provide a face fusion method in a live broadcast scenario such as any one of the methods provided in the first aspect.

第五方面，提供了一种包含计算机指令的计算机程序产品，当计算机程序产品在计算机上运行时，使得计算机执行如第一方面提供的任一项的网络直播场景下的人脸融合方法。In a fifth aspect, a computer program product comprising computer instructions is provided. When the computer program product is run on a computer, the computer is enabled to execute a face fusion method in a live broadcast scenario as provided in any one of the first aspects.

第六方面，提供了一种装置(例如，该装置可以是芯片系统)，该装置包括处理器，用于支持电子设备实现上述第一方面中所涉及的功能。在一种可能的设计中，该装置还包括存储器，该存储器，用于保存电子设备必要的程序指令和数据。该装置是芯片系统时，可以由芯片构成，也可以包含芯片和其他分立器件。In a sixth aspect, a device (for example, the device may be a chip system) is provided, the device including a processor for supporting an electronic device to implement the functions involved in the first aspect above. In one possible design, the device also includes a memory for storing program instructions and data necessary for the electronic device. When the device is a chip system, it may be composed of a chip, or may include a chip and other discrete devices.

需要说明的是，上述计算机指令可以全部或者部分存储在第一计算机可读存储介质上。其中，第一计算机可读存储介质可以与数据处理装置的处理器封装在一起的，也可以与数据处理装置的处理器单独封装，本公开对此不作限定。It should be noted that the above computer instructions may be stored in whole or in part on a first computer-readable storage medium, wherein the first computer-readable storage medium may be packaged together with the processor of the data processing device, or may be packaged separately from the processor of the data processing device, which is not limited in the present disclosure.

本公开中第二方面到第六方面的描述，可以参考第一方面的详细描述；并且，第二方面到第五方面的描述的有益效果，可以参考第一方面的有益效果分析，此处不再赘述。The descriptions of the second to sixth aspects of the present disclosure may refer to the detailed description of the first aspect; and the beneficial effects of the descriptions of the second to fifth aspects may refer to the analysis of the beneficial effects of the first aspect, which will not be repeated here.

在本公开中，上述数据处理装置的名字对设备或功能模块本身不构成限定，在实际实现中，这些设备或功能模块可以以其他名称出现。只要各个设备或功能模块的功能和本公开类似，属于本公开权利要求及其等同技术的范围之内。In the present disclosure, the name of the above-mentioned data processing device does not limit the device or functional module itself. In actual implementation, these devices or functional modules may appear with other names. As long as the functions of each device or functional module are similar to those of the present disclosure, they fall within the scope of the claims of the present disclosure and their equivalent technologies.

本公开的这些方面或其他方面在以下的描述中会更加简明易懂。These and other aspects of the present disclosure will become more apparent from the following description.

本公开提供的技术方案与现有技术相比具有如下优点：通过获取目标直播间的主播人脸图像数据和意向人脸数据，然后分别对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；最后对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像，并将融合主播人脸图像发送至直播视频生成设备，以使直播视频生成设备生成直播视频流。应用本公开的技术方案，通过分别处理主播人脸图像数据和意向人脸数据，可以基于主播人脸图像数据和意向人脸数据的处理结果来修饰原始的主播人脸图像数据，以改变主播人脸图像数据对应的人脸形象。从而使直播主播的外在形象更符合用户期待，提高直播间观看用户的留存率。Compared with the prior art, the technical solution provided by the present disclosure has the following advantages: by obtaining the host face image data and intended face data of the target live broadcast room, and then processing the host face image data and intended face data respectively, the host face feature results and intended face feature results are obtained; finally, the host face feature results and intended face feature results are fused to generate a fused host face image, and the fused host face image is sent to the live video generation device, so that the live video generation device generates a live video stream. Applying the technical solution of the present disclosure, by processing the host face image data and intended face data respectively, the original host face image data can be modified based on the processing results of the host face image data and intended face data to change the facial image corresponding to the host face image data. Thereby, the external image of the live broadcast host is more in line with user expectations, and the retention rate of users watching the live broadcast room is improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the present disclosure.

为了更清楚地说明本公开实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1A为本申请实施例提供的一种实施环境的结构示意图。FIG1A is a schematic diagram of the structure of an implementation environment provided in an embodiment of the present application.

图1B为本申请实施例提供的一种直播系统的结构示意图。FIG1B is a schematic diagram of the structure of a live broadcast system provided in an embodiment of the present application.

图1C为本申请实施例提供的一种内容表达设备的硬件示意图。FIG. 1C is a hardware schematic diagram of a content expression device provided in an embodiment of the present application.

图2为本申请实施例提供的一种网络直播场景下的人脸融合方法的流程示意图之一。FIG. 2 is one of the flow charts of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

图3为本申请实施例提供的一种融合模型的选择示意图。FIG3 is a schematic diagram of selecting a fusion model provided in an embodiment of the present application.

图4为本申请实施例提供的一种网络直播场景下的人脸融合方法的流程示意图之二。FIG. 4 is a second flow chart of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

图5为本申请实施例提供的一种网络直播场景下的人脸融合方法的流程示意图之三。FIG5 is a third flow chart of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

图6为本申请实施例提供的一种人脸检测模型的应用示意图。FIG6 is a schematic diagram of an application of a face detection model provided in an embodiment of the present application.

图7为本申请实施例提供的一种网络直播场景下的人脸融合方法的流程示意图之四。FIG. 7 is a fourth flow chart of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

图8为本申请实施例提供的本申请实施例提供的一种网络直播场景下的人脸融合方法的流程示意图之五。FIG8 is a fifth flow chart of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

图9为本申请实施例提供的一种融合界面的显示示意图。FIG. 9 is a schematic diagram showing a display of a fusion interface provided in an embodiment of the present application.

图10为本申请实施例提供的一种内容表达设备的结构示意图。FIG. 10 is a schematic diagram of the structure of a content expression device provided in an embodiment of the present application.

图11是本申请实施例提供的一种网络直播场景下的人脸融合方法的计算机程序产品的结构示意图。FIG11 is a schematic diagram of the structure of a computer program product of a face fusion method in a live broadcast scenario provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了能够更清楚地理解本公开的上述目的、特征和优点，下面将对本公开的方案进行进一步描述。需要说明的是，在不冲突的情况下，本公开的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above-mentioned objectives, features and advantages of the present disclosure, the scheme of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments can be combined with each other without conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本公开，但本公开还可以采用其他不同于在此描述的方式来实施；显然，说明书中的实施例只是本公开的一部分实施例，而不是全部的实施例。In the following description, many specific details are set forth to facilitate a full understanding of the present disclosure, but the present disclosure may also be implemented in other ways different from those described herein; it is obvious that the embodiments in the specification are only part of the embodiments of the present disclosure, rather than all of the embodiments.

需要说明的是，在本文中，诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or device including the elements.

针对背景技术描述的问题，本申请实施例提供一种网络直播场景下的人脸融合方法，通过获取主播人脸图像数据和意向人脸数据，然后分别对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；最后对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像，并将融合主播人脸图像发送至直播视频生成设备，以使直播视频生成设备生成直播视频流。应用本公开的技术方案，通过分别处理主播人脸图像数据和意向人脸数据，可以基于主播人脸图像数据和意向人脸数据的处理结果来修饰原始的主播人脸图像数据，以改变主播人脸图像数据对应的人脸形象。从而使直播主播的外在形象更符合用户期待，提高直播间的观看用户的留存率。In view of the problems described in the background technology, the embodiment of the present application provides a face fusion method in a network live broadcast scenario, by obtaining the host face image data and the intended face data, and then processing the host face image data and the intended face data respectively to obtain the host face feature results and the intended face feature results; finally, the host face feature results and the intended face feature results are fused to generate a fused host face image, and the fused host face image is sent to a live video generation device, so that the live video generation device generates a live video stream. Applying the technical solution disclosed in the present invention, by processing the host face image data and the intended face data respectively, the original host face image data can be modified based on the processing results of the host face image data and the intended face data to change the facial image corresponding to the host face image data. Thereby, the external image of the live broadcast host is more in line with user expectations, and the retention rate of viewing users in the live broadcast room is improved.

下面对本申请实施例提供的一种网络直播场景下的人脸融合方法进行描述。本领域普通技术人员可知，随着技术的发展和新场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。The following describes a face fusion method in a live broadcast scenario provided by an embodiment of the present application. A person skilled in the art will appreciate that, with the development of technology and the emergence of new scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.

本公开实施例提供的一种网络直播场景下的人脸融合方法的执行主体可以为本公开实施例提供的内容表达设备，也可以为包括该内容表达设备的电子设备，具体的可以根据实际使用需求确定，本公开实施例不作限定。The execution subject of the face fusion method in a live broadcast scenario provided by the embodiment of the present disclosure can be the content expression device provided by the embodiment of the present disclosure, or it can be an electronic device including the content expression device. The specific execution subject can be determined according to actual usage requirements, and the embodiment of the present disclosure is not limited thereto.

本申请实施例提供一种如图1A所示的实施环境，该实施环境中可以包括有直播系统01、直播平台02和用户终端03。其中，直播系统01和直播平台02之间可以通过有线或无线的通讯方式进行通信，直播平台02和用户终端03之间也可以通过有线或无线的通讯方式进行通信。The embodiment of the present application provides an implementation environment as shown in FIG1A , which may include a live broadcast system 01, a live broadcast platform 02, and a user terminal 03. The live broadcast system 01 and the live broadcast platform 02 may communicate with each other via a wired or wireless communication method, and the live broadcast platform 02 and the user terminal 03 may also communicate with each other via a wired or wireless communication method.

直播系统01主要用于生成直播视频，并将直播视频推送至直播平台02进行直播。本申请实施例中，直播平台02可以是一个或多个。图1A中仅以一个直播平台02进行示例，不对此做具体限制。The live broadcast system 01 is mainly used to generate live broadcast videos and push the live broadcast videos to the live broadcast platform 02 for live broadcast. In the embodiment of the present application, there can be one or more live broadcast platforms 02. FIG. 1A only uses one live broadcast platform 02 as an example, and no specific limitation is made thereto.

直播平台02主要用于在接收到来自直播系统01的直播视频后，将直播数据按照特定规则推流至用户终端03，以使用户终端03对直播视频对应的直播数据进行播放供用户观看。本申请实施例中，用户终端03可以存在一个或多个。用户终端03可以为安装有直播平台02对应的直播应用的电子设备。The live broadcast platform 02 is mainly used to push the live broadcast data to the user terminal 03 according to specific rules after receiving the live broadcast video from the live broadcast system 01, so that the user terminal 03 plays the live broadcast data corresponding to the live broadcast video for the user to watch. In the embodiment of the present application, there can be one or more user terminals 03. The user terminal 03 can be an electronic device with a live broadcast application corresponding to the live broadcast platform 02 installed.

如图1B所示，该直播系统01可以包括：数据感知设备10、流程驱动设备20、内容表达设备30、决策设备40和直播数据生成设备50。As shown in FIG. 1B , the live broadcast system 01 may include: a data perception device 10 , a process driving device 20 , a content expression device 30 , a decision device 40 and a live broadcast data generation device 50 .

其中，决策设备40分别可以与数据感知设备10、流程驱动设备20以及内容表达设备30通信。流程驱动设备20以及内容表达设备30均可以与直播数据生成设备50通信。直播数据生成设备50还可以与决策设备40通信。需要说明的是，数据感知设备10、流程驱动设备20、内容表达设备30、决策设备40和直播数据生成设备50可以是单独的多个设备，还可以是一个设备中的不同功能部件，或者数据中心中实现不同功能的模块。具体根据实际需求而定，本申请对此不做具体限制。Among them, the decision device 40 can communicate with the data perception device 10, the process driving device 20 and the content expression device 30 respectively. The process driving device 20 and the content expression device 30 can both communicate with the live data generation device 50. The live data generation device 50 can also communicate with the decision device 40. It should be noted that the data perception device 10, the process driving device 20, the content expression device 30, the decision device 40 and the live data generation device 50 can be multiple separate devices, or different functional components in a device, or modules that implement different functions in a data center. It depends on actual needs, and this application does not make specific restrictions on this.

其中，数据感知模块10用于获取用户的需求数据。流程驱动设备20可以通过数据感知模块10获取用户的需求数据，并根据用户的需求数据，生成直播流程数据和直播规则数据。内容表达设备30可以通过数据感知模块10获取用户的需求数据，并根据用户的需求数据，生成直播内容数据。其中，用户的需求数据可以包括意向人脸数据，直播内容数据包括融合主播人脸图像。直播数据生成设备50可以根据直播流程数据、直播规则数据和直播内容数据生成直播视频。决策设备20可以根据直播视频的播放数据生成直播决策数据，并基于直播决策数据调整流程驱动设备20生成的直播流程数据、直播规则数据和/或内容表达设备30生成的直播内容数据。Among them, the data perception module 10 is used to obtain the user's demand data. The process driving device 20 can obtain the user's demand data through the data perception module 10, and generate live broadcast process data and live broadcast rule data according to the user's demand data. The content expression device 30 can obtain the user's demand data through the data perception module 10, and generate live broadcast content data according to the user's demand data. Among them, the user's demand data may include intended face data, and the live broadcast content data includes a fused anchor face image. The live broadcast data generation device 50 can generate a live broadcast video according to the live broadcast process data, live broadcast rule data and live broadcast content data. The decision device 20 can generate live broadcast decision data according to the playback data of the live broadcast video, and adjust the live broadcast process data, live broadcast rule data and/or live broadcast content data generated by the process driving device 20 based on the live broadcast decision data.

图1C是本公开实施例提供的内容表达设备的硬件示意图。内容表达设备包括但不限于服务器、平板电脑、笔记本电脑、掌上电脑以及终端等。如图1C所示，内容表达设备包括处理器101、存储器102、网络接口103和总线104。其中，处理器101、存储器102以及网络接口103之间可以通过总线104连接，或采用其他方式相互连接。FIG1C is a hardware schematic diagram of a content expression device provided in an embodiment of the present disclosure. The content expression device includes, but is not limited to, a server, a tablet computer, a laptop computer, a PDA, and a terminal. As shown in FIG1C , the content expression device includes a processor 101, a memory 102, a network interface 103, and a bus 104. The processor 101, the memory 102, and the network interface 103 may be connected via a bus 104, or may be connected to each other in other ways.

处理器101是内容表达设备的控制中心，处理器101可以是通用中央处理单元(central processing unit，CPU)，也可以是其他通用处理器等，其中，通用处理器可以是微处理器或者是任何常规的处理器等。示例性的，处理器101可以包括一个或多个CPU。该CPU为单核CPU(single-CPU)或多核CPU(multi-CPU)。The processor 101 is the control center of the content expression device. The processor 101 may be a general-purpose central processing unit (CPU) or other general-purpose processors, wherein the general-purpose processor may be a microprocessor or any conventional processor. Exemplarily, the processor 101 may include one or more CPUs. The CPU is a single-core CPU (single-CPU) or a multi-core CPU (multi-CPU).

存储器102包括但不限于是随机存取存储器(random access memory，RAM)、只读存储器(read only memory，ROM)、可擦除可编程只读存储器(erasable programmableread-only memory，EPROM)、快闪存储器、或光存储器、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。The memory 102 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, or optical memory, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer.

一种可能的实现方式中，存储器102可以独立于处理器101存在。存储器102可以通过总线104与处理器101相连接，用于存储数据、指令或者程序代码。处理器101调用并执行存储器102中存储的指令或程序代码时，能够实现本申请实施例提供的资源推荐确定方法。In one possible implementation, the memory 102 may exist independently of the processor 101. The memory 102 may be connected to the processor 101 via a bus 104 and used to store data, instructions, or program codes. When the processor 101 calls and executes the instructions or program codes stored in the memory 102, the resource recommendation determination method provided in the embodiment of the present application can be implemented.

另一种可能的实现方式中，存储器102也可以和处理器101集成在一起。In another possible implementation, the memory 102 may also be integrated with the processor 101 .

网络接口103是有线接口(端口)，例如光纤分布式数据接口(fiber distributeddata interface，FDDI)、千兆以太网(gigabit ethernet，GE)接口。或者，网络接口103是无线接口。应理解，网络接口103包括多个物理端口，网络接口103可以用于接收或发送用户行为序列或者待预测资源序列。The network interface 103 is a wired interface (port), such as a fiber distributed data interface (FDDI) or a gigabit Ethernet (GE) interface. Alternatively, the network interface 103 is a wireless interface. It should be understood that the network interface 103 includes a plurality of physical ports, and the network interface 103 can be used to receive or send a user behavior sequence or a resource sequence to be predicted.

可选地，内容表达设备还包括输入输出接口105，输入输出接口105用于与输入设备连接，接收用户通过输入设备输入的信息。输入设备包括但不限于键盘、触摸屏、麦克风等等。输入输出接口105还用于与输出设备连接，输出处理器101的处理结果。输出设备包括但不限于显示器、打印机等等。Optionally, the content expression device further includes an input/output interface 105, which is used to connect to an input device and receive information input by a user through the input device. The input device includes but is not limited to a keyboard, a touch screen, a microphone, etc. The input/output interface 105 is also used to connect to an output device and output the processing result of the processor 101. The output device includes but is not limited to a display, a printer, etc.

总线104，可以是工业标准体系结构(industry standard architecture，ISA)总线、外部设备互连(peripheral component interconnect，PCI)总线或扩展工业标准体系结构(extended industry standard architecture，EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示，图1C中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The bus 104 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG1C only uses one thick line, but does not mean that there is only one bus or one type of bus.

需要指出的是，图1C中示出的结构并不构成对该内容表达设备的限定，除图1C所示部件之外，该内容表达设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。It should be noted that the structure shown in FIG. 1C does not constitute a limitation on the content expression device. In addition to the components shown in FIG. 1C , the content expression device may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

参考图2，本申请实施例提供的一种网络直播场景下的人脸融合方法的流程图。如图2所示，该方法可以包括步骤201-步骤204。Referring to Fig. 2, a flow chart of a face fusion method in a live broadcast scenario provided by an embodiment of the present application. As shown in Fig. 2, the method may include steps 201 to 204.

步骤201、获取目标直播间的主播人脸图像数据。Step 201: Obtain the host's facial image data in the target live broadcast room.

其中，目标直播间为任一需要对主播人脸执行处理的直播间。Among them, the target live broadcast room is any live broadcast room that needs to process the host's face.

本实施例中，内容表达设备首先需要获取主播人脸图像数据。该主播人脸图像数据可以是一张包含人脸的图像。该主播人脸图像数据中的人脸可以是任一直播场景下的直播主播的人脸。In this embodiment, the content expression device first needs to obtain the host face image data. The host face image data can be an image containing a face. The face in the host face image data can be the face of a live broadcast host in any live broadcast scene.

示例性的，在网络直播场景下，该主播人脸图像数据可以是在生成直播间之前获取的。其中，该主播人脸图像数据可以是用户上传的包含直播主播人脸的图像，还可以是用户上传的包含直播主播人脸的视频。通过对该视频进行截取得到的主播人脸图像数据。该人脸视频可以是由至少两帧连续的图像构成的视频，且该至少两帧连续的图像中存在一帧包含直播主播完整人脸的图像。该主播人脸图像数据还可以是在直播过程中通过直播间的直播摄像头采集的直播主播的人脸图像。Exemplarily, in a live broadcast scenario, the host face image data may be obtained before the live broadcast room is generated. The host face image data may be an image uploaded by a user containing the live broadcast host's face, or a video uploaded by a user containing the live broadcast host's face. The host face image data is obtained by intercepting the video. The face video may be a video consisting of at least two consecutive frames of images, and one of the at least two consecutive frames of images contains an image of the live broadcast host's complete face. The host face image data may also be a live broadcast host's face image captured by a live broadcast camera in the live broadcast room during the live broadcast.

在一些实施例中，内容表达设备还可以从预处理装置获取主播人脸图像数据，预处理装置对待处理图像/待处理视频进行预处理，得到主播人脸图像数据。其中，预处理过程可以是先获取待处理图像，然后利用人脸检测技术从待处理图像中识别出人脸区域，并以此人脸区域为中心进行扩大，从而得到人脸图像。该人脸图像包含了人脸以及人脸周边的部分背景区域。类似的，对于待处理视频中的每帧待处理图像也可采用上述方式得到人脸图像，由此得到主播人脸图像数据。在本申请实施例中，预处理装置可以属于内容表达设备中的一部分，也可以是单独于内容表达设备之外的装置(或者其中的一部分)，本申请对此不做具体限制。在预处理装置不属于内容表达设备的情况下，预处理装置或者预处理装置所在的装置可以与内容表达设备进行通信，以使内容表达设备通过预处理装置获取主播人脸图像数据。In some embodiments, the content expression device can also obtain the host face image data from the preprocessing device, and the preprocessing device preprocesses the image to be processed/video to be processed to obtain the host face image data. Among them, the preprocessing process can be to first obtain the image to be processed, and then use the face detection technology to identify the face area from the image to be processed, and expand it with the face area as the center to obtain the face image. The face image includes the face and part of the background area around the face. Similarly, for each frame of the image to be processed in the video to be processed, the face image can also be obtained in the above manner, thereby obtaining the host face image data. In the embodiment of the present application, the preprocessing device can be a part of the content expression device, or it can be a device (or a part thereof) separate from the content expression device, and the present application does not make specific restrictions on this. In the case where the preprocessing device does not belong to the content expression device, the preprocessing device or the device where the preprocessing device is located can communicate with the content expression device so that the content expression device obtains the host face image data through the preprocessing device.

步骤202、响应于用户操作，确定意向人脸数据。Step 202: Determine intended facial data in response to user operation.

示例性的，用户操作可以为选择操作以及删除操作等。例如，选择操作可以是用户在多个预设融合模型中选择第一融合模型，其中，第一融合模型为明星A的人脸图像模型。删除操作可以是用户在多个选中的预设融合模型中删除一些融合模型。Exemplarily, the user operation may be a selection operation and a deletion operation, etc. For example, the selection operation may be that the user selects a first fusion model from multiple preset fusion models, where the first fusion model is a face image model of celebrity A. The deletion operation may be that the user deletes some fusion models from multiple selected preset fusion models.

其中，意向人脸数据可以是基于用户操作后，得到的图像数据(即上述的融合模型)。Among them, the intended face data can be image data obtained after user operation (that is, the above-mentioned fusion model).

例如，用户操作是用户在多个预设融合模型中选择第一融合模型，则第一融合模型即为意向人脸数据。若第一操作是用户在多个选中的预设融合模型中删除一些融合模型，则意向人脸数据为用户在多个选中的预设融合模型中删除一些融合模型后剩下的唯一的融合模型。如图3所示，内容表达设备接收到用户的选择操作，该选择操作具体是用户在多个融合模型中选择融合模型04。响应于用户对融合模型04的选择操作，确定融合模型04为意向人脸数据。其中，选择操作的可操作对象详见图3所示的内容。For example, if the user operation is that the user selects the first fusion model from multiple preset fusion models, then the first fusion model is the intended facial data. If the first operation is that the user deletes some fusion models from multiple selected preset fusion models, then the intended facial data is the only fusion model left after the user deletes some fusion models from multiple selected preset fusion models. As shown in FIG3 , the content expression device receives the user's selection operation, which specifically is that the user selects fusion model 04 from multiple fusion models. In response to the user's selection operation on fusion model 04, fusion model 04 is determined to be the intended facial data. Among them, the operable objects of the selection operation are detailed in the content shown in FIG3 .

步骤203、对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果。Step 203: Process the anchor face image data and the intended face data to obtain the anchor face feature results and the intended face feature results.

在一些实施例中，内容表达设备获取到主播人脸图像数据和意向人脸数据后，对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸图像数据对应的主播人脸特征结果以及意向人脸数据对应的意向人脸特征结果。示例性的，主播人脸特征结果可以是一维向量。主播人脸特征结果用于表征主播人脸图像数据中的人脸特征。意向人脸特征结果也可以是一维向量。意向人脸特征结果用于表征意向人脸数据中的目标人脸特征。In some embodiments, after the content expression device obtains the host face image data and the intended face data, it processes the host face image data and the intended face data to obtain the host face feature result corresponding to the host face image data and the intended face feature result corresponding to the intended face data. Exemplarily, the host face feature result can be a one-dimensional vector. The host face feature result is used to characterize the facial features in the host face image data. The intended face feature result can also be a one-dimensional vector. The intended face feature result is used to characterize the target face features in the intended face data.

步骤204、对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像。Step 204: fuse the host face feature result and the intended face feature result to generate a fused host face image.

在一些实施例中，内容表达设备得到主播人脸特征结果和意向人脸特征结果后，对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像。其中，对主播人脸特征结果和意向人脸特征结果进行融合可以是根据用户需求对主播人脸特征结果和意向人脸特征结果进行融合。若用户需求为保留更多直播主播的个人特色，则融合主播人脸图像更接近主播人脸图像数据。若用户需求为保留更多的意向人脸数据的脸部特征，则融合主播人脸图像更接近意向人脸数据。In some embodiments, after obtaining the anchor face feature result and the intended face feature result, the content expression device fuses the anchor face feature result and the intended face feature result to generate a fused anchor face image. The fusion of the anchor face feature result and the intended face feature result can be based on user needs. If the user needs to retain more personal characteristics of the live broadcast anchor, the fused anchor face image is closer to the anchor face image data. If the user needs to retain more facial features of the intended face data, the fused anchor face image is closer to the intended face data.

步骤205、向直播视频生成设备发送融合主播人脸图像，以使直播视频生成设备基于融合主播人脸图像，生成直播视频流。Step 205: Send the fused host face image to the live video generation device, so that the live video generation device generates a live video stream based on the fused host face image.

在一些实施例中，当内容表达设备得到融合主播人脸图像后，可以将融合主播人脸图像发送至直播视频生成设备，以使直播视频生成设备将融合主播人脸图像和其他直播元素(例如，音频元素、场景元素等等)结合起来生成直播视频流。In some embodiments, after the content expression device obtains the fused host face image, it can send the fused host face image to the live video generation device, so that the live video generation device combines the fused host face image with other live elements (for example, audio elements, scene elements, etc.) to generate a live video stream.

本公开实施例提供的一种网络直播场景下的人脸融合方法，通过获取目标直播间的主播人脸图像数据和意向人脸数据，然后分别对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；最后对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像，并将融合主播人脸图像发送至直播视频生成设备，以使直播视频生成设备生成直播视频流。应用本公开的技术方案，通过分别处理主播人脸图像数据和意向人脸数据，可以基于主播人脸图像数据和意向人脸数据的处理结果来修饰原始的主播人脸图像数据，以改变主播人脸图像数据对应的人脸形象。从而使直播主播的外在形象更符合用户期待，提高直播间观看用户的留存率。The disclosed embodiment provides a face fusion method in a network live broadcast scenario, which obtains the host face image data and intended face data of the target live broadcast room, and then processes the host face image data and intended face data respectively to obtain the host face feature results and intended face feature results; finally, the host face feature results and intended face feature results are fused to generate a fused host face image, and the fused host face image is sent to a live video generation device, so that the live video generation device generates a live video stream. Applying the technical solution of the disclosed invention, by processing the host face image data and intended face data respectively, the original host face image data can be modified based on the processing results of the host face image data and intended face data to change the facial image corresponding to the host face image data. Thereby, the external image of the live broadcast host is more in line with user expectations, and the retention rate of users watching the live broadcast room is improved.

在一些实施例中，如图4所示，步骤203包括：In some embodiments, as shown in FIG. 4 , step 203 includes:

步骤401、基于人脸检测模型，确定主播人脸图像数据对应主播人脸检测图像，以及意向人脸数据对应的意向人脸检测图像。Step 401: Based on the face detection model, determine the host face detection image corresponding to the host face image data, and the intended face detection image corresponding to the intended face data.

在得到主播人脸图像数据后，可以利用人脸检测模型对主播人脸图像数据进行处理，以输出主播人脸检测图像。类似的，利用人脸检测模型对意向人脸数据进行处理，以输出意向人脸检测图像。需要说明的是，本公开使用的人脸检测模型包含但不限于多任务级联卷积网络(Multi-task Cascaded Convolutional Networks，MTCNN)、采样和计算重分配人脸检测网络(Sample and Computation Redistribution for Efficient FaceDetection，SCRFD)或者其他神经网络，此处不做限定。下述实施例以人脸检测模型为MTCNN为例进行示例性说明。After obtaining the host face image data, the host face image data can be processed using a face detection model to output a host face detection image. Similarly, the intended face data is processed using a face detection model to output an intended face detection image. It should be noted that the face detection model used in the present disclosure includes but is not limited to multi-task cascaded convolutional networks (MTCNN), sampling and computational redistribution face detection networks (SCRFD) or other neural networks, which are not limited here. The following embodiments are exemplified by taking the face detection model as MTCNN.

在一些实施例中，如图5所示，步骤401包括：In some embodiments, as shown in FIG. 5 , step 401 includes:

步骤501、利用人脸检测模型，分别对主播人脸图像数据和意向人脸数据进行关键点提取，得到主播人脸图像数据对应的主播人脸检测特征，以及意向人脸数据对应的意向人脸检测特征。Step 501: Use the face detection model to extract key points of the host face image data and the intended face data, respectively, to obtain the host face detection features corresponding to the host face image data, and the intended face detection features corresponding to the intended face data.

其中，主播人脸检测特征包括主播人脸关键点，意向人脸检测特征包括意向人脸关键点。Among them, the host face detection features include the host face key points, and the intended face detection features include the intended face key points.

在一些实施例中，人脸检测模型可以对主播人脸图像数据进行关键点提取，输出主播人脸图像数据对应的主播人脸检测特征。其中，主播人脸检测特征包括人脸框的坐标、主播人脸关键点的坐标以及人脸分类。人脸分类包括是人脸或不是人脸。示例性的，主播人脸关键点至少包括人脸的5个关键点。5个关键点分别对应人脸的左眼位置、人脸的右眼的位置、人脸的鼻子位置、人脸的左嘴角位置以及人脸的右嘴角位置。如图6所示，人脸检测模型对主播人脸图像数据进行关键点提取后，得到主播人脸图像数据对应的主播人脸检测特征，该主播人脸检测特征包括人脸框601和主播人脸关键点602。In some embodiments, the face detection model can extract key points from the host face image data and output the host face detection features corresponding to the host face image data. Among them, the host face detection features include the coordinates of the face frame, the coordinates of the host face key points and the face classification. The face classification includes whether it is a face or not a face. Exemplarily, the host face key points include at least 5 key points of the face. The 5 key points correspond to the left eye position of the face, the right eye position of the face, the nose position of the face, the left corner of the mouth position of the face and the right corner of the mouth position of the face. As shown in Figure 6, after the face detection model extracts key points from the host face image data, the host face detection features corresponding to the host face image data are obtained, and the host face detection features include a face frame 601 and host face key points 602.

人脸检测模型对意向人脸数据进行关键点提取后，得到意向人脸数据对应的意向人脸检测特征。其中，意向人脸检测特征包括意向人脸检测图像中目标人脸框的坐标、意向人脸检测图像中意向人脸关键点的坐标以及意向人脸检测图像对应的人脸分类。示例性的，意向人脸关键点至少包括目标人脸的5个关键点。5个关键点分别对应目标人脸的左眼位置、目标人脸的右眼位置、目标人脸的鼻子位置、目标人脸的左嘴角位置以及目标人脸的右嘴角位置。After the face detection model extracts the key points of the intended face data, the intended face detection features corresponding to the intended face data are obtained. Among them, the intended face detection features include the coordinates of the target face frame in the intended face detection image, the coordinates of the intended face key points in the intended face detection image, and the face classification corresponding to the intended face detection image. Exemplarily, the intended face key points include at least 5 key points of the target face. The 5 key points correspond to the left eye position of the target face, the right eye position of the target face, the nose position of the target face, the left corner of the mouth position of the target face, and the right corner of the mouth position of the target face.

步骤502、分别将主播人脸关键点和意向人脸关键点映射到预设模板，得到主播人脸映射数据和意向人脸映射数据。Step 502: Map the anchor face key points and the intended face key points to the preset templates respectively to obtain anchor face mapping data and intended face mapping data.

其中，该预设模板为预设的标准人脸模板。Among them, the preset template is a preset standard face template.

在得到主播人脸检测特征后，可以将主播人脸检测特征中的主播人脸关键点映射到预设模板。由于主播人脸图像数据和预设模板的尺寸有可能相同有可能不同，主播人脸图像数据中的人脸区域和预设模板中的人脸区域有可能相同有可能不同，主播人脸图像数据中的人脸角度和预设模板中的人脸角度也有可能不同，所以需要通过映射，将主播人脸关键点映射到预设模板，从而得到主播人脸映射数据。示例性的，可以通过仿真变换技术将主播人脸关键点映射到预设模板，还可以利用其他可实现的技术将主播人脸关键点映射到预设模板，本公开对此不作限制。After obtaining the host face detection feature, the host face key points in the host face detection feature can be mapped to a preset template. Since the sizes of the host face image data and the preset template may be the same or different, the face area in the host face image data and the face area in the preset template may be the same or different, and the face angle in the host face image data and the face angle in the preset template may also be different, it is necessary to map the host face key points to the preset template through mapping, so as to obtain the host face mapping data. Exemplarily, the host face key points can be mapped to the preset template by simulation transformation technology, and other feasible technologies can also be used to map the host face key points to the preset template, and the present disclosure does not limit this.

例如，主播人脸关键点包括左眼中心关键点、右眼中心关键点以及嘴唇中心关键点。左眼中心关键点的坐标为(x0,y0)、右眼中心关键点的坐标为(x1,y1)以及嘴唇中心关键点的坐标为(x2,y2)。预设模板的尺寸为600*600，预设模板中左眼中心的坐标为(200，200)，右眼中心的坐标为(400，200)，嘴唇中心的坐标为(300，400)，则可以通过仿真变换将主播人脸图像数据中的左眼(x0,y0)映射到预设模板中左眼(200，200)上，将主播人脸图像数据中的右眼(x1,y1)映射到预设模板中右眼(400，200)上，将主播人脸图像数据中的嘴唇(x2,y2)映射到预设模板中嘴唇(300，400)上。通常主播人脸关键点包括多个关键点，可以按照该方式将剩余的其他关键点一并进行映射，从而得到主播人脸映射数据。For example, the anchor face key points include the left eye center key point, the right eye center key point and the lip center key point. The coordinates of the left eye center key point are (x0, y0), the coordinates of the right eye center key point are (x1, y1) and the lip center key point are (x2, y2). The size of the preset template is 600*600, the coordinates of the left eye center in the preset template are (200, 200), the coordinates of the right eye center are (400, 200), and the coordinates of the lip center are (300, 400). Then, the left eye (x0, y0) in the anchor face image data can be mapped to the left eye (200, 200) in the preset template, the right eye (x1, y1) in the anchor face image data can be mapped to the right eye (400, 200) in the preset template, and the lips (x2, y2) in the anchor face image data can be mapped to the lips (300, 400) in the preset template through simulation transformation. Usually, the anchor face key points include multiple key points, and the remaining key points can be mapped together in this way to obtain the anchor face mapping data.

类似的，意向人脸数据和预设模板的尺寸也有可能相同有可能不同，意向人脸数据中的人脸区域和预设模板中的人脸区域也有可能相同有可能不同，意向人脸数据中的人脸角度和预设模板中的人脸角度也有可能不同。所以还可以采用上述映射方式将意向人脸数据中的意向人脸关键点也映射到预设模板，以得到意向人脸映射数据。Similarly, the sizes of the intended face data and the preset template may be the same or different, the face area in the intended face data and the face area in the preset template may be the same or different, and the face angle in the intended face data and the face angle in the preset template may also be different. Therefore, the above mapping method can also be used to map the intended face key points in the intended face data to the preset template to obtain the intended face mapping data.

又例如，意向人脸关键点包括目标左眼中心关键点、目标右眼中心关键点以及目标嘴唇中心关键点。目标左眼中心关键点的坐标为(m0,n0)、目标右眼中心关键点的坐标为(m1,n1)以及目标嘴唇中心关键点的坐标为(m2,n2)。预设模板的尺寸为600*600，预设模板中左眼中心的坐标为(200，200)，右眼中心的坐标为(400，200)，嘴唇中心的坐标为(300，400)。则可以通过仿真变换将意向人脸数据中的左眼(m0,n0)映射到预设模板中左眼(200，200)上，将主播人脸图像数据中的右眼(m1,n1)映射到预设模板中右眼(400，200)上，将主播人脸图像数据中的嘴唇(m2,n2)映射到预设模板中嘴唇(300，400)上，通常意向人脸关键点包括多个关键点，可以按照该方式将剩余的其它关键点一并进行映射，从而得到主播人脸映射数据。For another example, the intended facial key points include the target left eye center key point, the target right eye center key point and the target lip center key point. The coordinates of the target left eye center key point are (m0, n0), the coordinates of the target right eye center key point are (m1, n1) and the coordinates of the target lip center key point are (m2, n2). The size of the preset template is 600*600, and the coordinates of the left eye center in the preset template are (200, 200), the coordinates of the right eye center are (400, 200), and the coordinates of the lip center are (300, 400). Then, through simulation transformation, the left eye (m0, n0) in the intended face data can be mapped to the left eye (200, 200) in the preset template, the right eye (m1, n1) in the host face image data can be mapped to the right eye (400, 200) in the preset template, and the lips (m2, n2) in the host face image data can be mapped to the lips (300, 400) in the preset template. Usually, the intended face key points include multiple key points, and the remaining other key points can be mapped together in this way to obtain the host face mapping data.

步骤503、分别对主播人脸映射数据和意向人脸映射数据进行空间归一化处理，得到主播人脸检测图像和意向人脸检测图像。Step 503: perform spatial normalization processing on the host face mapping data and the intended face mapping data respectively to obtain the host face detection image and the intended face detection image.

在得到主播人脸映射数据后，对主播人脸映射数据进行空间归一化处理，以提取到主播人脸图像数据中的人脸五官的形状纹理特征，从而得到主播人脸检测图像。该过程能够实现主播人脸图像数据中的人脸与预设模板中的人脸对齐。类似的，在得到意向人脸映射数据后，也可以对意向人脸映射数据进行空间归一化处理，以提取到意向人脸数据中的目标人脸五官的形状纹理特征，从而得到意向人脸检测图像。该过程也实现了意向人脸数据中的人脸与预设模板中的人脸对齐。After obtaining the host face mapping data, the host face mapping data is spatially normalized to extract the shape and texture features of the facial features in the host face image data, thereby obtaining the host face detection image. This process can achieve the alignment of the face in the host face image data with the face in the preset template. Similarly, after obtaining the intended face mapping data, the intended face mapping data can also be spatially normalized to extract the shape and texture features of the target facial features in the intended face data, thereby obtaining the intended face detection image. This process also achieves the alignment of the face in the intended face data with the face in the preset template.

步骤402、基于人脸识别模型，确定主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。Step 402: Based on the face recognition model, determine the host face feature result corresponding to the host face detection image and the intended face feature result corresponding to the intended face detection image.

在得到主播人脸检测图像和意向人脸检测图像后，可以将主播人脸检测图像和意向人脸检测图像分别输入人脸识别模型，人脸识别模型输出主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。After obtaining the host face detection image and the intended face detection image, the host face detection image and the intended face detection image can be input into the face recognition model respectively, and the face recognition model outputs the host face feature results corresponding to the host face detection image, and the intended face feature results corresponding to the intended face detection image.

需要说明的是，本公开使用的人脸识别模型包含但不限于ArcFace或者其他神经网络，此处不做限定。下述实施例以人脸识别模型为ArcFace为例进行示例性说明。It should be noted that the face recognition model used in the present disclosure includes but is not limited to ArcFace or other neural networks, which are not limited here. The following embodiments are illustrative in that the face recognition model is ArcFace.

在一些实施例中，如图7所示，步骤402包括：In some embodiments, as shown in FIG. 7 , step 402 includes:

步骤701、利用人脸识别模型分别对主播人脸检测图像和意向人脸检测图像进行脸部特征提取，得到主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。Step 701: Use a face recognition model to extract facial features from the host face detection image and the intended face detection image, respectively, to obtain the host face feature results corresponding to the host face detection image, and the intended face feature results corresponding to the intended face detection image.

其中，主播人脸特征结果包括主播人脸特征向量，意向人脸特征结果包括意向人脸特征向量。主播人脸特征向量的长度和意向人脸特征向量的长度相同。The host face feature result includes a host face feature vector, and the intended face feature result includes an intended face feature vector. The length of the host face feature vector is the same as the length of the intended face feature vector.

示例性的，在得到主播人脸检测图像后，可以将主播人脸检测图像输入人脸识别模型，人脸识别模型输出主播人脸检测图像对应的主播人脸特征结果，该主播人脸特征结果可以是长度为512的向量。在得到意向人脸检测图像后，可以将意向人脸检测图像输入人脸识别模型，人脸识别模型输出意向人脸检测图像对应的意向人脸特征结果，该意向人脸特征结果是长度为512的向量。Exemplarily, after obtaining the host face detection image, the host face detection image can be input into the face recognition model, and the face recognition model outputs the host face feature result corresponding to the host face detection image, and the host face feature result can be a vector with a length of 512. After obtaining the intended face detection image, the intended face detection image can be input into the face recognition model, and the face recognition model outputs the intended face feature result corresponding to the intended face detection image, and the intended face feature result is a vector with a length of 512.

在一些实施例中，如图8所示，步骤204包括：In some embodiments, as shown in FIG8 , step 204 includes:

步骤801、确定主播人脸图像数据和意向人脸数据的融合权重。Step 801: Determine the fusion weight of the host face image data and the intended face data.

其中，融合权重用于控制融合主播人脸图像的融合方向。Among them, the fusion weight is used to control the fusion direction of the fused anchor face image.

示例性的，融合权重w的取值范围为[0,1]。若融合权重w越趋近于0，则融合方向更趋近于主播人脸图像数据中的人脸，若融合权重w越趋近于1，则融合方向更趋近于意向人脸数据中的人脸。Exemplarily, the value range of the fusion weight w is [0, 1]. If the fusion weight w is closer to 0, the fusion direction is closer to the face in the anchor face image data, and if the fusion weight w is closer to 1, the fusion direction is closer to the face in the intended face data.

在一些实施例中，主播人脸图像数据和意向人脸数据的融合权重可以是预设的固定值。当用户上传了主播人脸图像数据，并选择了意向人脸数据后，可以利用预设的固定值进行融合并输出融合主播人脸图像。为了提高用户满意度，如图9所示，用户可以在融合界面900中设置融合参数。融合参数中包括融合方向，融合方向用于提示用户可以通过滑动融合权重滑动条控件901，来使融合后的融合主播人脸图像的融合方向更趋向于人脸方向(即主播人脸图像数据中的人脸)，或者更趋向于目标方向(即意向人脸数据中的人脸)。其中，滑动融合权重滑动条控件901相当于调整主播人脸图像数据和意向人脸数据的融合权重。In some embodiments, the fusion weight of the anchor face image data and the intended face data can be a preset fixed value. After the user uploads the anchor face image data and selects the intended face data, the preset fixed value can be used for fusion and output of the fused anchor face image. In order to improve user satisfaction, as shown in Figure 9, the user can set the fusion parameters in the fusion interface 900. The fusion parameters include the fusion direction, which is used to prompt the user to slide the fusion weight slider control 901 to make the fusion direction of the fused anchor face image after fusion more toward the face direction (i.e., the face in the anchor face image data), or more toward the target direction (i.e., the face in the intended face data). Among them, sliding the fusion weight slider control 901 is equivalent to adjusting the fusion weight of the anchor face image data and the intended face data.

步骤802、基于融合权重，对主播人脸特征结果和意向人脸特征结果进行调整，得到融合主播人脸图像。Step 802: Based on the fusion weight, adjust the host face feature result and the intended face feature result to obtain a fused host face image.

示例性的，在确定出融合权重后，可以基于主播人脸特征结果中的主播人脸特征向量A、意向人脸特征结果中的意向人脸特征向量B以及融合权重w确定出融合特征向量C。融合特征向量C与主播人脸特征结果中的主播人脸特征向量A、意向人脸特征结果中的意向人脸特征向量B、融合权重w的对应关系满足如下表达式：Exemplarily, after the fusion weight is determined, the fusion feature vector C can be determined based on the host face feature vector A in the host face feature result, the intended face feature vector B in the intended face feature result, and the fusion weight w. The corresponding relationship between the fusion feature vector C, the host face feature vector A in the host face feature result, the intended face feature vector B in the intended face feature result, and the fusion weight w satisfies the following expression:

融合特征向量C＝(1-融合权重w)*主播人脸特征向量A+融合权重w*意向人脸特征向量BFusion feature vector C = (1-fusion weight w) * host face feature vector A + fusion weight w * intended face feature vector B

结合上述表达式可知，当融合权重w＝0时，融合特征向量C与主播人脸特征向量A相同；当融合权重w＝1时，融合特征向量C与意向人脸特征向量B相同；当融合权重w取0～1之间的其它值时，融合特征向量C为主播人脸特征向量A和意向人脸特征向量B加权融合的结果。若融合权重w越接近于0，融合特征向量C与主播人脸特征向量A越接近；融合权重w越接近1，融合特征向量C与意向人脸特征向量B越接近。在得到融合特征向量C后，可以基于融合特征向量C得到最终的融合主播人脸图像。Combined with the above expression, it can be seen that when the fusion weight w=0, the fusion feature vector C is the same as the host face feature vector A; when the fusion weight w=1, the fusion feature vector C is the same as the intended face feature vector B; when the fusion weight w takes other values between 0 and 1, the fusion feature vector C is the result of the weighted fusion of the host face feature vector A and the intended face feature vector B. If the fusion weight w is closer to 0, the fusion feature vector C is closer to the host face feature vector A; the closer the fusion weight w is to 1, the closer the fusion feature vector C is to the intended face feature vector B. After obtaining the fusion feature vector C, the final fusion host face image can be obtained based on the fusion feature vector C.

上述主要从方法的角度对本申请实施例提供的方案进行了介绍。为了实现上述功能，其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。The above mainly introduces the solution provided by the embodiment of the present application from the perspective of the method. In order to realize the above functions, it includes hardware structures and/or software modules corresponding to the execution of each function. Those skilled in the art should easily realize that, in combination with the units and algorithm steps of each example described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the present application.

本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分，例如，可以对应各个功能划分各个功能模块，也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。需要说明的是，本申请实施例中对模块的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。The embodiment of the present application can divide the functional modules of the electronic device according to the above method example. For example, each functional module can be divided according to each function, or two or more functions can be integrated into one processing module. The above integrated module can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. There may be other division methods in actual implementation.

对应于前述实施例中的方法，本申请实施例还提供一种内容表达设备。该内容表达设备用于实现前述一种网络直播场景下的人脸融合方法。该内容表达设备的功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。Corresponding to the method in the above embodiment, the embodiment of the present application further provides a content expression device. The content expression device is used to implement the face fusion method in the above-mentioned network live broadcast scenario. The function of the content expression device can be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

例如，图10示出了一种网络直播场景下的内容表达设备的结构示意图，如图10所示，该智能决策设备可以包括：获取模块1001、确定模块1002、处理模块1003、融合模块1004和发送模块1005。其中，获取模块1001，用于获取目标直播间的主播人脸图像数据；确定模块1002，用于响应于用户操作，确定意向人脸数据；处理模块1003，用于对主播人脸图像数据和意向人脸数据进行处理，得到主播人脸特征结果和意向人脸特征结果；融合模块1004，用于对主播人脸特征结果和意向人脸特征结果进行融合，生成融合主播人脸图像，发送模块1005，用于向直播视频生成设备发送融合主播人脸图像，以使直播视频生成设备基于融合主播人脸图像，生成直播视频流。For example, FIG10 shows a schematic diagram of the structure of a content expression device in a network live broadcast scenario. As shown in FIG10, the intelligent decision-making device may include: an acquisition module 1001, a determination module 1002, a processing module 1003, a fusion module 1004, and a sending module 1005. Among them, the acquisition module 1001 is used to obtain the anchor face image data of the target live broadcast room; the determination module 1002 is used to determine the intended face data in response to the user operation; the processing module 1003 is used to process the anchor face image data and the intended face data to obtain the anchor face feature results and the intended face feature results; the fusion module 1004 is used to fuse the anchor face feature results and the intended face feature results to generate a fused anchor face image, and the sending module 1005 is used to send the fused anchor face image to the live video generation device, so that the live video generation device generates a live video stream based on the fused anchor face image.

在一些可实施的示例中，处理模块1003包括检测单元和识别单元，检测单元，用于基于人脸检测模型，确定主播人脸图像数据对应主播人脸检测图像，以及意向人脸数据对应的意向人脸检测图像；识别单元，用于基于人脸识别模型，确定主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果。In some feasible examples, the processing module 1003 includes a detection unit and an identification unit. The detection unit is used to determine, based on a face detection model, the host face detection image corresponding to the host face image data and the intended face detection image corresponding to the intended face data; the identification unit is used to determine, based on a face recognition model, the host face feature result corresponding to the host face detection image and the intended face feature result corresponding to the intended face detection image.

在一些可实施的示例中，检测单元，还用于利用人脸检测模型，分别对主播人脸图像数据和意向人脸数据进行关键点提取，得到主播人脸图像数据对应的主播人脸检测特征，以及意向人脸数据对应的意向人脸检测特征，主播人脸检测特征包括主播人脸关键点，意向人脸检测特征包括意向人脸关键点；分别将主播人脸关键点和意向人脸关键点映射到预设模板，得到主播人脸映射数据和意向人脸映射数据；分别对主播人脸映射数据和意向人脸映射数据进行空间归一化处理，得到主播人脸检测图像和意向人脸检测图像。In some feasible examples, the detection unit is also used to use the face detection model to extract key points of the host face image data and the intended face data, respectively, to obtain the host face detection features corresponding to the host face image data, and the intended face detection features corresponding to the intended face data, the host face detection features include the host face key points, and the intended face detection features include the intended face key points; respectively map the host face key points and the intended face key points to preset templates to obtain host face mapping data and intended face mapping data; respectively perform spatial normalization processing on the host face mapping data and the intended face mapping data to obtain the host face detection image and the intended face detection image.

在一些可实施的示例中，识别单元，还用于利用人脸识别模型分别对主播人脸检测图像和意向人脸检测图像进行脸部特征提取，得到主播人脸检测图像对应的主播人脸特征结果，以及意向人脸检测图像对应的意向人脸特征结果，主播人脸特征结果包括主播人脸特征向量，意向人脸特征结果包括意向人脸特征向量。In some feasible examples, the recognition unit is also used to use a face recognition model to extract facial features from the host face detection image and the intended face detection image respectively, to obtain the host face feature results corresponding to the host face detection image and the intended face feature results corresponding to the intended face detection image, the host face feature results including the host face feature vector, and the intended face feature results including the intended face feature vector.

在一些可实施的示例中，融合模块包括权重确定单元和调整单元，权重确定单元，用于确定主播人脸图像数据和意向人脸数据的融合权重；调整单元，用于基于融合权重，对主播人脸特征结果和意向人脸特征结果进行调整，得到融合主播人脸图像。In some feasible examples, the fusion module includes a weight determination unit and an adjustment unit. The weight determination unit is used to determine the fusion weight of the host facial image data and the intended facial data; the adjustment unit is used to adjust the host facial feature results and the intended facial feature results based on the fusion weight to obtain a fused host facial image.

当然，本公开实施例提供的内容表达设备包括但不限于上述模块，例如内容表达设备还可以包括存储模块。存储模块可以用于存储该写内容表达设备的程序代码，还可以用于存储写内容表达设备在运行过程中生成的数据，如写请求中的数据等。Of course, the content expression device provided by the embodiment of the present disclosure includes but is not limited to the above modules, for example, the content expression device may also include a storage module. The storage module may be used to store the program code of the content expression device, and may also be used to store data generated by the content expression device during operation, such as data in a write request, etc.

本公开还提供了一种包括指令的计算机可读存储介质，计算机可读存储介质上存储有指令，当计算机可读存储介质中的指令由计算机设备的处理器执行时，使得计算机能够执行上述所示实施例提供的资源推荐确定方法。例如，计算机可读存储介质可以为包括指令的存储器，上述指令可由内容表达设备的处理器执行以完成上述方法。可选地，计算机可读存储介质可以是非临时性计算机可读存储介质，例如，非临时性计算机可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。The present disclosure also provides a computer-readable storage medium including instructions, and the computer-readable storage medium stores instructions. When the instructions in the computer-readable storage medium are executed by a processor of a computer device, the computer can execute the resource recommendation determination method provided by the above-mentioned embodiment. For example, the computer-readable storage medium can be a memory including instructions, and the above instructions can be executed by a processor of a content expression device to complete the above method. Optionally, the computer-readable storage medium can be a non-temporary computer-readable storage medium, for example, a non-temporary computer-readable storage medium can be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

图11示意性地示出本公开实施例提供的计算机程序产品的概念性局部视图，计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。FIG. 11 schematically shows a conceptual partial view of a computer program product provided by an embodiment of the present disclosure, where the computer program product includes a computer program for executing a computer process on a computing device.

在一个实施例中，计算机程序产品是使用信号承载介质1110来提供的。信号承载介质1110可以包括一个或多个程序指令，其当被一个或多个处理器运行时可以提供以上针对图2描述的功能或者部分功能。因此，例如，参考图3中所示的实施例，步骤31～步骤34的一个或多个特征可以由与信号承载介质1110相关联的一个或多个指令来承担。此外，图11中的程序指令也描述示例指令。In one embodiment, the computer program product is provided using a signal bearing medium 1110. The signal bearing medium 1110 may include one or more program instructions that, when executed by one or more processors, may provide the functionality or portions of the functionality described above with respect to FIG. 2 . Thus, for example, with reference to the embodiment shown in FIG. 3 , one or more features of steps 31 to 34 may be undertaken by one or more instructions associated with the signal bearing medium 1110. In addition, the program instructions in FIG. 11 also describe example instructions.

在一些示例中，信号承载介质1110可以包含计算机可读介质1111，诸如但不限于，硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(read-only memory，ROM)或随机存储记忆体(random access memory，RAM)等等。In some examples, signal bearing medium 1110 may include computer readable medium 1111, such as, but not limited to, a hard drive, a compact disk (CD), a digital video disk (DVD), a digital tape, a memory, a read-only memory (ROM), or a random access memory (RAM), and the like.

在一些实施方式中，信号承载介质1110可以包含计算机可记录介质1112，诸如但不限于，存储器、读/写(R/W)CD、R/W DVD、等等。In some implementations, signal bearing medium 1110 may include computer recordable medium 1112 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like.

在一些实施方式中，信号承载介质1110可以包含通信介质1113，诸如但不限于，数字和/或模拟通信介质(例如，光纤电缆、波导、有线通信链路、无线通信链路、等等)。In some implementations, signal bearing medium 1110 may include communication medium 1113 such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).

信号承载介质1110可以由无线形式的通信介质1113来传达。一个或多个程序指令可以是，例如，计算机可执行指令或者逻辑实施指令。The signal bearing medium 1110 may be communicated by a wireless form of communication medium 1113. The one or more program instructions may be, for example, computer executable instructions or logic implemented instructions.

在一些示例中，诸如针对图3描述的内容表达设备可以被配置为，响应于通过计算机可读介质1111、计算机可记录介质1112、和/或通信介质1113中的一个或多个程序指令，提供各种操作、功能、或者动作。In some examples, a content expression device such as described with respect to FIG. 3 may be configured to provide various operations, functions, or actions in response to one or more program instructions via computer-readable medium 1111 , computer-recordable medium 1112 , and/or communication medium 1113 .

通过以上的实施方式的描述，所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全分类部或者部分功能。Through the description of the above implementation methods, technical personnel in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional modules is used as an example. In actual applications, the above-mentioned functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.

在本公开所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个装置，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是一个物理单元或多个物理单元，即可以位于一个地方，或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全分类部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外，在本公开各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个可读取存储介质中。基于这样的理解，本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全分类部或部分可以以软件产品的形式体现出来，该软件产品存储在一个存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本公开各个实施例方法的全分类部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present disclosure is essentially or the part that contributes to the prior art or the full classification part or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to perform the full classification part or part of the steps of each embodiment method of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, ROM, RAM, disk or optical disk and other media that can store program code.

以上，仅为本公开的具体实施方式，但本公开的保护范围并不局限于此，任何在本公开揭露的技术范围内的变化或替换，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present disclosure should be included in the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.