CN116974507A

Movatterモバイル変換

Info

Publication number: CN116974507A
Application number: CN202310012875.2A
Authority: CN
Inventors: 吴杉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-10-31

Abstract

The application relates to a virtual object interaction method, a virtual object interaction device, a virtual object interaction computer device, a virtual object interaction storage medium and a virtual object interaction computer program product. The method comprises the following steps: displaying the virtual object in the first interaction gesture in the interaction page; responding to the triggering of the audio instruction, and displaying feedback interaction information aiming at the audio instruction; displaying the virtual object in a second interaction gesture in the interaction page; wherein the feedback interaction information is obtained based on text information of the audio instruction; and the second interaction gesture is determined according to the text information and the acoustic information corresponding to the audio instruction. By adopting the method, the interaction effect can be effectively improved.

Description

Translated fromChinese

虚拟对象交互方法、装置、计算机设备、存储介质和程序产品Virtual object interaction method, device, computer equipment, storage medium and program product

技术领域Technical Field

本申请涉及图像处理技术领域，特别是涉及一种虚拟对象交互方法、装置、计算机设备、存储介质和计算机程序产品。The present application relates to the field of image processing technology, and in particular to a virtual object interaction method, apparatus, computer equipment, storage medium and computer program product.

背景技术Background Art

随着语音技术和图像处理技术的不断发展，目标对象可以通过语音与交互产品进行交互，如与智能机器人或应用程序中的虚拟对象进行沟通交流。在交互过程中，传统交互方案中虚拟对象所呈现的形象比较机械化，缺乏真实性，从而影响了交互效果。With the continuous development of voice technology and image processing technology, the target object can interact with interactive products through voice, such as communicating with intelligent robots or virtual objects in applications. In the interactive process, the image presented by virtual objects in traditional interactive solutions is relatively mechanical and lacks authenticity, which affects the interactive effect.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种虚拟对象交互方法、装置、计算机设备、计算机可读存储介质和计算机程序产品，能够有效地提高交互效果。Based on this, it is necessary to provide a virtual object interaction method, device, computer equipment, computer readable storage medium and computer program product to effectively improve the interaction effect in response to the above technical problems.

第一方面，本申请提供了一种虚拟对象交互方法。所述方法包括：In a first aspect, the present application provides a virtual object interaction method. The method comprises:

在交互页面中显示处于第一交互姿态的虚拟对象；Displaying the virtual object in the first interaction posture in the interaction page;

响应音频指令触发，显示针对所述音频指令的反馈交互信息；In response to an audio instruction trigger, feedback interaction information for the audio instruction is displayed;

在所述交互页面中显示处于第二交互姿态的所述虚拟对象；Displaying the virtual object in a second interaction posture in the interaction page;

其中，所述反馈交互信息是基于所述音频指令的文本信息获得的；所述第二交互姿态是根据所述音频指令对应的文本信息和声学信息确定的。The feedback interaction information is obtained based on the text information of the audio instruction; and the second interaction posture is determined according to the text information and acoustic information corresponding to the audio instruction.

第二方面，本申请还提供了一种虚拟对象交互装置。所述装置包括：In a second aspect, the present application also provides a virtual object interaction device. The device comprises:

第一显示模块，用于在交互页面中显示处于第一交互姿态的虚拟对象；A first display module, used for displaying a virtual object in a first interactive posture in an interactive page;

第二显示模块，用于响应音频指令触发，显示针对所述音频指令的反馈交互信息；在所述交互页面中显示处于第二交互姿态的所述虚拟对象；A second display module is configured to respond to the audio instruction trigger and display feedback interaction information for the audio instruction; and display the virtual object in the second interaction posture in the interaction page;

第三方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In a third aspect, the present application further provides a computer device. The computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

第四方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:

第五方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In a fifth aspect, the present application further provides a computer program product. The computer program product includes a computer program, and when the computer program is executed by a processor, the following steps are implemented:

上述虚拟对象交互方法、装置、计算机设备、存储介质和计算机程序产品，在交互页面中显示处于第一交互姿态的虚拟对象，在收到音频指令时，显示针对音频指令的反馈交互信息，并且在交互页面中显示处于第二交互姿态的虚拟对象，从而在进行语音交互时，除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。The above-mentioned virtual object interaction method, device, computer equipment, storage medium and computer program product display the virtual object in a first interaction posture in the interaction page, display feedback interaction information for the audio command when receiving the audio command, and display the virtual object in a second interaction posture in the interaction page, so that when performing voice interaction, in addition to the interaction on text information, there is also interaction of virtual objects through different interaction postures, which enriches the interaction mode; moreover, the second interaction posture is determined according to the text information and acoustic information corresponding to the audio command, so when the acoustic information of the audio command is different, the second interaction posture presented by the virtual object is different, even if the same command content is issued, the virtual object can also present corresponding different interaction postures, which enhances the target object's sense of immersion and empathy with the virtual object, thereby greatly improving the interaction effect.

第六方面，本申请提供了一种虚拟对象交互方法。所述方法包括：In a sixth aspect, the present application provides a virtual object interaction method. The method comprises:

显示直播界面，所述直播界面包括处于第一交互姿态的虚拟对象；Displaying a live broadcast interface, wherein the live broadcast interface includes a virtual object in a first interaction posture;

响应音频指令触发，在所述直播界面显示反馈交互信息和处于第二交互姿态的所述虚拟对象；其中，所述反馈交互信息是根据所述音频指令对应的文本信息获得的，所述第二交互姿态是根据所述音频指令对应的文本信息和声学信息确定的。In response to an audio command trigger, feedback interaction information and the virtual object in a second interaction posture are displayed on the live broadcast interface; wherein the feedback interaction information is obtained based on text information corresponding to the audio command, and the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command.

第七方面，本申请还提供了一种虚拟对象交互装置。所述装置包括：In a seventh aspect, the present application further provides a virtual object interaction device. The device comprises:

第一显示模块，用于显示直播界面，所述直播界面包括处于第一交互姿态的虚拟对象；A first display module, configured to display a live broadcast interface, wherein the live broadcast interface includes a virtual object in a first interactive posture;

第二显示模块，用于响应音频指令触发，在所述直播界面显示反馈交互信息和处于第二交互姿态的所述虚拟对象；其中，所述反馈交互信息是根据所述音频指令对应的文本信息获得的，所述第二交互姿态是根据所述音频指令对应的文本信息和声学信息确定的。The second display module is used to respond to the audio command trigger and display feedback interaction information and the virtual object in a second interaction posture on the live broadcast interface; wherein the feedback interaction information is obtained based on the text information corresponding to the audio command, and the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command.

第八方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In an eighth aspect, the present application further provides a computer device. The computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

第九方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a ninth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:

第十方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In a tenth aspect, the present application further provides a computer program product. The computer program product includes a computer program, and when the computer program is executed by a processor, the following steps are implemented:

上述虚拟对象交互方法、装置、计算机设备、存储介质和计算机程序产品，显示包括处于第一交互姿态的虚拟对象的直播界面，在收到音频指令时，在该直播界面中显示反馈交互信息和处于第二交互姿态的虚拟对象，从而即使在直播过程中，也可以与直播的虚拟对象之间进行交互，在交互过程中除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象观看直播的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。The above-mentioned virtual object interaction method, device, computer equipment, storage medium and computer program product display a live broadcast interface including a virtual object in a first interaction posture. When an audio instruction is received, feedback interaction information and a virtual object in a second interaction posture are displayed in the live broadcast interface, so that even during the live broadcast, interaction can be carried out with the virtual object of the live broadcast. In addition to the interaction on text information, there is also interaction between virtual objects through different interaction postures during the interaction process, which enriches the interaction mode; moreover, the second interaction posture is determined according to the text information and acoustic information corresponding to the audio instruction. Therefore, when the acoustic information of the audio instruction is different, the second interaction posture presented by the virtual object is different. Even if the same instruction content is issued, the virtual object can also present corresponding different interaction postures, which enhances the target object's immersion in watching the live broadcast and empathy with the virtual object, thereby greatly enhancing the interaction effect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一个实施例中虚拟对象交互方法的应用环境图；FIG1 is a diagram of an application environment of a virtual object interaction method according to an embodiment;

图2为一个实施例中虚拟对象交互方法的流程示意图；FIG2 is a schematic diagram of a flow chart of a virtual object interaction method in one embodiment;

图3为一个实施例中呈现第一交互姿态的虚拟对象的页面示意图；FIG3 is a schematic diagram of a page showing a virtual object in a first interaction posture in one embodiment;

图4为一个实施例中呈现第三交互姿态的虚拟对象和显示声波图的页面示意图；FIG4 is a schematic diagram of a virtual object presenting a third interaction posture and a page displaying a sound wave graph in one embodiment;

图5为一个实施例中显示声波图的示意图；FIG5 is a schematic diagram showing a sonic wave diagram in one embodiment;

图6为一个实施例中在交互页面显示反馈交互信息的页面示意图；FIG6 is a schematic diagram of a page displaying feedback interaction information on an interaction page in one embodiment;

图7为一个实施例中呈现第二交互姿态的虚拟对象的页面示意图；FIG7 is a schematic diagram of a page showing a virtual object in a second interaction posture in one embodiment;

图8为一个实施例中请求使用麦克风权限以及检测音频信号的流程示意图；FIG8 is a schematic diagram of a process of requesting permission to use a microphone and detecting an audio signal in one embodiment;

图9为一个实施例中在交互页面显示提示信息的页面示意图；FIG9 is a schematic diagram of a page showing prompt information on an interactive page in one embodiment;

图10为一个实施例中在交互页面显示请求获取麦克风权限的页面示意图；FIG10 is a schematic diagram of a page displaying a request for microphone permission on an interactive page in one embodiment;

图11为一个实施例中呈现第四交互姿态的虚拟对象的页面示意图；FIG11 is a schematic diagram of a page showing a virtual object in a fourth interaction posture in one embodiment;

图12为一个实施例中根据不同的声学信息显示不同交互姿态的虚拟对象的流程示意图；FIG12 is a schematic diagram of a process of displaying virtual objects in different interaction postures according to different acoustic information in one embodiment;

图13为一个实施例中呈现第五交互姿态的虚拟对象的页面示意图；FIG13 is a schematic diagram of a page showing a virtual object in a fifth interaction posture in one embodiment;

图14为一个实施例中虚拟对象演唱目标音乐、目标对象演唱目标音乐与其进行挑战以及根据挑战结果进行交互的流程示意图；FIG14 is a schematic diagram of a process in which a virtual object sings a target music, a target object sings the target music to challenge it, and an interaction is performed according to the challenge result in one embodiment;

图15为一个实施例中虚拟对象拒绝演唱的页面示意图；FIG15 is a schematic diagram of a page in which a virtual object refuses to sing in one embodiment;

图16为一个实施例中呈现第六交互姿态的虚拟对象的页面示意图；FIG16 is a schematic diagram of a page showing a virtual object with a sixth interaction gesture in one embodiment;

图17为一个实施例中呈现第七交互姿态的虚拟对象的页面示意图；FIG17 is a schematic diagram of a page showing a virtual object of a seventh interaction posture in one embodiment;

图18为一个实施例中目标音乐的旋律示意图；FIG18 is a schematic diagram of the melody of target music in one embodiment;

图19为一个实施例中目标音乐的音高表格示意图；FIG19 is a schematic diagram of a pitch table of target music in one embodiment;

图20为一个实施例中显示虚拟对象和演唱提示面板的页面示意图；FIG20 is a schematic diagram of a page showing a virtual object and a singing prompt panel in one embodiment;

图21为另一个实施例中虚拟对象交互方法的流程示意图；FIG21 is a schematic diagram of a flow chart of a virtual object interaction method in another embodiment;

图22为一个实施例中通过虚拟对象进行游戏直播的页面示意图；FIG22 is a schematic diagram of a page for live broadcasting of a game through a virtual object in one embodiment;

图23为一个实施例中演唱提示面板的示意图；FIG23 is a schematic diagram of a singing prompt panel in one embodiment;

图24为一个实施例中虚拟对象交互装置的结构框图；FIG24 is a block diagram of a virtual object interaction device in one embodiment;

图25为一个实施例中计算机设备的内部结构图。FIG. 25 is a diagram showing the internal structure of a computer device in one embodiment.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

需要说明的是，在以下的描述中，所涉及的术语“第一、第二、…、第八”仅仅是区别类似的对象，不代表针对对象的特定排序，可以理解地，“第一、第二、…、第八”在允许的情况下可以互换特定的顺序或先后次序，以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。It should be noted that in the following description, the terms "first, second, ..., eighth" are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that "first, second, ..., eighth" can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described here can be implemented in an order other than that illustrated or described here.

本申请实施例提供的虚拟对象交互方法，可以应用于如图1所示的应用环境中。其中，终端102通过网络与服务器104进行通信。数据存储系统可以存储服务器104需要处理的数据。数据存储系统可以集成在服务器104上，也可以放在云上或其他网络服务器上。在进行交互时，终端102可以在本地或服务器104对应的数据存储系统获取相应交互姿态的虚拟对象，然后显示该相应交互姿态的虚拟对象。例如，用户在进行交互时，终端102在交互页面中显示处于欢迎姿态的虚拟对象，如图1中名为小瞳的虚拟人物；用户可以发出音频指令，如音频指令对应的文本信息为“小瞳唱首歌呗”，此时终端102播放音乐，并控制该虚拟对象可以做出唱歌的姿态和口型(即说话或发音时的口部形状)，从而使用户从视觉上感知该虚拟对象在唱歌。此外，用户也可以发出其它语音，如：“天王盖地虎”，此时可以以弹窗方式显示“你发现了秘密”的反馈交互信息，此外还可以在交互页面中显示得意小表情的虚拟对象。The virtual object interaction method provided in the embodiment of the present application can be applied in the application environment shown in Figure 1. Among them, the terminal 102 communicates with the server 104 through the network. The data storage system can store the data that the server 104 needs to process. The data storage system can be integrated on the server 104, or it can be placed on the cloud or other network servers. When interacting, the terminal 102 can obtain the virtual object of the corresponding interactive posture locally or in the data storage system corresponding to the server 104, and then display the virtual object of the corresponding interactive posture. For example, when the user interacts, the terminal 102 displays a virtual object in a welcome posture in the interactive page, such as the virtual character named Xiaotong in Figure 1; the user can issue an audio instruction, such as the text information corresponding to the audio instruction is "Xiaotong sings a song", at this time, the terminal 102 plays music, and controls the virtual object to make a singing posture and mouth shape (i.e., the shape of the mouth when speaking or pronouncing), so that the user can visually perceive that the virtual object is singing. In addition, users can also make other voice calls, such as: "The king covers the earth and the tiger", at which time the feedback interaction information "You have discovered the secret" can be displayed in a pop-up window. In addition, a virtual object with a proud expression can be displayed in the interaction page.

其中，终端102可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、物联网设备和便携式可穿戴设备，物联网设备可为智能音箱、智能电视、智能空调和智能车载设备等。便携式可穿戴设备可为智能手表、智能手环、头戴设备等。The terminal 102 may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, an IoT device, and a portable wearable device. The IoT device may be a smart speaker, a smart TV, a smart air conditioner, and a smart car device, etc. The portable wearable device may be a smart watch, a smart bracelet, a head-mounted device, etc.

服务器104可以是独立的物理服务器，也可以是区块链系统中的服务节点，该区块链系统中的各服务节点之间形成点对点(P2P，Peer To Peer)网络，P2P协议是一个运行在传输控制协议(TCP，Transmission Control Protocol)协议之上的应用层协议。The server 104 may be an independent physical server or a service node in a blockchain system. A peer-to-peer (P2P) network is formed between the service nodes in the blockchain system. The P2P protocol is an application layer protocol running on top of the Transmission Control Protocol (TCP).

此外，服务器104还可以是多个物理服务器构成的服务器集群，可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。In addition, server 104 can also be a server cluster composed of multiple physical servers, and can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), as well as big data and artificial intelligence platforms.

终端102与服务器104之间可以通过蓝牙、USB(Universal Serial Bus，通用串行总线)或者网络等通讯连接方式进行连接，本申请在此不做限制。The terminal 102 and the server 104 may be connected via Bluetooth, USB (Universal Serial Bus) or a network or other communication connection methods, which are not limited in this application.

在一个实施例中，如图2所示，提供了一种虚拟对象交互方法，以该方法应用于图1中的终端102为例进行说明，包括以下步骤：In one embodiment, as shown in FIG. 2 , a virtual object interaction method is provided, which is described by taking the method applied to the terminal 102 in FIG. 1 as an example, and includes the following steps:

S202，在交互页面中显示处于第一交互姿态的虚拟对象。S202: Displaying a virtual object in a first interaction posture in an interaction page.

其中，交互页面可以指交互产品的用于交互的前端页面，可用于显示图像、模型以及播放视频；在显示图像时，该交互页面可称为图像交互页面；在显示模型时，该交互页面可称为模型交互页面；在播放视频时，该交互页面可称为视频交互页面，特别地，当视频为直播视频时，该交互页面也可称为直播页面(或直播界面)。该图像可以是二维静态图像或二维动态图像，该模型可以是三维模型，如三维的仿生模型。该交互产品可以是虚拟对象交互方法所应用的硬件或软件产品(如应用程序、web平台或网页)，又或者是软件产品中的一个小模块或小程序。例如，当交互产品为硬件时，该交互产品具体可以是智能机器人，各交互姿态的虚拟对象可以在该智能机器人的显示屏上进行显示，该显示屏中的用于显示虚拟对象的区域或页面可视为交互页面；当交互产品为软件产品时，该交互产品具体可以是游戏应用、视频应用或其它交互类应用，各交互姿态的虚拟对象可以在该游戏应用、视频应用或其它交互类应用的交互页面进行显示；当交互产品为软件产品中的一个小模块或小程序时，该交互产品具体可以是游戏应用中的一种互动玩法或小程序，各交互姿态的虚拟对象可以在该互动玩法或小程序的交互页面进行显示，此时的交互页面可以是HTML(HyperText Markup Language，超文本标记语言)5页面。Among them, the interactive page may refer to the front-end page of the interactive product for interaction, which can be used to display images, models and play videos; when displaying images, the interactive page may be called an image interactive page; when displaying models, the interactive page may be called a model interactive page; when playing videos, the interactive page may be called a video interactive page. In particular, when the video is a live video, the interactive page may also be called a live page (or live interface). The image may be a two-dimensional static image or a two-dimensional dynamic image, and the model may be a three-dimensional model, such as a three-dimensional bionic model. The interactive product may be a hardware or software product (such as an application, a web platform or a webpage) to which the virtual object interaction method is applied, or it may be a small module or applet in a software product. For example, when the interactive product is hardware, the interactive product may specifically be an intelligent robot, and virtual objects of various interactive postures may be displayed on the display screen of the intelligent robot, and the area or page in the display screen for displaying virtual objects may be regarded as an interactive page; when the interactive product is a software product, the interactive product may specifically be a game application, a video application or other interactive application, and virtual objects of various interactive postures may be displayed on the interactive page of the game application, video application or other interactive application; when the interactive product is a small module or applet in a software product, the interactive product may specifically be an interactive gameplay or applet in a game application, and virtual objects of various interactive postures may be displayed on the interactive page of the interactive gameplay or applet, and the interactive page at this time may be an HTML (HyperText Markup Language) 5 page.

姿态可以是虚拟对象所呈现的形象，如容貌、神态(或神情)、表情、风格、手势以及姿势中的至少一种；该表情可以指面部表情、言语表情和身段表情。交互姿态可以是在进行交互过程中虚拟对象所呈现的形象(如期待的神情)，处于不同的交互情况下，该交互姿态可以是发生变化的，可以根据目标对象(如用户)所发出语音的音高、音量和内容呈现出相对应的交互姿态。第一交互姿态可以是虚拟对象所呈现出的表示欢迎目标对象或期待与目标对象交互的姿态。A gesture may be an image presented by a virtual object, such as at least one of appearance, demeanor (or expression), expression, style, gesture, and posture; the expression may refer to facial expression, verbal expression, and body expression. An interactive gesture may be an image presented by a virtual object during the interaction process (such as an expected expression). In different interactive situations, the interactive gesture may change, and a corresponding interactive gesture may be presented according to the pitch, volume, and content of the voice emitted by the target object (such as the user). The first interactive gesture may be a gesture presented by the virtual object to welcome the target object or to expect to interact with the target object.

虚拟对象可以是虚拟的人物对象，或者是拟人化的动物对象、生物对象以及其它的物体对象。该虚拟对象在交互页面中显示时，可以通过二维静态图像、二维动态图像、三维模型或视频的方式来呈现。当虚拟对象的呈现方式为二维静态图像或三维模型时，交互姿态为一种静态的姿态，如摆出OK的静态手势或微笑的静态表情；当虚拟对象的呈现方式为二维动态图像、三维模型或视频时，交互姿态为一种动态的姿态，如点头或摇头的动态姿态。The virtual object may be a virtual character object, or an anthropomorphic animal object, biological object, or other object object. When the virtual object is displayed in the interactive page, it may be presented in the form of a two-dimensional static image, a two-dimensional dynamic image, a three-dimensional model, or a video. When the virtual object is presented in the form of a two-dimensional static image or a three-dimensional model, the interactive gesture is a static gesture, such as a static gesture of "OK" or a static expression of a smile; when the virtual object is presented in the form of a two-dimensional dynamic image, a three-dimensional model, or a video, the interactive gesture is a dynamic gesture, such as a dynamic gesture of nodding or shaking the head.

需要指出的是，各种交互姿态的虚拟对象可以是预先制作出来并进行了保存，也可以是在默认姿态的虚拟对象或其它当前显示的虚拟对象的基础上实时生成的。例如，目标对象在发出语音之后，依据该目标对象的音高、音量或文字内容(即文本信息)中的至少一种，对处于默认姿态的虚拟对象进行姿态控制，使该虚拟对象的姿态发生变化，如使虚拟对象的口型发生变化，从而得到相应交互姿态的虚拟对象。因此，在显示不同交互姿态的虚拟对象时，可以从数据库获取相应交互姿态的虚拟对象(如获取包含相应交互姿态的二维静态图像、二维动态图像、三维模型或视频)进行显示，也可以根据音频信号对应的音高、音量或文字内容中的至少一种，对虚拟对象进行姿态控制，使该虚拟对象呈现相应的交互姿态，然后进行显示。It should be noted that the virtual objects of various interactive postures can be pre-made and saved, or can be generated in real time based on the virtual objects of the default posture or other currently displayed virtual objects. For example, after the target object makes a speech, the virtual object in the default posture is posture-controlled according to at least one of the pitch, volume or text content (i.e., text information) of the target object, so that the posture of the virtual object changes, such as changing the mouth shape of the virtual object, thereby obtaining a virtual object of the corresponding interactive posture. Therefore, when displaying virtual objects of different interactive postures, the virtual object of the corresponding interactive posture can be obtained from the database (such as obtaining a two-dimensional static image, a two-dimensional dynamic image, a three-dimensional model or a video containing the corresponding interactive posture) for display, or the virtual object can be posture-controlled according to at least one of the pitch, volume or text content corresponding to the audio signal, so that the virtual object presents the corresponding interactive posture, and then displayed.

在一个实施例中，终端可以确定交互信息，在本地库或网络侧的服务器中，依据该交互信息查找处于第一交互姿态的虚拟对象；或者，根据该交互信息对处于默认姿态的虚拟对象进行姿态控制，得到处于第一交互姿态的虚拟对象；然后，在交互页面中显示处于该第一交互姿态的虚拟对象，如图3所示。In one embodiment, the terminal can determine the interaction information, and search for the virtual object in the first interaction posture in the local library or the server on the network side according to the interaction information; or, perform posture control on the virtual object in the default posture according to the interaction information to obtain the virtual object in the first interaction posture; then, display the virtual object in the first interaction posture in the interaction page, as shown in Figure 3.

其中，该交互信息可以是目标对象未发出语音、或发出了语音但没有实际文字内容时的特定交互信息，如进入了交互页面之后，目标对象还未发出交互语音，但快要发出交互语音时所确定的交互信息，或者目标对象发出了语音但没有实际的内容，如清嗓子的声音，此时虚拟对象可以呈现出期待的神情。Among them, the interaction information can be specific interaction information when the target object has not made any speech, or has made speech but has no actual text content. For example, after entering the interaction page, the target object has not made any interaction speech but is about to make any interaction speech, or the target object has made speech but has no actual content, such as the sound of clearing the throat. At this time, the virtual object can show an expectant expression.

S204，响应音频指令触发，显示针对音频指令的反馈交互信息。S204, in response to the audio instruction trigger, displaying feedback interaction information for the audio instruction.

其中，音频指令可以音频内容属于指令(即命令)类的音频信号。例如，假设“天王盖地虎”为指令类的信息(简称指令信息或命令信息)，那么音频内容为“天王盖地虎”或包含“天王盖地虎”的音频信号即为音频指令。The audio instruction may be an audio signal whose audio content belongs to the instruction (i.e., command) category. For example, assuming that "天王盖地虎" is instruction information (referred to as instruction information or command information), then an audio signal whose audio content is "天王盖地虎" or contains "天王盖地虎" is an audio instruction.

反馈交互信息可以是基于音频指令的文本信息获得的，是交互产品响应于音频指令的文本信息而显示的交互信息，可以是表示赞扬的文字内容，或俏皮话。该反馈交互信息可以以子页面的形式进行显示。其中，该子页面可以是弹窗、浮层或在交互页面上创建的HTML5页面。对于以弹窗的形式显示交互信息，可以参考图4。The feedback interaction information can be obtained based on the text information of the audio instruction, and is the interactive information displayed by the interactive product in response to the text information of the audio instruction, and can be text content expressing praise, or a witty remark. The feedback interaction information can be displayed in the form of a subpage. Among them, the subpage can be a pop-up window, a floating layer, or an HTML5 page created on the interactive page. For displaying the interactive information in the form of a pop-up window, please refer to Figure 4.

在一个实施例中，当获得音频指令时，对音频指令进行语音识别得到对应的文本信息，该文本信息与预设命令信息之间信息匹配时，显示针对文本信息的反馈交互信息。In one embodiment, when an audio instruction is obtained, voice recognition is performed on the audio instruction to obtain corresponding text information, and when the text information matches the preset command information, feedback interaction information for the text information is displayed.

预设命令信息可以是预先配置在交互产品中的命令信息。交互产品可以具有语音命令交互功能、语音关键词交互功能、语音分析加强交互功能以及语音分析音高唱歌玩法功能。其中，语音命令交互功能的优先级高于语音关键词交互功能，即音频指令对应的文本信息命中语音命令交互功能对应的预设命令信息，也是文本信息是针对虚拟对象的描述信息时，会根据音频指令对应的文本信息实现语音命令交互功能；而语音分析加强交互功能是在语音命令交互功能和语音关键词交互功能的基础上，增加了音频指令的音高和音量进行分析使虚拟对象做出不同姿态的反应。其中，音高由音频指令的频率和波长决定，即频率高波长短，则音"高"，反之，频率低波长长，则音"低"。The preset command information may be command information pre-configured in the interactive product. The interactive product may have a voice command interaction function, a voice keyword interaction function, a voice analysis enhanced interaction function, and a voice analysis pitch singing gameplay function. Among them, the priority of the voice command interaction function is higher than the voice keyword interaction function, that is, when the text information corresponding to the audio instruction hits the preset command information corresponding to the voice command interaction function, and the text information is the description information of the virtual object, the voice command interaction function will be realized according to the text information corresponding to the audio instruction; and the voice analysis enhanced interaction function is based on the voice command interaction function and the voice keyword interaction function, and adds the pitch and volume of the audio instruction to analyze so that the virtual object can react in different postures. Among them, the pitch is determined by the frequency and wavelength of the audio instruction, that is, the higher the frequency and the shorter the wavelength, the "higher" the sound, and vice versa, the lower the frequency and the longer the wavelength, the "lower" the sound.

对于语音命令交互功能，在进行交互时，使用命令进行交互，该预设命令信息可以是用于对虚拟对象进行控制的命令信息，例如预设命令信息可以是接头暗号，说对该接头暗号的具体内容，交互产品会触发特定交互逻辑，从而使虚拟对象呈现相应的交互姿态，如为目标对象可以呈现静态的点赞姿态，也可以呈现静态的得意小表情，又或者可以呈现出动态的点赞姿势，此外还可以控制虚拟对象根据该文本信息进行其它的交互操作。For the voice command interaction function, commands are used for interaction. The preset command information can be command information for controlling virtual objects. For example, the preset command information can be a connection code. According to the specific content of the connection code, the interactive product will trigger a specific interaction logic, so that the virtual object can present a corresponding interaction gesture. For example, the target object can present a static thumbs-up gesture, a static smug expression, or a dynamic thumbs-up gesture. In addition, the virtual object can be controlled to perform other interactive operations according to the text information.

在一个实施例中，当获得音频指令对应的文本信息时，终端在音频检测标识中显示文本信息；在显示文本信息时，可以是实时显示每次识别出来的小段文本信息，或者一次性显示完整的文本信息。In one embodiment, when the text information corresponding to the audio instruction is obtained, the terminal displays the text information in the audio detection mark; when displaying the text information, it can display the small segment of text information recognized each time in real time, or display the complete text information at one time.

对于音频指令的识别，终端可以对每次收到的一小段音频指令进行语音识别，对收到的完整音频指令进行语音识别。在对每次收到的一小段音频指令进行语音识别时，可以得到对应的小段文本信息，然后将该小段文本信息显示在音频检测标识的文本框中；或者，在对收到的完整音频指令进行语音识别后，可以得到完整的文本信息，然后将完整的文本信息显示在音频检测标识的文本框中。For the recognition of audio instructions, the terminal can perform voice recognition on a short audio instruction received each time, and perform voice recognition on a complete audio instruction received. When performing voice recognition on a short audio instruction received each time, a corresponding short text message can be obtained, and then the short text message is displayed in the text box of the audio detection mark; or, after performing voice recognition on the complete audio instruction received, the complete text message can be obtained, and then the complete text message is displayed in the text box of the audio detection mark.

具体地，终端可以使用语音识别的JS(JavaScript)库对每次收到的一小段音频指令进行语音识别，或对收到的完整音频指令进行语音识别，得到完整的文本信息。其中，该JS库可以是基于annyang.js形成的JS库，在进行语音识别时，annyang.js可以通过WebSpeech API的SpeechRecognition(语音识别)接口实现。通过上述语音识别方式，无需依赖/接入第三方AI(Artificial Intelligence，人工智能)模型的SDK接口，可以能够大大减少语音识别的成本。Specifically, the terminal can use the JS (JavaScript) library of speech recognition to perform speech recognition on a small audio instruction received each time, or perform speech recognition on the complete audio instruction received to obtain complete text information. Among them, the JS library can be a JS library formed based on annyang.js. When performing speech recognition, annyang.js can be implemented through the SpeechRecognition (speech recognition) interface of the WebSpeech API. Through the above-mentioned speech recognition method, there is no need to rely on/access the SDK interface of the third-party AI (Artificial Intelligence, artificial intelligence) model, which can greatly reduce the cost of speech recognition.

其中，Web Speech API能够将音频指令合并到Web应用程序中。Web Speech API有两个部分：SpeechSynthesis语音合成(文本到语音TTS)和SpeechRecognition语音识别(异步语音识别)。其中，语音识别通过SpeechRecognition接口进行访问，它提供了从音频输入(通常是设备默认的语音识别服务)中识别语音情景的能力。将使用该接口的构造函数来构造一个新的SpeechRecognition对象，该对象包含了一系列有效的对象处理函数来检测识别设备麦克风中的语音输入。The Web Speech API enables the incorporation of audio instructions into Web applications. The Web Speech API has two parts: SpeechSynthesis speech synthesis (text to speech TTS) and SpeechRecognition speech recognition (asynchronous speech recognition). Speech recognition is accessed through the SpeechRecognition interface, which provides the ability to recognize speech scenarios from audio input (usually the device's default speech recognition service). The constructor of this interface will be used to construct a new SpeechRecognition object, which contains a series of valid object processing functions to detect and recognize speech input from the device microphone.

在一个实施例中，在获得完整的文本信息之后，终端还可以将该完整的文本信息与预设命令信息进行相似度计算，得到相似度；当相似度大于或等于相似度阈值时，表示文本信息与预设命令信息之间信息匹配。In one embodiment, after obtaining the complete text information, the terminal may also calculate the similarity between the complete text information and the preset command information to obtain the similarity; when the similarity is greater than or equal to the similarity threshold, it indicates that the text information matches the preset command information.

例如，终端将该完整的文本信息传参给annyang的addCommands函数进行载入；其中，annyang会识别到该文本信息与设定的命令字符串一一对应、且一字不差时，执行相应的交互逻辑，从而显示针对文本信息的反馈交互信息。For example, the terminal passes the complete text information to annyang's addCommands function for loading; when annyang recognizes that the text information corresponds one-to-one with the set command string and is word for word identical, it executes the corresponding interaction logic to display feedback interaction information for the text information.

在一个实施例中，在进行交互时，交互产品可以根据音频指令对应的文本信息实现语音命令交互功能，具体地：当获得音频指令对应的文本信息、且文本信息与预设命令信息之间信息全局匹配时，终端在交互页面中显示子页面，然后在子页面中显示针对文本信息的反馈交互信息，从而交互产品可以以文本形式与目标对象进行交互，有利于提高产品的交互途径。In one embodiment, when interacting, the interactive product can implement a voice command interaction function based on the text information corresponding to the audio instruction. Specifically: when the text information corresponding to the audio instruction is obtained, and the information between the text information and the preset command information is globally matched, the terminal displays a sub-page in the interactive page, and then displays feedback interaction information for the text information in the sub-page, so that the interactive product can interact with the target object in text form, which is conducive to improving the product's interactive path.

其中，信息全局匹配可以指文本信息与预设命令信息一一匹配，此时实现语音命令交互功能。此外，文本信息与预设命令信息之间信息局部匹配时，可以实现语音关键词交互功能。The global information matching may refer to the one-to-one matching of the text information and the preset command information, in which case the voice command interaction function is realized. In addition, when the information between the text information and the preset command information is partially matched, the voice keyword interaction function can be realized.

例如，对于，反馈交互信息的显示，终端显示弹窗或浮层，在弹窗或浮层中显示反馈交互信息。For example, for displaying feedback interaction information, the terminal displays a pop-up window or a floating layer, and displays the feedback interaction information in the pop-up window or the floating layer.

在显示反馈交互信息之前，终端可以从配置的交互信息中获取用于反馈该文本信息的反馈交互信息，然后显示该反馈交互信息，如图4所示；或者，终端可以根据该文本信息生成反馈交互信息，如将该文本信息与预设交互信息进行组合得到反馈交互信息，具体可以是“主人真厉害，天王盖地虎这么有难度的接头暗语都被你发现了”。Before displaying the feedback interaction information, the terminal may obtain the feedback interaction information used to feedback the text information from the configured interaction information, and then display the feedback interaction information, as shown in FIG4 ; or, the terminal may generate feedback interaction information based on the text information, such as combining the text information with the preset interaction information to obtain feedback interaction information, which may specifically be “Master is really amazing, you have discovered such a difficult connection code as Tianwang Gaidihu”.

在一个实施例中，在显示反馈交互信息之后，终端还可以响应于针对反馈交互信息的确认操作，播放第一语音；或者，播放包含虚拟对象的视频动画；其中，视频动画对应的语音包括第一语音，第一语音是依据文本信息和预设交互信息合成的语音。In one embodiment, after displaying the feedback interaction information, the terminal may also play a first voice in response to a confirmation operation on the feedback interaction information; or, play a video animation containing a virtual object; wherein the voice corresponding to the video animation includes the first voice, and the first voice is a voice synthesized based on text information and preset interaction information.

例如，在交互页面上显示“你发现了秘密”弹窗，目标对象点击确认控件后，可以播放“哈哈被你发现啦，天王盖地虎”的可爱类型的语音；或者，播放包含虚拟对象的视频动画，并在播放该视频动画的同时，播放“哈哈被你发现啦，天王盖地虎”的可爱类型的语音。其中，视频动画中的虚拟对象，口型和表情与播放的语音相匹配，从而在视觉上感觉该语音是虚拟对象发出的。For example, a pop-up window "You found a secret" is displayed on the interactive page. After the target object clicks the confirmation control, a cute voice of "Haha, you found it, you are so powerful" can be played; or a video animation containing a virtual object is played, and at the same time as the video animation, a cute voice of "Haha, you found it, you are so powerful" can be played. The virtual object in the video animation has a mouth shape and expression that matches the played voice, so that the voice is visually felt to be emitted by the virtual object.

S206，在交互页面中显示处于第二交互姿态的虚拟对象。S206: Display the virtual object in the second interaction posture in the interaction page.

其中，第一交互姿态和第二交互姿态，是虚拟对象所呈现出的用于反映交互过程的不同姿态。The first interaction posture and the second interaction posture are different postures presented by the virtual object to reflect the interaction process.

第二交互姿态是根据音频指令对应的文本信息和声学信息确定的。因此，文本信息或声学信息中的至少一种信息发生变化时，虚拟对象所呈现的第二交互姿态也可以不同。例如，音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即音频指令的声学信息(如音高和音量)发生变化时，第二交互姿态也会具有一定的差异，不同的声学信息可以反映出目标对象处于的不同状态或情绪，从而在目标对象处于不同状态或情绪时，虚拟对象可以呈现出相匹配的交互姿态，从而即便是目标对象发出相同的内容，虚拟对象也可以呈现出相对应的不同的交互姿态。The second interaction posture is determined based on the text information and acoustic information corresponding to the audio instruction. Therefore, when at least one of the text information or the acoustic information changes, the second interaction posture presented by the virtual object may also be different. For example, when the acoustic information of the audio instruction is different, the second interaction posture presented by the virtual object is different, that is, when the acoustic information (such as pitch and volume) of the audio instruction changes, the second interaction posture will also have certain differences. Different acoustic information can reflect the different states or emotions of the target object, so that when the target object is in different states or emotions, the virtual object can present a matching interaction posture, so that even if the target object sends the same content, the virtual object can present corresponding different interaction postures.

例如，第二交互姿态可以是音频指令对应的文本信息与预设命令信息之间信息匹配时，虚拟对象所表现出的得意小表情(如图5所示)或对目标对象表示赞扬的表情；当目标对象的声学信息满足预设条件时，虚拟对象可以呈现出更加喜悦或活泼的交互姿态。For example, the second interactive gesture may be a smug expression (as shown in FIG. 5 ) or an expression of praise for the target object shown by the virtual object when the text information corresponding to the audio instruction matches the preset command information; when the acoustic information of the target object meets the preset conditions, the virtual object may present a more joyful or lively interactive gesture.

上述实施例中，在交互页面中显示处于第一交互姿态的虚拟对象，在收到音频指令时，显示针对音频指令的反馈交互信息，并且在交互页面中显示处于第二交互姿态的虚拟对象，从而在进行语音交互时，除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。In the above embodiment, a virtual object in a first interaction posture is displayed on the interaction page, and when an audio command is received, feedback interaction information for the audio command is displayed, and a virtual object in a second interaction posture is displayed on the interaction page. Therefore, when voice interaction is performed, in addition to the interaction on text information, there is also interaction between virtual objects through different interaction postures, which enriches the interaction mode. Moreover, the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command. Therefore, when the acoustic information of the audio command is different, the second interaction posture presented by the virtual object is different. Even if the same command content is issued, the virtual object can also present corresponding different interaction postures, which enhances the target object's sense of immersion and empathy with the virtual object, thereby greatly improving the interaction effect.

在一个实施例中，音频指令是音频内容为指令类的音频信号，因此在收到音频信号时，在交互页面中显示处于第三交互姿态的虚拟对象和音频检测标识；在对应于音频检测标识的位置，显示音频信号对应的声波图；当获得音频信号对应的文本信息时，在音频检测标识中显示文本信息。In one embodiment, an audio instruction is an audio signal whose audio content is an instruction type. Therefore, when an audio signal is received, a virtual object in a third interactive posture and an audio detection identifier are displayed in the interactive page; a sound wave graph corresponding to the audio signal is displayed at a position corresponding to the audio detection identifier; and when text information corresponding to the audio signal is obtained, the text information is displayed in the audio detection identifier.

其中，音频信号可以是目标对象发出语音时所形成的声波信号，也可以指目标对象发出的声音信息(如语音信息)。考虑到目标对象在发出语音时，一段话可能断断续续、或持续一小段时间说完，因此上述的收到音频信号可以是正在接收音频信号、接收完音频信号或接收到但未接收完音频信号。对于接收到但未接收完音频信号，可以是：终端收到较小一段音频信号、且该音频信号未包含有效内容，但在间隔预设时长(如一两秒时间)未继续发出语音。对应地，该第三交互姿态可以是表示正在接收音频信号、接收完音频信号或接收到但未接收完音频信号的交互姿态。Among them, the audio signal can be a sound wave signal formed when the target object makes a speech, and can also refer to the sound information (such as voice information) emitted by the target object. Considering that when the target object makes a speech, a paragraph of speech may be intermittent or last for a short period of time, the above-mentioned received audio signal can be that the audio signal is being received, the audio signal has been received, or the audio signal has been received but not completely received. For the audio signal that is received but not completely received, it can be: the terminal receives a small audio signal, and the audio signal does not contain valid content, but does not continue to emit voice within a preset time interval (such as one or two seconds). Correspondingly, the third interactive gesture can be an interactive gesture indicating that the audio signal is being received, the audio signal has been received, or the audio signal has been received but not completely received.

例如，终端通过声音接收器接收外界的音频信号，在接收外界的音频信号的过程中，可以在交互页面中显示表示处于正在接收音频信号的交互姿态的虚拟对象，如虚拟对象把手放在耳朵边做出聆听的姿势；在接收完外界的音频信号时，可以显示表示处于接收完音频信号的交互姿态的虚拟对象，如虚拟对象把手放在耳朵边做出聆听的姿势；在接收到但未接收完音频信号时，可以显示表示处于接收到但未接收完音频信号的交互姿态的虚拟对象，如处于正在钓鱼状态的虚拟对象做出疑问表情，或在该虚拟对象的旁边显示表示疑问的信息，如图6所示。For example, the terminal receives an external audio signal through a sound receiver. During the process of receiving the external audio signal, a virtual object indicating that it is in an interactive posture of receiving the audio signal can be displayed in the interactive page, such as a virtual object putting its hand next to its ear to make a listening gesture; when the external audio signal is received, a virtual object indicating that it is in an interactive posture of completing the reception of the audio signal can be displayed, such as a virtual object putting its hand next to its ear to make a listening gesture; when the audio signal is received but not fully received, a virtual object indicating that it is in an interactive posture of receiving but not fully receiving the audio signal can be displayed, such as a virtual object in a fishing state making a questioning expression, or information expressing questioning can be displayed next to the virtual object, as shown in Figure 6.

上述的音频检测标识可以是由麦克风图标和文本框组成的标识，一方面可以反映交互产品处于音频接收(或检测)状态，另一方面可以用于显示检测到的语音内容，如显示音频信号对应的文本信息。此外，该音频检测标识可以显示在交互页面的下方，如左下侧、右下侧或下方的中间位置。该文本框也可称为语音检测框，具体可以参考图6左下侧的类似于胶囊的框，该文本信息即为目标对象发出的语音信号中的文字内容。The above-mentioned audio detection logo can be a logo composed of a microphone icon and a text box. On the one hand, it can reflect that the interactive product is in an audio reception (or detection) state, and on the other hand, it can be used to display the detected voice content, such as displaying the text information corresponding to the audio signal. In addition, the audio detection logo can be displayed at the bottom of the interactive page, such as the lower left side, the lower right side, or the middle position below. The text box can also be called a voice detection box. For details, please refer to the capsule-like box on the lower left side of Figure 6. The text information is the text content in the voice signal emitted by the target object.

关于上述在对应于音频检测标识的位置，显示音频信号对应的声波图的步骤，具体可以包括：终端在交互页面中，依据音频检测标识的位置确定声波显示区；确定音频信号的频率；音频信号的频率是动态变化的；将音频信号的频率作为高度参数传递至绘制函数；通过绘制函数，在声波显示区绘制高度参数对应的波形柱，得到音频信号对应的由各波形柱组成的声波图。Regarding the above-mentioned step of displaying the sound wave graph corresponding to the audio signal at the position corresponding to the audio detection identifier, it can specifically include: the terminal determines the sound wave display area according to the position of the audio detection identifier in the interactive page; determines the frequency of the audio signal; the frequency of the audio signal changes dynamically; passes the frequency of the audio signal as a height parameter to the drawing function; through the drawing function, draws the waveform column corresponding to the height parameter in the sound wave display area, and obtains the sound wave graph composed of the waveform columns corresponding to the audio signal.

例如，通过AudioContext(音频上下文)接口或AudioContext对象确定音频信号的频率，然后将音频信号的频率作为高度(height)传参给Web API(ApplicationProgramming Interface，网络应用程序接口)中的绘制函数，如CanvasRenderingContext2D.fillRect()函数，最后在交互页面中绘画出一个个音频信号的波形柱，音频信号的频率高低作为波形柱的高度，这些波形柱组合在一起可以形成音频信号的声波图，如图7所示。当音频检测标识显示于交互页面的下方时，可以在交互页面的下方划分出一块区域，在该区域显示音频信号对应的声波图，如图6所示；需要指出的是，音频检测标识显示于交互页面的其它方位，也可以采用类似于上述划分方式划分出一块显示声波图的区域。For example, the frequency of the audio signal is determined through the AudioContext interface or the AudioContext object, and then the frequency of the audio signal is passed as the height to the drawing function in the Web API (Application Programming Interface), such as the CanvasRenderingContext2D.fillRect() function, and finally, waveform columns of the audio signal are drawn in the interactive page, and the frequency of the audio signal is used as the height of the waveform column. These waveform columns are combined to form a sound wave diagram of the audio signal, as shown in Figure 7. When the audio detection mark is displayed at the bottom of the interactive page, an area can be divided at the bottom of the interactive page, and the sound wave diagram corresponding to the audio signal can be displayed in the area, as shown in Figure 6; it should be pointed out that the audio detection mark is displayed in other positions of the interactive page, and a similar division method can be used to divide an area for displaying the sound wave diagram.

其中，音频上下文接口可控制它包含的节点的创建、音频处理以及解码的执行，在上述的操作(如节点的创建、音频处理或解码的执行)之前，可以先创建一个AudioContext对象，然后开始上述的操作。之后在需要用到AudioContext对象时，可以进行复用而不是每次初始化一个新的AudioContext对象，并且可以对多个不同的音频信号和管道同时使用一个AudioContext对象。The audio context interface controls the creation of nodes, audio processing, and decoding. Before the above operations (such as node creation, audio processing, or decoding), you can create an AudioContext object and then start the above operations. When the AudioContext object is needed, it can be reused instead of initializing a new AudioContext object each time, and an AudioContext object can be used for multiple different audio signals and pipelines at the same time.

Web API主要用于JavaScript，该JavaScript是一种函数优先的轻量级，解释性或即时编译型的编程语言，是一种属于网络的脚本语言，已经被广泛用于Web(网络)应用开发，常用来为网页添加各式各样的动态功能，为用户提供更流畅美观的浏览效果，通常JavaScript脚本是通过嵌入在HTML中实现自身的功能的。Web API is mainly used for JavaScript, which is a function-first, lightweight, interpreted or just-in-time compiled programming language. It is a scripting language belonging to the network and has been widely used in Web (network) application development. It is often used to add a variety of dynamic functions to web pages to provide users with a smoother and more beautiful browsing experience. Usually JavaScript scripts realize their own functions by embedding them in HTML.

上述实施例中，在收到音频信号时，在交互页面中显示处于第三交互姿态的虚拟对象，从而在收到音频信号前后，虚拟对象所呈现出的交互姿态是不同的，丰富了虚拟对象的交互方式；此外，在对应于音频检测标识的位置，显示音频信号对应的声波图，并在当获得音频信号对应的文本信息时，在音频检测标识中显示文本信息，因此目标对象可以从视觉上看到音频信号的声波图以及对应的文本信息，有利于提高交互效果。In the above embodiment, when an audio signal is received, a virtual object in a third interaction posture is displayed in the interaction page, so that the interaction posture presented by the virtual object is different before and after the audio signal is received, thereby enriching the interaction method of the virtual object; in addition, a sound wave graph corresponding to the audio signal is displayed at a position corresponding to the audio detection identifier, and when text information corresponding to the audio signal is obtained, the text information is displayed in the audio detection identifier, so that the target object can visually see the sound wave graph of the audio signal and the corresponding text information, which is conducive to improving the interaction effect.

在一个实施例中，S202之前，该方法还包括：In one embodiment, before S202, the method further includes:

S802，显示交互页面，并在交互页面中显示处于默认姿态的虚拟对象。S802, displaying an interactive page, and displaying a virtual object in a default posture on the interactive page.

在一个实施例中，终端在检测到目标对象触发的产品开启操作或交互操作时，进入交互产品的交互页面，并在交互页面中显示处于默认姿态的虚拟对象，如图9所示。In one embodiment, when the terminal detects a product start-up operation or an interactive operation triggered by a target object, it enters an interactive page of the interactive product and displays a virtual object in a default posture on the interactive page, as shown in FIG. 9 .

S804，在对应于虚拟对象的位置，显示关于虚拟对象的用于表示进入交互页面的提示信息。S804: Display prompt information about the virtual object for indicating entering an interactive page at a position corresponding to the virtual object.

其中，该位置可以是虚拟对象的显示位置。The position may be a display position of a virtual object.

在一个实施例中，终端可以先确定虚拟对象的显示位置，在该显示位置处显示子页面，如弹窗或浮层，在该子页面中显示关于虚拟对象的用于表示进入交互页面的提示信息，如图9所示。此外，除了可以显示图9中的提示信息外，还可以显示“欢迎进入小瞳的互动世界”。In one embodiment, the terminal may first determine the display position of the virtual object, display a sub-page at the display position, such as a pop-up window or a floating layer, and display prompt information about the virtual object for indicating entering the interactive page in the sub-page, as shown in FIG9. In addition, in addition to displaying the prompt information in FIG9, "Welcome to Xiaotong's interactive world" may also be displayed.

S806，显示用于表示使用麦克风的权限请求信息。S806: Display permission request information for using the microphone.

其中，该权限请求信息可以是请求使用麦克风权限的请求信息，如A想要使用您的麦克风，如图10所示。需要指出的是，A可以是交互产品的名称，为了提升目标对象的沉浸感以及与虚拟对象共情，A也可以是虚拟对象的名称，如小瞳想要使用您的麦克风。The permission request information may be a request for permission to use a microphone, such as "A wants to use your microphone", as shown in FIG10. It should be noted that A may be the name of an interactive product. In order to enhance the immersion of the target object and empathy with the virtual object, A may also be the name of a virtual object, such as "Xiaotong wants to use your microphone".

在一个实施例中，当目标对象未在交互产品的设置页面中开启麦克风的使用权限时，终端可以在交互页面上显示子页面，如弹窗或浮层，在该子页面上显示用于表示使用麦克风的权限请求信息，如图10所示。需要指出的是，该子页面上除了显示权限请求信息之外，还会显示相应的控件，如禁止控件和允许控件，具体可参考图10。In one embodiment, when the target object does not enable the microphone usage permission in the setting page of the interactive product, the terminal can display a sub-page, such as a pop-up window or a floating layer, on the interactive page, and display permission request information for using the microphone on the sub-page, as shown in Figure 10. It should be noted that in addition to displaying the permission request information, the sub-page also displays corresponding controls, such as a prohibition control and an allowance control, for details, please refer to Figure 10.

S808，响应于对权限请求信息的确认操作，自动开启麦克风的使用权限。S808, in response to the confirmation operation on the permission request information, automatically enable the permission to use the microphone.

其中，该麦克风的使用权限可以是长期使用麦克风的权限(如在未来一周或一个月内无需再次授权)，也可以是在需要时使用麦克风的权限；当麦克风的使用权限是长期使用麦克风的权限时，可以在请求获取麦克风的使用权限时，该权限请求信息中可以包含长期使用麦克风的提示文字。The permission to use the microphone can be permission to use the microphone for a long time (such as no need to re-authorize within the next week or month), or it can be permission to use the microphone when needed; when the permission to use the microphone is permission to use the microphone for a long time, when requesting permission to use the microphone, the permission request information can include prompt text for long-term use of the microphone.

具体地，终端检测目标对象对请求信息的确认操作，如目标对象点击或触摸了允许控件，则会自动开启交互产品的关于麦克风的使用权限。Specifically, the terminal detects the target object's confirmation operation on the requested information, such as the target object clicking or touching the permission control, and then automatically enables the interactive product's permission to use the microphone.

S810，实时进行音频检测，以获得音频指令。S810: Perform audio detection in real time to obtain an audio instruction.

在开启麦克风的使用权限之后，终端便可实时检测外界发出的音频指令，然后执行S204。After enabling the permission to use the microphone, the terminal can detect the audio instructions sent from the outside in real time, and then execute S204.

上述实施例中，在进入交互产品的交互页面时，显示处于默认姿态的虚拟对象，以及显示关于虚拟对象的用于表示进入交互页面的提示信息，可以让目标对象对虚拟对象有一定的了解，起到了提示作用；此外，在未获得麦克风权限时，可以请求使用麦克风权限，可以让目标对象知道交互产品获取麦克风权限，在授予麦克风权限之后，目标对象可以通过语音方式与交互产品进行交互，使人机交互变得简单方便。In the above embodiment, when entering the interactive page of the interactive product, a virtual object in a default posture is displayed, and prompt information about the virtual object is displayed to indicate entering the interactive page, so that the target object can have a certain understanding of the virtual object and serve as a prompt; in addition, when the microphone permission is not obtained, the use of the microphone permission can be requested, so that the target object knows that the interactive product obtains the microphone permission. After granting the microphone permission, the target object can interact with the interactive product through voice, making human-computer interaction simple and convenient.

在一个实施例中，在进行交互时，交互产品还可以根据音频信号对应的文本信息实现语音关键词交互功能，具体可以包括：当获得音频信号对应的文本信息、且文本信息为针对虚拟对象的描述信息时，终端通过交互产品播放第二语音；在交互页面中显示处于第四交互姿态的虚拟对象。In one embodiment, during interaction, the interactive product can also implement a voice keyword interaction function based on the text information corresponding to the audio signal, which may specifically include: when the text information corresponding to the audio signal is obtained, and the text information is description information for the virtual object, the terminal plays a second voice through the interactive product; and displays the virtual object in the fourth interactive posture in the interactive page.

其中，第二语音是依据描述信息对应的答复文本信息生成的语音，第四交互姿态为虚拟对象所呈现的具有第一喜悦度的姿态。Among them, the second voice is a voice generated according to the reply text information corresponding to the description information, and the fourth interactive posture is a posture with a first degree of joy presented by the virtual object.

在获得音频信号对应的文本信息之后，终端首先会查找与文本信息匹配的预设命令信息，若未查找到匹配的预设命令信息，则查找与文本信息匹配的预设关键词信息，若查找到与文本信息匹配的预设关键词信息，表示该文本信息为针对虚拟对象的描述信息，此时会播放第二语音，如播放“谢谢”、“谢谢夸奖”或“谢谢主人对小瞳的认可”等；此外，终端还会在交互页面中显示表达喜悦的虚拟对象，如显示的小瞳呈现喜悦(如可爱或微笑)的表情，如图11所示。After obtaining the text information corresponding to the audio signal, the terminal will first search for preset command information that matches the text information. If no matching preset command information is found, it will search for preset keyword information that matches the text information. If the preset keyword information that matches the text information is found, it indicates that the text information is a description of the virtual object. At this time, a second voice will be played, such as "Thank you", "Thank you for the compliment" or "Thank you for the host's recognition of Xiaotong", etc.; in addition, the terminal will also display a virtual object expressing joy in the interactive page, such as displaying Xiaotong with an expression of joy (such as cute or smiling), as shown in Figure 11.

举例来说，当音频信号对应的文本信息没有命中所有预设命令信息时，可以使用annyang.js提供的回调函数，获取annyang.js识别到的文本信息(记为userSaid)，然后将userSaid作为参数传参给关键词交互逻辑，通过CheckUserSaid函数进行遍历以判定userSaid中是否含有预设关键词信息。如果userSaid中含有预设关键词信息，则执行关键词命令函数，让交互产品执行相应的交互操作，如播放第二语音，以及在交互页面中显示表达喜悦的虚拟对象。For example, when the text information corresponding to the audio signal does not hit all the preset command information, the callback function provided by annyang.js can be used to obtain the text information recognized by annyang.js (recorded as userSaid), and then userSaid is passed as a parameter to the keyword interaction logic, and the CheckUserSaid function is used to traverse to determine whether userSaid contains the preset keyword information. If userSaid contains the preset keyword information, the keyword command function is executed to let the interactive product perform the corresponding interactive operation, such as playing the second voice and displaying a virtual object expressing joy in the interactive page.

此外，交互产品还具有语音分析加强交互功能，即在语音关键词交互功能或语音命令交互功能的基础上增加了音频信号的音高和频率进行分析使虚拟对象做出不同姿态的反应，具体可参考图12，包括：In addition, the interactive product also has a voice analysis enhanced interaction function, that is, based on the voice keyword interaction function or the voice command interaction function, the pitch and frequency of the audio signal are added for analysis to enable the virtual object to respond with different gestures. For details, please refer to Figure 12, including:

S1202，当获得音频信号对应的文本信息、且文本信息为针对虚拟对象的描述信息时，确定音频信号的声学信息。S1202: When text information corresponding to the audio signal is obtained, and the text information is description information for the virtual object, determine acoustic information of the audio signal.

其中，声学信息包括音高或音量中的至少一种。The acoustic information includes at least one of pitch or volume.

通过aubio.js库对音频信号进行分析得到音高，然后从音频信号中检测出的PCM数据，根据PCM数据计算出音量。将音高或音量中的至少一个参数融合到语音交互命令逻辑中，当音高和音量的大小不同时，可以让虚拟对象呈现出做出不同的姿态反应，具体如S1204和S1206。Aubio.js是在Aubio基础上拓展的js库文件，把Aubio用Emscripten编译器编译成js或wasm文件，以便在浏览器中运行。The audio signal is analyzed through the aubio.js library to obtain the pitch, and then the PCM data detected from the audio signal is used to calculate the volume. At least one parameter of the pitch or volume is integrated into the voice interaction command logic. When the pitch and volume are different, the virtual object can show different gestures and reactions, such as S1204 and S1206. Aubio.js is a js library file expanded on the basis of Aubio. Aubio is compiled into a js or wasm file using the Emscripten compiler so that it can run in the browser.

S1204，当声学信息满足预设条件时，在交互页面中显示处于第五交互姿态的虚拟对象。S1204: When the acoustic information meets a preset condition, display the virtual object in the fifth interaction posture in the interaction page.

其中，第五交互姿态为虚拟对象所呈现的具有第二喜悦度的姿态，第二喜悦度大于第一喜悦度，可参考图13。Among them, the fifth interactive posture is a posture with a second joy degree presented by the virtual object, and the second joy degree is greater than the first joy degree. Please refer to Figure 13.

S1206，当声学信息不满足预设条件时，播放第二语音；以及在交互页面中显示处于第四交互姿态的虚拟对象。S1206: When the acoustic information does not meet the preset condition, play the second voice; and display the virtual object in the fourth interaction posture in the interaction page.

其中，S1206的具体实现过程可参考上述实现语音关键词交互功能的实施例。从图11和图13可以看出，当声学信息满足预设条件时，虚拟对象所呈现的喜悦度比声学信息不满足预设条件时的喜悦度更大。The specific implementation process of S1206 may refer to the above-mentioned embodiment for implementing the voice keyword interaction function. It can be seen from Figures 11 and 13 that when the acoustic information meets the preset conditions, the joy degree presented by the virtual object is greater than the joy degree when the acoustic information does not meet the preset conditions.

上述实施例中，目标对象在与虚拟对象进行交互时，目标对象的声学信息不同，可以使虚拟对象所呈现出不同喜悦度的交互姿态，即便是目标对象发出相同的内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。In the above embodiment, when the target object interacts with the virtual object, the target object's acoustic information is different, which can make the virtual object present interactive postures with different degrees of joy. Even if the target object sends the same content, the virtual object can also present corresponding different interactive postures, which enhances the target object's sense of immersion and empathy with the virtual object, thereby greatly improving the interaction effect.

在一个实施例中，交互产品还具有语音分析音高唱歌玩法功能，即在进行交互时，可以命令虚拟对象唱歌，如图14所示，具体如下：In one embodiment, the interactive product also has a voice analysis pitch singing gameplay function, that is, during the interaction, the virtual object can be commanded to sing, as shown in FIG14 , specifically as follows:

S1402，当获得音频信号对应的文本信息、且文本信息与预设演唱信息之间信息匹配时，播放目标音乐。S1402, when the text information corresponding to the audio signal is obtained and the information between the text information and the preset singing information matches, the target music is played.

其中，预设演唱信息可以是预设的用于请求虚拟对象唱歌的关键词，该关键词可以包括：唱歌、唱首歌、来一曲或唱GQ等。其中，GQ可以指歌曲名称，也可以是一句歌词。The preset singing information may be a preset keyword for requesting the virtual object to sing, and the keyword may include: sing, sing a song, sing a song, or sing GQ, etc. GQ may refer to the name of a song or a line of lyrics.

具体地，当预设演唱信息为唱歌、唱首歌或来一曲等模糊演唱信息时，终端可以获取交互产品默认的目标音乐，然后进行播放；当预设演唱信息为目标演唱信息(如唱小星星)时，终端可以获取目标演唱信息对应的目标音乐(如名称为一闪一闪亮晶晶的音乐)，然后进行播放。Specifically, when the preset singing information is ambiguous singing information such as singing, singing a song or playing a song, the terminal can obtain the default target music of the interactive product and then play it; when the preset singing information is target singing information (such as singing Twinkle Twinkle Little Star), the terminal can obtain the target music corresponding to the target singing information (such as music named Twinkle Twinkle Little Star) and then play it.

S1404，在交互页面中，显示演唱目标音乐的虚拟对象，或显示随目标音乐发生姿态变化的虚拟对象。S1404, displaying a virtual object that sings the target music, or displaying a virtual object that changes posture with the target music, in the interactive page.

在播放目标音乐的过程中，为了从视觉上看出该目标音乐时虚拟对象演唱的，可以在交互页面中显示演唱目标音乐的虚拟对象，如该虚拟对象的口型与目标音乐的歌词对应的发音相匹配。此外，为了增强视觉效果，可以在播放目标音乐的过程中，使虚拟对象随目标音乐发生姿态变化，如跟随音乐跳舞。During the playing of the target music, in order to visually show that the target music is sung by the virtual object, the virtual object singing the target music can be displayed in the interactive page, such as the lip shape of the virtual object matches the pronunciation corresponding to the lyrics of the target music. In addition, in order to enhance the visual effect, during the playing of the target music, the virtual object can be made to change its posture with the target music, such as dancing to the music.

具体地，终端获取音乐信号的音乐要素；根据音乐要素生成对应的交互指令；基于交互指令生成虚拟对象的姿态变化数据；依据姿态变化数据控制虚拟对象执行演唱操作，或者控制虚拟对象随目标音乐发生姿态变化。Specifically, the terminal obtains music elements of a music signal; generates corresponding interaction instructions according to the music elements; generates posture change data of a virtual object based on the interaction instructions; controls the virtual object to perform a singing operation according to the posture change data, or controls the virtual object to change posture with the target music.

在一个实施例中，在播放目标音乐的过程中，当接收到包含音乐停止命令的语音时，停止播放目标音乐；停止虚拟对象的演唱操作或随目标音乐发生姿态变化。In one embodiment, during the playing of the target music, when a voice including a music stop command is received, the playing of the target music is stopped; the singing operation of the virtual object or the posture change following the target music is stopped.

S1406，在播放完目标音乐之后，接收目标对象发出的音乐信号。S1406, after playing the target music, receiving a music signal emitted by the target object.

在一个实施例中，在播放完目标音乐之后，终端可以接收目标对象发出的音乐信号；此外，终端还可以继续播放其它音乐或者其它的表示继续请求虚拟对象唱歌的音频信号(如再来一曲)，此时会继续播放对应的音乐。当播放音乐的次数达到预设次数时，若再次接收到表示继续请求虚拟对象唱歌的音频信号时，终端可以显示表示拒绝唱歌的交互姿态的虚拟对象，可以参考图15。In one embodiment, after playing the target music, the terminal can receive a music signal sent by the target object; in addition, the terminal can continue to play other music or other audio signals indicating a continued request for the virtual object to sing (such as another song), and the corresponding music will continue to be played. When the number of times the music is played reaches a preset number, if an audio signal indicating a continued request for the virtual object to sing is received again, the terminal can display a virtual object indicating an interactive gesture of refusing to sing, as shown in FIG15.

S1408，当音乐信号中的歌词信息与播放的目标音乐中的歌词信息一致、但音乐信号的音乐要素与播放的目标音乐的音乐要素不一致时，播放表示对目标对象正向激励的第三语音；以及，在交互页面中，显示表示对目标对象正向激励的第六交互姿态的虚拟对象。S1408, when the lyric information in the music signal is consistent with the lyric information in the target music being played, but the musical elements of the music signal are inconsistent with the musical elements of the target music being played, a third voice representing positive motivation for the target object is played; and, in the interactive page, a virtual object of a sixth interactive gesture representing positive motivation for the target object is displayed.

其中，音乐要素可以包括旋律、音高或其它乐理信息中的至少一种。正向激励表示对目标对象进行鼓励，对应的第三语音可以是“唱得还不错，继续加油”，第六交互姿态可以是卖萌或其它表示鼓励的姿态，可参考图16。The music elements may include at least one of melody, pitch or other music theory information. Positive encouragement means encouraging the target object, and the corresponding third voice may be "You sing well, keep it up", and the sixth interactive gesture may be a cute gesture or other gesture expressing encouragement, as shown in FIG16 .

S1410，当音乐信号中的歌词信息与播放的目标音乐中的歌词信息一致、且音乐信号的音乐要素与播放的目标音乐的音乐要素一致时，播放表示对目标对象赞扬的第四语音；在交互页面中，显示表示对目标对象赞扬的第七交互姿态的虚拟对象。S1410, when the lyrics information in the music signal is consistent with the lyrics information in the target music being played, and the music elements of the music signal are consistent with the music elements of the target music being played, a fourth voice expressing praise for the target object is played; in the interactive page, a virtual object of a seventh interactive gesture expressing praise for the target object is displayed.

在一个实施例中，当音乐信号中的歌词信息与播放的目标音乐中的歌词信息一致时，终端继续判断音乐信号的音乐要素与播放的目标音乐的音乐要素是否一致，若音乐信号的音高与播放的目标音乐的音高不一致，或者音乐信号的旋律与播放的目标音乐的旋律不一致，则播放表示鼓励目标对象的语音；以及，在交互页面中，显示表示鼓励目标对象的交互姿态的虚拟对象。此外，若音乐信号的音高与播放的目标音乐的音高一致，且音乐信号的旋律与播放的目标音乐的旋律一致，则播放表示赞扬目标对象的语音；以及，在交互页面中，显示表示赞扬目标对象的交互姿态的虚拟对象，如图17所示。In one embodiment, when the lyrics information in the music signal is consistent with the lyrics information in the target music being played, the terminal continues to determine whether the music elements of the music signal are consistent with the music elements of the target music being played. If the pitch of the music signal is inconsistent with the pitch of the target music being played, or the melody of the music signal is inconsistent with the melody of the target music being played, a voice expressing encouragement to the target object is played; and, in the interactive page, a virtual object expressing an interactive gesture that encourages the target object is displayed. In addition, if the pitch of the music signal is consistent with the pitch of the target music being played, and the melody of the music signal is consistent with the melody of the target music being played, a voice expressing praise for the target object is played; and, in the interactive page, a virtual object expressing an interactive gesture that praises the target object is displayed, as shown in FIG17 .

其中，音乐信号的音高和旋律指的是目标对象演唱时所呈现出的音高和旋律。Among them, the pitch and melody of the music signal refer to the pitch and melody presented when the target object sings.

例如，虚拟对象演唱完《一闪一闪亮晶晶》，或演唱了《一闪一闪亮晶晶》中的一部分歌曲之后，目标对象可以挑战该虚拟对象，也演唱《一闪一闪亮晶晶》，此时终端会自动检测目标对象的哼唱，首先匹配目标对象是否唱对了歌词，如果演唱对了歌词，则进入语音分析音高和旋律的交互逻辑。对于《一闪一闪亮晶晶》这首歌，旋律可以参考图18，音高可参考图19。For example, after the virtual object sings Twinkle Twinkle Little Star or a part of the song, the target object can challenge the virtual object and sing Twinkle Twinkle Little Star. At this time, the terminal will automatically detect the target object's humming, first match whether the target object sings the lyrics correctly, and if the lyrics are correct, enter the interactive logic of voice analysis pitch and melody. For the song Twinkle Twinkle Little Star, the melody can refer to Figure 18, and the pitch can refer to Figure 19.

在进行旋律匹配时，可以将目标对象的旋律与图17中的旋律简谱进行对比，从而可以确定旋律是否匹配。在进行音高匹配时，可以参考以下匹配逻辑：When performing melody matching, the melody of the target object can be compared with the melody score in Figure 17 to determine whether the melody matches. When performing pitch matching, the following matching logic can be referred to:

传统音乐理论中使用前七个拉丁字母：A、B、C、D、E、F、G(按此顺序则音高循序而上)以及一些变化(升音和降音符号)来标示不同的音符。两个音符间若相差一倍的频率，则称两者之间相差一个八度，为了表示同名但不同高度的音符，科学音调记号法利用字母及一个用来表示所在八度的阿拉伯数字，明确指出音符的位置。比如说，现在的标准调音音高440赫兹名为A4，往上高八度则为A5，继续向上可无限延伸；至于A4往下，则为A3、A2…等。把音高转成对应的MIDI(Musical Instrument Digital Interface，乐器数字接口)编号，对照即可找到音名，比如82.41Hz计算得到MIDI编号38，即数字法音名E2，按照上述方法可以确定音高是否匹配。In traditional music theory, the first seven Latin letters: A, B, C, D, E, F, G (in this order, the pitch goes up) and some variations (sharp and flat symbols) are used to indicate different notes. If the frequency difference between two notes is doubled, they are said to be one octave apart. In order to indicate notes of the same name but different pitches, scientific pitch notation uses letters and an Arabic numeral to indicate the octave to clearly indicate the position of the note. For example, the current standard tuning pitch of 440 Hz is called A4, and the octave above is A5, which can be extended infinitely upward; as for A4 below, it is A3, A2, etc. Convert the pitch to the corresponding MIDI (Musical Instrument Digital Interface) number, and you can find the note name by comparing it. For example, 82.41Hz is calculated to be MIDI number 38, which is the digital note name E2. According to the above method, you can determine whether the pitch matches.

在一个实施例中，终端在接收音乐信号的过程中，当交互页面上的纠错管理控件处于开启状态时，在交互页面的第一区域中，显示表示接听音乐信号的第八交互姿态的虚拟对象；在交互页面的第二区域中，显示演唱提示面板。In one embodiment, when the terminal is receiving a music signal, when the error correction management control on the interactive page is in the turned-on state, a virtual object representing the eighth interactive posture of answering the music signal is displayed in the first area of the interactive page; and a singing prompt panel is displayed in the second area of the interactive page.

其中，交互页面可以包括两个区域，即第一区域和第二区域，第一区域用于显示虚拟对象，第二区域可以显示演唱提示面板。该演唱提示面板显示目标对象的音高、音量或音高与目标音乐的音高之间的对比图中的至少一种，例如该演唱提示面板显示目标对象的音高、音量以及目标对象的音高与目标音乐的音高之间的对比图，如图20所示。其中，图20中的Debug选项属于纠错管理控件。Among them, the interactive page may include two areas, namely a first area and a second area, the first area is used to display a virtual object, and the second area may display a singing prompt panel. The singing prompt panel displays at least one of the pitch, volume, or pitch of the target object and the pitch of the target music. For example, the singing prompt panel displays the pitch, volume, and pitch of the target object and the pitch of the target music. The comparison diagram between the pitch, volume, and pitch of the target object and the pitch of the target music is shown in Figure 20. Among them, the Debug option in Figure 20 belongs to the error correction management control.

上述实施例中，在进行交互时，虚拟对象还可以演唱歌曲以及在演唱歌曲过程中呈现出对应的交互姿态，丰富了交互方式；此外，在虚拟对象演唱过程中随时对虚拟对象的演唱进行控制，而且目标对象还可以与虚拟对象进行唱歌挑战，然后根据目标对象演唱时的歌词信息与音乐要素与虚拟对象演唱时的歌词信息与音乐要素是否一致，播放不同的语音以及呈现出不同的交互姿态，大大地丰富了交互方式，有效地提高了交互效果；而且，在目标对象演唱过程中，还可以显示演唱提示面板，用来对目标对象进行纠错管理，使目标对象可以有效地纠正自己的演唱方式，进一步提高了交互效果。In the above embodiment, when interacting, the virtual object can also sing songs and present corresponding interactive gestures during the singing process, which enriches the interaction method; in addition, the singing of the virtual object can be controlled at any time during the singing process of the virtual object, and the target object can also challenge the virtual object to sing, and then different voices are played and different interactive gestures are presented according to whether the lyrics information and music elements when the target object sings are consistent with the lyrics information and music elements when the virtual object sings, which greatly enriches the interaction method and effectively improves the interaction effect; moreover, during the singing process of the target object, a singing prompt panel can also be displayed to perform error correction management on the target object, so that the target object can effectively correct its own singing method, further improving the interaction effect.

在一个实施例中，如图21所示，提供了一种虚拟对象交互方法，以该方法应用于图1中的终端102为例进行说明，包括以下步骤：In one embodiment, as shown in FIG. 21 , a virtual object interaction method is provided, which is described by taking the method applied to the terminal 102 in FIG. 1 as an example, and includes the following steps:

S2102，显示直播界面，直播界面包括处于第一交互姿态的虚拟对象。S2102, displaying a live broadcast interface, where the live broadcast interface includes a virtual object in a first interactive posture.

其中，直播界面可以是利用虚拟对象进行视频直播的直播页面，如在该直播界面中，利用虚拟对象直播游戏比赛和体育赛事，从而目标对象可以观看到虚拟对象直播的游戏比赛和体育赛事的视频，此外目标对象还可以在观看直播视频的过程中，与虚拟对象进行交互。Among them, the live broadcast interface can be a live broadcast page that uses virtual objects for live video broadcast. For example, in this live broadcast interface, virtual objects are used to broadcast game competitions and sports events, so that the target object can watch the video of the game competitions and sports events broadcast by the virtual objects. In addition, the target object can also interact with the virtual object while watching the live video.

姿态可以是虚拟对象所呈现的形象，如容貌、神态(或神情)、表情、风格、手势以及姿势中的至少一种；该表情可以指面部表情、言语表情和身段表情。交互姿态可以是在进行交互过程中虚拟对象所呈现的形象，处于不同的交互情况下，该交互姿态可以是发生变化的，可以根据目标对象所发出语音的音高、音量和内容呈现出相对应的交互姿态。第一交互姿态可以是虚拟对象所呈现出的表示欢迎目标对象或期待与目标对象交互的姿态。A gesture may be an image presented by a virtual object, such as at least one of appearance, demeanor (or expression), expression, style, gesture, and posture; the expression may refer to facial expression, verbal expression, and body expression. An interactive gesture may be an image presented by a virtual object during an interaction process. In different interactive situations, the interactive gesture may change, and a corresponding interactive gesture may be presented according to the pitch, volume, and content of the speech emitted by the target object. A first interactive gesture may be a gesture presented by a virtual object to express a welcome to the target object or to expect interaction with the target object.

在一个实施例中，S2102之前，终端可以在交互产品(如视频应用)的入口页面显示默认姿态的虚拟对象，当接收到包含目标关键词(如看游戏直播)的语音指令时，显示直播界面。此外，当接收到包含目标关键词的语音指令时，终端可以控制虚拟对象发出针对语音指令的反馈语音，然后进入直播界面。In one embodiment, before S2102, the terminal can display a virtual object in a default posture on the entry page of an interactive product (such as a video application), and when a voice command containing a target keyword (such as watching a live game) is received, the live broadcast interface is displayed. In addition, when a voice command containing a target keyword is received, the terminal can control the virtual object to issue a feedback voice for the voice command, and then enter the live broadcast interface.

在一个实施例中，终端在显示直播界面之后，可以在该直播界面中播放直播视频，如播放游戏比赛、体育比赛、颁奖礼、晚会、会议或论坛等的直播视频。在播放直播视频的过程中，终端可以根据直播视频的音频数据或文本数据进行语音播报，并控制虚拟对象呈现与音频数据或文本数据对应的交互姿态；此外，在播放直播视频的过程中，终端还可以根据视频画面中的目标事件播放语音，并控制虚拟对象呈现视频画面中的目标事件对应的交互姿态。In one embodiment, after displaying the live broadcast interface, the terminal can play live videos in the live broadcast interface, such as live videos of game matches, sports matches, award ceremonies, parties, conferences, or forums. During the live broadcast video playback, the terminal can make voice broadcasts based on the audio data or text data of the live video, and control the virtual object to present an interactive gesture corresponding to the audio data or text data; in addition, during the live broadcast video playback, the terminal can also play voices based on the target event in the video screen, and control the virtual object to present an interactive gesture corresponding to the target event in the video screen.

例如，终端获取音频数据或文本数据，控制虚拟对象呈现出与音频数据或文本数据对应的口型，以及呈现出对应的表情或动作等，如游戏队伍a控制的游戏角色击杀了游戏队伍b控制的游戏角色时，终端可以播放“游戏队伍a真厉害”的语音，此外还可以控制虚拟对象呈现出与“游戏队伍a真厉害”对应的口型，以及控制虚拟对象呈现出钦佩的表情。此外，终端检测到视频画面中出现了目标事件，如游戏队伍a控制的游戏角色击杀了游戏队伍b控制的游戏角色，终端可以播放“游戏队伍a真厉害”的语音，此外还可以控制虚拟对象呈现出与“游戏队伍a真厉害”对应的口型，以及控制虚拟对象呈现出钦佩的表情。For example, the terminal obtains audio data or text data, controls the virtual object to present the lip shape corresponding to the audio data or text data, and presents the corresponding expression or action, etc. For example, when the game character controlled by game team a kills the game character controlled by game team b, the terminal can play the voice of "game team a is really awesome", and can also control the virtual object to present the lip shape corresponding to "game team a is really awesome", and control the virtual object to present the expression of admiration. In addition, when the terminal detects the appearance of the target event in the video screen, such as the game character controlled by game team a kills the game character controlled by game team b, the terminal can play the voice of "game team a is really awesome", and can also control the virtual object to present the lip shape corresponding to "game team a is really awesome", and control the virtual object to present the expression of admiration.

其中，该音频数据可以是直播员发出的语音数据。文本数据可以是直播员发出的语音数据对应的文字内容，或根据视频画面生成的直播文稿。The audio data may be the voice data sent by the live broadcaster, and the text data may be the text content corresponding to the voice data sent by the live broadcaster, or a live broadcast script generated according to the video screen.

在另一个实施例中，当直播时长满足预设条件时，终端未检测到目标事件时，可以控制虚拟对象呈现相应的交互姿态或发出相应的语音，如处于正在钓鱼状态的虚拟对象做出疑问表情，或在该虚拟对象的旁边显示表示疑问的信息，表示在等待目标事件的发生，如等待某个球队踢进首个球。In another embodiment, when the live broadcast duration meets the preset conditions and the terminal does not detect the target event, the virtual object can be controlled to present a corresponding interactive gesture or make a corresponding voice, such as a virtual object in a fishing state making a questioning expression, or displaying questioning information next to the virtual object, indicating that it is waiting for the target event to occur, such as waiting for a team to score the first goal.

S2104，响应音频指令触发，在直播界面显示反馈交互信息和处于第二交互姿态的虚拟对象。S2104, in response to the audio command trigger, displaying the feedback interaction information and the virtual object in the second interaction posture on the live broadcast interface.

音频指令可以音频内容属于指令类的音频信号。例如，音频指令的音频内容可以是“哪个队会赢”、“你猜哪个队会赢”或“你觉得哪个队会赢”；此外，音频指令的音频内容也可以是关于虚拟对象的描述信息或评论信息，如“小瞳的直播水平真厉害”。The audio instruction may be an audio signal whose audio content belongs to the instruction category. For example, the audio content of the audio instruction may be "Which team will win", "Guess which team will win" or "Which team do you think will win"; in addition, the audio content of the audio instruction may also be descriptive information or commentary information about the virtual object, such as "Xiaotong's live broadcasting skills are amazing".

反馈交互信息可以是基于音频指令的文本信息获得的，是交互产品响应于音频指令的文本信息而显示的交互信息。例如，目标对象询问“哪个队会赢”时，该反馈交互信息可以是“小瞳觉得游戏队伍a会赢”或“小瞳觉得b国足球队会赢”；又如目标对象夸奖“小瞳的直播水平真厉害”时，该反馈交互信息可以是“害羞啦，小瞳依然会继续努力直播”。该反馈交互信息可以以子页面的形式进行显示。其中，该子页面可以是弹窗、浮层或在交互页面上创建的HTML5页面。Feedback interaction information can be obtained based on the text information of the audio instruction, and is the interactive information displayed by the interactive product in response to the text information of the audio instruction. For example, when the target object asks "which team will win", the feedback interaction information can be "Xiaotong thinks that the game team A will win" or "Xiaotong thinks that the football team of country B will win"; for example, when the target object praises "Xiaotong's live broadcast level is really amazing", the feedback interaction information can be "I'm shy, but Xiaotong will continue to work hard on the live broadcast." The feedback interaction information can be displayed in the form of a sub-page. Among them, the sub-page can be a pop-up window, a floating layer, or an HTML5 page created on the interactive page.

对于S2104的具体实现过程，可以参考图2实施例中的S204和S206。此外，在直播过程中，目标对象还可以与虚拟对象进行其它内容的交互，具体可以参考本申请上述所阐述的语音命令交互功能、语音关键词交互功能、语音分析加强交互功能以及语音分析音高唱歌玩法功能等这些方面的交互。For the specific implementation process of S2104, please refer to S204 and S206 in the embodiment of Figure 2. In addition, during the live broadcast, the target object can also interact with the virtual object in other content, and specifically refer to the voice command interaction function, voice keyword interaction function, voice analysis enhanced interaction function, and voice analysis pitch singing gameplay function described above in this application.

在一个实施例中，终端在收到音频指令之后，除了可以在直播界面显示反馈交互信息和处于第二交互姿态的虚拟对象，还可以控制虚拟对象通过语音方式与目标对象进行语音交流。In one embodiment, after receiving the audio instruction, the terminal can not only display the feedback interaction information and the virtual object in the second interaction posture on the live broadcast interface, but also control the virtual object to communicate with the target object through voice.

为了更加清楚了虚拟对象进行直播时的交互方案，这里结合游戏直播的应用场景进行说明，如图22所示，具体交互方案如下：In order to make the interaction scheme of virtual objects during live broadcast more clear, the application scenario of game live broadcast is explained here, as shown in Figure 22. The specific interaction scheme is as follows:

用户想要观看射击类的游戏直播时，可以在游戏直播的入口页面找到对应的射击类游戏，点击进入该游戏直播的直播界面，或者通过语音控制虚拟对象“小瞳”打开该游戏直播的直播界面，然后在该直播界面中显示该射击类游戏的直播画面，并在该直播界面的右上角显示虚拟对象“小瞳”，“小瞳”此时呈现出期待直播和观看游戏的神情，如图22的(a)图所示；用户在观看直播的过程中发出“小瞳，你觉得哪个队会赢”的语音指令时，可以控制“小瞳”呈现表示回复用户的报告姿态，并在直播界面显示“我觉得a对会赢”的反馈交互信息，如图22的(b)图所示；当a队的游戏角色击杀了b队的游戏角色时，触发了目标事件，此时小瞳呈现出开心和得意的小表情，如图22的(c)图所示。When the user wants to watch a live broadcast of a shooting game, the user can find the corresponding shooting game on the entry page of the game live broadcast and click to enter the live broadcast interface of the game live broadcast, or use voice control to open the live broadcast interface of the game live broadcast through the virtual object "Xiaotong", and then display the live screen of the shooting game in the live broadcast interface, and display the virtual object "Xiaotong" in the upper right corner of the live broadcast interface. At this time, "Xiaotong" shows an expression of looking forward to the live broadcast and watching the game, as shown in Figure 22 (a); when the user issues a voice command "Xiaotong, which team do you think will win" while watching the live broadcast, "Xiaotong" can be controlled to present a reporting gesture to reply to the user, and the feedback interaction information "I think team a will win" is displayed in the live broadcast interface, as shown in Figure 22 (b); when the game character of team a kills the game character of team b, the target event is triggered, and Xiaotong shows a happy and proud expression at this time, as shown in Figure 22 (c).

上述实施例中，显示包括处于第一交互姿态的虚拟对象的直播界面，在收到音频指令时，在该直播界面中显示反馈交互信息和处于第二交互姿态的虚拟对象，从而即使在直播过程中，也可以与直播的虚拟对象之间进行交互，在交互过程中除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。In the above embodiment, a live broadcast interface including a virtual object in a first interaction posture is displayed, and when an audio command is received, feedback interaction information and a virtual object in a second interaction posture are displayed in the live broadcast interface, so that even during the live broadcast, interaction can be carried out with the virtual object of the live broadcast. In addition to the interaction of text information during the interaction process, there is also interaction between virtual objects through different interaction postures, which enriches the interaction mode; moreover, the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command. Therefore, when the acoustic information of the audio command is different, the second interaction posture presented by the virtual object is different. Even if the same command content is issued, the virtual object can also present corresponding different interaction postures, which enhances the target object's sense of immersion and empathy with the virtual object, thereby greatly enhancing the interaction effect.

为了更加清楚了解本申请的方案，这里结合交互产品和底层逻辑进行说明，具体如下：In order to better understand the solution of this application, the interactive product and the underlying logic are explained here, as follows:

(1)进入产品的前端页面，请求获取麦克风权限。(1) Access the product front page and request microphone permission.

用户进入产品的前端页面，在该前端页面的中间展示与用户交互的虚拟人物，此外还会展示包含欢迎语的弹窗。该虚拟人物可以有多种形态，包括不限于二维动态图像、二维静态图像、三维模型和视频播放等。When a user enters the front page of the product, a virtual character that interacts with the user is displayed in the middle of the front page, and a pop-up window containing a welcome message is also displayed. The virtual character can have various forms, including but not limited to two-dimensional dynamic images, two-dimensional static images, three-dimensional models, and video playback.

点击该弹窗中的按钮“ok”后，请求获取麦克风权限，该麦克风权限请求过程包括：调用Web API的AudioContext(音频上下文)对象判断前端页面是否能够成功使用各类音频模块，如果音频模块检测证明用户有麦克风设备，则使用Web API的navigator.mediaDevices.getUserMedia()弹出获取麦克风权限的请求。用户允许后，产品将获得麦克风权限，自动开启实时语音检测，此外返回一个MediaStream(包含媒体类型的数据流)对象。对于Web API获取用户接入的设备权限以及最终PCM数据流总流程，可以参考如下程序：After clicking the "ok" button in the pop-up window, a request is made to obtain microphone permission. The microphone permission request process includes: calling the AudioContext (audio context) object of the Web API to determine whether the front-end page can successfully use various audio modules. If the audio module detection proves that the user has a microphone device, the Web API's navigator.mediaDevices.getUserMedia() pops up a request to obtain microphone permission. After the user allows it, the product will obtain microphone permission and automatically start real-time voice detection. In addition, a MediaStream (data stream containing media types) object is returned. For the Web API to obtain user access device permissions and the final PCM data stream overall process, you can refer to the following program:

(2)在开启实时语音检测之后，前端页面的下方会展示实时声波图。(2) After real-time voice detection is turned on, a real-time sound wave graph will be displayed at the bottom of the front-end page.

前端页面下方展示实时的声波图，该声波图随用户麦克风输入的语音频率发生变化。此外，前端页面的左下方展示胶囊状的语音检测框，实时检测用户的语音，如从上方audioContext的Analyser中实时获取到用户的语音频率，将该语音频率作为高度传参给Web API中的CanvasRenderingContext2D.fillRect()函数，最后在页面中绘画出一个个波形柱，语音频率的大小值作为波形柱的高度，从而得到对应的声波图。对于绘制声波图的波形柱的具体过程，可参考以下程序：A real-time sound wave graph is displayed at the bottom of the front-end page, and the sound wave graph changes with the voice frequency input by the user's microphone. In addition, a capsule-shaped voice detection box is displayed at the bottom left of the front-end page to detect the user's voice in real time. For example, the user's voice frequency is obtained in real time from the Analyser of the audioContext above, and the voice frequency is passed as the height parameter to the CanvasRenderingContext2D.fillRect() function in the Web API. Finally, waveform columns are drawn on the page, and the size of the voice frequency is used as the height of the waveform column to obtain the corresponding sound wave graph. For the specific process of drawing the waveform column of the sound wave graph, please refer to the following program:

通过上述程序可以绘制出声波图，具体可参考图5。此外，除了绘制声波图，还会识别用户的语音得到对应的文字内容，本申请使用annyang.js作为语音识别的js库，使用Speech KITT作为annyang.js语音识别的结果展示，即在语音检测框中展示识别所得的文字内容，如图4所示。Annyang是基于SpeechRecognitionWeb API的js(JavaScript)库，可以通过语音控制相应的web或虚拟对象。The above program can draw a sound wave graph, as shown in Figure 5. In addition to drawing a sound wave graph, the user's voice will be recognized to obtain the corresponding text content. This application uses annyang.js as the js library for speech recognition, and uses Speech KITT as the result display of annyang.js speech recognition, that is, the recognized text content is displayed in the speech detection box, as shown in Figure 4. Annyang is a js (JavaScript) library based on the SpeechRecognitionWeb API, which can control the corresponding web or virtual objects through voice.

对于识别过程，可参考以下程序：For the identification process, refer to the following procedure:

针对不同的文字内容，以及用户是否说完话，虚拟对象的姿态(如表情和姿势等)会有所不同。例如，用户说“啊”，虚拟人物钓鱼的姿态反应是用户还未说完话，语音未接收到结果或用户长时间未进行语音输入的一种待机状态。用户也通过点击语音检测框，可以实现主动暂时关闭语音检测/再次开启语音检测的功能。The posture of the virtual object (such as expression and posture, etc.) will be different for different text content and whether the user has finished speaking. For example, when the user says "ah", the virtual character's fishing posture response is a standby state when the user has not finished speaking, the voice has not received the result, or the user has not made any voice input for a long time. The user can also actively turn off the voice detection temporarily/turn it on again by clicking the voice detection box.

(3)语音命令交互功能(3) Voice command interaction function

该语音命令交互功能根据用户的语音进行实时语音识别，需要命令信息与用户所说的话语一一对应，且优先级高于语音关键词交互功能。例如，用户说“天王盖地虎”，则完全匹配设置的命令信息，此时弹出“你发现了秘密”弹窗，用户点击ok按钮后则展示小瞳(虚拟人物)得意的小表情与播放内容为“哈哈被你发现啦～天王盖地虎”的可爱语音。如果用户说出的是“天王盖地虎哼哼哈嘿”，虽然此句也包含命令信息“天王盖地虎”，但不满足一字不差地一一对应，则不会触发秘密交互逻辑，而是其他的逻辑路线，如实现语音关键词交互功能。The voice command interaction function performs real-time voice recognition based on the user's voice, which requires a one-to-one correspondence between the command information and the user's words, and has a higher priority than the voice keyword interaction function. For example, if the user says "天王盖地虎", the set command information is completely matched, and a pop-up window "You discovered the secret" pop-up window pops up. After the user clicks the OK button, the proud expression of Xiaotong (a virtual character) and the cute voice "Haha, you discovered it~天王盖地虎" are displayed. If the user says "天王盖地虎 hum hum ha hey", although this sentence also contains the command information "天王盖地虎", it does not meet the one-to-one correspondence word for word, and will not trigger the secret interaction logic, but other logical routes, such as realizing the voice keyword interaction function.

例如，设定好commands(命令)常量作为记录各命令信息与相应函数逻辑，传参给annyang的addCommands进行载入，annyang确定识别所得的文字内容与设定的命令信息一一对应、且一字不差的时候，执行相应的函数逻辑。其中，对于识别用户发出的语音以及执行相应的函数逻辑这些过程，可以参考以下程序：For example, set the commands constant as a record of each command information and corresponding function logic, pass it to annyang's addCommands for loading, and when annyang determines that the recognized text content corresponds to the set command information word for word, execute the corresponding function logic. For the process of recognizing the user's voice and executing the corresponding function logic, you can refer to the following program:

(4)语音关键词交互功能(4) Voice keyword interaction function

当识别所得的文字内容与(3)中的所有命令信息不匹配时，会继续检测用户是否说出了相应的关键词，也就是用户角度看来当虚拟人物在聆听到用户说出相关的关键词时，根据编写的相应逻辑展示不同姿态的虚拟人物进行交互。例如，用户说出“小瞳可爱”或“你今天可可爱爱”时，则会识别到含有“可爱”关键词，从而小瞳呈现开心的表情，并播放相应的语音如“谢谢小星星”等。When the recognized text content does not match all the command information in (3), it will continue to detect whether the user has spoken the corresponding keyword. That is, from the user's perspective, when the virtual character hears the user saying the relevant keyword, it will interact with the virtual character in different postures according to the corresponding logic. For example, when the user says "Xiaotong is cute" or "You are so cute today", it will be recognized that it contains the keyword "cute", so Xiaotong will show a happy expression, and play the corresponding voice such as "Thank you, little star".

实现语音关键词交互功能的过程，具体可以是：本申请使用annyang提供的参数为“resultNoMatch”的回调函数，获取annyang识别到的文字内容(以及相近的字符串数组)为userSaid，把userSaid作为参数传参给关键词交互逻辑，通过CheckUserSaid函数，去遍历userSaid来判定是否含有相应的关键词，若包含，则执行相应的关键词命令函数，让前端页面做相应的交互展示，具体可参考以下程序：The process of implementing the voice keyword interaction function can be specifically as follows: this application uses the callback function with the parameter "resultNoMatch" provided by annyang to obtain the text content (and similar string arrays) recognized by annyang as userSaid, and passes userSaid as a parameter to the keyword interaction logic. Through the CheckUserSaid function, userSaid is traversed to determine whether it contains the corresponding keywords. If it does, the corresponding keyword command function is executed to allow the front-end page to perform the corresponding interactive display. For details, please refer to the following program:

(5)语音分析加强交互功能(5) Voice analysis enhances interactive functions

在语音识别后，本申请可以进行进一步的语音分析，可以分析出用户语音的音高以及音量大小，根据音高和音量进行相应的玩法结合互动。如结合(4)，当用户夸赞虚拟人物小瞳形象“可爱”或“你知道你很可爱吗”等时，会分析用户的声高以及音量大小，如果检测到用户的音高较高或音量较大(如c5以上约500赫兹，或音量高于20分贝)，则视为用户更加开朗、诚心诚意的一种展现，则虚拟人物小瞳也收到夸赞也会更加开心，如图13所示。After voice recognition, the present application can further perform voice analysis, analyze the pitch and volume of the user's voice, and perform corresponding gameplay interaction based on the pitch and volume. For example, in combination with (4), when the user praises the virtual character Xiaotong's image as "cute" or "Do you know you are cute?", the user's voice pitch and volume will be analyzed. If the user's voice pitch is high or the volume is high (such as C5 or above about 500 Hz, or the volume is higher than 20 decibels), it is regarded as a display of the user's more cheerful and sincere, and the virtual character Xiaotong will also be happier to receive the praise, as shown in Figure 13.

对于语音分析加强交互功能，是在语音命令交互功能与语音关键词交互功能的基础上的交互玩法，即增加了语音识别后的分析，从用户语音输入的语音中检测pcm数据，通过aubio.js库分析出语音的音高，以及通过pcm数据计算出音量。根据音高和音量这两个参数融合到语音交互的命令逻辑中，让虚拟人物做出不同的姿态反应。对于音高和音量的计算方式，可参考以下程序：The voice analysis enhanced interactive function is an interactive gameplay based on the voice command interactive function and the voice keyword interactive function. That is, it adds analysis after voice recognition, detects PCM data from the voice input of the user, analyzes the pitch of the voice through the aubio.js library, and calculates the volume through PCM data. The two parameters of pitch and volume are integrated into the command logic of voice interaction to allow the virtual character to react with different postures. For the calculation method of pitch and volume, please refer to the following program:

至此已经成功通过如上方法得到pcm数据的音高以及音量大小，可以执行相应的命令交互逻辑，从而根据音高与分贝使虚拟人物不同的反应，具体可参考以下程序：So far, the pitch and volume of the PCM data have been successfully obtained through the above method. The corresponding command interaction logic can be executed to make the virtual character react differently according to the pitch and decibel. For details, please refer to the following program:

(6)语音分析音高唱歌玩法功能(6) Voice analysis and pitch singing function

用户可以对虚拟人物小瞳说“唱首歌吧”或者唱出一闪一闪亮晶晶的歌词片段，此时开始唱歌互动玩法，虚拟人物会开始进行演唱。当用户说出“别唱了”的命令时，虚拟人物停止唱歌/停止唱歌玩法。The user can say "Sing a song" or sing a segment of the lyrics of Twinkle, Twinkle Little Star to the virtual character Xiao Tong, and the singing interactive gameplay will begin, and the virtual character will start singing. When the user says the command "Stop singing", the virtual character stops singing/stops singing gameplay.

对于语音分析音高唱歌玩法功能，首先在html页面增加audio标签，控制相关语音在web前端页面的播放，class与Id标识页面的audio标签元素，src则是播放的语音地址，具体可参考以下程序：For the voice analysis pitch singing function, first add the audio tag to the html page to control the playback of related voices on the web front-end page. The class and ID identify the audio tag element of the page, and src is the address of the voice to be played. For details, please refer to the following program:

当用户对虚拟人物说出唱首歌吧，以及当虚拟人物需要做出语音反馈的时候，可通过(3)中所述的语音命令交互实现，载入虚拟人物唱歌的交互逻辑，并通过标签id锁定该audio元素，通过volume设定声音的大小，通过play()函数播放该语音/歌曲。其中，虚拟人物唱歌的交互逻辑可参考以下程序：When the user says "Let's sing a song" to the virtual character, and when the virtual character needs to give voice feedback, the voice command interaction described in (3) can be implemented, the interactive logic of the virtual character singing is loaded, and the audio element is locked through the tag id, the volume is set through the volume, and the voice/song is played through the play() function. The interactive logic of the virtual character singing can refer to the following program:

当用户说出“别唱了”或“停止”等词语时，可以通过(4)中的关键词命令交互逻辑，来让虚拟形象停止播放语音。其中，关键词命令交互逻辑如下：When the user says "Stop singing" or "Stop", the avatar can stop playing the voice through the keyword command interaction logic in (4). The keyword command interaction logic is as follows:

如用户等待虚拟人物唱完片段“一闪一闪亮晶晶”之后，用户模仿虚拟人物，自行哼唱“一闪一闪亮晶晶”，本申请则会自动检测用户的哼唱，首先则匹配用户是否唱对了歌词，如果歌词一一对应，则进入用户语音分析音高和旋律等交互逻辑。在进行旋律匹配时，可以将目标对象的旋律与图17中的旋律简谱进行对比，从而可以确定旋律是否匹配。For example, after the user waits for the virtual character to finish singing the segment "Twinkle, Twinkle Little Star", the user imitates the virtual character and hums "Twinkle, Twinkle Little Star" by himself, the present application will automatically detect the user's humming, first matching whether the user sings the lyrics correctly, and if the lyrics correspond one to one, then entering the user voice analysis pitch and melody and other interactive logic. When performing melody matching, the melody of the target object can be compared with the melody notation in FIG17, so as to determine whether the melody matches.

此外，在进行音高匹配时，可以参考以下匹配逻辑：传统音乐理论中使用前七个拉丁字母：A、B、C、D、E、F、G(按此顺序则音高循序而上)以及一些变化(升音和降音符号)来标示不同的音符。两个音符间若相差一倍的频率，则称两者之间相差一个八度，为了表示同名但不同高度的音符，科学音调记号法利用字母及一个用来表示所在八度的阿拉伯数字，明确指出音符的位置。比如说，现在的标准音高440赫兹名为A4，往上高八度则为A5，继续向上可无限延伸；至于A4往下，则为A3、A2…等。把音高转成对应的MIDI(Musical InstrumentDigital Interface，乐器数字接口)编号，对照即可找到音名，比如82.41Hz计算得到MIDI编号38，即数字法音名E2，按照上述方法可以确定音高是否匹配。In addition, when matching pitches, the following matching logic can be referred to: In traditional music theory, the first seven Latin letters: A, B, C, D, E, F, G (in this order, the pitch increases in sequence) and some changes (sharp and flat symbols) are used to indicate different notes. If the frequency difference between two notes is doubled, it is said that the difference between the two is one octave. In order to indicate notes with the same name but different pitches, the scientific pitch notation uses letters and an Arabic numeral used to indicate the octave to clearly indicate the position of the note. For example, the current standard pitch of 440 Hz is called A4, and the octave higher is A5, which can be extended infinitely upward; as for A4 and below, it is A3, A2, etc. Convert the pitch to the corresponding MIDI (Musical Instrument Digital Interface) number, and you can find the note name by comparison. For example, 82.41Hz is calculated to get the MIDI number 38, which is the digital note name E2. According to the above method, you can determine whether the pitch matches.

此时，已经获得用户刚刚吟唱小星星片段中的所有音名，且已知小星星音名为CCGGAAG，则使用fuzzball库来计算两个字符串中的音名匹配度，并应用在虚拟人物跟唱玩法的交互逻辑中。其中fuzzywuzzy的js库版本，用于计算两段字符串之间的相似度，拥有多种算法。At this point, we have obtained all the note names in the segment of Twinkle Twinkle Little Star that the user just sang, and we know that the note name of Twinkle Twinkle Little Star is CCGGAAG. We then use the fuzzball library to calculate the matching degree of the note names in the two strings and apply it to the interactive logic of the virtual character sing-along gameplay. The js library version of fuzzywuzzy is used to calculate the similarity between two strings and has multiple algorithms.

本申请提供纠错管理方案，给用户查看自己的音高与音量，以及音准偏差，如图23所示，在代码方面的实现则是：This application provides an error correction management solution to allow users to view their pitch and volume, as well as pitch deviation, as shown in Figure 23. The code implementation is:

在app.js可以设定仪表盘中的指针偏差角度，具体如下：In app.js, you can set the pointer deviation angle in the dashboard as follows:

//通过上述函数getCents计算音高//Calculate the pitch through the above function getCents

const cents＝this.getCents(frequency,noteMIDI)；const cents=this.getCents(frequency,noteMIDI);

//计算指针偏差角度//Calculate the pointer deviation angle

const centsDeg＝(cents/50)*45)；const centsDeg=(cents/50)*45);

至此获得了所有的数据，只需在获取用户实时发出语音时计算并更新前端页面的数据展示即可，具体如下：Now that all the data is available, we only need to calculate and update the data display on the front-end page when the user speaks in real time, as follows:

通过本申请的实施例，可以具有以下技术效果：The embodiments of the present application can achieve the following technical effects:

1)无需依赖/接入第三方AI模型的SDK接口，大大减少语音识别的成本。1) No need to rely on/access the SDK interface of third-party AI models, greatly reducing the cost of speech recognition.

2)能根据音高/音量进行进一步的语音分析识别，让虚拟人物根据用户的不同场景时识别的同一个命令做出更细化的反应，提升用户的沉浸感与交互对象共情。相同的话语在不同的场景下，如用户音调低沉/音量较小或是音调上扬/分贝正常偏大等多种情况，能够反应用户的情绪特点能够被捕捉，在检测到用户说的是相同内容后，通过语音分析从而输出不同的交互结果，从用户角度来看这让交互的过程/对象增加真实感，提升了用户与虚拟对象的情感共鸣。2) Further voice analysis and recognition can be performed based on pitch/volume, allowing the virtual character to make more detailed responses to the same command recognized by the user in different scenarios, thereby enhancing the user's sense of immersion and empathy with the interactive object. The same words in different scenarios, such as the user's low pitch/low volume or the pitch rising/normal to high decibels, can reflect the user's emotional characteristics and be captured. After detecting that the user is saying the same content, different interaction results are output through voice analysis. From the user's perspective, this makes the interactive process/object more realistic and enhances the emotional resonance between the user and the virtual object.

3)通过音高匹配等唱歌交互玩法，更能符合虚拟人物的虚拟身份，在低成本低能效的情况下，可以满足用户的需求达成与虚拟人物的趣味交互体验。3) Through singing interactive gameplay such as pitch matching, it can better match the virtual identity of the virtual character. At low cost and low energy efficiency, it can meet the needs of users to achieve a fun interactive experience with the virtual character.

4)方案灵活，载体可以是各类web平台、html5页面或小程序等，代码体积小，互动逻辑灵活。4) The solution is flexible. The carrier can be various web platforms, HTML5 pages or applets, etc. The code size is small and the interactive logic is flexible.

应该理解的是，虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowcharts involved in the above-mentioned embodiments are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps does not have a strict order restriction, and these steps can be executed in other orders. Moreover, at least a part of the steps in the flowcharts involved in the above-mentioned embodiments can include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these steps or stages is not necessarily carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the steps or stages in other steps.

基于同样的发明构思，本申请实施例还提供了一种用于实现上述所涉及的虚拟对象交互方法的虚拟对象交互装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个虚拟对象交互装置实施例中的具体限定可以参见上文中对于虚拟对象交互方法的限定，在此不再赘述。Based on the same inventive concept, the embodiment of the present application also provides a virtual object interaction device for implementing the virtual object interaction method involved above. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the above method, so the specific limitations in the one or more virtual object interaction device embodiments provided below can refer to the limitations of the virtual object interaction method above, and will not be repeated here.

在一个实施例中，如图24所示，提供了一种虚拟对象交互装置，包括：第一显示模块2402、第二显示模块2404，其中：In one embodiment, as shown in FIG. 24 , a virtual object interaction device is provided, including: a first display module 2402 and a second display module 2404, wherein:

第一显示模块2402，用于在交互页面中显示处于第一交互姿态的虚拟对象；A first display module 2402, configured to display a virtual object in a first interaction posture in an interaction page;

第二显示模块2404，用于响应音频指令触发，显示针对音频指令的反馈交互信息；在交互页面中显示处于第二交互姿态的虚拟对象；The second display module 2404 is used to respond to the audio instruction trigger and display the feedback interaction information for the audio instruction; and display the virtual object in the second interaction posture in the interaction page;

其中，反馈交互信息是基于音频指令的文本信息获得的；第二交互姿态是根据音频指令对应的文本信息和声学信息确定的。The feedback interaction information is obtained based on the text information of the audio instruction; and the second interaction posture is determined according to the text information and acoustic information corresponding to the audio instruction.

上述实施例中，在交互页面中显示处于第一交互姿态的虚拟对象，在收到音频指令时，显示针对音频指令的反馈交互信息，并且在交互页面中显示处于第二交互姿态的虚拟对象，从而在进行语音交互时，除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。In the above embodiment, a virtual object in a first interaction posture is displayed on the interaction page, and when an audio command is received, feedback interaction information for the audio command is displayed, and a virtual object in a second interaction posture is displayed on the interaction page. Therefore, when voice interaction is performed, in addition to the interaction on text information, there is also interaction of virtual objects through different interaction postures, which enriches the interaction mode. Moreover, the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command. Therefore, when the acoustic information of the audio command is different, the second interaction posture presented by the virtual object is different. Even if the same command content is issued, the virtual object can also present corresponding different interaction postures, which enhances the target object's sense of immersion and empathy with the virtual object, thereby greatly improving the interaction effect.

在一个实施例中，该装置还包括：In one embodiment, the apparatus further comprises:

第三显示模块，用于显示交互页面，并在交互页面中显示处于默认姿态的虚拟对象；在对应于虚拟对象的位置，显示关于虚拟对象的用于表示进入交互页面的提示信息。The third display module is used to display the interactive page and display the virtual object in a default posture in the interactive page; at the position corresponding to the virtual object, display prompt information about the virtual object for indicating entering the interactive page.

第三显示模块，还用于显示用于表示使用麦克风的权限请求信息；The third display module is further used to display permission request information for using the microphone;

检测模块，用于响应于对权限请求信息的确认操作，自动开启麦克风的使用权限；实时进行音频检测，以获得音频指令。The detection module is used to automatically enable the use permission of the microphone in response to the confirmation operation of the permission request information; and perform audio detection in real time to obtain audio instructions.

上述实施例中，在进入交互产品的交互页面时，显示默认姿态的虚拟对象，以及显示关于虚拟对象的用于表示进入交互页面的提示信息，可以让目标对象对虚拟对象有一定的了解，起到了提示作用；此外，在未获得麦克风权限时，可以请求使用麦克风权限，可以让目标对象知道交互产品获取麦克风权限，在授予麦克风权限之后，目标对象可以通过语音方式与交互产品进行交互，使人机交互变得简单方便。In the above embodiment, when entering the interactive page of the interactive product, a virtual object with a default posture is displayed, and prompt information about the virtual object is displayed to indicate entering the interactive page, so that the target object can have a certain understanding of the virtual object and play a prompt role; in addition, when the microphone permission is not obtained, the use of the microphone permission can be requested, so that the target object knows that the interactive product obtains the microphone permission. After granting the microphone permission, the target object can interact with the interactive product through voice, making human-computer interaction simple and convenient.

在一个实施例中，音频指令为音频信号；该装置还包括：In one embodiment, the audio instruction is an audio signal; the device further comprises:

第四显示模块，用于在收到音频信号时，在交互页面中显示处于第三交互姿态的虚拟对象和音频检测标识；在对应于音频检测标识的位置，显示音频信号对应的声波图；当获得音频信号对应的文本信息时，在音频检测标识中显示文本信息。The fourth display module is used to display the virtual object and the audio detection mark in the third interaction posture in the interaction page when the audio signal is received; display the sound wave graph corresponding to the audio signal at the position corresponding to the audio detection mark; when the text information corresponding to the audio signal is obtained, the text information is displayed in the audio detection mark.

在一个实施例中，第四显示模块，还用于在交互页面中，依据音频检测标识的位置确定声波显示区；确定音频信号的频率；音频信号的频率是动态变化的；将音频信号的频率作为高度参数传递至绘制函数；通过绘制函数，在声波显示区绘制高度参数对应的波形柱，得到音频信号对应的由各波形柱组成的声波图。In one embodiment, the fourth display module is also used to determine the sound wave display area in the interactive page according to the position of the audio detection identifier; determine the frequency of the audio signal; the frequency of the audio signal changes dynamically; pass the frequency of the audio signal as a height parameter to the drawing function; through the drawing function, draw a waveform column corresponding to the height parameter in the sound wave display area, and obtain a sound wave graph composed of various waveform columns corresponding to the audio signal.

在一个实施例中，第二显示模块，还用于当获得音频指令对应的文本信息、且文本信息与预设命令信息之间信息全局匹配时，在交互页面中显示子页面；在子页面中显示针对文本信息的反馈交互信息。In one embodiment, the second display module is also used to display a sub-page in the interactive page when text information corresponding to the audio instruction is obtained and the information between the text information and the preset command information is globally matched; and feedback interaction information for the text information is displayed in the sub-page.

第一播放模块，用于响应于针对反馈交互信息的确认操作，播放第一语音；或者，播放包含虚拟对象的视频动画；其中，视频动画对应的语音包括第一语音，第一语音是依据文本信息和预设交互信息合成的语音。The first playback module is used to play a first voice in response to a confirmation operation on feedback interaction information; or, to play a video animation containing a virtual object; wherein the voice corresponding to the video animation includes the first voice, and the first voice is a voice synthesized based on text information and preset interaction information.

第二播放模块，用于当获得音频指令对应的文本信息、且文本信息为针对虚拟对象的描述信息时，播放第二语音；A second playing module, configured to play a second voice when text information corresponding to the audio instruction is obtained, and the text information is description information for the virtual object;

第五显示模块，用于在交互页面中显示处于第四交互姿态的虚拟对象；其中，第二语音是依据描述信息对应的答复文本信息生成的语音，第四交互姿态为虚拟对象所呈现的具有第一喜悦度的姿态。The fifth display module is used to display the virtual object in the fourth interactive posture in the interactive page; wherein the second voice is a voice generated based on the reply text information corresponding to the description information, and the fourth interactive posture is a posture with a first degree of joy presented by the virtual object.

确定模块，用于当获得音频指令对应的文本信息、且文本信息为针对虚拟对象的描述信息时，确定音频信号的声学信息；声学信息包括音高或音量中的至少一种；A determination module, configured to determine acoustic information of the audio signal when text information corresponding to the audio instruction is obtained, and the text information is description information for the virtual object; the acoustic information includes at least one of pitch or volume;

第五显示模块，还用于当声学信息满足预设条件时，在交互页面中显示处于第五交互姿态的虚拟对象；第五交互姿态为虚拟对象所呈现的具有第二喜悦度的姿态，第二喜悦度大于第一喜悦度；The fifth display module is further used to display the virtual object in a fifth interactive posture in the interactive page when the acoustic information meets a preset condition; the fifth interactive posture is a posture with a second joy degree presented by the virtual object, and the second joy degree is greater than the first joy degree;

第二播放模块，还用于当声学信息不满足预设条件时，播放第二语音；The second playing module is further used to play the second voice when the acoustic information does not meet the preset condition;

第五显示模块，还用于在交互页面中显示处于第四交互姿态的虚拟对象。The fifth display module is further used to display the virtual object in the fourth interaction posture in the interaction page.

在一个实施例中，虚拟对象在交互页面中显示时，以二维静态图像、二维动态图像、三维模型或视频的方式进行显示。In one embodiment, when a virtual object is displayed in an interactive page, it is displayed in the form of a two-dimensional static image, a two-dimensional dynamic image, a three-dimensional model or a video.

第三播放模块，用于当获得音频指令对应的文本信息、且文本信息与预设演唱信息之间信息匹配时，播放目标音乐；The third playing module is used to play the target music when the text information corresponding to the audio instruction is obtained and the information between the text information and the preset singing information matches;

第六显示模块，用于在交互页面中，显示演唱目标音乐的虚拟对象，或显示随目标音乐发生姿态变化的虚拟对象。The sixth display module is used to display a virtual object singing the target music, or a virtual object whose posture changes with the target music, in the interactive page.

控制模块，用于获取音乐信号的音乐要素；根据音乐要素生成对应的交互指令；基于交互指令生成虚拟对象的姿态变化数据；依据姿态变化数据控制虚拟对象执行演唱操作，或者控制虚拟对象随目标音乐发生姿态变化。A control module is used to obtain music elements of a music signal; generate corresponding interaction instructions according to the music elements; generate posture change data of a virtual object based on the interaction instructions; and control the virtual object to perform a singing operation according to the posture change data, or control the virtual object to change posture according to the target music.

在一个实施例中，控制模块，还用于在播放目标音乐的过程中，当接收到包含音乐停止命令的语音时，停止播放目标音乐；停止虚拟对象的演唱操作或随目标音乐发生姿态变化。In one embodiment, the control module is also used to stop playing the target music when receiving a voice containing a music stop command during the playing of the target music; stop the virtual object's singing operation or posture changes with the target music.

接收模块，用于在播放完目标音乐之后，接收目标对象发出的音乐信号；A receiving module is used to receive a music signal emitted by a target object after playing the target music;

第四播放模块，用于当音乐信号中的歌词信息与播放的目标音乐中的歌词信息一致、但音乐信号的音乐要素与播放的目标音乐的音乐要素不一致时，播放表示对目标对象正向激励的第三语音；以及，在交互页面中，显示表示对目标对象正向激励的第六交互姿态的虚拟对象。The fourth playback module is used to play a third voice representing positive motivation for the target object when the lyrics information in the music signal is consistent with the lyrics information in the target music being played, but the music elements of the music signal are inconsistent with the music elements of the target music being played; and, in the interaction page, display a virtual object of a sixth interaction gesture representing positive motivation for the target object.

第五播放模块，用于当音乐信号中的歌词信息与播放的目标音乐中的歌词信息一致、且音乐信号的音乐要素与播放的目标音乐的音乐要素一致时，播放表示对目标对象赞扬的第四语音；a fifth playing module, configured to play a fourth voice expressing praise for the target object when the lyrics information in the music signal is consistent with the lyrics information in the target music being played, and the music elements of the music signal are consistent with the music elements of the target music being played;

第七显示模块，用于在交互页面中，显示表示对目标对象赞扬的第七交互姿态的虚拟对象。The seventh display module is used to display a virtual object of a seventh interactive gesture expressing praise for the target object in the interactive page.

第八显示模块，用于在接收音乐信号的过程中，当交互页面上的纠错管理控件处于开启状态时，在交互页面的第一区域中，显示表示接听音乐信号的第八交互姿态的虚拟对象；在交互页面的第二区域中，显示演唱提示面板；其中，演唱提示面板显示目标对象的音高、音量或音高与目标音乐的音高之间的对比图中的至少一种。The eighth display module is used to display a virtual object representing an eighth interactive posture for answering a music signal in the first area of the interactive page when the error correction management control on the interactive page is in the turned-on state during the process of receiving a music signal; and to display a singing prompt panel in the second area of the interactive page; wherein the singing prompt panel displays at least one of the pitch, volume, or a comparison diagram between the pitch of the target object and the pitch of the target music.

上述实施例中，在进行交互时，虚拟对象还可以演唱歌曲以及在演唱歌曲过程中呈现出对应的交互姿态，丰富了交互方式；此外，在虚拟对象演唱过程中随时对虚拟对象的演唱进行控制，而且目标对象还可以与虚拟对象进行唱歌挑战，然后根据目标对象演唱时的歌词信息与音乐要素与虚拟对象演唱时的歌词信息与音乐要素是否一致，播放不同的语音以及呈现出不同的交互姿态，大大地丰富了交互方式，有效地提高了交互效果；而且，在目标对象演唱过程中，还可以显示演唱提示面板，用来对目标对象进行纠错管理，使目标对象可以有效地纠正自己的演唱方式，进一步提高了交互效果。In the above embodiment, when interacting, the virtual object can also sing songs and present corresponding interactive gestures during the singing process, which enriches the interaction method; in addition, the singing of the virtual object can be controlled at any time during the singing process of the virtual object, and the target object can also challenge the virtual object to sing, and then different voices are played and different interactive gestures are presented according to whether the lyrics information and music elements when the target object sings are consistent with the lyrics information and music elements when the virtual object sings, which greatly enriches the interaction method and effectively improves the interaction effect; moreover, during the singing of the target object, a singing prompt panel can also be displayed to perform error correction management on the target object, so that the target object can effectively correct its own singing method, further improving the interaction effect.

在一个实施例中，提供了一种虚拟对象交互装置，包括：第一显示模块、第二显示模块，其中：In one embodiment, a virtual object interaction device is provided, comprising: a first display module and a second display module, wherein:

第一显示模块，用于显示直播界面，直播界面包括处于第一交互姿态的虚拟对象；A first display module, used to display a live broadcast interface, where the live broadcast interface includes a virtual object in a first interactive posture;

第二显示模块，用于响应音频指令触发，在直播界面显示反馈交互信息和处于第二交互姿态的虚拟对象；其中，反馈交互信息是根据音频指令对应的文本信息获得的，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的。The second display module is used to respond to the audio command trigger and display feedback interaction information and a virtual object in a second interaction posture on the live broadcast interface; wherein the feedback interaction information is obtained based on the text information corresponding to the audio command, and the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command.

上述实施例中，显示包括处于第一交互姿态的虚拟对象的直播界面，在收到音频指令时，在该直播界面中显示反馈交互信息和处于第二交互姿态的虚拟对象，从而即使在直播过程中，也可以与直播的虚拟对象之间进行交互，在交互过程中除了文本信息上的交互，而且还有虚拟对象通过不同交互姿态方面的交互，丰富了交互方式；而且，第二交互姿态是根据音频指令对应的文本信息和声学信息确定的，因此音频指令的声学信息不同时，虚拟对象所呈现的第二交互姿态不同，即便发出相同的指令内容，虚拟对象也可以呈现出相对应的不同的交互姿态，提升了目标对象观看直播的沉浸感以及与虚拟对象的共情，从而大大地提升了交互效果。In the above embodiment, a live broadcast interface including a virtual object in a first interaction posture is displayed. When an audio command is received, feedback interaction information and a virtual object in a second interaction posture are displayed in the live broadcast interface. Therefore, even during the live broadcast, interaction can be performed with the virtual object in the live broadcast. In addition to the interaction of text information, there is also interaction of virtual objects through different interaction postures during the interaction process, which enriches the interaction mode. Moreover, the second interaction posture is determined based on the text information and acoustic information corresponding to the audio command. Therefore, when the acoustic information of the audio command is different, the second interaction posture presented by the virtual object is different. Even if the same command content is issued, the virtual object can present corresponding different interaction postures, which enhances the target object's immersion in watching the live broadcast and empathy with the virtual object, thereby greatly enhancing the interaction effect.

上述虚拟对象交互装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。Each module in the above virtual object interaction device can be implemented in whole or in part by software, hardware, or a combination thereof. Each module can be embedded in or independent of a processor in a computer device in the form of hardware, or can be stored in a memory in a computer device in the form of software, so that the processor can call and execute operations corresponding to each module.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图25所示。该计算机设备包括处理器、存储器、输入/输出接口、通信接口、显示单元和输入装置。其中，处理器、存储器和输入/输出接口通过系统总线连接，通信接口、显示单元和输入装置通过输入/输出接口连接到系统总线。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、移动蜂窝网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种虚拟对象交互方法。该计算机设备的显示单元用于形成视觉可见的画面，可以是显示屏、投影装置或虚拟现实成像装置，显示屏可以是液晶显示屏或电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be shown in FIG25. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory, and the input/output interface are connected via a system bus, and the communication interface, the display unit, and the input device are connected to the system bus via the input/output interface. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The input/output interface of the computer device is used to exchange information between the processor and an external device. The communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be implemented through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. When the computer program is executed by the processor, a virtual object interaction method is implemented. The display unit of the computer device is used to form a visually visible image, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen. The input device of the computer device can be a touch layer covered on the display screen, or a button, trackball or touchpad set on the computer device casing, or an external keyboard, touchpad or mouse, etc.

本领域技术人员可以理解，图25中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 25 is merely a block diagram of a partial structure related to the scheme of the present application, and does not constitute a limitation on the computer device to which the scheme of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现上述虚拟对象交互方法的步骤。In one embodiment, a computer device is provided, including a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the steps of the above-mentioned virtual object interaction method when executing the computer program.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现上述虚拟对象交互方法的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps of the above-mentioned virtual object interaction method are implemented.

在一个实施例中，提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述虚拟对象交互方法的步骤。In one embodiment, a computer program product is provided, including a computer program, which implements the steps of the above-mentioned virtual object interaction method when executed by a processor.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-OnlyMemory，ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory，MRAM)、铁电存储器(Ferroelectric Random Access Memory，FRAM)、相变存储器(Phase Change Memory，PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic RandomAccess Memory，DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等，不限于此。Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to the memory, database or other medium used in the embodiments provided in the present application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetoresistive random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. As an illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM). The database involved in each embodiment provided in this application may include at least one of a relational database and a non-relational database. Non-relational databases may include distributed databases based on blockchains, etc., but are not limited to this. The processor involved in each embodiment provided in this application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic device, a data processing logic device based on quantum computing, etc., but are not limited to this.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be construed as limiting the scope of the present application. It should be noted that, for a person of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the attached claims.