Movatterモバイル変換


[0]ホーム

URL:


CN116614557A - Voice data transmission method, device, equipment, storage medium and product - Google Patents

Voice data transmission method, device, equipment, storage medium and product
Download PDF

Info

Publication number
CN116614557A
CN116614557ACN202310754031.5ACN202310754031ACN116614557ACN 116614557 ACN116614557 ACN 116614557ACN 202310754031 ACN202310754031 ACN 202310754031ACN 116614557 ACN116614557 ACN 116614557A
Authority
CN
China
Prior art keywords
control data
format
data
language
json
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310754031.5A
Other languages
Chinese (zh)
Inventor
秦超
周黎明
胡孙强
吴磊
刘东东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co LtdfiledCriticalChina Construction Bank Corp
Priority to CN202310754031.5ApriorityCriticalpatent/CN116614557A/en
Publication of CN116614557ApublicationCriticalpatent/CN116614557A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供一种语音数据传输方法、装置、设备、存储介质及产品,应用于数据传输领域。本方法应用于电子设备,包括:获取待传输的标签语言格式的控制数据;所述控制数据中包括:语音播报控制数据;按预先定义的格式规范将所述标签语言格式的控制数据转译为json格式的控制数据;将所述json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。预先定义json格式规范用于描述标签语言,可以实现标签语言格式的控制数据与json格式的控制数据的快速转译,从而可以用json格式传输控制数据,提高了标签语言格式的控制数据的传输速度。

The present application provides a voice data transmission method, device, equipment, storage medium and product, which are applied in the field of data transmission. The method is applied to electronic equipment, including: obtaining control data in a label language format to be transmitted; the control data includes: voice broadcast control data; translating the control data in the label language format into json according to a predefined format specification format control data; send the control data in the json format to the user terminal, so that the user terminal translates the control data in the json format into the control data of the label language based on the predefined format specification, and performs the control data based on the control data of the label language Voice broadcast. The pre-defined json format specification is used to describe the label language, which can realize the rapid translation of the control data in the label language format and the control data in the json format, so that the control data can be transmitted in the json format, and the transmission speed of the control data in the label language format is improved.

Description

Translated fromChinese
语音数据传输方法、装置、设备、存储介质及产品Voice data transmission method, device, equipment, storage medium and product

技术领域technical field

本申请涉及数据传输领域,尤其涉及一种语音数据传输方法、装置、设备、存储介质及产品。The present application relates to the field of data transmission, in particular to a voice data transmission method, device, equipment, storage medium and product.

背景技术Background technique

随着语音合成技术和流媒体技术的不断进步,合成语音的应用和合成数字人的应用也在不断兴起。例如,社交软件应用中语音转文字交流的场景,如短视频应用中数字人进行同步直播播报等。With the continuous progress of speech synthesis technology and streaming media technology, the application of synthetic voice and synthetic digital human is also rising. For example, scenarios of voice-to-text communication in social software applications, such as simultaneous live broadcast by digital humans in short video applications.

目前,数字人通常通过DRML/SSML标签语言来驱动,而DRML/SSML标签语言无法用于传输数据,因此,在服务端生成驱动数字人的标签语言格式的控制数据后,需要将标签语言格式的控制数据解析为字符串,再将字符串发送至用户终端,用户终端对字符串进行解析生成驱动数字人的标签语言格式的控制数据。由于驱动数字人的控制数据中可能包括语音数据,将包括语音数据的控制数据解析为字符串,以及将字符串解析为包括语音数据的控制数据的过程复杂,导致标签语言格式的控制数据的传输效率较低。At present, digital humans are usually driven by DRML/SSML tag language, and DRML/SSML tag language cannot be used to transmit data. The control data is parsed into a character string, and then the character string is sent to the user terminal, and the user terminal parses the character string to generate control data in the label language format for driving the digital human. Since the control data driving the digital human may include speech data, the process of parsing the control data including speech data into character strings, and parsing the character strings into control data including speech data is complicated, resulting in the transmission of control data in tag language format less efficient.

发明内容Contents of the invention

本申请提供一种语音数据传输方法、装置、设备、存储介质及产品,用以解决将包括语音数据的控制数据解析为字符串,以及将字符串解析为包括语音数据的控制数据的过程复杂,导致标签语言格式的控制数据的传输效率较低的问题。The present application provides a voice data transmission method, device, equipment, storage medium and product, which are used to solve the complicated process of parsing control data including voice data into character strings and parsing character strings into control data including voice data, This causes a problem in which the transmission efficiency of the control data in the markup language format is low.

第一方面,本申请提供一种语音数据传输方法,应用于电子设备,包括:In the first aspect, the present application provides a voice data transmission method applied to electronic equipment, including:

获取待传输的标签语言格式的控制数据;所述控制数据中包括:语音播报控制数据;Obtain the control data in the label language format to be transmitted; the control data includes: voice broadcast control data;

按预先定义的格式规范将所述标签语言格式的控制数据转译为json格式的控制数据;Translate the control data in the label language format into control data in the json format according to a predefined format specification;

将所述json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。The control data in the json format is sent to the user terminal, so that the user terminal translates the control data in the json format into control data in a label language based on a predefined format specification, and performs voice broadcast based on the control data in the label language.

第二方面,本申请提供一种语音数据传输方法,应用于用户终端,包括:In a second aspect, the present application provides a voice data transmission method applied to a user terminal, including:

获取电子设备发送的json格式的控制数据;所述json格式的控制数据为电子设备按预先定义的格式规范对待传输的标签语言格式的控制数据进行转译形成的;所述控制数据中包括:语音播报控制数据;Obtain the control data in json format sent by the electronic device; the control data in json format is formed by the electronic device in accordance with the pre-defined format specification to be transmitted in the label language format of the control data to be translated; the control data includes: voice broadcast control data;

基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据;Translate control data in json format into control data in label language based on a predefined format specification;

基于标签语言的控制数据进行语音播报。Voice broadcast based on control data in label language.

第三方面,本申请提供一种语音数据传输装置,应用于电子设备,包括:In a third aspect, the present application provides a voice data transmission device applied to electronic equipment, including:

获取模块,用于获取待传输的标签语言格式的控制数据;所述控制数据中包括:语音播报控制数据;An acquisition module, configured to acquire control data in a label language format to be transmitted; the control data includes: voice broadcast control data;

转译模块,用于按预先定义的格式规范将所述标签语言格式的控制数据转译为json格式的控制数据;A translation module, configured to translate the control data in the label language format into control data in the json format according to a predefined format specification;

发送模块,用于将所述json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。A sending module, configured to send the control data in the json format to the user terminal, so that the user terminal translates the control data in the json format into control data in a label language based on a predefined format specification, and executes the control data based on the control data in the label language. Voice broadcast.

第四方面,本申请提供一种语音数据传输装置,应用于用户终端,包括:In a fourth aspect, the present application provides a voice data transmission device applied to a user terminal, including:

获取模块,用于获取电子设备发送的json格式的控制数据;所述json格式的控制数据为电子设备按预先定义的格式规范对待传输的标签语言格式的控制数据进行转译形成的;所述控制数据中包括:语音播报控制数据;The obtaining module is used to obtain the control data in the json format sent by the electronic device; the control data in the json format is formed by the electronic device in accordance with the pre-defined format specification to translate the control data in the label language format to be transmitted; the control data Including: voice broadcast control data;

转译模块,用于基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据;A translation module, configured to translate the control data in json format into control data in a label language based on a predefined format specification;

控制模块,用于基于标签语言的控制数据进行语音播报。The control module is used for voice broadcast based on the control data of the label language.

第五方面,本申请提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器及收发器;In a fifth aspect, the present application provides an electronic device, including: a processor, and a memory and a transceiver communicatively connected to the processor;

所述存储器存储计算机执行指令;所述收发器用于收发数据;The memory stores computer-executable instructions; the transceiver is used to send and receive data;

所述处理器执行所述存储器存储的计算机执行指令,以实现上述第一方面所述的语音数据传输方法。The processor executes the computer-executable instructions stored in the memory, so as to implement the voice data transmission method described in the first aspect above.

第六方面,本申请提供一种用户终端包括:处理器,以及与所述处理器通信连接的存储器及收发器;In a sixth aspect, the present application provides a user terminal including: a processor, and a memory and a transceiver communicatively connected to the processor;

所述存储器存储计算机执行指令;所述收发器用于收发数据;The memory stores computer-executable instructions; the transceiver is used to send and receive data;

所述处理器执行所述存储器存储的计算机执行指令,以实现上述第二方面所述的语音数据传输方法。The processor executes the computer-executable instructions stored in the memory, so as to implement the voice data transmission method described in the second aspect above.

第七方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现上第一或第二方面所述的语音数据传输方法。In a seventh aspect, the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are used to implement the above-mentioned first or second aspect when executed by a processor. The voice data transmission method described above.

第八方面,本申请提供一种计算机程序产品,包括计算机执行指令,该计算机执行指令被处理器执行时实现上述第一或第二方面所述的语音数据传输方法。In an eighth aspect, the present application provides a computer program product, including computer-executable instructions, and when the computer-executable instructions are executed by a processor, implement the voice data transmission method described in the first or second aspect above.

本申请提供的语音数据传输方法、装置、设备、存储介质及产品,应用于电子设备,包括:获取待传输的标签语言格式的控制数据;所述控制数据中包括:语音播报控制数据;按预先定义的格式规范将所述标签语言格式的控制数据转译为json格式的控制数据;将所述json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。预先定义json格式规范用于描述标签语言,可以实现标签语言格式的控制数据与json格式的控制数据的快速转译,从而可以用json格式传输控制数据,提高了标签语言格式的控制数据的传输速度。The voice data transmission method, device, equipment, storage medium and product provided by this application are applied to electronic equipment, including: obtaining control data in label language format to be transmitted; the control data includes: voice broadcast control data; The defined format specification translates the control data in the label language format into the control data in the json format; sends the control data in the json format to the user terminal, so that the user terminal converts the control data in the json format based on the predefined format specification Translate the control data into the label language, and perform voice broadcast based on the control data in the label language. The pre-defined json format specification is used to describe the label language, which can realize the rapid translation of the control data in the label language format and the control data in the json format, so that the control data can be transmitted in the json format, and the transmission speed of the control data in the label language format has been improved.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

图1为本申请实施例提供的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;

图2为本申请实施例一提供的语音数据传输方法流程图;FIG. 2 is a flowchart of a voice data transmission method provided in Embodiment 1 of the present application;

图3为本申请实施例二提供的语音数据传输方法流程图;FIG. 3 is a flowchart of a voice data transmission method provided in Embodiment 2 of the present application;

图4为本申请提供的一种语音数据传输信令图;FIG. 4 is a signaling diagram for voice data transmission provided by the present application;

图5为本申请提供的一种底层架构图;Fig. 5 is a kind of underlying architecture diagram provided by the present application;

图6为本申请实施例三提供的语音数据传输装置的结构示意图;FIG. 6 is a schematic structural diagram of a voice data transmission device provided in Embodiment 3 of the present application;

图7为本申请实施例四提供的语音数据传输装置的结构示意图;FIG. 7 is a schematic structural diagram of a voice data transmission device provided in Embodiment 4 of the present application;

图8为本申请实施例五提供的电子设备的结构示意图;FIG. 8 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present application;

图9为本申请实施例五提供的用户终端的结构示意图。FIG. 9 is a schematic structural diagram of a user terminal provided in Embodiment 5 of the present application.

通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。在以下各实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。The terms "first", "second", etc. are used for descriptive purposes only, and should not be understood as indicating or implying relative importance or implicitly specifying the number of indicated technical features. In the descriptions of the following embodiments, "plurality" means two or more, unless otherwise specifically defined.

首先对本发明所涉及的现有技术进行详细说明及分析。Firstly, the prior art involved in the present invention will be described and analyzed in detail.

由于驱动数字人的控制数据中可能包括语音数据,需要将标签语言格式的控制数据解析为字符串,再将字符串解析为包括语音数据的控制数据。目前一种对字符串文本的解析方式为:对字符串文本进行文本分析,确定文本特征,然后通过声学模型建模,根据文本特征确定对应的梅尔谱,再由声码器根据梅尔谱生成语音采样点序列,最后通过后处理对语音采样点序列的采样率、语速、音量以及音调等进行调节得到合成语音数据。因此将字符串解析为包括语音数据的控制数据的过程复杂,导致标签语言格式的控制数据的传输效率较低。Since the control data for driving the digital human may include speech data, it is necessary to parse the control data in the tag language format into a character string, and then parse the character string into control data including speech data. At present, a method of parsing string text is: perform text analysis on string text, determine text features, and then use acoustic model modeling to determine the corresponding Mel spectrum according to the text features, and then use the vocoder according to the Mel spectrum Generate a sequence of voice sampling points, and finally adjust the sampling rate, speech rate, volume, and pitch of the sequence of voice sampling points through post-processing to obtain synthesized voice data. Therefore, the process of parsing character strings into control data including voice data is complicated, resulting in low transmission efficiency of control data in tag language format.

发明人在研究中发现,json格式的数据可以进行传输,预先定义用于描述标签语言的json格式规范,就可以采用该格式规范将标签语言格式的控制数据转译为json格式的控制数据,并传输将控制数据以json格式的进行传输,在用户终端接收到json格式的控制数据后,采用该格式规范将json格式的控制数据转译为标签语言格式的控制数据就可以驱动数字人进行动作。The inventor found in the research that the data in the json format can be transmitted, and the json format specification used to describe the label language is pre-defined, and the control data in the label language format can be translated into the control data in the json format by using the format specification, and transmitted The control data is transmitted in the json format. After the user terminal receives the control data in the json format, the control data in the json format is translated into the control data in the label language format by using the format specification to drive the digital human to perform actions.

图1为本申请实施例提供的应用场景示意图,如图1所示,电子设备在获取待传输的标签语言格式的控制数据后,将标签语言格式的控制数据转译为json格式的控制数据,并将json格式的控制数据发送至用户终端;用户终端接收到json格式的控制数据后,按预先定义的格式规范对json格式的控制数据进行转译,可以获取到标签语言格式的控制数据。Figure 1 is a schematic diagram of the application scenario provided by the embodiment of the present application. As shown in Figure 1, after the electronic device obtains the control data in the label language format to be transmitted, it translates the control data in the label language format into the control data in the json format, and Send the control data in json format to the user terminal; after receiving the control data in json format, the user terminal translates the control data in json format according to the pre-defined format specification, and can obtain the control data in label language format.

本申请的技术方案中,所涉及的金融数据或用户数据等信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this application, the collection, storage, use, processing, transmission, provision, and disclosure of financial data or user data and other information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.

下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solution of the present application and how the technical solution of the present application solves the above technical problems will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below in conjunction with the accompanying drawings.

实施例一Embodiment one

图2为本申请实施例一提供的语音数据传输方法流程图,本申请实施例针对标签语言格式的控制数据的传输效率较低的问题,提供了语音数据传输方法。本实施例中的方法应用于语音数据传输装置,语音数据传输装置可以位于电子设备中。其中,电子设备可以为表示各种形式的数字计算机。诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。FIG. 2 is a flow chart of the voice data transmission method provided by Embodiment 1 of the present application. The embodiment of the present application provides a voice data transmission method for the problem of low transmission efficiency of control data in tag language format. The method in this embodiment is applied to a voice data transmission device, and the voice data transmission device may be located in an electronic device. Wherein, the electronic device may represent various forms of digital computers. Such as, laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.

如图2所示,该方法具体步骤如下:As shown in Figure 2, the specific steps of the method are as follows:

步骤S101、获取待传输的标签语言格式的控制数据。Step S101. Obtain control data in tag language format to be transmitted.

其中,控制数据中包括:语音播报控制数据,语音播报控制数据中包括语音报播的内容;控制数据中还可以包括:动作控制数据。Wherein, the control data includes: voice broadcast control data, and the voice broadcast control data includes the content of the voice broadcast; the control data may also include: motion control data.

本申请实施例中,标签语言格式的控制数据可以用于驱动用户终端中显示的数字人,例如,可以控制数字人进行动作,或进行语音播报。语音播报控制数据为用于驱动数字人进行语音播报的数据。In the embodiment of the present application, the control data in the markup language format can be used to drive the digital human displayed in the user terminal, for example, to control the digital human to perform actions or perform voice broadcasts. The voice broadcast control data is data used to drive the digital human to perform voice broadcast.

步骤S102、按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据。Step S102 , translating the control data in tag language format into control data in json format according to a predefined format specification.

其中,预先定义的格式规范用于采用json格式描述标签语言。Among them, the pre-defined format specification is used to describe the label language in json format.

本申请实施例中,可以预先定义一套json格式规范schema用来描述标签语言,标签语言可以为DRML标签语言或SSML标签语言。在获取待传输的标签语言格式的控制数据后,按预先定义的格式规范可以将标签语言格式的控制数据转译为json格式的控制数据。In the embodiment of the present application, a set of json format specification schema can be pre-defined to describe the tag language, and the tag language can be DRML tag language or SSML tag language. After obtaining the control data in the label language format to be transmitted, the control data in the label language format can be translated into the control data in the json format according to the pre-defined format specification.

步骤S103、将json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。Step S103, sending the control data in json format to the user terminal, so that the user terminal translates the control data in json format into control data in tag language based on a predefined format specification, and performs voice broadcast based on the control data in tag language.

本申请实施例中,将json格式的控制数据发送至用户终端后,用户终端可以按预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并可以采用标签语言的控制数据驱动数字人进行语音播报。In the embodiment of this application, after the control data in json format is sent to the user terminal, the user terminal can translate the control data in json format into control data in label language according to the pre-defined format specification, and can use the control data in label language to drive Digital humans make voice announcements.

本申请实施例提供的语音数据传输方法,获取待传输的标签语言格式的控制数据,按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据,将json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。预先定义json格式规范用于描述标签语言,可以实现标签语言格式的控制数据与json格式的控制数据的快速转译,从而可以用json格式传输控制数据,提高了标签语言格式的控制数据的传输速度。The voice data transmission method provided by the embodiment of the present application obtains the control data in the label language format to be transmitted, translates the control data in the label language format into the control data in the json format according to the pre-defined format specification, and sends the control data in the json format To the user terminal, so that the user terminal translates the control data in the json format into the control data in the label language based on the predefined format specification, and performs voice broadcast based on the control data in the label language. The pre-defined json format specification is used to describe the label language, which can realize the rapid translation of the control data in the label language format and the control data in the json format, so that the control data can be transmitted in the json format, and the transmission speed of the control data in the label language format has been improved.

在上述实施例的基础上,步骤S101获取待传输的标签语言格式的控制数据,还包括:On the basis of the foregoing embodiments, step S101 obtains the control data in the label language format to be transmitted, and also includes:

步骤S201、获取用户终端发送的需求数据或用户操作数据。Step S201, acquiring demand data or user operation data sent by a user terminal.

步骤S202、对需求数据或用户操作数据进行识别以确定需求数据或用户操作数据对应的应答数据。Step S202, identifying the demand data or user operation data to determine response data corresponding to the demand data or user operation data.

步骤S203、基于应答数据生成待传输的标签语言格式的控制数据。Step S203, generating control data in label language format to be transmitted based on the response data.

本申请实施例不限制对需求数据或用户操作数据进行识别以确定需求数据或用户操作数据对应的应答数据的方式。The embodiment of the present application does not limit the manner of identifying the demand data or the user operation data to determine the response data corresponding to the demand data or the user operation data.

例如,用户终端发送的数据为需求数据,可以对需求数据进行意图识别,基于意图识别的结果确定对应的应答数据。还可以提取需求数据中的关键词,基于关键词确定对应的应答数据。For example, the data sent by the user terminal is demand data, and intent recognition may be performed on the demand data, and corresponding response data may be determined based on a result of the intent recognition. It is also possible to extract keywords in the demand data, and determine corresponding response data based on the keywords.

又例如,用户终端发送的数据为操作数据,可以基于预先配置的操作与应答方式的映射关系确定用户操作数据对应的应答数据。For another example, the data sent by the user terminal is operation data, and the response data corresponding to the user operation data may be determined based on a pre-configured mapping relationship between operations and response modes.

本申请实施例提供的语音数据传输方法,获取用户终端发送的需求数据或用户操作数据;对需求数据或用户操作数据进行识别以确定需求数据或用户操作数据对应的应答数据;基于应答数据生成待传输的标签语言格式的控制数据。可以应用于与用户进行交互的场景中,可以基于用户的操作或需求确定对应的响应方式,并生成用于控制用户终端的标签语言格式的控制数据,以在用户终端进行对应的响应。The voice data transmission method provided by the embodiment of the present application obtains the demand data or user operation data sent by the user terminal; identifies the demand data or user operation data to determine the response data corresponding to the demand data or user operation data; generates the waiting data based on the response data Transmitted control data in markup language format. It can be applied to the scene of interacting with the user, and can determine the corresponding response method based on the user's operation or demand, and generate control data in a label language format for controlling the user terminal, so as to make a corresponding response on the user terminal.

在上述实施例的基础上,本实施例涉及的是对步骤S102按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据的一种实现方式的细化。On the basis of the above-mentioned embodiments, this embodiment relates to refinement of an implementation manner of translating control data in tag language format into control data in json format in step S102 according to a predefined format specification.

标签语言格式可以为数字人富文本标记语言DRML格式,在步骤S102之前,可以确定DRML格式包括的各标签的属性对应的json格式规范,以预先定义DRML格式对应的json格式规范。The tag language format may be Digital Human Rich Text Markup Language DRML format. Before step S102, the json format specification corresponding to the attributes of each tag included in the DRML format may be determined to predefine the json format specification corresponding to the DRML format.

示例性地,可以定义以下标签属性对应的json格式规范,包括:Exemplarily, the json format specifications corresponding to the following tag attributes can be defined, including:

1、标签名称(英文为name):该属性作为节点名称/类型展示。示例性地,标签名称可以为:speak,phoneme,break。1)speak标签可以控制文本输出倒语音及动作的过程添加各种组件,有四个基本属性:字幕开关、语速调节、音调调节、音量调节。2)phoneme标签用于标记标签内文本的读音,目前支持汉语拼音和英文音标,读音通过py配置属性给出。拼音之间用单个空格隔开。该标签有一个基本属性py用于与表述文本的读音,拼音后面跟数字代表声调为几声。3)break标签可以在文本中添加停顿,也支持设置停顿的时间长度,以秒/毫秒为单位。该标签有一个基本属性time,设置停顿的持续时长。1. Label name (name in English): This attribute is displayed as the node name/type. Exemplarily, the tag names may be: speak, phoneme, break. 1) The speak tag can control the process of text output, voice and action and add various components. It has four basic attributes: subtitle switch, speech rate adjustment, pitch adjustment, and volume adjustment. 2) The phoneme tag is used to mark the pronunciation of the text in the tag. Currently, Chinese pinyin and English phonetic symbols are supported, and the pronunciation is given through the py configuration attribute. Pinyin are separated by a single space. The tag has a basic attribute py used to express the pronunciation of the text, and the pinyin followed by a number represents the number of tones. 3) The break tag can add a pause in the text, and also supports setting the length of the pause, in seconds/milliseconds. The tag has a basic attribute time, which sets the duration of the pause.

DRML格式还包括标签:ue4event和uievent。ue4event标签用来描述数字人的动作描述;uievent用来描述数字人的插播内容,如图片、列表、文档等插入内容的展示。The DRML format also includes tags: ue4event and uievent. The ue4event tag is used to describe the action description of the digital human; uievent is used to describe the inserted content of the digital human, such as the display of inserted content such as pictures, lists, and documents.

2、标签类型(英文为type):type是Event对象标签下的类型,是一种枚举属性。2. Label type (type in English): type is the type under the Event object label and is an enumerated attribute.

配合ue4event标签和uievent标签可以描述这个标签是用来描述什么样的动作或者插入什么样的场景,例如,插入image图片,list列表、doc文档等。Cooperating with ue4event tag and uievent tag can describe what kind of action or what kind of scene this tag is used to describe, for example, inserting image pictures, list lists, doc documents, etc.

3、标签基本属性(英文为props):props是标签的内联属性描述信息集合,是每个DRML标签配置的关键属性,标签内容配置的基本属性都需在其中描述。比如speak标签下的字幕开关、phoneme的py拼音读音配置等。3. Basic attributes of tags (props in English): props is a collection of inline attribute description information of tags, and is a key attribute of each DRML tag configuration, and the basic attributes of tag content configuration need to be described in it. For example, the subtitle switch under the speak label, the py pinyin pronunciation configuration of phoneme, etc.

4、标签子集(英文为children):children是标签内节点的子集,包含文本和节点,依渲染顺序依次解析,用于描述标签内的嵌套结构。该属性目前并没有嵌套限制,可以设置最大嵌套为3层以内。4. Label subset (children in English): children is a subset of nodes in a label, including text and nodes, which are parsed in order of rendering and used to describe the nested structure in a label. There is currently no nesting limit for this attribute, and the maximum nesting can be set to within 3 layers.

5、标签数据(英文为data):data是Event对象的独立属性集合,表示其具体属性配置。该属性存在几个特殊类型:news(新闻),bar(柱形图),pie(饼图),choice(选项相关),addresses(地址)。其中,Event对象是一种主要标签,主要为UE4event标签和UIevent标签。5. Label data (data in English): data is an independent attribute collection of the Event object, indicating its specific attribute configuration. There are several special types of this attribute: news (news), bar (column chart), pie (pie chart), choice (option related), addresses (address). Among them, the Event object is a main label, mainly UE4event label and UIevent label.

6、标签交互(英文为action):action是Event对象的可交互按钮配置,可以配置一些交互内容的标签内容。6. Label interaction (action in English): action is the interactive button configuration of the Event object, which can configure the label content of some interactive content.

本申请实施例提供的语音数据传输方法,确定DRML格式包括的各标签的属性对应的json格式规范,以预先定义DRML格式对应的json格式规范,可以实现按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据,可以确保各标签的属性均可以由DRML格式转换为json格式,或可以由json格式转换为DRML格式。具体可以包括以下步骤:The voice data transmission method provided by the embodiment of the present application determines the json format specification corresponding to the attributes of each label included in the DRML format, and pre-defines the json format specification corresponding to the DRML format, which can realize the label language format according to the pre-defined format specification. The control data is converted into control data in json format, which can ensure that the attributes of each tag can be converted from DRML format to json format, or can be converted from json format to DRML format. Specifically, the following steps may be included:

步骤S301、确定标签语言格式的控制数据中各标签的属性对应的json格式规范。Step S301 , determining the json format specification corresponding to the attribute of each tag in the control data in the tag language format.

步骤S302、按控制数据中各标签的属性对应的json格式规范将标签语言格式的控制数据转译为json格式的控制数据。Step S302 , translating the control data in tag language format into control data in json format according to the json format specification corresponding to the attribute of each tag in the control data.

本申请实施例中们可以解析出标签语言格式的控制数据中各标签的属性,并确定对应的json格式规范,从而可以按控制数据中各标签的属性对应的json格式规范将标签语言格式的控制数据转译为json格式的控制数据。In the embodiment of the present application, we can analyze the attributes of each tag in the control data in the tag language format, and determine the corresponding json format specification, so that the control data in the tag language format can be converted according to the json format specification corresponding to the attributes of each tag in the control data The data is translated into control data in json format.

示例性地,一段DRML标签语言格式的控制数据可以为:<speak字幕开关="off"语速调节="1"音调调节="1"音量调节="1">你好</speak>。可以按控制数据中各标签的属性对应的json格式规范转换为:{"标签名称":"speak","标签基本属性":{"字幕开关":"off","语速调节":"1","音调调节":"1","音量调节":"1"},"标签子集":["你好"],}。Exemplarily, a piece of control data in DRML markup language format may be: <speak subtitle switch="off"speech rate adjustment="1"pitch adjustment="1"volume adjustment="1">Hello</speak>. According to the json format specifications corresponding to the attributes of each tag in the control data, it can be converted into: {"tag name":"speak","basic tag attributes":{"subtitle switch":"off","speech speed adjustment":" 1","tone adjustment":"1","volume adjustment":"1"},"label subset":["Hello"],}.

实施例二Embodiment two

图3为本申请实施例二提供的语音数据传输方法流程图,本申请实施例针对标签语言格式的控制数据的传输效率较低的问题,提供了语音数据传输方法。本实施例中的方法应用于语音数据传输装置,语音数据传输装置可以位于用户终端中。其中,用户终端可以为表示各种形式的数字计算机或可移动设备。诸如,智能终端、膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。FIG. 3 is a flow chart of the voice data transmission method provided by Embodiment 2 of the present application. The embodiment of the present application provides a voice data transmission method for the problem of low transmission efficiency of control data in tag language format. The method in this embodiment is applied to a voice data transmission device, and the voice data transmission device may be located in a user terminal. Wherein, the user terminal may be a digital computer or a mobile device representing various forms. Such as, smart terminals, laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.

如图3所示,该方法具体步骤如下:As shown in Figure 3, the specific steps of the method are as follows:

步骤S401、获取电子设备发送的json格式的控制数据。Step S401, acquiring control data in JSON format sent by the electronic device.

其中,json格式的控制数据为电子设备按预先定义的格式规范对待传输的标签语言格式的控制数据进行转译形成的;控制数据中包括:语音播报控制数据。Among them, the control data in json format is formed by the electronic device according to the pre-defined format specification to translate the control data in label language format to be transmitted; the control data includes: voice broadcast control data.

步骤S402、基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据。Step S402, translating the control data in json format into control data in tag language based on a predefined format specification.

其中,预先定义的格式规范用于采用json格式描述标签语言。Among them, the pre-defined format specification is used to describe the label language in json format.

本申请实施例中,可以预先定义一套json格式规范schema用来描述标签语言,标签语言可以为DRML标签语言或SSML标签语言。在获取待传输的标签语言格式的控制数据后,按预先定义的格式规范可以将json格式的控制数据转译为标签语言的控制数据。In the embodiment of the present application, a set of json format specification schema can be pre-defined to describe the tag language, and the tag language can be DRML tag language or SSML tag language. After obtaining the control data in the label language format to be transmitted, the control data in the json format can be translated into the control data in the label language according to the pre-defined format specification.

步骤S403、基于标签语言的控制数据进行语音播报。Step S403, performing voice broadcast based on the control data in the label language.

本申请实施例中,标签语言格式的控制数据可以用于驱动用户终端中显示的数字人。在获取标签语言的控制数据后可以采用标签语言的控制数据控制数字人进行语音播报进行动作。语音播报控制数据为用于驱动数字人进行语音播报的数据。In the embodiment of the present application, the control data in the label language format can be used to drive the digital human displayed in the user terminal. After the control data of the label language is obtained, the control data of the label language can be used to control the digital human to perform voice broadcast and perform actions. The voice broadcast control data is data used to drive the digital human to perform voice broadcast.

可选地,控制数据中还可以包括以下至少一种:显示控制数据,语音播报控制数据;用户终端可以基于标签语言的控制数据进行显示和/或语音播报。Optionally, the control data may also include at least one of the following: display control data, voice broadcast control data; the user terminal may perform display and/or voice broadcast based on the control data in label language.

具体地,若控制数据中包括显示控制数据,则用户终端可以基于标签语言的控制数据进行显示;若控制数据中包括语音播报控制数据,则用户终端可以基于标签语言的控制数据进行语音播报;若控制数据中包括语音播报控制数据及显示控制数据,则用户终端可以基于标签语言的控制数据进行显示及语音播报。Specifically, if the display control data is included in the control data, the user terminal can perform display based on the control data in the label language; if the control data includes the voice broadcast control data, the user terminal can perform voice broadcast based on the control data in the label language; if The control data includes voice broadcast control data and display control data, so the user terminal can perform display and voice broadcast based on the control data in tag language.

本申请实施例中,用户终端还可以基于标签语言的控制数据进行显示和/或语音播报,增加了对数据的展示方式,提高了用户体验。In the embodiment of the present application, the user terminal can also perform display and/or voice broadcast based on the control data in the label language, which increases the display mode of the data and improves the user experience.

本申请实施例提供的语音数据传输方法,获取电子设备发送的json格式的控制数据;json格式的控制数据为电子设备按预先定义的格式规范对待传输的标签语言格式的控制数据进行转译形成的;控制数据中包括:语音播报控制数据;基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据;基于标签语言的控制数据进行语音播报。预先定义json格式规范用于描述标签语言,可以实现标签语言格式的控制数据与json格式的控制数据的快速转译,从而可以用json格式传输控制数据,提高了标签语言格式的控制数据的传输速度。The voice data transmission method provided by the embodiment of the present application obtains the control data in the json format sent by the electronic device; the control data in the json format is formed by translating the control data in the label language format to be transmitted by the electronic device according to a predefined format specification; The control data includes: voice broadcast control data; based on the pre-defined format specification, the control data in json format is translated into control data in label language; voice broadcast is performed based on the control data in label language. The pre-defined json format specification is used to describe the label language, which can realize the rapid translation of the control data in the label language format and the control data in the json format, so that the control data can be transmitted in the json format, and the transmission speed of the control data in the label language format has been improved.

在上述实施例的基础上,步骤S401获取电子设备发送的json格式的控制数据之前,还包括:On the basis of the above embodiments, before step S401 acquires the control data in json format sent by the electronic device, it also includes:

步骤S501、响应于用户在交互界面语音输入或文本输入的需求生成需求数据,或响应于用户在交互界面的操作生成用户操作数据;需求数据为语音数据或文本数据。Step S501 , generating demand data in response to user voice input or text input on the interactive interface, or generating user operation data in response to user operations on the interactive interface; the demand data is voice data or text data.

步骤S502、将需求数据或用户操作数据发送至电子设备,以使电子设备确定需求数据或用户操作数据对应的应答数据并基于应答数据生成待传输的标签语言格式的控制数据。Step S502 , sending the requirement data or user operation data to the electronic device, so that the electronic device determines response data corresponding to the requirement data or user operation data and generates control data in label language format to be transmitted based on the response data.

本申请实施例中,用户可以通过语音或文字等方式在交互界面中输入需求,用户终端可以基于用户输入的需求生成需求数据,并将需求数据发送至电子设备,以使电子设备确定对需求数据的响应,并生成对应的标签语言格式的控制数据。例如,需求数据可以为获取产品信息,对需求数据的响应可以为展示产品信息。In the embodiment of the present application, the user can input the demand in the interactive interface through voice or text, and the user terminal can generate demand data based on the demand input by the user, and send the demand data to the electronic device, so that the electronic device can determine the requirements for the demand data. response, and generate control data in the corresponding label language format. For example, the demand data may be to obtain product information, and the response to the demand data may be to display product information.

本申请实施例中,用户还可以在交互界面中进行交互操作,可以基于用户在交互界面的操作生成用户操作数据并将用户操作数据发送至电子设备,以使电子设备确定对用户操作数据的响应,并生成对应的标签语言格式的控制数据,例如,用语音进行感谢等。例如,用户操作数据可以为点赞等操作,对用户操作数据的响应可以为用语音进行感谢等。In the embodiment of the present application, the user can also perform interactive operations in the interactive interface, and can generate user operation data based on the user's operation on the interactive interface and send the user operation data to the electronic device, so that the electronic device can determine the response to the user operation data , and generate control data in the corresponding tag language format, for example, thank you by voice. For example, the user operation data may be operations such as likes, and the response to the user operation data may be thanking by voice.

本申请实施例提供的语音数据传输方法,响应于用户在交互界面语音输入或文本输入的需求生成需求数据,或响应于用户在交互界面的操作生成用户操作数据;需求数据为语音数据或文本数据;将需求数据或用户操作数据发送至电子设备,以使电子设备确定需求数据或用户操作数据对应的应答数据并基于应答数据生成待传输的标签语言格式的控制数据。可以应用于与用户进行交互的场景中,可以基于用户的操作或需求确定对应的响应方式,并生成用于控制用户终端的标签语言格式的控制数据,并在用户终端进行对应的响应。The voice data transmission method provided in the embodiment of the present application generates demand data in response to the user's demand for voice input or text input on the interactive interface, or generates user operation data in response to the user's operation on the interactive interface; the demand data is voice data or text data ; Sending the demand data or user operation data to the electronic device, so that the electronic device determines the response data corresponding to the demand data or user operation data and generates control data in label language format to be transmitted based on the response data. It can be applied to the scene of interacting with the user, and can determine the corresponding response method based on the user's operation or demand, and generate control data in the label language format used to control the user terminal, and perform a corresponding response on the user terminal.

可选地,在基于标签语言的控制数据进行显示和/或语音播报之后,还可以包括:Optionally, after displaying and/or voice broadcasting the control data based on the label language, it may also include:

步骤S601、响应于用户对显示和/或语音播报的调整操作,更新json格式的控制数据中的属性。Step S601, updating the attributes in the control data in json format in response to the user's adjustment operation on the display and/or voice announcement.

步骤S602、基于预先定义的格式规范将更新后的json格式的控制数据转译为更新后的标签语言的控制数据。Step S602: Translate the updated control data in json format into updated control data in tag language based on a predefined format specification.

步骤S603、基于更新后的标签语言的控制数据进行显示和/或语音播报。Step S603, performing display and/or voice broadcast based on the updated control data of the label language.

具体地,获取该标签某属性的方式可以为:查找json格式的控制数据中的标签,获取该标签的指定属性,对该标签的指定属性进行更改。可选地,还可以删除该标签的指定属性或删除该标签。Specifically, the method of obtaining a certain attribute of the tag may be: searching for the tag in the control data in json format, obtaining the specified attribute of the tag, and changing the specified attribute of the tag. Optionally, you can also delete the specified attribute of the label or delete the label.

示例性地,用户可以说“放慢语速”,则可以获取speak标签,获取speak标签的标签基本属性中的语速调节属性,并下调该语速调节属性。For example, if the user can say "slow down the speaking speed", the speak tag can be obtained, the speech rate adjustment attribute in the basic label attributes of the speak tag can be obtained, and the speech rate adjustment attribute can be lowered.

应理解的是,由于DRML标签语言格式不支持修改据指定属性,因此用户在进行对显示和/或语音播报的调整操作后,用户终端通过更新json格式的控制数据中的属性实现对显示和/或语音播报的调整。It should be understood that since the DRML tag language format does not support modification of the specified attributes, after the user performs an adjustment operation on the display and/or voice broadcast, the user terminal implements the display and/or voice broadcast by updating the attributes in the control data in the json format. Or adjust the voice broadcast.

本申请实施例提供的语音数据传输方法,响应于用户对显示和/或语音播报的调整操作,更新json格式的控制数据中的属性;基于预先定义的格式规范将更新后的json格式的控制数据转译为更新后的标签语言的控制数据;基于更新后的标签语言的控制数据进行显示和/或语音播报。由于json格式的控制数据可以动态配置,因此无需将用户对显示和/或语音播报的调整操作数据发送至电子设备,使电子设备重新生成更新后的标签语言的控制数据,在用户终端就可以实现显示、语音播报的动态配置,提高了调整显示、语音播报配置的效率。The voice data transmission method provided by the embodiment of the present application updates the attributes in the control data in the json format in response to the user's adjustment operation on the display and/or voice broadcast; based on the predefined format specification, the updated control data in the json format Translate into the control data of the updated label language; display and/or voice broadcast based on the control data of the updated label language. Since the control data in json format can be dynamically configured, there is no need to send the adjustment operation data of the user to the display and/or voice broadcast to the electronic device, so that the electronic device can regenerate the control data of the updated label language, which can be realized on the user terminal The dynamic configuration of display and voice broadcast improves the efficiency of adjusting the configuration of display and voice broadcast.

下面结合一个具体的示例对上述实施例提供的语音数据传输信令图进行说明,图4为本申请提供的一种语音数据传输信令图,如图4所示,包括以下步骤:The voice data transmission signaling diagram provided by the above embodiment is described below in conjunction with a specific example. FIG. 4 is a voice data transmission signaling diagram provided by the present application. As shown in FIG. 4 , it includes the following steps:

步骤S701、用户终端响应于用户在交互界面语音输入或文本输入的需求生成需求数据,并将需求数据发送至电子设备。Step S701 , the user terminal generates demand data in response to the user's voice input or text input on the interactive interface, and sends the demand data to the electronic device.

步骤S702、电子设备确定需求数据对应的应答数据,并基于应答数据生成待传输的标签语言格式的控制数据。In step S702, the electronic device determines response data corresponding to the demand data, and generates control data in label language format to be transmitted based on the response data.

步骤S703、电子设备按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据,并将json格式的控制数据发送至用户终端。Step S703, the electronic device translates the control data in the tag language format into the control data in the json format according to the pre-defined format specification, and sends the control data in the json format to the user terminal.

步骤S704、用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据。Step S704, the user terminal translates the control data in json format into control data in label language based on the predefined format specification.

步骤S705、基于标签语言的控制数据进行语音播报。Step S705, perform voice broadcast based on the control data in the label language.

本申请实施例提供的语音数据传输方法,可实现前后端无差别通信,并且双端数据无需特殊处理即可通用,数据也可通过对象取值实现快速动态修改配置,大大加强了语音数字人合成处理的灵活性和通用性。The voice data transmission method provided by the embodiment of the present application can realize indiscriminate communication between the front and back ends, and the double-ended data can be used universally without special processing, and the data can also be quickly and dynamically modified and configured through the object value, which greatly strengthens the voice digital human synthesis Processing flexibility and versatility.

图5为本申请提供的一种底层架构图,如图5所示,申请是基于DRML/SSML标签语言来驱动数字人,数字人底层可以封装一层数据层,用来与应用层和数据接口对接,来实现一套完整的数据驱动,数据层可以打通底层数据API与DRML之间的相互转换逻辑。数据层最外层可以定义的数字人驱动json格式规范schema,示例性地,schema可以用来描述数字人的行为标签的类型,从而按预先定义的格式规范schema实现标签语言格式与json格式的互相转换,进一步基于json格式实现语音数据的传递过程。Figure 5 is a diagram of the underlying architecture provided by this application. As shown in Figure 5, the application is based on DRML/SSML tag language to drive the digital human, and the bottom layer of the digital human can encapsulate a layer of data to interface with the application layer and data Docking to realize a complete set of data drivers, the data layer can open up the conversion logic between the underlying data API and DRML. The outermost layer of the data layer can define the digital human-driven json format specification schema. For example, the schema can be used to describe the type of digital human behavior tags, so as to realize the interaction between the label language format and the json format according to the predefined format specification schema. Conversion, and further realize the transmission process of voice data based on the json format.

实施例三Embodiment three

图6为本申请实施例三提供的语音数据传输装置的结构示意图。本申请实施例提供的语音数据传输装置可以执行语音数据传输方法实施例一提供的处理流程。如图6所示,该语音数据传输装置80包括:获取模块801,转译模块802,发送模块803。FIG. 6 is a schematic structural diagram of a voice data transmission device provided in Embodiment 3 of the present application. The voice data transmission device provided in the embodiment of the present application can execute the processing flow provided in Embodiment 1 of the voice data transmission method. As shown in FIG. 6 , the voice data transmission device 80 includes: an acquisition module 801 , a translation module 802 , and a sending module 803 .

具体地,获取模块801,用于获取待传输的标签语言格式的控制数据;控制数据中包括:语音播报控制数据。Specifically, the obtaining module 801 is configured to obtain control data in a label language format to be transmitted; the control data includes: voice broadcast control data.

转译模块802,用于按预先定义的格式规范将标签语言格式的控制数据转译为json格式的控制数据。The translation module 802 is configured to translate the control data in the tag language format into the control data in the json format according to a predefined format specification.

发送模块803,用于将json格式的控制数据发送至用户终端,以使用户终端基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据,并基于标签语言的控制数据进行语音播报。The sending module 803 is configured to send the control data in the json format to the user terminal, so that the user terminal translates the control data in the json format into control data in a label language based on a predefined format specification, and performs speech based on the control data in the label language broadcast.

本申请实施例提供的装置可以具体用于执行上述实施例一所提供的方法实施例,具体功能此处不再赘述。The device provided in the embodiment of the present application may be specifically configured to execute the method embodiment provided in the first embodiment above, and the specific functions will not be repeated here.

可选地,获取模块80具体用于:获取用户终端发送的需求数据或用户操作数据;需求数据为语音数据或文本数据;对需求数据或用户操作数据进行识别以确定需求数据或用户操作数据对应的应答数据;基于应答数据生成待传输的标签语言格式的控制数据。Optionally, the acquisition module 80 is specifically configured to: acquire demand data or user operation data sent by the user terminal; the demand data is voice data or text data; identify the demand data or user operation data to determine the corresponding demand data or user operation data Response data; generate control data in label language format to be transmitted based on the response data.

可选地,标签语言格式为数字人富文本标记语言DRML格式,语音数据传输装置80还包括:预定义模块;预定义模块用于:确定DRML格式包括的各标签的属性对应的json格式规范,以预先定义DRML格式对应的json格式规范。Optionally, the label language format is Digital Human Rich Text Markup Language DRML format, and the voice data transmission device 80 also includes: a predefined module; the predefined module is used to: determine the json format specification corresponding to the attributes of each label included in the DRML format, The json format specification corresponding to the predefined DRML format.

可选地,转译模块802具体用于:确定标签语言格式的控制数据中各标签的属性对应的json格式规范;按控制数据中各标签的属性对应的json格式规范将标签语言格式的控制数据转译为json格式的控制数据。Optionally, the translation module 802 is specifically configured to: determine the json format specification corresponding to the attribute of each label in the control data in the label language format; translate the control data in the label language format according to the json format specification corresponding to the attribute of each label in the control data Control data in json format.

本申请实施例提供的装置可以具体用于执行上述方法实施例一,具体功能此处不再赘述。The device provided in the embodiment of the present application can be specifically used to execute the first method embodiment above, and the specific functions will not be repeated here.

实施例四Embodiment four

图7为本申请实施例四提供的语音数据传输装置的结构示意图。本申请实施例提供的语音数据传输装置可以执行语音数据传输方法实施例二提供的处理流程。如图7所示,该语音数据传输装置90包括:获取模块901,转译模块902,控制模块903。FIG. 7 is a schematic structural diagram of a voice data transmission device provided in Embodiment 4 of the present application. The voice data transmission device provided in the embodiment of the present application can execute the processing flow provided in the second embodiment of the voice data transmission method. As shown in FIG. 7 , the voice data transmission device 90 includes: an acquisition module 901 , a translation module 902 , and a control module 903 .

具体地,获取模块901,用于获取电子设备发送的json格式的控制数据;json格式的控制数据为电子设备按预先定义的格式规范对待传输的标签语言格式的控制数据进行转译形成的;控制数据中包括:语音播报控制数据。Specifically, the acquiring module 901 is configured to acquire the control data in the json format sent by the electronic device; the control data in the json format is formed by translating the control data in the label language format to be transmitted by the electronic device according to a predefined format specification; the control data Including: voice broadcast control data.

转译模块902,用于基于预先定义的格式规范将json格式的控制数据转译为标签语言的控制数据。The translation module 902 is configured to translate the control data in json format into control data in label language based on a predefined format specification.

控制模块903,用于基于标签语言的控制数据进行语音播报。The control module 903 is used for voice broadcasting based on the control data in the markup language.

本申请实施例提供的装置可以具体用于执行上述实施例二所提供的方法实施例,具体功能此处不再赘述。The device provided in the embodiment of the present application may be specifically used to execute the method embodiment provided in the second embodiment above, and specific functions will not be repeated here.

可选地,语音数据传输装置90还包括:响应模块;响应模块用于:响应于用户在交互界面语音输入或文本输入的需求生成需求数据,或响应于用户在交互界面的操作生成用户操作数据;需求数据为语音数据或文本数据;将需求数据或用户操作数据发送至电子设备,以使电子设备确定需求数据或用户操作数据对应的应答数据并基于应答数据生成待传输的标签语言格式的控制数据。Optionally, the voice data transmission device 90 further includes: a response module; the response module is used to: generate demand data in response to user voice input or text input in the interactive interface, or generate user operation data in response to user operations on the interactive interface ;The demand data is voice data or text data; Send the demand data or user operation data to the electronic device, so that the electronic device determines the response data corresponding to the demand data or user operation data and generates the control of the label language format to be transmitted based on the response data data.

可选地,控制数据中包括以下至少一种:显示控制数据,语音播报控制数据;控制模块903还用于:基于标签语言的控制数据进行显示和/或语音播报。Optionally, the control data includes at least one of the following: display control data, voice broadcast control data; the control module 903 is further configured to: perform display and/or voice broadcast based on the control data in label language.

可选地,语音数据传输装置90还包括:调整模块;调整模块用于:响应于用户对显示和/或语音播报的调整操作,更新json格式的控制数据中的属性;基于预先定义的格式规范将更新后的json格式的控制数据转译为更新后的标签语言的控制数据;基于更新后的标签语言的控制数据进行显示和/或语音播报。Optionally, the voice data transmission device 90 also includes: an adjustment module; the adjustment module is used to: respond to the user's adjustment operation on the display and/or voice broadcast, update the attributes in the control data in json format; Translating the updated control data in json format into updated label language control data; performing display and/or voice broadcast based on the updated label language control data.

本申请实施例提供的装置可以具体用于执行上述方法实施例二,具体功能此处不再赘述。The device provided in the embodiment of the present application can be specifically used to execute the second embodiment of the above method, and the specific functions will not be repeated here.

实施例五Embodiment five

图8为本申请实施例五提供的电子设备的结构示意图,如图8所示,本申请还提供了一种电子设备100,包括:处理器1001,以及与处理器1001通信连接的存储器1002及收发器1003。其中,存储器1002存储计算机执行指令;收发器1003用于收发数据;处理器1001执行存储器1002存储的计算机执行指令,以实现本申请任意一个实施例提供的方法。FIG. 8 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present application. As shown in FIG. 8 , the present application also provides an electronic device 100, including: a processor 1001, and a memory 1002 and a memory 1002 communicatively connected to the processor 1001; Transceiver 1003. Wherein, the memory 1002 stores computer-executable instructions; the transceiver 1003 is used to send and receive data; the processor 1001 executes the computer-executable instructions stored in the memory 1002 to implement the method provided by any embodiment of the present application.

具体地,程序可以包括程序代码,程序代码包括计算机执行指令。存储器1002可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。其中,计算机执行指令存储在存储器1002中,并被配置为由处理器1001执行以实现本申请任意一个实施例提供的方法。相关说明可以对应参见附图中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。Specifically, the program may include program code, and the program code includes computer execution instructions. The memory 1002 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. Wherein, the computer-executed instructions are stored in the memory 1002 and are configured to be executed by the processor 1001 to implement the method provided by any one embodiment of the present application. Relevant descriptions can be understood by referring to the relevant descriptions and effects corresponding to the steps in the accompanying drawings, and details are not repeated here.

其中,本申请实施例中,存储器1002和处理器1001通过总线连接。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(Peripheral Component Interconnect,简称为PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, in the embodiment of the present application, the memory 1002 and the processor 1001 are connected through a bus. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. . The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 8 , but it does not mean that there is only one bus or one type of bus.

图9为本申请实施例五提供的用户终端的结构示意图,如图9所示,本申请还提供了一种用户终端110,包括:处理器1101,以及与处理器1101通信连接的存储器1102及收发器1103。其中,存储器1102存储计算机执行指令;收发器1103用于收发数据;处理器1101执行存储器1102存储的计算机执行指令,以实现本申请任意一个实施例提供的方法。FIG. 9 is a schematic structural diagram of a user terminal provided in Embodiment 5 of the present application. As shown in FIG. 9 , the present application also provides a user terminal 110, including: a processor 1101, and a memory 1102 and a memory 1102 communicatively connected to the processor 1101; Transceiver 1103. Wherein, the memory 1102 stores computer-executable instructions; the transceiver 1103 is used to send and receive data; the processor 1101 executes the computer-executable instructions stored in the memory 1102 to implement the method provided by any embodiment of the present application.

具体地,程序可以包括程序代码,程序代码包括计算机执行指令。存储器1102可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。其中,计算机执行指令存储在存储器1102中,并被配置为由处理器1101执行以实现本申请任意一个实施例提供的方法。相关说明可以对应参见附图中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。Specifically, the program may include program code, and the program code includes computer execution instructions. The memory 1102 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. Wherein, the computer-executed instructions are stored in the memory 1102 and configured to be executed by the processor 1101 to implement the method provided by any one embodiment of the present application. Relevant descriptions can be understood by referring to the relevant descriptions and effects corresponding to the steps in the accompanying drawings, and details are not repeated here.

其中,本申请实施例中,存储器1102和处理器1101通过总线连接。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(Peripheral Component Interconnect,简称为PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, in the embodiment of the present application, the memory 1102 and the processor 1101 are connected through a bus. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. . The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.

本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,计算机执行指令被处理器执行时用于实现本申请任意一个实施例提供的方法。The embodiment of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are used to implement the method provided in any embodiment of the present application when executed by a processor.

本申请实施例还提供一种计算机程序产品,包括计算机执行指令,计算机执行指令被处理器执行时实现本申请任意一个实施例提供的方法。An embodiment of the present application further provides a computer program product, including computer-executable instructions, and when the computer-executable instructions are executed by a processor, the method provided in any embodiment of the present application is implemented.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.

作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。A module described as a separate component may or may not be physically separated, and a component shown as a module may or may not be a physical module, that is, it may be located in one place, or may also be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software function modules.

用于实施本申请的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程全路径轨迹融合装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes can be provided to a processor or a controller of a general-purpose computer, a special-purpose computer, or other programmable full-path trajectory fusion devices, so that the program codes when executed by the processor or the controller make the flow chart and/or the specified in the block diagram The function/operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本申请的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本申请的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。In addition, while operations are depicted in a particular order, this should be understood to require that such operations be performed in the particular order shown, or in sequential order, or that all illustrated operations should be performed to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the application. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求书指出。Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the application, these modifications, uses or adaptations follow the general principles of the application and include common knowledge or conventional technical means in the technical field not disclosed in the application . The specification and examples are to be considered exemplary only, with a true scope and spirit of the application indicated by the following claims.

应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。It should be understood that the present application is not limited to the precise constructions which have been described above and shown in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (14)

CN202310754031.5A2023-06-252023-06-25 Voice data transmission method, device, equipment, storage medium and productPendingCN116614557A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310754031.5ACN116614557A (en)2023-06-252023-06-25 Voice data transmission method, device, equipment, storage medium and product

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310754031.5ACN116614557A (en)2023-06-252023-06-25 Voice data transmission method, device, equipment, storage medium and product

Publications (1)

Publication NumberPublication Date
CN116614557Atrue CN116614557A (en)2023-08-18

Family

ID=87680175

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310754031.5APendingCN116614557A (en)2023-06-252023-06-25 Voice data transmission method, device, equipment, storage medium and product

Country Status (1)

CountryLink
CN (1)CN116614557A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9367570B1 (en)*2012-04-092016-06-14Google Inc.Ad hoc queryable JSON with audit trails
CN109976702A (en)*2019-03-202019-07-05青岛海信电器股份有限公司A kind of audio recognition method, device and terminal
WO2020253389A1 (en)*2019-06-192020-12-24深圳壹账通智能科技有限公司Page translation method and apparatus, medium, and electronic device
CN113066473A (en)*2021-03-312021-07-02建信金融科技有限责任公司Voice synthesis method and device, storage medium and electronic equipment
US20220261261A1 (en)*2021-02-162022-08-18Cisco Technology, Inc.Generating application programming interface based on object models from network devices
CN115550696A (en)*2022-09-202022-12-30中国建设银行股份有限公司 Multimedia data transmission method, device, equipment, storage medium and program product

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9367570B1 (en)*2012-04-092016-06-14Google Inc.Ad hoc queryable JSON with audit trails
CN109976702A (en)*2019-03-202019-07-05青岛海信电器股份有限公司A kind of audio recognition method, device and terminal
WO2020253389A1 (en)*2019-06-192020-12-24深圳壹账通智能科技有限公司Page translation method and apparatus, medium, and electronic device
US20220261261A1 (en)*2021-02-162022-08-18Cisco Technology, Inc.Generating application programming interface based on object models from network devices
CN113066473A (en)*2021-03-312021-07-02建信金融科技有限责任公司Voice synthesis method and device, storage medium and electronic equipment
CN115550696A (en)*2022-09-202022-12-30中国建设银行股份有限公司 Multimedia data transmission method, device, equipment, storage medium and program product

Similar Documents

PublicationPublication DateTitle
US11049493B2 (en)Spoken dialog device, spoken dialog method, and recording medium
US9530415B2 (en)System and method of providing speech processing in user interface
JP6928642B2 (en) Audio broadcasting method and equipment
CN108847214B (en)Voice processing method, client, device, terminal, server and storage medium
JP2025524735A (en) Method for generating digital humans, model training method, device, equipment, and medium
US10824664B2 (en)Method and apparatus for providing text push information responsive to a voice query request
CN109448709A (en)A kind of terminal throws the control method and terminal of screen
JPWO2014147674A1 (en) Advertisement translation device, advertisement display device, and advertisement translation method
CN110738996B (en)Method for controlling printer printing through voice and printing terminal
CN115942039B (en)Video generation method, device, electronic equipment and storage medium
CN109543021B (en)Intelligent robot-oriented story data processing method and system
CN114064943A (en)Conference management method, conference management device, storage medium and electronic equipment
CN110245334B (en) Method and device for outputting information
CN112562733A (en)Media data processing method and device, storage medium and computer equipment
CN110379406A (en)Voice remark conversion method, system, medium and electronic equipment
CN113096635A (en)Audio and text synchronization method, device, equipment and medium
CN116614557A (en) Voice data transmission method, device, equipment, storage medium and product
CN107102748A (en) Methods and input methods for entering words
US20240347045A1 (en)Information processing device, information processing method, and program
US20210109960A1 (en)Electronic apparatus and controlling method thereof
CN115460166B (en)Instant voice communication method, device, electronic equipment and storage medium
CN113066498B (en)Information processing method, apparatus and medium
JP2002288170A (en)Support system for communications in multiple languages
JP2005266009A (en) Data conversion program and data conversion apparatus
WO2022161132A1 (en)Voice broadcasting method and apparatus

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp