CN110880330A

Movatterモバイル変換

Info

Publication number: CN110880330A
Application number: CN201911033600.7A
Authority: CN
Inventors: 刘秋菊
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-03-13

Abstract

Translated fromChinese

本发明实施例提供一种音频转换方法及终端设备，应用于通信技术领域，以解决相关技术中由于用户操作不及时而导致遗漏重要通话内容的问题。该方法包括：获取通话过程中第一通话语音的目标语音情感特征；在目标语音情感特征与预定语音情感特征匹配的情况下，保存通话过程中的第二通话语音，并对第二通话语音进行语义分析，得到第二通话语音的音频文本；其中，第二通话语音为：目标通话时间之后的通话音频，目标通话时间为第一通话语音之前的预定时间。

Embodiments of the present invention provide an audio conversion method and a terminal device, which are applied in the field of communication technologies to solve the problem of missing important call content due to untimely user operations in the related art. The method includes: acquiring a target voice emotional feature of a first call voice during a call; in the case that the target voice emotional feature matches a predetermined voice emotional feature, saving the second call voice in the call process, and performing an analysis on the second call voice. Semantic analysis is performed to obtain the audio text of the second call voice; wherein, the second call voice is: call audio after the target call time, and the target call time is a predetermined time before the first call voice.

Description

Translated fromChinese

音频转换方法及终端设备Audio conversion method and terminal device

技术领域technical field

本发明实施例涉及通信技术领域，尤其涉及一种音频转换方法及终端设备。Embodiments of the present invention relate to the field of communication technologies, and in particular, to an audio conversion method and a terminal device.

背景技术Background technique

随着终端设备技术的发展，用户使用终端设备的频率越来越高，当用户使用终端设备进行通话时，需要对重要信息进行实时记录。With the development of terminal equipment technology, users use terminal equipment more and more frequently. When users use terminal equipment to make a call, important information needs to be recorded in real time.

在相关技术中，当用户在通话过程中想要记录通话过程中的重要信息，则需要用户在通话过程中手动开启录音功能，从而通过保存通话录音，使得用户可以在通话结束后，通过反复播放通话录音的录音内容，来获取通话录音中的重要信息。In the related art, when the user wants to record important information during the call, the user needs to manually turn on the recording function during the call, so that the call recording is saved, so that the user can play the recording repeatedly after the call is over. The recording content of the call recording to obtain important information in the call recording.

然而，当用户在通话过程中手动开启录音功能时，很可能由于用户操作不及时，而导致未能及时对重要通话内容进行录音，从而遗漏重要通话内容。However, when the user manually turns on the recording function during the call, it is likely that the important call content cannot be recorded in time due to the untimely operation of the user, thereby omitting the important call content.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种音频转换方法及终端设备，以解决相关技术中由于用户操作不及时而导致遗漏重要通话内容的问题。Embodiments of the present invention provide an audio conversion method and a terminal device, so as to solve the problem in the related art that important call content is missed due to untimely user operations.

为了解决上述技术问题，本申请是这样实现的：In order to solve the above technical problems, this application is implemented as follows:

第一方面，本发明实施例提供一种音频转换方法，该方法包括：获取通话过程中第一通话语音的目标语音情感特征；在目标语音情感特征与预定语音情感特征匹配的情况下，保存该通话过程中的第二通话语音，并对第二通话语音进行语义分析，得到第二通话语音的音频文本；其中，上述第二通话语音为：目标通话时间之后的通话音频，该目标通话时间为第一通话语音之前的预定时间。In a first aspect, an embodiment of the present invention provides an audio conversion method, the method includes: acquiring a target voice emotional feature of a first call during a call; and saving the target voice emotional feature when the target voice emotional feature matches a predetermined voice emotional feature The second call voice during the call, and semantic analysis is performed on the second call voice to obtain the audio text of the second call voice; wherein, the above-mentioned second call voice is: the call audio after the target call time, and the target call time is The predetermined time before the voice of the first call.

第二方面，本发明实施例还提供了一种终端设备，该终端设备包括：获取模块，用于获取通话过程中第一通话语音的目标语音情感特征；存储模块，用于在上述获取模块获取的上述目标语音情感特征与预定语音情感特征匹配的情况下，保存通话过程中的第二通话语音；分析模块，用于对上述存储模块存储的上述第二通话语音进行语义分析，得到上述第二通话语音的音频文本；其中，上述第二通话语音为：目标通话时间之后的通话音频，目标通话时间为上述第一通话语音之前的预定时间。In a second aspect, an embodiment of the present invention further provides a terminal device, the terminal device includes: an acquisition module, used for acquiring target voice emotional characteristics of the first call voice during the call; a storage module, used for acquiring in the above acquisition module Under the situation that the above-mentioned target voice emotional feature matches the predetermined voice emotional feature, the second call voice in the call process is saved; the analysis module is used to perform semantic analysis on the above-mentioned second call voice stored in the above-mentioned storage module, and obtain the above-mentioned second call voice. The audio text of the call voice; wherein, the second call voice is: call audio after the target call time, and the target call time is a predetermined time before the first call voice.

第三方面，本发明实施例提供了一种终端设备，包括处理器、存储器及存储在该存储器上并可在该处理器上运行的计算机程序，该计算机程序被该处理器执行时实现如第一方面的音频转换方法的步骤。In a third aspect, an embodiment of the present invention provides a terminal device, including a processor, a memory, and a computer program stored in the memory and running on the processor, the computer program being executed by the processor to achieve the The steps of an audio conversion method in one aspect.

第四方面，本发明实施例提供了一种计算机可读存储介质，该计算机可读存储介质上存储计算机程序，该计算机程序被处理器执行时实现如第一方面的音频转换的方法的步骤。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the audio conversion method according to the first aspect.

在本发明实施例中，由于语音的语音情感特征能够表征发出该语音的用户的情感变化，且用户情感发生变化的时刻之前的一定时间内通常会表述重要事件，因此，本发明实施例中终端设备通过检测用户通话过程中第一通话语音的目标语音情感特征是否与预定语音情感特征匹配，来确定第二通话语音是否为该通话过程中的重要通话语音，上述第二通话语音为在第一通话语音之前的预定时间之后的通话音频。因此，当终端设备检测到用户情感发生变化时，无需人工手动操作，便可自动保存第二通话语音，同时对第二通话语音进行语义分析，最终得到第二通话语音的音频文本，使得用户可以通过该音频文本或该第二通话语音直接了解该第二通话语音中陈述的重要通话内容，从而避免用户遗漏重要通话内容。In the embodiment of the present invention, since the emotional feature of the speech can represent the emotional change of the user who made the speech, and an important event is usually expressed within a certain period of time before the moment when the user's emotion changes, the terminal in the embodiment of the present invention The device determines whether the second call voice is an important call voice in the call process by detecting whether the target voice emotional feature of the first call voice during the user's call matches the predetermined voice emotional feature, and the second call voice is the first call voice in the first call. Call audio after a predetermined time before the call voice. Therefore, when the terminal device detects that the user's emotion has changed, the second call voice can be automatically saved without manual operation, and the second call voice can be semantically analyzed to finally obtain the audio text of the second call voice, so that the user can The important call content stated in the second call voice can be directly learned through the audio text or the second call voice, so as to prevent the user from missing the important call content.

附图说明Description of drawings

图1为本发明实施例提供的一种可能的安卓操作系统的架构示意图；1 is a schematic diagram of the architecture of a possible Android operating system provided by an embodiment of the present invention;

图2为本发明实施例提供的一种音频转换的方法流程示意图之一；2 is one of the schematic flowcharts of a method for audio conversion provided by an embodiment of the present invention;

图3为本发明实施例提供的一种音频转换的方法流程示意图之二；3 is the second schematic flowchart of a method for audio conversion provided by an embodiment of the present invention;

图4为本发明实施例提供的一种终端设备的结构示意图；FIG. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention;

图5为本发明实施例提供的终端的硬件示意图。FIG. 5 is a schematic diagram of hardware of a terminal according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

需要说明的是，本文中的“/”表示或的意思，例如，A/B可以表示A或B；本文中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。It should be noted that “/” in this document means or, for example, A/B can mean A or B; “and/or” in this document is only an association relationship that describes an associated object, indicating that there may be three A relationship, for example, A and/or B, can mean the existence of A alone, the existence of both A and B, and the existence of B alone.

需要说明的是，本文中的“多个”是指两个或多于两个。It should be noted that the "plurality" herein refers to two or more than two.

需要说明的是，本发明实施例中，“示例性的”或者“例如”等词用于表示作例子、例证或说明。本发明实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言，使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to represent examples, illustrations, or descriptions. Any embodiments or designs described as "exemplary" or "such as" in the embodiments of the present invention should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

需要说明的是，为了便于清楚描述本发明实施例的技术方案，在本发明实施例中，采用了“第一”、“第二”等字样对功能或作用基本相同的相同项或相似项进行区分，本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。例如，第一通话语音和第二通话语音是用于区别不同的通话语音，而不是用于描述通话语音的特定顺序。It should be noted that, in order to facilitate the clear description of the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, words such as "first" and "second" are used to describe the same or similar items that have basically the same function or effect. For distinction, those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and execution order. For example, the first call voice and the second call voice are used to distinguish different call voices, but are not used to describe a specific order of the call voices.

本发明实施例提供的音频转换方法的执行主体可以为上述的终端设备(包括移动终端设备和非移动终端设备)，也可以为该终端设备中能够实现该音频转换方法的功能模块和/或功能实体，具体的可以根据实际使用需求确定，本发明实施例不作限定。下面以终端设备为例，对本发明实施例提供的音频转换方法进行示例性的说明。The executive body of the audio conversion method provided by the embodiment of the present invention may be the above-mentioned terminal equipment (including mobile terminal equipment and non-mobile terminal equipment), or may be functional modules and/or functions in the terminal equipment capable of implementing the audio conversion method The entity can be specifically determined according to actual usage requirements, which is not limited in this embodiment of the present invention. The audio conversion method provided by the embodiment of the present invention is exemplarily described below by taking a terminal device as an example.

本发明实施例中的终端设备可以为移动终端设备，也可以为非移动终端设备。移动终端设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载终端设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本或者个人数字助理(personal digital assistant，PDA)等；非移动终端设备可以为个人计算机(personalcomputer，PC)、电视机(television，TV)、柜员机或者自助机等；本发明实施例不作具体限定。The terminal device in this embodiment of the present invention may be a mobile terminal device or a non-mobile terminal device. The mobile terminal device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle terminal device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (PDA) etc.; the non-mobile terminal device may be a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc. The embodiment of the present invention does not make any specific limitation.

本发明实施例中的终端设备可以为具有操作系统的终端设备。该操作系统可以为安卓(Android)操作系统，可以为ios操作系统，还可以为其他可能的操作系统，本发明实施例不作具体限定。The terminal device in the embodiment of the present invention may be a terminal device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present invention.

下面以安卓操作系统为例，介绍一下本发明实施例提供的音频转换方法所应用的软件环境。The following takes the Android operating system as an example to introduce the software environment to which the audio conversion method provided by the embodiment of the present invention is applied.

如图1所示，为本发明实施例提供的一种可能的安卓操作系统的架构示意图。在图1中，安卓操作系统的架构包括4层，分别为：应用程序层、应用程序框架层、系统运行库层和内核层(具体可以为Linux内核层)。As shown in FIG. 1 , it is a schematic structural diagram of a possible Android operating system provided by an embodiment of the present invention. In FIG. 1, the architecture of the Android operating system includes four layers, namely: an application layer, an application framework layer, a system runtime layer, and a kernel layer (specifically, a Linux kernel layer).

其中，应用程序层包括安卓操作系统中的各个应用程序(包括系统应用程序和第三方应用程序)。The application layer includes various applications (including system applications and third-party applications) in the Android operating system.

应用程序框架层是应用程序的框架，开发人员可以在遵守应用程序的框架的开发原则的情况下，基于应用程序框架层开发一些应用程序。The application framework layer is the framework of the application, and developers can develop some applications based on the application framework layer under the condition of complying with the development principles of the framework of the application.

系统运行库层包括库(也称为系统库)和安卓操作系统运行环境。库主要为安卓操作系统提供其所需的各类资源。安卓操作系统运行环境用于为安卓操作系统提供软件环境。The system runtime layer includes libraries (also called system libraries) and the Android operating system runtime environment. The library mainly provides various resources required by the Android operating system. The Android operating system operating environment is used to provide a software environment for the Android operating system.

内核层是安卓操作系统的操作系统层，属于安卓操作系统软件层次的最底层。内核层基于Linux内核为安卓操作系统提供核心系统服务和与硬件相关的驱动程序。The kernel layer is the operating system layer of the Android operating system and belongs to the bottom layer of the Android operating system software layer. The kernel layer provides core system services and hardware-related drivers for the Android operating system based on the Linux kernel.

以安卓操作系统为例，本发明实施例中，开发人员可以基于上述如图1所示的安卓操作系统的系统架构，开发实现本发明实施例提供的音频转换方法的软件程序，从而使得该音频转换方法可以基于如图1所示的安卓操作系统运行。即处理器或者终端设备可以通过在安卓操作系统中运行该软件程序实现本发明实施例提供的音频转换方法。Taking the Android operating system as an example, in the embodiment of the present invention, a developer can develop a software program for implementing the audio conversion method provided by the embodiment of the present invention based on the system architecture of the Android operating system as shown in FIG. The conversion method may operate based on the Android operating system as shown in FIG. 1 . That is, the processor or the terminal device can implement the audio conversion method provided by the embodiment of the present invention by running the software program in the Android operating system.

下面结合图2所示的音频转换方法流程图对本发明实施例的音频转换方法进行说明，图2为本发明实施例提供的一种音频转换方法流程示意图，包括步骤201和步骤202：The audio conversion method according to the embodiment of the present invention will be described below in conjunction with the flow chart of the audio conversion method shown in FIG. 2 . FIG. 2 is a schematic flowchart of an audio conversion method provided by an embodiment of the present invention, includingstep 201 and step 202:

步骤201：终端设备获取通话过程中第一通话语音的目标语音情感特征。Step 201: The terminal device acquires the target voice emotion feature of the first call voice during the call.

在本发明实施例中，上述的第一通话语音可以为该通话过程中的全部通话语音，也可以为该通话过程中的某个时段的通话语音，本发明实施例对此不作限定。In this embodiment of the present invention, the above-mentioned first call voice may be all call voices in the call process, or may be call voices in a certain period of time during the call process, which is not limited in this embodiment of the present invention.

在本发明实施例中，在语音通话过程中，终端设备可以实时监听用户的通话语音，也可以按照第一预定时间间隔监听用户的通话语音，本发明实施例对此不作限定。在终端设备监听用户的通话语音时，该终端设备可以实时检测当前通话语音的语音情感特征，也可以按照第二预定时间间隔检测当前通话语音的语音情感特征，本发明实施例对此不作限定。In this embodiment of the present invention, during a voice call, the terminal device may monitor the user's call voice in real time, or may monitor the user's call voice at a first predetermined time interval, which is not limited in this embodiment of the present invention. When the terminal device monitors the user's call voice, the terminal device can detect the voice emotion feature of the current call voice in real time, or can detect the voice emotion feature of the current call voice at a second predetermined time interval, which is not limited in this embodiment of the present invention.

步骤202：在目标语音情感特征与预定语音情感特征匹配的情况下，终端设备保存通话过程中的第二通话语音，并对第二通话语音进行语义分析，得到第二通话语音的音频文本。Step 202: In the case that the target voice emotional feature matches the predetermined voice emotional feature, the terminal device saves the second call voice during the call, and performs semantic analysis on the second call voice to obtain the audio text of the second call voice.

在本发明实施例中，上述的第二通话语音为：目标通话时间之后的通话音频，上述目标通话时间为第一通话语音之前的预定时间。In the embodiment of the present invention, the above-mentioned second call voice is: call audio after the target call time, and the above-mentioned target call time is a predetermined time before the first call voice.

可选的，在本发明实施例中，上述目标语音情感特征用于表征发出上述第一通话语音的通话用户的用户情绪。Optionally, in this embodiment of the present invention, the emotion feature of the target voice is used to represent the user emotion of the call user who sends the first call voice.

示例性的，上述的目标语音情感特征包括以下至少一项：第一通话语音中用于表征语音音调的音频特征，用于表征语音语速的音频特征，用于表征语音节奏的音频特征，用于表征语音音量的音频特征。一般的，语音的音调、音量、节奏、语速等语音特征能够反映出发出该语音的用户的个人情感/情绪。Exemplarily, the above-mentioned emotional features of the target speech include at least one of the following: an audio feature used to characterize the tone of the voice in the first call, an audio feature used to characterize the speed of speech, an audio feature used to characterize the rhythm of the speech, and Audio features used to characterize speech volume. Generally, speech features such as pitch, volume, rhythm, and speaking speed of speech can reflect the personal emotion/emotion of the user who makes the speech.

示例性的，用户在生气愤怒时说话声音变大，音调变高，语速不变或偏快；用户在担忧时说话语速变慢，音调降低，声音会变小；用户在高兴时说话声音会抑扬顿挫，语速轻快，声音大小保持平稳；用户在情感/情绪偏中性时，说话声音平稳，语速不变；音调保持不变；用户在紧张激动时，语速会较快，声音较小，节奏混乱。因此，通常情况下，当用户以中性情感/情绪或高兴的情感/情绪开始对话时，终端设备可以通过对该用户的语音语调进行检测，来判断该用户情感变化走向，进而使得该终端设备可以以检测到的用户情绪变化的时刻为依据记录语音中关键事件。Exemplarily, when the user is angry and angry, the voice becomes louder, the pitch becomes higher, and the speed of speech remains unchanged or fast; when the user is worried, the voice becomes slower, the pitch is lowered, and the voice becomes smaller; the user speaks when he is happy It can be cadenced, the speed of speech is brisk, and the volume of the voice remains stable; when the user is in a neutral emotion/emotion, the voice is stable and the speed of speech is unchanged; the pitch remains unchanged; when the user is nervous and excited, the speed of speech will be faster and the voice Small, chaotic rhythm. Therefore, under normal circumstances, when a user starts a conversation with a neutral emotion/emotion or a happy emotion/emotion, the terminal device can detect the user's voice intonation to determine the direction of the user's emotional change, and then make the terminal device Key events in the speech can be recorded based on the detected moment of the user's emotional change.

可选的，在本发明实施例中，上述预定语音情感特征用于表征用户特定情绪的语音情感特征。示例性的，上述的预定语音情感特征可以为终端设备或预定数据库中预置的至少一个语音情感特征，一个预定语音情感特征对应一种用户情感。Optionally, in this embodiment of the present invention, the above-mentioned predetermined speech emotion feature is used to represent the speech emotion feature of the specific emotion of the user. Exemplarily, the above-mentioned predetermined voice emotion feature may be at least one voice emotion feature preset in the terminal device or a predetermined database, and one predetermined voice emotion feature corresponds to one user emotion.

可选的，在本发明实施例中，上述目标语音情感特征与预定语音情感特征匹配是指：该目标语音情感特征与预定语音情感特征之间的相似度大于或等于预定阈值(例如80％)。Optionally, in this embodiment of the present invention, the matching between the emotional feature of the target voice and the emotional feature of the predetermined voice means that the similarity between the emotional feature of the target voice and the emotional feature of the predetermined voice is greater than or equal to a predetermined threshold (for example, 80%) .

示例性的，终端设备在获取到第一通话语音的目标语音情感特征后，可以以该目标语音情感特征为索引，在语音情感特征库(例如，FAU AIBO儿童情感数据库)中，查找是否存在与该目标语音情感特征相匹配的预定语音情感特征。其中，上述的语音情感特征库中包括至少一个预定语音情感特征。Exemplarily, after acquiring the emotional feature of the target voice of the first call, the terminal device can use the emotional feature of the target voice as an index to search for a voice emotional feature database (for example, the FAU AIBO children's emotional database) to find out whether there is an emotional feature of the target voice. The target speech emotion feature matches the predetermined speech emotion feature. Wherein, the above-mentioned speech emotion feature library includes at least one predetermined speech emotion feature.

可选的，在本发明实施例中，上述第一通话语音包括至少一个第一语音情感特征，上述第一通话语音包括至少一个通话时刻，上述至少一个第一语音情感特征中的每个第一语音情感特征对应一个通话时刻，上述至少一个第一语音情感特征包括上述目标语音情感特征；上述目标通话时间为上述目标语音情感特征对应目标通话时刻之前的预定时间。Optionally, in this embodiment of the present invention, the first call voice includes at least one first voice emotion feature, the first call voice includes at least one call time, and each first voice emotion feature in the at least one first voice emotion feature. The voice emotion feature corresponds to a call moment, and the at least one first voice emotion feature includes the target voice emotion feature; the target talk time is a predetermined time before the target voice emotion feature corresponds to the target call time.

举例说明，假设第一通话语音包括3个通话时刻(如，T1、T2、T3)，分别提取每个通话时刻对应的语音情感特征，如，T1时刻对应的语音情感特征为特征1，T2时刻对应的语音情感特征为特征2，T3时刻对应的语音情感特征为特征3。终端设备在获取到特征1、特征2、特征3后，将该特征1、特征2、特征3与预定语音情感特征进行匹配，若特征2与该预定语音情感特征匹配，则将特征2对应的时刻T2作为上述目标通话时刻。即终端设备采集时刻T2之前的预定时间之后的通话语音。For example, suppose that the first call voice includes 3 call moments (eg, T1, T2, T3), and extract the voice emotional feature corresponding to each call moment, for example, the voice emotional feature corresponding to T1 moment is Feature 1, T2 moment The corresponding speech emotion feature isfeature 2, and the corresponding speech emotion feature at time T3 is feature 3. After acquiring feature 1,feature 2, and feature 3, the terminal device matches the feature 1,feature 2, and feature 3 with the predetermined voice emotion feature. Time T2 is the above-mentioned target call time. That is, the terminal device collects the call voice after a predetermined time before time T2.

可选的，在本发明实施例中，上述的第二通话语音的音频文本包括以下至少一项：目标事件的事件信息，对端通话用户的身份信息。Optionally, in the embodiment of the present invention, the above-mentioned audio text of the second call voice includes at least one of the following: event information of the target event, and identity information of the calling user at the opposite end.

进一步可选的，在本发明实施例中，上述的目标事件的事件信息包括以下至少一项：目标事件的关键字，目标事件的事件发生时间，目标事件中对端通话用户的陈述内容(例如，对端通话用户对目标事件的意见、想法、建议、答复)。Further optionally, in this embodiment of the present invention, the event information of the above-mentioned target event includes at least one of the following: a keyword of the target event, the event occurrence time of the target event, and the statement content of the opposite-end calling user in the target event (for example, , the opinions, ideas, suggestions, and replies of the end-to-end call users to the target event).

可选的，在本发明实施例中，当终端设备获取到第二通话语音的音频文本后，终端设备以该音频文本中的关键字为索引进行相关搜索，并在第一界面中展示搜索结果。例如，假设某一通话语音的音频文本中记录对端通话用户家中停电，由于需要快速恢复家中供电，终端设备可以将通讯录中存储的包括“电工”的联系人信息显示在第一界面中，供用户参考。Optionally, in this embodiment of the present invention, after the terminal device obtains the audio text of the second call voice, the terminal device uses the keywords in the audio text as an index to perform a related search, and displays the search results on the first interface. . For example, assuming that the audio text of a call voice records the power outage at the home of the caller at the opposite end, since the home power supply needs to be quickly restored, the terminal device can display the contact information including "electrician" stored in the address book on the first interface, For user reference.

本发明实施例提供的音频转换方法，由于语音的语音情感特征能够表征发出该语音的用户的情感变化，且用户情感发生变化的时刻之前的一定时间内通常会表述重要事件，因此，本发明实施例中终端设备通过检测用户通话过程中第一通话语音的目标语音情感特征是否与预定语音情感特征匹配，来确定第二通话语音是否为该通话过程中的重要通话语音，上述第二通话语音为在第一通话语音之前的预定时间之后的通话音频。因此，当终端设备检测到用户情感发生变化时，无需人工手动操作，便可自动保存第二通话语音，同时对第二通话语音进行语义分析，最终得到第二通话语音的音频文本，使得用户可以通过该音频文本或该第二通话语音直接了解该第二通话语音中陈述的重要通话内容，从而避免用户遗漏重要通话内容。In the audio conversion method provided by the embodiment of the present invention, since the emotional feature of speech can represent the emotional change of the user who uttered the speech, and important events are usually expressed within a certain period of time before the moment when the user's emotion changes, the present invention implements In the example, the terminal device determines whether the second call voice is an important call voice in the call process by detecting whether the target voice emotional feature of the first call voice during the user's call matches the predetermined voice emotional feature, and the above-mentioned second call voice is: Call audio after a predetermined time before the first call voice. Therefore, when the terminal device detects that the user's emotion has changed, the second call voice can be automatically saved without manual operation, and the second call voice can be semantically analyzed to finally obtain the audio text of the second call voice, so that the user can The important call content stated in the second call voice can be directly learned through the audio text or the second call voice, so as to prevent the user from missing the important call content.

可选的，在本发明实施例中，在上述的第二通话语音的音频文本包括目标事件的事件信息的情况下，如图3所示，在上述步骤202之后，该音频转换方法还包括如下步骤：Optionally, in this embodiment of the present invention, in the case that the audio text of the above-mentioned second call voice includes event information of the target event, as shown in FIG. 3 , after theabove step 202, the audio conversion method further includes the following: step:

步骤A1：终端设备根据上述事件信息，生成上述目标事件的事件描述文本。Step A1: The terminal device generates the event description text of the above target event according to the above event information.

步骤A2：终端设备在第一界面上显示目标事件的事件描述文本。Step A2: The terminal device displays the event description text of the target event on the first interface.

示例性的，终端设备可以在得到第二通话语音的音频文本之后，直接基于该第二通话语音的音频文本为该目标事件生成相应的事件描述文本，并显示在第一界面上。Exemplarily, after obtaining the audio text of the second call voice, the terminal device may directly generate corresponding event description text for the target event based on the audio text of the second call voice, and display it on the first interface.

示例性的，终端设备在得到第二通话语音的音频文本后，当终端设备接收到用户的第一输入，则响应于该第一输入，基于该第二通话语音的音频文本为该目标事件生成相应的事件描述文本，并在第一界面上显示该目标事件的事件描述文本。Exemplarily, after the terminal device obtains the audio text of the second call voice, when the terminal device receives the user's first input, in response to the first input, the audio text based on the second call voice is generated for the target event. corresponding event description text, and the event description text of the target event is displayed on the first interface.

在一种示例中，上述的第一输入可以包括：用户对特定界面的输入，具体可以按照实际需求设定，本发明实施例对此不作限制。例如，上述特定界面为该音频文本的文本界面，上述用户对特定界面的输入可以包括：用户对该音频文本的文本界面中第一控件的输入。其中，上述的第一控件用于触发终端设备在第一界面上显示目标事件的事件描述文本。In an example, the above-mentioned first input may include: a user's input on a specific interface, which may be specifically set according to actual requirements, which is not limited in this embodiment of the present invention. For example, the above-mentioned specific interface is a text interface of the audio-text, and the above-mentioned user input to the specific interface may include: a user's input of a first control in the audio-text text interface. The above-mentioned first control is used to trigger the terminal device to display the event description text of the target event on the first interface.

进一步可选的，在本发明实施例中，在上述的第二通话语音的音频文本包括目标事件的事件信息的情况下，如图3所示，在上述步骤A1可以包括如下步骤：Further optionally, in this embodiment of the present invention, when the audio text of the above-mentioned second call voice includes event information of the target event, as shown in FIG. 3 , the above-mentioned step A1 may include the following steps:

步骤B1：终端设备获取上述第二通话语音中与所述终端设备用户进行通话的对端通话用户的身份信息。Step B1: The terminal device acquires the identity information of the opposite-end calling user who is talking with the terminal device user in the above-mentioned second calling voice.

步骤B2：终端设备根据上述目标事件的事件信息和上述对端通话用户的身份信息，生成上述目标事件的事件描述文本。Step B2: The terminal device generates the event description text of the target event according to the event information of the target event and the identity information of the user on the opposite end of the call.

示例性的，终端设备可以以对端通话用户的联系人电话和/或联系人姓名为索引，通过终端设备保存的通讯录或者通讯软件中所保存的联系人信息，自动获取上述对端通话用户的身份信息。其中，联系人信息至少包括如下一项：联系人姓名，联系人备注、联系人职位，联系人职称，联系人与用户的关系等。Exemplarily, the terminal device may use the contact phone number and/or the contact name of the opposite end calling user as an index, and automatically obtain the above-mentioned opposite end calling user through the address book saved by the terminal device or the contact information saved in the communication software. identity information. Wherein, the contact information includes at least one of the following items: contact name, contact note, contact position, contact title, relationship between the contact and the user, and the like.

示例性的，当终端设备无法通过终端设备保存的通讯录或者通讯软件中所保存的联系人信息中获取上述对端通话用户的身份信息时，则终端设备从该第二通话语音的音频文本中获取对端通话用户的身份信息。Exemplarily, when the terminal device cannot obtain the identity information of the above-mentioned peer calling user through the address book saved by the terminal device or the contact information stored in the communication software, the terminal device will obtain the identity information from the audio text of the second call voice. Get the identity information of the call user at the opposite end.

示例性的，上述目标事件的事件描述文本包括以下至少一项：目标事件的关键字，目标事件的事件发生时间，目标事件中对端通话用户的陈述内容(例如，对端通话用户对目标事件的意见、想法、建议、答复)。Exemplarily, the event description text of the above-mentioned target event includes at least one of the following items: the keyword of the target event, the event occurrence time of the target event, the statement content of the opposite-end calling user in the target event (for example, the comments, ideas, suggestions, responses).

这样，终端设备可以基于获取到的对端通话用户的身份信息以及事件信息形成目标事件的事件描述文本，使得用户可以直接得到第二通话语音中所陈述内容的缩略版本，便于用户快速了解第二通话语音中的关键信息，大大节约用户的时间，提高用户的工作和生活效率。In this way, the terminal device can form the event description text of the target event based on the acquired identity information and event information of the user on the opposite end of the call, so that the user can directly obtain the abbreviated version of the content stated in the voice of the second call, which is convenient for the user to quickly understand the first The key information in the voice of the second call greatly saves the user's time and improves the user's work and life efficiency.

可选的，在本发明实施例中，当终端设备获取到M(M为大于1的正整数)个不同对端通话用户的通话语音的音频文本后，终端设备可以为上述M个对端通话用户进行身份紧急重要度排序，也可以确定M个对端通话用户对应的X个事件描述文本中每个事件的重要程度。Optionally, in this embodiment of the present invention, after the terminal device obtains the audio text of the call voices of M (M is a positive integer greater than 1) different peer-to-peer call users, the terminal device may make calls for the above-mentioned M peer-to-peer calls. The user can sort the emergency importance of the identity, and can also determine the importance of each event in the X event description texts corresponding to the M peer call users.

示例性的，终端设备可以按照M个对端通话用户的身份紧急重要度排序，确定该M个对端通话用户对应的X个事件描述文本中每个事件描述文本的优先级。Exemplarily, the terminal device may determine the priority of each event description text in the X event description texts corresponding to the M peer call users according to the urgent importance of identities of the M peer call users.

示例性的，终端设备在为上述M个对端通话用户进行身份紧急重要度排序时，可以按照对端通话用户的身份信息对上述M个对端通话用户进行身份紧急重要度排序。其中，上述的对端通话用户的身份信息包括但不限于：姓名、备注、职位、职称、与用户的关系等。Exemplarily, when the terminal device sorts the identity urgency importance of the M peer call users, it may sort the identity urgency importance of the M peer call users according to the identity information of the peer call users. Wherein, the above-mentioned identity information of the calling user at the opposite end includes but is not limited to: name, remarks, position, professional title, relationship with the user, and the like.

例1：终端设备可以按照与对端通话用户的职位对M个对端通话用户进行排序。如，职位优先级高低可以为：上级领导＞直接领导＞业务相关人。Example 1: The terminal device may sort the M peer calling users according to the positions of the peer calling users. For example, the priority of the position can be: superior leader>direct leader>business related person.

例2：终端设备可以按照对端通话用户与用户的关系对M个对端通话用户进行排序。如，对端通话用户与终端设备用户的关系优先级的高低可以为：亲属＞朋友＞同事。Example 2: The terminal device may sort the M peer call users according to the relationship between the peer call users and the users. For example, the priority of the relationship between the calling user at the opposite end and the user of the terminal device may be: relatives>friends>colleagues.

示例性的，终端设备可以按照上述X个事件描述文本中每个事件描述文本的文本优先级，为对应的事件设置相应的提醒策略，来向用户提醒相应事件。Exemplarily, the terminal device may set a corresponding reminder policy for the corresponding event according to the text priority of each event description text in the above X event description texts to remind the user of the corresponding event.

这样，终端设备可以通过对对端通话用户的身份紧急重要度进行排序，来为相应的事件设置提醒策略，从而提醒用户及时处理重要事件，避免用户遗漏处理重要事件。In this way, the terminal device can set a reminder policy for the corresponding event by sorting the urgent importance of the identity of the calling user at the opposite end, so as to remind the user to deal with the important event in time and avoid the user missing to deal with the important event.

图4为本发明实施例提供的一种音频转换的终端设备的结构示意图，如图4所示，该终端设备600包括：获取模块601、存储模块602和分析模块603，其中：FIG. 4 is a schematic structural diagram of a terminal device for audio conversion provided by an embodiment of the present invention. As shown in FIG. 4 , theterminal device 600 includes: anacquisition module 601, astorage module 602, and ananalysis module 603, wherein:

获取模块601，用于获取通话过程中第一通话语音的目标语音情感特征。The obtainingmodule 601 is configured to obtain the target voice emotion feature of the first call voice during the call.

存储模块602，用于在上述获取模块601获取的上述目标语音情感特征与预定语音情感特征匹配的情况下，保存上述通话过程中的第二通话语音。Thestorage module 602 is configured to save the second call voice during the call when the emotion feature of the target voice acquired by theacquisition module 601 matches the predetermined voice emotion feature.

分析模块603，用于对上述存储模块602存储的上述第二通话语音进行语义分析，得到上述第二通话语音的音频文本。Theanalysis module 603 is configured to perform semantic analysis on the second call voice stored in thestorage module 602 to obtain the audio text of the second call voice.

其中，上述第二通话语音为：目标通话时间之后的通话音频，上述目标通话时间为上述第一通话语音之前的预定时间。The second call voice is: call audio after the target call time, and the target call time is a predetermined time before the first call voice.

可选的，在本发明实施例中，上述目标语音情感特征用于表征发出该第一通话语音的通话用户的用户情绪。Optionally, in this embodiment of the present invention, the above-mentioned emotion feature of the target voice is used to represent the user emotion of the calling user who makes the first calling speech.

可选的，在本发明实施例中，上述第一通话语音包括至少一个第一语音情感特征，所述至少一个第一语音情感特征中的每个第一语音情感特征对应一个通话时刻，上述至少一个第一语音情感特征包括上述目标语音情感特征；上述目标通话时间为上述目标语音情感特征对应目标通话时刻之前的预定时间。Optionally, in this embodiment of the present invention, the above-mentioned first call voice includes at least one first voice emotion feature, and each first voice emotion feature in the at least one first voice emotion feature corresponds to a call moment, and the above-mentioned at least one A first speech emotion feature includes the target speech emotion feature; the target talk time is a predetermined time before the target speech emotion feature corresponds to the target talk time.

可选的，在本发明实施例中，如图4所示，上述终端设备600还包括显示模块604和生成模块605，其中：生成模块605，用于根据事件信息，生成上述目标事件的事件描述文本；显示模块604，用于在第一界面上显示上述目标事件的事件描述文本；其中，上述音频文本包括目标事件的事件信息。Optionally, in this embodiment of the present invention, as shown in FIG. 4 , the above-mentionedterminal device 600 further includes adisplay module 604 and ageneration module 605, wherein: thegeneration module 605 is configured to generate an event description of the above-mentioned target event according to the event information text; thedisplay module 604 is configured to display the event description text of the above-mentioned target event on the first interface; wherein, the above-mentioned audio text includes event information of the target event.

可选的，在本发明实施例中，如图4所示，上述生成模块605，具体用于：终端设备获取上述第二通话语音中与所述终端设备用户进行通话的对端通话用户的身份信息；终端设备根据上述目标事件的事件信息上述对端通话用户的身份信息，生成上述目标事件的事件描述文本。Optionally, in this embodiment of the present invention, as shown in FIG. 4 , the above-mentionedgenerating module 605 is specifically configured to: the terminal device obtains the identity of the opposite-end calling user who is talking with the terminal device user in the above-mentioned second calling voice. information; the terminal device generates the event description text of the target event according to the event information of the target event and the identity information of the opposite party calling user.

本发明实施例提供的终端设备，能够实现上述方法实施例中终端设备实现的各个过程，为避免重复，这里不再赘述。The terminal device provided in the embodiments of the present invention can implement each process implemented by the terminal device in the foregoing method embodiments, and to avoid repetition, details are not repeated here.

本发明实施例提供的终端设备，由于语音的语音情感特征能够表征发出该语音的用户的情感变化，且用户情感发生变化的时刻之前的一定时间内通常会表述重要事件，因此，本发明实施例中终端设备通过检测用户通话过程中第一通话语音的目标语音情感特征是否与预定语音情感特征匹配，来确定第二通话语音是否为该通话过程中的重要通话语音，上述第二通话语音为在第一通话语音之前的预定时间之后的通话音频通常为重要通话语音。因此，当终端设备检测到用户情感发生变化时，无需人工手动操作，便可自动保存第二通话语音，同时对第二通话语音进行语义分析，最终得到第二通话语音的音频文本，使得用户可以通过该音频文本或该第二通话语音直接了解该第二通话语音中陈述的重要通话内容，从而避免用户遗漏重要通话内容。In the terminal device provided by the embodiment of the present invention, since the voice emotional feature of the voice can represent the emotional change of the user who uttered the voice, and an important event is usually expressed within a certain period of time before the moment when the user's emotion changes, the embodiment of the present invention The terminal equipment determines whether the second call voice is an important call voice in the call process by detecting whether the target voice emotional feature of the first call voice during the user's call matches the predetermined voice emotional feature, and the above-mentioned second call voice is in the The call audio after a predetermined time before the first call voice is usually an important call voice. Therefore, when the terminal device detects that the user's emotion has changed, the second call voice can be automatically saved without manual operation, and the second call voice can be semantically analyzed to finally obtain the audio text of the second call voice, so that the user can The important call content stated in the second call voice can be directly learned through the audio text or the second call voice, so as to prevent the user from missing the important call content.

图5为实现本发明各个实施例的一种终端设备的硬件结构示意图，该终端设备100包括但不限于：射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、处理器110、以及电源111等部件。本领域技术人员可以理解，图5中示出的终端设备100的结构并不构成对终端设备的限定，终端设备100可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。在本发明实施例中，终端设备100包括但不限于手机、平板电脑、笔记本电脑、掌上电脑、车载终端设备、可穿戴设备、以及计步器等。5 is a schematic diagram of the hardware structure of a terminal device implementing various embodiments of the present invention. Theterminal device 100 includes but is not limited to: aradio frequency unit 101, anetwork module 102, anaudio output unit 103, aninput unit 104, asensor 105, and adisplay unit 106 , theuser input unit 107 , theinterface unit 108 , thememory 109 , theprocessor 110 , and thepower supply 111 and other components. Those skilled in the art can understand that the structure of theterminal device 100 shown in FIG. 5 does not constitute a limitation on the terminal device, and theterminal device 100 may include more or less components than those shown, or combine some components, or Different component arrangements. In this embodiment of the present invention, theterminal device 100 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle-mounted terminal device, a wearable device, a pedometer, and the like.

其中，处理器110，用于获取通话过程中第一通话语音的目标语音情感特征；存储器109，用于在上述处理器110获取的上述目标语音情感特征与预定语音情感特征匹配的情况下，保存上述通话过程中的第二通话语音；处理器110，用于对上述存储器109存储的上述第二通话语音进行语义分析，得到上述第二通话语音的音频文本；其中，上述第二通话语音为：目标通话时间之后的通话音频，上述目标通话时间为上述第一通话语音之前的预定时间。Wherein, theprocessor 110 is used to acquire the target voice emotional feature of the first call during the call; thememory 109 is used to save the target voice emotional feature acquired by theprocessor 110 and the predetermined voice emotional feature in the case of matching The second call voice in the above-mentioned call process; theprocessor 110 is used to perform semantic analysis on the above-mentioned second call voice stored in the above-mentionedmemory 109, and obtain the audio text of the above-mentioned second call voice; Wherein, the above-mentioned second call voice is: Call audio after the target call time, where the target call time is a predetermined time before the first call voice.

应理解的是，本发明实施例中，射频单元101可用于收发信息或通话过程中，信号的接收和发送，具体的，将来自基站的下行数据接收后，给处理器110处理；另外，将上行的数据发送给基站。通常，射频单元101包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外，射频单元101还可以通过无线通信系统与网络和其他设备通信。It should be understood that, in this embodiment of the present invention, theradio frequency unit 101 may be used for receiving and sending signals in the process of sending and receiving information or during a call. Specifically, after receiving the downlink data from the base station, it is processed by theprocessor 110; The uplink data is sent to the base station. Generally, theradio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, theradio frequency unit 101 can also communicate with the network and other devices through a wireless communication system.

终端设备100通过网络模块102为用户提供了无线的宽带互联网访问，如帮助用户收发电子邮件、浏览网页和访问流式媒体等。Theterminal device 100 provides the user with wireless broadband Internet access through thenetwork module 102, such as helping the user to send and receive emails, browse web pages, access streaming media, and the like.

音频输出单元103可以将射频单元101或网络模块102接收的或者在存储器109中存储的音频数据转换成音频信号并且输出为声音。而且，音频输出单元103还可以提供与终端设备100执行的特定功能相关的音频输出(例如，呼叫信号接收声音、消息接收声音等等)。音频输出单元103包括扬声器、蜂鸣器以及受话器等。Theaudio output unit 103 may convert audio data received by theradio frequency unit 101 or thenetwork module 102 or stored in thememory 109 into audio signals and output as sound. Also, theaudio output unit 103 may also provide audio output related to a specific function performed by the terminal device 100 (eg, call signal reception sound, message reception sound, etc.). Theaudio output unit 103 includes a speaker, a buzzer, a receiver, and the like.

输入单元104用于接收音频或视频信号。输入单元104可以包括图形处理器(Graphics Processing Unit，GPU)1041和麦克风1042，图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元106上。经图形处理器1041处理后的图像帧可以存储在存储器109(或其它存储介质)中或者经由射频单元101或网络模块102进行发送。麦克风1042可以接收声音，并且能够将这样的声音处理为音频数据。处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元101发送到移动通信基站的格式输出。Theinput unit 104 is used to receive audio or video signals. Theinput unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and amicrophone 1042. Thegraphics processor 1041 captures images of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode data is processed. The processed image frames may be displayed on thedisplay unit 106 . The image frames processed by thegraphics processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via theradio frequency unit 101 or thenetwork module 102 . Themicrophone 1042 can receive sound and can process such sound into audio data. The processed audio data can be converted into a format that can be transmitted to a mobile communication base station via theradio frequency unit 101 for output in the case of a telephone call mode.

终端设备100还包括至少一种传感器105，比如光传感器、运动传感器以及其他传感器。具体地，光传感器包括环境光传感器及接近传感器，其中，环境光传感器可根据环境光线的明暗来调节显示面板1061的亮度，接近传感器可在终端设备100移动到耳边时，关闭显示面板1061和/或背光。作为运动传感器的一种，加速计传感器可检测各个方向上(一般为三轴)加速度的大小，静止时可检测出重力的大小及方向，可用于识别终端设备姿态(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等；传感器105还可以包括指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等，在此不再赘述。Theterminal device 100 also includes at least onesensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of thedisplay panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off thedisplay panel 1061 and thedisplay panel 1061 when theterminal device 100 moves to the ear. / or backlight. As a type of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (generally three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the terminal device (such as horizontal and vertical screen switching, related games , magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; thesensor 105 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, Infrared sensors, etc., are not repeated here.

显示单元106用于显示由用户输入的信息或提供给用户的信息。显示单元106可包括显示面板1061，可以采用液晶显示器(Liquid Crystal Display，LCD)、有机发光二极管(Organic Light-Emitting Diode，OLED)等形式来配置显示面板1061。Thedisplay unit 106 is used to display information input by the user or information provided to the user. Thedisplay unit 106 may include adisplay panel 1061, and thedisplay panel 1061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.

用户输入单元107可用于接收输入的数字或字符信息，以及产生与终端设备100的用户设置以及功能控制有关的键信号输入。具体地，用户输入单元107包括触控面板1071以及其他输入设备1072。触控面板1071，也称为触摸屏，可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1071上或在触控面板1071附近的操作)。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其中，触摸检测装置检测用户的触摸方位，并检测触摸操作带来的信号，将信号传送给触摸控制器；触摸控制器从触摸检测装置上接收触摸信息，并将它转换成触点坐标，再送给处理器110，接收处理器110发来的命令并加以执行。此外，可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1071。除了触控面板1071，用户输入单元107还可以包括其他输入设备1072。具体地，其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。Theuser input unit 107 may be used to receive input numerical or character information, and generate key signal input related to user settings and function control of theterminal device 100 . Specifically, theuser input unit 107 includes atouch panel 1071 andother input devices 1072 . Thetouch panel 1071, also referred to as a touch screen, can collect the user's touch operations on or near it (such as the user's finger, stylus, etc., any suitable object or attachment on or near the touch panel 1071). operate). Thetouch panel 1071 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller. To theprocessor 110, the command sent by theprocessor 110 is received and executed. In addition, thetouch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves. In addition to thetouch panel 1071 , theuser input unit 107 may also includeother input devices 1072 . Specifically,other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described herein again.

进一步的，触控面板1071可覆盖在显示面板1061上，当触控面板1071检测到在其上或附近的触摸操作后，传送给处理器110以确定触摸事件的类型，随后处理器110根据触摸事件的类型在显示面板1061上提供相应的视觉输出。虽然在图5中，触控面板1071与显示面板1061是作为两个独立的部件来实现终端设备100的输入和输出功能，但是在某些实施例中，可以将触控面板1071与显示面板1061集成而实现终端设备100的输入和输出功能，具体此处不做限定。Further, thetouch panel 1071 can be covered on thedisplay panel 1061. When thetouch panel 1071 detects a touch operation on or near it, it transmits it to theprocessor 110 to determine the type of the touch event, and then theprocessor 110 determines the type of the touch event according to the touch The type of event provides corresponding visual output ondisplay panel 1061 . Although in FIG. 5, thetouch panel 1071 and thedisplay panel 1061 are used as two independent components to realize the input and output functions of theterminal device 100, in some embodiments, thetouch panel 1071 and thedisplay panel 1061 can be The input and output functions of theterminal device 100 are implemented through integration, which is not specifically limited here.

接口单元108为外部装置与终端设备100连接的接口。例如，外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元108可以用于接收来自外部装置的输入(例如，数据信息、电力等等)并且将接收到的输入传输到终端设备100内的一个或多个元件或者可以用于在终端设备100和外部装置之间传输数据。Theinterface unit 108 is an interface for connecting an external device to theterminal device 100 . For example, external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more. Theinterface unit 108 may be used to receive input from external devices (eg, data information, power, etc.) and transmit the received input to one or more elements within theterminal device 100 or may be used between theterminal device 100 and external Transfer data between devices.

存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外，存储器109可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。Thememory 109 may be used to store software programs as well as various data. Thememory 109 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required for at least one function, and the like; Data created by the use of the mobile phone (such as audio data, phone book, etc.), etc. Additionally,memory 109 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

处理器110是终端设备100的控制中心，利用各种接口和线路连接整个终端设备100的各个部分，通过运行或执行存储在存储器109内的软件程序和/或模块，以及调用存储在存储器109内的数据，执行终端设备100的各种功能和处理数据，从而对终端设备100进行整体监控。处理器110可包括一个或多个处理单元；可选的，处理器110可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器110中。Theprocessor 110 is the control center of theterminal device 100, uses various interfaces and lines to connect various parts of the entireterminal device 100, runs or executes the software programs and/or modules stored in thememory 109, and calls the software programs and/or modules stored in thememory 109. data, perform various functions of theterminal device 100 and process data, so as to monitor theterminal device 100 as a whole. Theprocessor 110 may include one or more processing units; optionally, theprocessor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc., and the modem The modulation processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into theprocessor 110 .

终端设备100还可以包括给各个部件供电的电源111(比如电池)，可选的，电源111可以通过电源管理系统与处理器110逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。Theterminal device 100 may further include a power supply 111 (such as a battery) for supplying power to various components. Optionally, thepower supply 111 may be logically connected to theprocessor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system management and other functions.

另外，终端设备100包括一些未示出的功能模块，在此不再赘述。In addition, theterminal device 100 includes some unshown functional modules, which will not be repeated here.

可选的，本发明实施例还提供一种终端设备，包括处理器，存储器，存储在存储器上并可在上述处理器110上运行的计算机程序，该计算机程序被处理器执行时实现上述音频转换方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Optionally, an embodiment of the present invention further provides a terminal device, including a processor, a memory, a computer program stored in the memory and running on the above-mentionedprocessor 110, and the above-mentioned audio conversion is realized when the computer program is executed by the processor. Each process of the method embodiment can achieve the same technical effect, and in order to avoid repetition, it will not be repeated here.

本发明实施例还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现上述音频转换方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，上述的计算机可读存储介质，如只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above-mentioned audio conversion method embodiment can be realized, and the same technology can be achieved. The effect, in order to avoid repetition, is not repeated here. The above-mentioned computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk, or an optical disk.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例上述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the above-mentioned methods in the various embodiments of the present application.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of this application, without departing from the scope of protection of the purpose of this application and the claims, many forms can be made, which all fall within the protection of this application.