CN111210826A

Movatterモバイル変換

Info

Publication number: CN111210826A
Application number: CN201911363138.7A
Authority: CN
Inventors: 许锋刚; 熊友军
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Beijing Youbixuan Intelligent Robot Co ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-05-29
Anticipated expiration: 2039-12-26
Also published as: CN111210826B

Abstract

Translated fromChinese

本申请适用于语音信息处理技术领域，提供了一种语音信息处理方法、装置、存储介质和智能终端，包括：获取智能终端的属性信息以及所述智能终端发送的语音信息；对所述语音信息进行语音识别，确定所述语音信息对应的处理操作的操作类型；根据所述智能终端的属性信息和/或所述操作类型，确定所述语音信息的上传方式；基于确定的所述上传方式，将所述语音信息上传至云端服务器。本申请可释放本地服务器的空间资源，降低服务器的处理压力，提高服务器的执行效率。

The present application is applicable to the technical field of voice information processing, and provides a voice information processing method, device, storage medium and intelligent terminal, including: acquiring attribute information of an intelligent terminal and voice information sent by the intelligent terminal; Perform voice recognition to determine the operation type of the processing operation corresponding to the voice information; determine the upload mode of the voice information according to the attribute information of the intelligent terminal and/or the operation type; based on the determined upload mode, Upload the voice information to the cloud server. The present application can release the space resources of the local server, reduce the processing pressure of the server, and improve the execution efficiency of the server.

Description

Voice information processing method and device, storage medium and intelligent terminal

Technical Field

The present application belongs to the field of voice information processing technologies, and in particular, to a voice information processing method, apparatus, storage medium, and intelligent terminal.

Background

With the further rapid development of artificial intelligence in recent years, the robot industry is also rising rapidly, the voice recording system is more and more emphasized as an important system for uploading voice and issuing instructions of a robot, the system is more and more emphasized as a key system for the robot or other terminals to report voice and issue analysis result instructions, and the high-frequency and high-concurrent access of clients also brings great pressure to the voice recording system.

ASR (automatic speech recognition) technology has long played an important role as an integral part of service recognition of user languages. However, as the number of users increases, the pronunciation standard and the volume of the users both affect that the ASR cannot have a high recognition rate. Therefore, many enterprises or scientific research institutions need to use a large amount of voice files to test the voice algorithm, unfortunately, the number of standard voice channels which can be obtained at present is not large, the voice quality is low, and the problem of multiple dialects cannot be met. Therefore, it is highly necessary to store high-quality complete and smooth standard speech.

However, as the quantity of collected users increases gradually, data loss caused by insufficient storage space of the server occurs sometimes, the execution efficiency of the server is reduced due to excessive files stored in the server, the files are uploaded to the cloud in real time, the processing pressure of the server is increased, and storage failure is easily caused.

Disclosure of Invention

The embodiment of the application provides a voice information processing method and device, a storage medium and an intelligent terminal, and the problems that in the prior art, too many files are stored in a server, the execution efficiency of the server is reduced, the files are uploaded to a cloud end in real time, the processing pressure of the server is increased, and storage failure is easily caused are solved.

In a first aspect, an embodiment of the present application provides a method for processing voice information, including:

acquiring attribute information of intelligent equipment and voice information sent by the intelligent equipment;

performing voice recognition on the voice information, and determining the operation type of processing operation corresponding to the voice information;

determining an uploading mode of the voice information according to the attribute information and/or the operation type of the intelligent equipment;

and uploading the voice information to a cloud server based on the determined uploading mode.

In a possible implementation manner of the first aspect, the step of determining an uploading manner of the voice information according to the attribute information of the smart device includes:

determining an application scene type corresponding to the attribute information of the intelligent equipment according to the attribute information of the intelligent equipment;

and determining the uploading mode of the voice information corresponding to the application scene type of the intelligent equipment according to the application scene type and a preset scene uploading mode comparison table.

In a possible implementation manner of the first aspect, the determining, according to the operation type, an uploading manner of the voice information includes:

performing voice recognition on the voice information, and determining the operation type of processing operation corresponding to the voice information according to the voice recognition result;

and determining an uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table.

In a possible implementation manner of the first aspect, the uploading the voice information to a cloud server based on the determined uploading manner includes:

reading voice byte information with a specified size from the voice information, and storing the voice byte information into a predefined message queue, wherein the message queue is used for storing the voice byte information to be uploaded;

and the starting thread uploads the voice byte information in the message queue to a cloud server based on the determined uploading mode.

if the uploading mode of the voice information is determined to be additional uploading, acquiring the packaging parameters of the additional uploading;

and packaging the voice information according to the additionally uploaded packaging parameters and uploading the voice information to a cloud server.

if the uploading mode of the voice information is one-time uploading, determining whether the voice information carries an end identifier;

if the voice information does not carry an end identifier, storing the voice information into a temporary folder;

if the voice information carries an end identifier, storing the voice information into the temporary folder, and acquiring a one-time uploaded packaging parameter;

and according to the one-time uploaded packaging parameters, packaging and packaging the voice information in the temporary folder, and uploading the packaged voice information to a cloud server.

In a second aspect, an embodiment of the present application provides a speech information processing apparatus, including:

the information acquisition unit is used for acquiring attribute information of the intelligent equipment and voice information sent by the intelligent equipment;

the information identification unit is used for carrying out voice identification on the voice information and determining the operation type of the processing operation corresponding to the voice information;

the uploading mode determining unit is used for determining the uploading mode of the voice information according to the attribute information and/or the operation type of the intelligent equipment;

and the information uploading unit is used for uploading the voice information to a cloud server based on the determined uploading mode.

In a third aspect, an embodiment of the present application provides an intelligent terminal, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the method for processing voice information according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program implements the method for processing voice information according to the first aspect.

In a fifth aspect, the present application provides a computer program product, which, when running on an intelligent terminal, causes the intelligent terminal to execute the voice information processing method according to the first aspect.

In the embodiment of the application, through the attribute information who acquires the smart machine and the speech information that the smart machine sent, it is right speech information carries out speech recognition, confirms the operation type of the processing operation that speech information corresponds, then according to the attribute information of smart machine and/or operation type confirms speech information's upload mode is based on the affirmation again the upload mode will speech information uploads to the high in the clouds server, and this application not only can release local server's space resource, improves the execution efficiency of server, still can upload speech information to the high in the clouds server from local server according to the upload mode of nimble affirmation, guarantees the integrality that speech information uploaded, can effectively reduce the execution pressure of server simultaneously.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a voice information processing method provided in an embodiment of the present application;

fig. 2 is a flowchart of a specific implementation of the voice information processing method S103 according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a specific implementation of uploading in an additional uploading manner in step S104 of the voice information processing method according to the embodiment of the present application;

fig. 4 is a flowchart illustrating a specific implementation of uploading in a one-time transmission manner in step S104 of a voice information processing method according to an embodiment of the present application;

fig. 5 is a block diagram of a voice information processing apparatus according to an embodiment of the present application;

fig. 5a is a block diagram of a speech information processing apparatus according to another embodiment of the present application;

fig. 5b is a block diagram of a structure of an information uploading unit provided in an embodiment of the present application;

fig. 6 is a schematic diagram of an intelligent terminal provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The voice information processing method provided by the embodiment of the present application can be applied to intelligent terminals such as servers and ultra-mobile personal computers (UMPCs), and the specific type of the intelligent terminal is not limited in any way in the embodiment of the present application.

Fig. 1 shows an implementation flow of a voice information processing method provided by an embodiment of the present application, where the method flow includes steps S101 to S104. The specific realization principle of each step is as follows:

s101: acquiring attribute information of the intelligent device and voice information sent by the intelligent device,

the attribute information of the intelligent equipment with different service processing functions is different, the attribute information comprises model information, equipment identification information and the like, and the attribute information of the intelligent equipment can be used for determining the application scene of the intelligent equipment. The execution end in this embodiment is a server, and the intelligent device may be a robot or an intelligent sound box. Illustratively, model information of the robot is acquired when the robot establishes a communication connection with the server. In the embodiment of the application, a voice access system runs on the server, the user wakes up the intelligent device, and the intelligent device establishes communication with the server through a token issued by the voice access system running on the server.

Specifically, a user inputs attribute information of a smart device, such as attribute information of a robot, on an APP of a mobile terminal, the mobile terminal sends the attribute information of the smart device to a server, receives a verification code fed back by the server based on the attribute information of the smart device sent by the mobile terminal and forwards the verification code to the smart device, the smart device requests a token from the server according to the verification code, the token is used for establishing communication between the smart device and the server, the server generates a token according to the attribute information of the smart device and the verification code and sends the token to the smart device, and after the smart device acquires the token, the smart device sends voice information of the user to the server in real time based on the token and the server communication connection, and the server acquires the voice information sent by the intelligent equipment in real time.

S102: and performing voice recognition on the voice information, and determining the operation type of the processing operation corresponding to the voice information.

In the embodiment of the application, a main thread is started to perform voice recognition on the voice information, corresponding processing operation is executed according to the result of the voice recognition, the result information of the processing operation is fed back to the intelligent equipment, and meanwhile, the operation type of the processing operation corresponding to the voice information is determined. Specifically, the processing operations corresponding to the voice information are classified according to the number of interactions, for example, a single interaction type or a multi-round interaction type, and it is determined whether the corresponding processing operation can be completed once or the voice information needs to be acquired multiple times to execute multiple processing operations according to the result of the voice recognition.

S103: and determining the uploading mode of the voice information according to the model information and/or the operation type of the intelligent equipment.

In order to improve the efficiency of information uploading and ensure the integrity of the uploaded information, in the embodiment of the present application, the uploading manner of the voice information may be determined according to the attribute information of the intelligent device and/or the operation type, where the uploading manner of the voice information includes, but is not limited to, additional uploading and one-time uploading. The one-time uploading refers to that a plurality of pieces of voice information are packaged together and uploaded. The voice information sent by the robots of different models can correspond to different uploading modes, correspond to voice information of different operation types, and the modes of uploading to the cloud server can be different.

As an embodiment of the present application, the attribute information includes model information, and the step of determining the uploading manner of the voice information according to the attribute information of the smart device specifically includes:

a1: and determining the application scene type corresponding to the attribute information of the intelligent equipment according to the attribute information of the intelligent equipment. The application scene type refers to a service function of the intelligent device, for example, the robot is determined to be a floor sweeping robot, a child learning robot or a companion robot according to the model information of the robot.

A2: and determining the uploading mode of the voice information corresponding to the application scene type of the intelligent equipment according to the application scene type and a preset scene uploading mode comparison table. A preset scene uploading mode comparison table is set, and the preset scene uploading mode comparison table comprises a comparison relation between an application scene type and an uploading mode of voice information, for example, if the robot is a sweeping robot, the uploading mode corresponding to the voice information sent by the sweeping robot is additional uploading; if the robot is a child learning robot, the uploading mode of the voice information sent by the child learning robot is one-time uploading. It should be noted that the scene upload mode comparison table includes two upload modes, that is, one-time upload mode and additional upload mode, for example, a random upload mode, that is, a randomly selected upload mode may also be included.

As an embodiment of the present application, the step of determining an uploading manner of the voice information according to the operation type specifically includes:

b1: and performing voice recognition on the voice information, and determining the operation type of the processing operation corresponding to the voice information according to the voice recognition result.

B2: and determining an uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table. The preset type uploading mode comparison table comprises a comparison relation between the operation type and the uploading mode. For example, if the operation type of the processing operation corresponding to the voice information is a single interaction type, the uploading mode corresponding to the voice information is an additional uploading mode; and if the operation type of the processing operation corresponding to the voice information is a multi-round interaction type, the uploading mode corresponding to the voice information is a one-time uploading mode. It should be noted that the type upload mode comparison table includes two upload modes, i.e., one-time upload mode and additional upload mode, and may also include a random upload mode, i.e., a randomly selected upload mode.

As an embodiment of the present application, as shown in fig. 2, a specific implementation flow of step S103 of a voice information processing method provided in an embodiment of the present application specifically includes:

c1: and reading voice byte information with a specified size from the voice information, and storing the voice byte information into a predefined message queue, wherein the message queue is used for storing the voice byte information to be uploaded. In this embodiment, the server obtains the voice information uploaded by the intelligent device in real time, reads the voice byte information with the specified size from the voice information according to the set frequency, and sequentially stores the read voice byte information into a predefined message queue. The message queue is preset and used for storing voice byte information to be uploaded.

C2: and the starting thread uploads the voice byte information in the message queue to a cloud server based on the determined uploading mode. Specifically, a set number of sub-threads are started to consume the message queue, and the voice byte information in the message queue is uploaded to a cloud server according to a determined uploading mode.

In the embodiment of the application, a main thread is started to perform voice recognition on the voice information, corresponding processing operation is executed according to the result of the voice recognition, a set number of sub-threads are started to consume the message queue, and the voice byte information in the message queue is uploaded to a cloud server based on a determined uploading mode. The voice information is uploaded to the cloud server through the sub-thread, whether the voice information is uploaded successfully or not, the main thread is not influenced to process the voice information and feed back a processing result, the voice information is effectively uploaded to the cloud server to be stored while the voice information is processed efficiently, the processing efficiency of the server on the voice information is improved, the voice information is uploaded by introducing the message queue, uploading congestion is avoided, and meanwhile the pressure of the server is relieved.

Optionally, if it is determined that the uploading mode of the voice information corresponding to the application scene type of the smart device is a random uploading mode according to the application scene type and a preset scene uploading mode comparison table, determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table, and determining the uploading mode corresponding to the operation type as the uploading mode of the voice information.

S104: and uploading the voice information to a cloud server based on the determined uploading mode.

In the embodiment of the application, an API (application programming interface) corresponding to an additional uploading mode, a one-time uploading mode or other uploading modes is called, and the voice information is uploaded to a cloud server to be stored. Note that the upload method in the embodiment of the present application is not limited to the additional upload and the one-time upload.

As an embodiment of the present application, fig. 3 shows a specific implementation flow of uploading in an additional uploading manner in step S104 of the voice information processing method provided in the embodiment of the present application, which is detailed as follows:

d1: and if the uploading mode of the voice information is determined to be additional uploading, acquiring the packaging parameters of the additional uploading. Specifically, the encapsulation parameters of different uploading modes are different, and the encapsulation parameters include the identifiers of the uploading modes. And when the uploading mode of the voice information is determined to be additional uploading, generating an additional uploading packaging parameter aiming at the voice information according to a preset rule according to a user identifier, an intelligent equipment identifier and an additional uploading mode identifier carried by the voice information.

D2: and packaging the voice information according to the additionally uploaded packaging parameters and uploading the voice information to a cloud server.

In the embodiment of the application, the voice information is packaged according to the corresponding packaging parameters and then uploaded to the cloud server, and due to the fact that the uploading mode identification exists in the packaging parameters, the uploading mode of the stored voice information can be conveniently determined in the cloud server. Optionally, the cloud server may dynamically generate a folder name dedicated to the voice information according to the user identifier and the smart device identifier in the encapsulation information, and the server uploads the voice information uploaded by the smart device in an additional manner within a certain time period to the same folder for storage.

As an embodiment of the present application, fig. 4 shows a specific implementation flow of uploading in a one-time uploading manner in step S104 of a voice information processing method provided in an embodiment of the present application, which is detailed as follows:

e1: and if the uploading mode of the voice information is one-time uploading, determining whether the voice information carries an end identifier. Specifically, if the voice information is determined to be a first section of initial voice information uploaded by the intelligent device, generating an initial identifier for the voice; and if the voice information is determined to be the last section of voice information of the intelligent equipment in the current interaction stage, generating an ending identifier of the voice information. In the embodiment of the application, the voice information acquired in the current round of interaction for the first time is determined as a first section of voice information, the server fixedly reads the voice information with the specified length every time, and when the length of the read voice information is smaller than the specified length, the voice information is determined as a last section of voice information, or the voice information uploaded by the intelligent device is not detected within the preset time, the user is determined to be muted, and the voice information uploaded by the intelligent device before the user is muted is determined as the last section of voice information.

E2: and if the voice information does not carry an end identifier, storing the voice information into a temporary folder. If the voice information does not carry the ending mark, judging that the user has not recorded the voice information, continuously acquiring the voice information uploaded by the intelligent equipment, wherein each section of voice information uploaded by the intelligent equipment carries an information number, and sequentially storing the voice information into a temporary folder according to the information number. The temporary folder is used for caching voice information. In one embodiment, the storage space is periodically cleaned, the temporary folders which are uploaded in a packaged mode are deleted, and the storage space is released.

E3: and if the voice information carries an end identifier, storing the voice information into the temporary folder, and acquiring the packaging parameters uploaded at one time. Specifically, when the uploading mode of the voice information is determined to be one-time uploading, generating a packaging parameter for the one-time uploading of the voice information according to a preset rule according to a user identifier, an intelligent device identifier and a one-time uploading mode identifier carried by the voice information.

E4: and according to the one-time uploaded packaging parameters, packaging and packaging the voice information in the temporary folder, and uploading the packaged voice information to a cloud server. And the server can dynamically generate the folder name of the temporary folder according to the user identifier, the intelligent equipment identifier and the one-time uploading mode identifier carried by the voice information.

In the embodiment of the application, the voice information originally stored in the local server is uploaded to the cloud server, so that the space resource of the server is released, and the execution efficiency of the server can be improved.

In this application embodiment, through the attribute information who acquires the smart machine and the speech information that the smart machine sent, it is right speech information carries out speech recognition, confirms the operation type of the processing operation that speech information corresponds, then according to the attribute information of smart machine and/or operation type confirms speech information's upload mode is based on the affirmation again speech information uploads to the high in the clouds server, and this application not only can release local server's space resource, improves the execution efficiency of server, still can upload to the high in the clouds server from local server according to the upload mode of nimble affirmation with speech information, guarantees the integrality that speech information uploaded, can effectively reduce the execution pressure of server simultaneously.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 5 shows a block diagram of a voice information processing apparatus according to an embodiment of the present application, which corresponds to the voice information processing method described in the foregoing embodiment, and only shows portions related to the embodiment of the present application for convenience of description.

Referring to fig. 5, the voice information processing apparatus includes: aninformation acquisition unit 51, aninformation identification unit 52, an uploadmode determination unit 53, and an information uploadunit 54, wherein:

aninformation obtaining unit 51, configured to obtain attribute information of an intelligent device and voice information sent by the intelligent device;

aninformation recognition unit 52, configured to perform speech recognition on the speech information, and determine an operation type of a processing operation corresponding to the speech information;

an uploadmode determining unit 53, configured to determine an upload mode of the voice information according to the attribute information of the smart device and/or the operation type;

and aninformation uploading unit 54, configured to upload the voice information to a cloud server based on the determined uploading manner.

Optionally, as shown in fig. 5a, the uploadmode determining unit 53 includes:

the scene type determining module 531 is configured to determine, according to the attribute information of the intelligent device, an application scene type corresponding to the attribute information of the intelligent device;

a firstmode determining module 532, configured to determine, according to the application scene type and a preset scene uploading mode comparison table, an uploading mode of the voice information corresponding to the application scene type of the smart device.

Optionally, the uploadmode determination unit 53 includes:

the operation type determining module is used for carrying out voice recognition on the voice information and determining the operation type of processing operation corresponding to the voice information according to the voice recognition result;

and the second mode determining module is used for determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table.

Optionally, the uploadmode determining unit 53 further includes:

and the third mode determining module is used for determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table if the uploading mode of the voice information corresponding to the application scene type of the intelligent device is determined to be a random uploading mode according to the application scene type and a preset scene uploading mode comparison table.

Optionally, theinformation uploading unit 54 includes:

the information reading module is used for reading voice byte information with specified size from the voice information and storing the voice byte information into a predefined message queue, and the message queue is used for storing the voice byte information to be uploaded;

and the information uploading module is used for starting the sub thread and uploading the voice byte information in the message queue to a cloud server based on the determined uploading mode.

Optionally, theinformation uploading unit 54 includes:

the first parameter acquisition module is used for acquiring the packaging parameters of the additional uploading if the uploading mode of the voice information is determined to be the additional uploading;

and the first uploading module is used for uploading the voice information to a cloud server after packaging according to the additionally uploaded packaging parameters.

Optionally, as shown in fig. 5b, theinformation uploading unit 54 includes:

anidentifier determining module 541, configured to determine whether the voice information carries an end identifier if the voice information is uploaded in a one-time uploading manner;

theinformation caching module 542 is configured to store the voice information into a temporary folder if the voice information does not carry an end identifier;

a secondparameter obtaining module 543, configured to store the voice information in the temporary folder and obtain a one-time uploaded encapsulation parameter if the voice information carries an end identifier;

thesecond uploading module 544 is configured to, according to the one-time uploaded encapsulation parameter, package and encapsulate the voice information in the temporary folder, and then upload the voice information to a cloud server.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Embodiments of the present application further provide a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps of any one of the voice information processing methods shown in fig. 1 to 4 are implemented.

An embodiment of the present application further provides an intelligent terminal, which includes a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer readable instructions to implement any one of the steps of the voice information processing method shown in fig. 1 to 4.

The embodiment of the present application further provides a computer program product, which, when running on a server, causes the server to execute the steps of implementing any one of the voice information processing methods shown in fig. 1 to 4.

Fig. 6 is a schematic diagram of an intelligent terminal according to an embodiment of the present application. As shown in fig. 6, theintelligent terminal 6 of this embodiment includes: aprocessor 60, amemory 61, and computerreadable instructions 62 stored in thememory 61 and executable on theprocessor 60. Theprocessor 60, when executing the computerreadable instructions 62, implements the steps in the various speech information processing method embodiments described above, such as steps S101-S104 shown in fig. 1. Alternatively, theprocessor 60, when executing the computerreadable instructions 62, implements the functions of the modules/units in the above-described device embodiments, such as the functions of theunits 51 to 54 shown in fig. 5.

Illustratively, the computerreadable instructions 62 may be partitioned into one or more modules/units that are stored in thememory 61 and executed by theprocessor 60 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions 62 in theintelligent terminal 6.

Theintelligent terminal 6 may be a server. Theintelligent terminal 6 may include, but is not limited to, aprocessor 60, and amemory 61. It will be understood by those skilled in the art that fig. 6 is merely an example of theintelligent terminal 6, and does not constitute a limitation of theintelligent terminal 6, and may include more or less components than those shown, or combine some components, or different components, for example, theintelligent terminal 6 may further include an input-output device, a network access device, a bus, etc.

TheProcessor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Thememory 61 may be an internal storage unit of theintelligent terminal 6, such as a hard disk or a memory of theintelligent terminal 6. Thememory 61 may also be an external storage device of theintelligent terminal 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on theintelligent terminal 6. Further, thememory 61 may also include both an internal storage unit and an external storage device of thesmart terminal 6. Thememory 61 is used for storing the computer readable instructions and other programs and data required by the intelligent terminal. Thememory 61 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying the computer program code with the speech information processing apparatus/terminal device, the recording medium, the computer Memory, the Read-Only Memory (ROM), the Random Access Memory (RAM), the electrical carrier signal, the telecommunications signal, and the software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.