Movatterモバイル変換


[0]ホーム

URL:


CN111210826A - Voice information processing method and device, storage medium and intelligent terminal - Google Patents

Voice information processing method and device, storage medium and intelligent terminal
Download PDF

Info

Publication number
CN111210826A
CN111210826ACN201911363138.7ACN201911363138ACN111210826ACN 111210826 ACN111210826 ACN 111210826ACN 201911363138 ACN201911363138 ACN 201911363138ACN 111210826 ACN111210826 ACN 111210826A
Authority
CN
China
Prior art keywords
voice information
uploading
information
voice
upload
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911363138.7A
Other languages
Chinese (zh)
Other versions
CN111210826B (en
Inventor
许锋刚
熊友军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youbixuan Intelligent Robot Co ltd
Original Assignee
Shenzhen Ubtech Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ubtech Technology Co ltdfiledCriticalShenzhen Ubtech Technology Co ltd
Priority to CN201911363138.7ApriorityCriticalpatent/CN111210826B/en
Publication of CN111210826ApublicationCriticalpatent/CN111210826A/en
Application grantedgrantedCritical
Publication of CN111210826BpublicationCriticalpatent/CN111210826B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请适用于语音信息处理技术领域,提供了一种语音信息处理方法、装置、存储介质和智能终端,包括:获取智能终端的属性信息以及所述智能终端发送的语音信息;对所述语音信息进行语音识别,确定所述语音信息对应的处理操作的操作类型;根据所述智能终端的属性信息和/或所述操作类型,确定所述语音信息的上传方式;基于确定的所述上传方式,将所述语音信息上传至云端服务器。本申请可释放本地服务器的空间资源,降低服务器的处理压力,提高服务器的执行效率。

Figure 201911363138

The present application is applicable to the technical field of voice information processing, and provides a voice information processing method, device, storage medium and intelligent terminal, including: acquiring attribute information of an intelligent terminal and voice information sent by the intelligent terminal; Perform voice recognition to determine the operation type of the processing operation corresponding to the voice information; determine the upload mode of the voice information according to the attribute information of the intelligent terminal and/or the operation type; based on the determined upload mode, Upload the voice information to the cloud server. The present application can release the space resources of the local server, reduce the processing pressure of the server, and improve the execution efficiency of the server.

Figure 201911363138

Description

Voice information processing method and device, storage medium and intelligent terminal
Technical Field
The present application belongs to the field of voice information processing technologies, and in particular, to a voice information processing method, apparatus, storage medium, and intelligent terminal.
Background
With the further rapid development of artificial intelligence in recent years, the robot industry is also rising rapidly, the voice recording system is more and more emphasized as an important system for uploading voice and issuing instructions of a robot, the system is more and more emphasized as a key system for the robot or other terminals to report voice and issue analysis result instructions, and the high-frequency and high-concurrent access of clients also brings great pressure to the voice recording system.
ASR (automatic speech recognition) technology has long played an important role as an integral part of service recognition of user languages. However, as the number of users increases, the pronunciation standard and the volume of the users both affect that the ASR cannot have a high recognition rate. Therefore, many enterprises or scientific research institutions need to use a large amount of voice files to test the voice algorithm, unfortunately, the number of standard voice channels which can be obtained at present is not large, the voice quality is low, and the problem of multiple dialects cannot be met. Therefore, it is highly necessary to store high-quality complete and smooth standard speech.
However, as the quantity of collected users increases gradually, data loss caused by insufficient storage space of the server occurs sometimes, the execution efficiency of the server is reduced due to excessive files stored in the server, the files are uploaded to the cloud in real time, the processing pressure of the server is increased, and storage failure is easily caused.
Disclosure of Invention
The embodiment of the application provides a voice information processing method and device, a storage medium and an intelligent terminal, and the problems that in the prior art, too many files are stored in a server, the execution efficiency of the server is reduced, the files are uploaded to a cloud end in real time, the processing pressure of the server is increased, and storage failure is easily caused are solved.
In a first aspect, an embodiment of the present application provides a method for processing voice information, including:
acquiring attribute information of intelligent equipment and voice information sent by the intelligent equipment;
performing voice recognition on the voice information, and determining the operation type of processing operation corresponding to the voice information;
determining an uploading mode of the voice information according to the attribute information and/or the operation type of the intelligent equipment;
and uploading the voice information to a cloud server based on the determined uploading mode.
In a possible implementation manner of the first aspect, the step of determining an uploading manner of the voice information according to the attribute information of the smart device includes:
determining an application scene type corresponding to the attribute information of the intelligent equipment according to the attribute information of the intelligent equipment;
and determining the uploading mode of the voice information corresponding to the application scene type of the intelligent equipment according to the application scene type and a preset scene uploading mode comparison table.
In a possible implementation manner of the first aspect, the determining, according to the operation type, an uploading manner of the voice information includes:
performing voice recognition on the voice information, and determining the operation type of processing operation corresponding to the voice information according to the voice recognition result;
and determining an uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table.
In a possible implementation manner of the first aspect, the uploading the voice information to a cloud server based on the determined uploading manner includes:
reading voice byte information with a specified size from the voice information, and storing the voice byte information into a predefined message queue, wherein the message queue is used for storing the voice byte information to be uploaded;
and the starting thread uploads the voice byte information in the message queue to a cloud server based on the determined uploading mode.
In a possible implementation manner of the first aspect, the uploading the voice information to a cloud server based on the determined uploading manner includes:
if the uploading mode of the voice information is determined to be additional uploading, acquiring the packaging parameters of the additional uploading;
and packaging the voice information according to the additionally uploaded packaging parameters and uploading the voice information to a cloud server.
In a possible implementation manner of the first aspect, the uploading the voice information to a cloud server based on the determined uploading manner includes:
if the uploading mode of the voice information is one-time uploading, determining whether the voice information carries an end identifier;
if the voice information does not carry an end identifier, storing the voice information into a temporary folder;
if the voice information carries an end identifier, storing the voice information into the temporary folder, and acquiring a one-time uploaded packaging parameter;
and according to the one-time uploaded packaging parameters, packaging and packaging the voice information in the temporary folder, and uploading the packaged voice information to a cloud server.
In a second aspect, an embodiment of the present application provides a speech information processing apparatus, including:
the information acquisition unit is used for acquiring attribute information of the intelligent equipment and voice information sent by the intelligent equipment;
the information identification unit is used for carrying out voice identification on the voice information and determining the operation type of the processing operation corresponding to the voice information;
the uploading mode determining unit is used for determining the uploading mode of the voice information according to the attribute information and/or the operation type of the intelligent equipment;
and the information uploading unit is used for uploading the voice information to a cloud server based on the determined uploading mode.
In a third aspect, an embodiment of the present application provides an intelligent terminal, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the method for processing voice information according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program implements the method for processing voice information according to the first aspect.
In a fifth aspect, the present application provides a computer program product, which, when running on an intelligent terminal, causes the intelligent terminal to execute the voice information processing method according to the first aspect.
In the embodiment of the application, through the attribute information who acquires the smart machine and the speech information that the smart machine sent, it is right speech information carries out speech recognition, confirms the operation type of the processing operation that speech information corresponds, then according to the attribute information of smart machine and/or operation type confirms speech information's upload mode is based on the affirmation again the upload mode will speech information uploads to the high in the clouds server, and this application not only can release local server's space resource, improves the execution efficiency of server, still can upload speech information to the high in the clouds server from local server according to the upload mode of nimble affirmation, guarantees the integrality that speech information uploaded, can effectively reduce the execution pressure of server simultaneously.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a voice information processing method provided in an embodiment of the present application;
fig. 2 is a flowchart of a specific implementation of the voice information processing method S103 according to an embodiment of the present application;
fig. 3 is a flowchart illustrating a specific implementation of uploading in an additional uploading manner in step S104 of the voice information processing method according to the embodiment of the present application;
fig. 4 is a flowchart illustrating a specific implementation of uploading in a one-time transmission manner in step S104 of a voice information processing method according to an embodiment of the present application;
fig. 5 is a block diagram of a voice information processing apparatus according to an embodiment of the present application;
fig. 5a is a block diagram of a speech information processing apparatus according to another embodiment of the present application;
fig. 5b is a block diagram of a structure of an information uploading unit provided in an embodiment of the present application;
fig. 6 is a schematic diagram of an intelligent terminal provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The voice information processing method provided by the embodiment of the present application can be applied to intelligent terminals such as servers and ultra-mobile personal computers (UMPCs), and the specific type of the intelligent terminal is not limited in any way in the embodiment of the present application.
Fig. 1 shows an implementation flow of a voice information processing method provided by an embodiment of the present application, where the method flow includes steps S101 to S104. The specific realization principle of each step is as follows:
s101: acquiring attribute information of the intelligent device and voice information sent by the intelligent device,
the attribute information of the intelligent equipment with different service processing functions is different, the attribute information comprises model information, equipment identification information and the like, and the attribute information of the intelligent equipment can be used for determining the application scene of the intelligent equipment. The execution end in this embodiment is a server, and the intelligent device may be a robot or an intelligent sound box. Illustratively, model information of the robot is acquired when the robot establishes a communication connection with the server. In the embodiment of the application, a voice access system runs on the server, the user wakes up the intelligent device, and the intelligent device establishes communication with the server through a token issued by the voice access system running on the server.
Specifically, a user inputs attribute information of a smart device, such as attribute information of a robot, on an APP of a mobile terminal, the mobile terminal sends the attribute information of the smart device to a server, receives a verification code fed back by the server based on the attribute information of the smart device sent by the mobile terminal and forwards the verification code to the smart device, the smart device requests a token from the server according to the verification code, the token is used for establishing communication between the smart device and the server, the server generates a token according to the attribute information of the smart device and the verification code and sends the token to the smart device, and after the smart device acquires the token, the smart device sends voice information of the user to the server in real time based on the token and the server communication connection, and the server acquires the voice information sent by the intelligent equipment in real time.
S102: and performing voice recognition on the voice information, and determining the operation type of the processing operation corresponding to the voice information.
In the embodiment of the application, a main thread is started to perform voice recognition on the voice information, corresponding processing operation is executed according to the result of the voice recognition, the result information of the processing operation is fed back to the intelligent equipment, and meanwhile, the operation type of the processing operation corresponding to the voice information is determined. Specifically, the processing operations corresponding to the voice information are classified according to the number of interactions, for example, a single interaction type or a multi-round interaction type, and it is determined whether the corresponding processing operation can be completed once or the voice information needs to be acquired multiple times to execute multiple processing operations according to the result of the voice recognition.
S103: and determining the uploading mode of the voice information according to the model information and/or the operation type of the intelligent equipment.
In order to improve the efficiency of information uploading and ensure the integrity of the uploaded information, in the embodiment of the present application, the uploading manner of the voice information may be determined according to the attribute information of the intelligent device and/or the operation type, where the uploading manner of the voice information includes, but is not limited to, additional uploading and one-time uploading. The one-time uploading refers to that a plurality of pieces of voice information are packaged together and uploaded. The voice information sent by the robots of different models can correspond to different uploading modes, correspond to voice information of different operation types, and the modes of uploading to the cloud server can be different.
As an embodiment of the present application, the attribute information includes model information, and the step of determining the uploading manner of the voice information according to the attribute information of the smart device specifically includes:
a1: and determining the application scene type corresponding to the attribute information of the intelligent equipment according to the attribute information of the intelligent equipment. The application scene type refers to a service function of the intelligent device, for example, the robot is determined to be a floor sweeping robot, a child learning robot or a companion robot according to the model information of the robot.
A2: and determining the uploading mode of the voice information corresponding to the application scene type of the intelligent equipment according to the application scene type and a preset scene uploading mode comparison table. A preset scene uploading mode comparison table is set, and the preset scene uploading mode comparison table comprises a comparison relation between an application scene type and an uploading mode of voice information, for example, if the robot is a sweeping robot, the uploading mode corresponding to the voice information sent by the sweeping robot is additional uploading; if the robot is a child learning robot, the uploading mode of the voice information sent by the child learning robot is one-time uploading. It should be noted that the scene upload mode comparison table includes two upload modes, that is, one-time upload mode and additional upload mode, for example, a random upload mode, that is, a randomly selected upload mode may also be included.
As an embodiment of the present application, the step of determining an uploading manner of the voice information according to the operation type specifically includes:
b1: and performing voice recognition on the voice information, and determining the operation type of the processing operation corresponding to the voice information according to the voice recognition result.
B2: and determining an uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table. The preset type uploading mode comparison table comprises a comparison relation between the operation type and the uploading mode. For example, if the operation type of the processing operation corresponding to the voice information is a single interaction type, the uploading mode corresponding to the voice information is an additional uploading mode; and if the operation type of the processing operation corresponding to the voice information is a multi-round interaction type, the uploading mode corresponding to the voice information is a one-time uploading mode. It should be noted that the type upload mode comparison table includes two upload modes, i.e., one-time upload mode and additional upload mode, and may also include a random upload mode, i.e., a randomly selected upload mode.
As an embodiment of the present application, as shown in fig. 2, a specific implementation flow of step S103 of a voice information processing method provided in an embodiment of the present application specifically includes:
c1: and reading voice byte information with a specified size from the voice information, and storing the voice byte information into a predefined message queue, wherein the message queue is used for storing the voice byte information to be uploaded. In this embodiment, the server obtains the voice information uploaded by the intelligent device in real time, reads the voice byte information with the specified size from the voice information according to the set frequency, and sequentially stores the read voice byte information into a predefined message queue. The message queue is preset and used for storing voice byte information to be uploaded.
C2: and the starting thread uploads the voice byte information in the message queue to a cloud server based on the determined uploading mode. Specifically, a set number of sub-threads are started to consume the message queue, and the voice byte information in the message queue is uploaded to a cloud server according to a determined uploading mode.
In the embodiment of the application, a main thread is started to perform voice recognition on the voice information, corresponding processing operation is executed according to the result of the voice recognition, a set number of sub-threads are started to consume the message queue, and the voice byte information in the message queue is uploaded to a cloud server based on a determined uploading mode. The voice information is uploaded to the cloud server through the sub-thread, whether the voice information is uploaded successfully or not, the main thread is not influenced to process the voice information and feed back a processing result, the voice information is effectively uploaded to the cloud server to be stored while the voice information is processed efficiently, the processing efficiency of the server on the voice information is improved, the voice information is uploaded by introducing the message queue, uploading congestion is avoided, and meanwhile the pressure of the server is relieved.
Optionally, if it is determined that the uploading mode of the voice information corresponding to the application scene type of the smart device is a random uploading mode according to the application scene type and a preset scene uploading mode comparison table, determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table, and determining the uploading mode corresponding to the operation type as the uploading mode of the voice information.
S104: and uploading the voice information to a cloud server based on the determined uploading mode.
In the embodiment of the application, an API (application programming interface) corresponding to an additional uploading mode, a one-time uploading mode or other uploading modes is called, and the voice information is uploaded to a cloud server to be stored. Note that the upload method in the embodiment of the present application is not limited to the additional upload and the one-time upload.
As an embodiment of the present application, fig. 3 shows a specific implementation flow of uploading in an additional uploading manner in step S104 of the voice information processing method provided in the embodiment of the present application, which is detailed as follows:
d1: and if the uploading mode of the voice information is determined to be additional uploading, acquiring the packaging parameters of the additional uploading. Specifically, the encapsulation parameters of different uploading modes are different, and the encapsulation parameters include the identifiers of the uploading modes. And when the uploading mode of the voice information is determined to be additional uploading, generating an additional uploading packaging parameter aiming at the voice information according to a preset rule according to a user identifier, an intelligent equipment identifier and an additional uploading mode identifier carried by the voice information.
D2: and packaging the voice information according to the additionally uploaded packaging parameters and uploading the voice information to a cloud server.
In the embodiment of the application, the voice information is packaged according to the corresponding packaging parameters and then uploaded to the cloud server, and due to the fact that the uploading mode identification exists in the packaging parameters, the uploading mode of the stored voice information can be conveniently determined in the cloud server. Optionally, the cloud server may dynamically generate a folder name dedicated to the voice information according to the user identifier and the smart device identifier in the encapsulation information, and the server uploads the voice information uploaded by the smart device in an additional manner within a certain time period to the same folder for storage.
As an embodiment of the present application, fig. 4 shows a specific implementation flow of uploading in a one-time uploading manner in step S104 of a voice information processing method provided in an embodiment of the present application, which is detailed as follows:
e1: and if the uploading mode of the voice information is one-time uploading, determining whether the voice information carries an end identifier. Specifically, if the voice information is determined to be a first section of initial voice information uploaded by the intelligent device, generating an initial identifier for the voice; and if the voice information is determined to be the last section of voice information of the intelligent equipment in the current interaction stage, generating an ending identifier of the voice information. In the embodiment of the application, the voice information acquired in the current round of interaction for the first time is determined as a first section of voice information, the server fixedly reads the voice information with the specified length every time, and when the length of the read voice information is smaller than the specified length, the voice information is determined as a last section of voice information, or the voice information uploaded by the intelligent device is not detected within the preset time, the user is determined to be muted, and the voice information uploaded by the intelligent device before the user is muted is determined as the last section of voice information.
E2: and if the voice information does not carry an end identifier, storing the voice information into a temporary folder. If the voice information does not carry the ending mark, judging that the user has not recorded the voice information, continuously acquiring the voice information uploaded by the intelligent equipment, wherein each section of voice information uploaded by the intelligent equipment carries an information number, and sequentially storing the voice information into a temporary folder according to the information number. The temporary folder is used for caching voice information. In one embodiment, the storage space is periodically cleaned, the temporary folders which are uploaded in a packaged mode are deleted, and the storage space is released.
E3: and if the voice information carries an end identifier, storing the voice information into the temporary folder, and acquiring the packaging parameters uploaded at one time. Specifically, when the uploading mode of the voice information is determined to be one-time uploading, generating a packaging parameter for the one-time uploading of the voice information according to a preset rule according to a user identifier, an intelligent device identifier and a one-time uploading mode identifier carried by the voice information.
E4: and according to the one-time uploaded packaging parameters, packaging and packaging the voice information in the temporary folder, and uploading the packaged voice information to a cloud server. And the server can dynamically generate the folder name of the temporary folder according to the user identifier, the intelligent equipment identifier and the one-time uploading mode identifier carried by the voice information.
In the embodiment of the application, the voice information originally stored in the local server is uploaded to the cloud server, so that the space resource of the server is released, and the execution efficiency of the server can be improved.
In this application embodiment, through the attribute information who acquires the smart machine and the speech information that the smart machine sent, it is right speech information carries out speech recognition, confirms the operation type of the processing operation that speech information corresponds, then according to the attribute information of smart machine and/or operation type confirms speech information's upload mode is based on the affirmation again speech information uploads to the high in the clouds server, and this application not only can release local server's space resource, improves the execution efficiency of server, still can upload to the high in the clouds server from local server according to the upload mode of nimble affirmation with speech information, guarantees the integrality that speech information uploaded, can effectively reduce the execution pressure of server simultaneously.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 5 shows a block diagram of a voice information processing apparatus according to an embodiment of the present application, which corresponds to the voice information processing method described in the foregoing embodiment, and only shows portions related to the embodiment of the present application for convenience of description.
Referring to fig. 5, the voice information processing apparatus includes: aninformation acquisition unit 51, aninformation identification unit 52, an uploadmode determination unit 53, and an information uploadunit 54, wherein:
aninformation obtaining unit 51, configured to obtain attribute information of an intelligent device and voice information sent by the intelligent device;
aninformation recognition unit 52, configured to perform speech recognition on the speech information, and determine an operation type of a processing operation corresponding to the speech information;
an uploadmode determining unit 53, configured to determine an upload mode of the voice information according to the attribute information of the smart device and/or the operation type;
and aninformation uploading unit 54, configured to upload the voice information to a cloud server based on the determined uploading manner.
Optionally, as shown in fig. 5a, the uploadmode determining unit 53 includes:
the scene type determining module 531 is configured to determine, according to the attribute information of the intelligent device, an application scene type corresponding to the attribute information of the intelligent device;
a firstmode determining module 532, configured to determine, according to the application scene type and a preset scene uploading mode comparison table, an uploading mode of the voice information corresponding to the application scene type of the smart device.
Optionally, the uploadmode determination unit 53 includes:
the operation type determining module is used for carrying out voice recognition on the voice information and determining the operation type of processing operation corresponding to the voice information according to the voice recognition result;
and the second mode determining module is used for determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table.
Optionally, the uploadmode determining unit 53 further includes:
and the third mode determining module is used for determining the uploading mode corresponding to the operation type according to the operation type and a preset type uploading mode comparison table if the uploading mode of the voice information corresponding to the application scene type of the intelligent device is determined to be a random uploading mode according to the application scene type and a preset scene uploading mode comparison table.
Optionally, theinformation uploading unit 54 includes:
the information reading module is used for reading voice byte information with specified size from the voice information and storing the voice byte information into a predefined message queue, and the message queue is used for storing the voice byte information to be uploaded;
and the information uploading module is used for starting the sub thread and uploading the voice byte information in the message queue to a cloud server based on the determined uploading mode.
Optionally, theinformation uploading unit 54 includes:
the first parameter acquisition module is used for acquiring the packaging parameters of the additional uploading if the uploading mode of the voice information is determined to be the additional uploading;
and the first uploading module is used for uploading the voice information to a cloud server after packaging according to the additionally uploaded packaging parameters.
Optionally, as shown in fig. 5b, theinformation uploading unit 54 includes:
anidentifier determining module 541, configured to determine whether the voice information carries an end identifier if the voice information is uploaded in a one-time uploading manner;
theinformation caching module 542 is configured to store the voice information into a temporary folder if the voice information does not carry an end identifier;
a secondparameter obtaining module 543, configured to store the voice information in the temporary folder and obtain a one-time uploaded encapsulation parameter if the voice information carries an end identifier;
thesecond uploading module 544 is configured to, according to the one-time uploaded encapsulation parameter, package and encapsulate the voice information in the temporary folder, and then upload the voice information to a cloud server.
In this application embodiment, through the attribute information who acquires the smart machine and the speech information that the smart machine sent, it is right speech information carries out speech recognition, confirms the operation type of the processing operation that speech information corresponds, then according to the attribute information of smart machine and/or operation type confirms speech information's upload mode is based on the affirmation again speech information uploads to the high in the clouds server, and this application not only can release local server's space resource, improves the execution efficiency of server, still can upload to the high in the clouds server from local server according to the upload mode of nimble affirmation with speech information, guarantees the integrality that speech information uploaded, can effectively reduce the execution pressure of server simultaneously.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Embodiments of the present application further provide a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps of any one of the voice information processing methods shown in fig. 1 to 4 are implemented.
An embodiment of the present application further provides an intelligent terminal, which includes a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer readable instructions to implement any one of the steps of the voice information processing method shown in fig. 1 to 4.
The embodiment of the present application further provides a computer program product, which, when running on a server, causes the server to execute the steps of implementing any one of the voice information processing methods shown in fig. 1 to 4.
Fig. 6 is a schematic diagram of an intelligent terminal according to an embodiment of the present application. As shown in fig. 6, theintelligent terminal 6 of this embodiment includes: aprocessor 60, amemory 61, and computerreadable instructions 62 stored in thememory 61 and executable on theprocessor 60. Theprocessor 60, when executing the computerreadable instructions 62, implements the steps in the various speech information processing method embodiments described above, such as steps S101-S104 shown in fig. 1. Alternatively, theprocessor 60, when executing the computerreadable instructions 62, implements the functions of the modules/units in the above-described device embodiments, such as the functions of theunits 51 to 54 shown in fig. 5.
Illustratively, the computerreadable instructions 62 may be partitioned into one or more modules/units that are stored in thememory 61 and executed by theprocessor 60 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions 62 in theintelligent terminal 6.
Theintelligent terminal 6 may be a server. Theintelligent terminal 6 may include, but is not limited to, aprocessor 60, and amemory 61. It will be understood by those skilled in the art that fig. 6 is merely an example of theintelligent terminal 6, and does not constitute a limitation of theintelligent terminal 6, and may include more or less components than those shown, or combine some components, or different components, for example, theintelligent terminal 6 may further include an input-output device, a network access device, a bus, etc.
TheProcessor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Thememory 61 may be an internal storage unit of theintelligent terminal 6, such as a hard disk or a memory of theintelligent terminal 6. Thememory 61 may also be an external storage device of theintelligent terminal 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on theintelligent terminal 6. Further, thememory 61 may also include both an internal storage unit and an external storage device of thesmart terminal 6. Thememory 61 is used for storing the computer readable instructions and other programs and data required by the intelligent terminal. Thememory 61 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying the computer program code with the speech information processing apparatus/terminal device, the recording medium, the computer Memory, the Read-Only Memory (ROM), the Random Access Memory (RAM), the electrical carrier signal, the telecommunications signal, and the software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

Translated fromChinese
1.一种语音信息处理方法,其特征在于,包括:1. a voice information processing method, is characterized in that, comprises:获取智能设备的属性信息以及所述智能设备发送的语音信息;Obtain the attribute information of the smart device and the voice information sent by the smart device;对所述语音信息进行语音识别,确定所述语音信息对应的处理操作的操作类型;Perform voice recognition on the voice information, and determine the operation type of the processing operation corresponding to the voice information;根据所述智能设备的属性信息和/或所述操作类型,确定所述语音信息的上传方式;Determine the uploading method of the voice information according to the attribute information of the smart device and/or the operation type;基于确定的所述上传方式,将所述语音信息上传至云端服务器。Based on the determined uploading method, the voice information is uploaded to the cloud server.2.根据权利要求1所述的语音信息处理方法,其特征在于,所述根据所述智能设备的属性信息,确定所述语音信息的上传方式的步骤,包括:2. The voice information processing method according to claim 1, wherein the step of determining the uploading mode of the voice information according to the attribute information of the smart device comprises:根据所述智能设备的属性信息,确定所述智能设备的属性信息对应的应用场景类型;Determine, according to the attribute information of the smart device, an application scenario type corresponding to the attribute information of the smart device;根据所述应用场景类型与预设的场景上传方式对照表,确定所述智能设备的应用场景类型对应的语音信息的上传方式。According to the comparison table between the application scene type and the preset scene upload method, the upload method of the voice information corresponding to the application scene type of the smart device is determined.3.根据权利要求1所述的语音信息处理方法,其特征在于,所述根据所述操作类型,确定所述语音信息的上传方式的步骤,包括:3. The voice information processing method according to claim 1, wherein the step of determining the uploading mode of the voice information according to the operation type comprises:对所述语音信息进行语音识别,根据所述语音识别的结果确定所述语音信息对应的处理操作的操作类型;Perform voice recognition on the voice information, and determine the operation type of the processing operation corresponding to the voice information according to the result of the voice recognition;根据所述操作类型与预设的类型上传方式对照表,确定所述操作类型对应的上传方式。According to the comparison table between the operation type and the preset type uploading method, the uploading method corresponding to the operation type is determined.4.根据权利要求1所述的语音信息处理方法,其特征在于,所述基于确定的所述上传方式,将所述语音信息上传至云端服务器的步骤,包括:4. The voice information processing method according to claim 1, wherein the step of uploading the voice information to a cloud server based on the determined uploading method comprises:从所述语音信息中读取指定大小的语音字节信息,存入预先定义的消息队列,所述消息队列用于存放待上传的语音字节信息;Read voice byte information of a specified size from the voice information, and store it in a predefined message queue, where the message queue is used to store the voice byte information to be uploaded;启动子线程,基于确定的所述上传方式,将所述消息队列中的所述语音字节信息上传至云端服务器。Start a sub-thread, and upload the voice byte information in the message queue to the cloud server based on the determined upload method.5.根据权利要求1所述的语音信息处理方法,其特征在于,所述基于确定的所述上传方式,将所述语音信息上传至云端服务器的步骤,包括:5. The voice information processing method according to claim 1, wherein the step of uploading the voice information to a cloud server based on the determined uploading method comprises:若确定所述语音信息的上传方式为追加上传,则获取所述追加上传的封装参数;If it is determined that the uploading method of the voice information is additional uploading, obtaining the package parameters of the additional uploading;根据所述追加上传的封装参数,将所述语音信息封装后上传至云端服务器。According to the additionally uploaded packaging parameters, the voice information is packaged and uploaded to the cloud server.6.根据权利要求1所述的语音信息处理方法,其特征在于,所述基于确定的所述上传方式,将所述语音信息上传至云端服务器的步骤,包括:6. The voice information processing method according to claim 1, wherein the step of uploading the voice information to a cloud server based on the determined uploading method comprises:若所述语音信息的上传方式为一次性上传,则确定所述语音信息是否携带结束标识;If the uploading method of the voice information is a one-time upload, then determine whether the voice information carries an end identifier;若所述语音信息未携带结束标识,则将所述语音信息存入临时文件夹中;If the voice information does not carry an end identification, then the voice information is stored in a temporary folder;若所述语音信息携带结束标识,则将所述语音信息存入所述临时文件夹,并获取一次性上传的封装参数;If the voice information carries an end identifier, the voice information is stored in the temporary folder, and the packaging parameters for one-time upload are obtained;根据所述一次性上传的封装参数,将所述临时文件夹中的语音信息打包封装后上传至云端服务器。According to the packaging parameters for the one-time upload, the voice information in the temporary folder is packaged and packaged and uploaded to the cloud server.7.一种语音信息处理装置,其特征在于,包括:7. A voice information processing device, comprising:信息获取单元,用于获取智能设备的属性信息以及所述智能设备发送的语音信息;an information acquisition unit, configured to acquire attribute information of the smart device and voice information sent by the smart device;信息识别单元,用于对所述语音信息进行语音识别,确定所述语音信息对应的处理操作的操作类型;an information recognition unit, configured to perform voice recognition on the voice information, and determine the operation type of the processing operation corresponding to the voice information;上传方式确定单元,用于根据所述智能设备的属性信息和/或所述操作类型,确定所述语音信息的上传方式;an upload mode determination unit, configured to determine an upload mode of the voice information according to the attribute information of the smart device and/or the operation type;信息上传单元,用于基于确定的所述上传方式,将所述语音信息上传至云端服务器。An information uploading unit, configured to upload the voice information to the cloud server based on the determined uploading method.8.根据权利要求7所述的语音信息处理装置,其特征在于,所述上传方式确定单元包括:8. The voice information processing device according to claim 7, wherein the uploading mode determining unit comprises:应用场景确定模块,用于根据所述智能设备的属性信息,确定所述智能设备的属性信息对应的应用场景类型;an application scenario determination module, configured to determine an application scenario type corresponding to the attribute information of the smart device according to the attribute information of the smart device;第一方式确定模块,用于根据所述应用场景类型与预设的场景上传方式对照表,确定所述智能设备的应用场景类型对应的语音信息的上传方式。The first mode determination module is configured to determine the upload mode of the voice information corresponding to the application scene type of the smart device according to the comparison table between the application scene type and the preset scene upload mode.9.一种智能终端,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述的语音信息处理方法。9. An intelligent terminal, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the computer program as claimed in the claims The voice information processing method according to any one of 1 to 6.10.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的语音信息处理方法。10. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the voice information according to any one of claims 1 to 6 is realized Approach.
CN201911363138.7A2019-12-262019-12-26Voice information processing method and device, storage medium and intelligent terminalActiveCN111210826B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201911363138.7ACN111210826B (en)2019-12-262019-12-26Voice information processing method and device, storage medium and intelligent terminal

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911363138.7ACN111210826B (en)2019-12-262019-12-26Voice information processing method and device, storage medium and intelligent terminal

Publications (2)

Publication NumberPublication Date
CN111210826Atrue CN111210826A (en)2020-05-29
CN111210826B CN111210826B (en)2022-08-05

Family

ID=70786468

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911363138.7AActiveCN111210826B (en)2019-12-262019-12-26Voice information processing method and device, storage medium and intelligent terminal

Country Status (1)

CountryLink
CN (1)CN111210826B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113782021A (en)*2021-09-142021-12-10海信电子科技(武汉)有限公司Display device and prompt tone playing method
CN113886138A (en)*2021-09-292022-01-04深信服科技股份有限公司 A user profile management method, apparatus, device and computer medium
CN114157523A (en)*2021-11-242022-03-08珠海格力电器股份有限公司Data reporting method and device, intelligent household equipment and storage medium
CN115866012A (en)*2021-09-242023-03-28中移(杭州)信息技术有限公司Set top box control method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110118861A1 (en)*2009-11-162011-05-19Yamaha CorporationSound processing apparatus
JP2012247679A (en)*2011-05-302012-12-13Nippon Telegr & Teleph Corp <Ntt>Text and voice feature amount collection method, system therefor, and program
US20130096925A1 (en)*2011-10-132013-04-18Kia Motors CorporationSystem for providing a sound source information management service
CN103617797A (en)*2013-12-092014-03-05腾讯科技(深圳)有限公司Voice processing method and device
JP2017107198A (en)*2015-12-022017-06-15悠之介 北Voice collection method and voice transplantation method
CN107342079A (en)*2017-07-052017-11-10谌勋A kind of acquisition system of the true voice based on internet
CN108010518A (en)*2017-12-132018-05-08腾讯科技(深圳)有限公司A kind of voice acquisition method, system and the storage medium of interactive voice equipment
CN108510290A (en)*2018-03-122018-09-07平安科技(深圳)有限公司Customer information amending method, device, computer equipment and storage medium in call
CN109192205A (en)*2018-09-122019-01-11深圳市酷搏创新科技有限公司A kind of intelligent speech interactive system and its control method
US20190166423A1 (en)*2016-07-272019-05-30Sound Devices LlcNetwork system for reliable reception of wireless audio
US20190198041A1 (en)*2017-12-272019-06-27Toyota Jidosha Kabushiki KaishaInformation providing apparatus
CN109961781A (en)*2017-12-222019-07-02深圳市优必选科技有限公司Robot-based voice information receiving method and system and terminal equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110118861A1 (en)*2009-11-162011-05-19Yamaha CorporationSound processing apparatus
JP2012247679A (en)*2011-05-302012-12-13Nippon Telegr & Teleph Corp <Ntt>Text and voice feature amount collection method, system therefor, and program
US20130096925A1 (en)*2011-10-132013-04-18Kia Motors CorporationSystem for providing a sound source information management service
CN103617797A (en)*2013-12-092014-03-05腾讯科技(深圳)有限公司Voice processing method and device
JP2017107198A (en)*2015-12-022017-06-15悠之介 北Voice collection method and voice transplantation method
US20190166423A1 (en)*2016-07-272019-05-30Sound Devices LlcNetwork system for reliable reception of wireless audio
CN107342079A (en)*2017-07-052017-11-10谌勋A kind of acquisition system of the true voice based on internet
CN108010518A (en)*2017-12-132018-05-08腾讯科技(深圳)有限公司A kind of voice acquisition method, system and the storage medium of interactive voice equipment
CN109961781A (en)*2017-12-222019-07-02深圳市优必选科技有限公司Robot-based voice information receiving method and system and terminal equipment
US20190198041A1 (en)*2017-12-272019-06-27Toyota Jidosha Kabushiki KaishaInformation providing apparatus
CN108510290A (en)*2018-03-122018-09-07平安科技(深圳)有限公司Customer information amending method, device, computer equipment and storage medium in call
CN109192205A (en)*2018-09-122019-01-11深圳市酷搏创新科技有限公司A kind of intelligent speech interactive system and its control method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG YONGXIA: "FPGA-based High-speed Data Acquisition and Transmission of Voice Logging System", 《ELECTRONIC SCIENCE AND TECHNOLOGY 》, 15 September 2011 (2011-09-15), pages 42 - 44*
施荣荣: "基于USB2.0接口语音采集系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, 30 April 2007 (2007-04-30), pages 140 - 315*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113782021A (en)*2021-09-142021-12-10海信电子科技(武汉)有限公司Display device and prompt tone playing method
CN113782021B (en)*2021-09-142023-10-24Vidaa(荷兰)国际控股有限公司Display equipment and prompt tone playing method
CN115866012A (en)*2021-09-242023-03-28中移(杭州)信息技术有限公司Set top box control method and device, electronic equipment and storage medium
CN113886138A (en)*2021-09-292022-01-04深信服科技股份有限公司 A user profile management method, apparatus, device and computer medium
CN114157523A (en)*2021-11-242022-03-08珠海格力电器股份有限公司Data reporting method and device, intelligent household equipment and storage medium
CN114157523B (en)*2021-11-242022-10-11珠海格力电器股份有限公司Data reporting method and device, intelligent household equipment and storage medium

Also Published As

Publication numberPublication date
CN111210826B (en)2022-08-05

Similar Documents

PublicationPublication DateTitle
CN111210826B (en)Voice information processing method and device, storage medium and intelligent terminal
CN113656503A (en)Data synchronization method, device and system and computer readable storage medium
CN108259318A (en) A method and device for distributing information
CN110474794B (en) An information conversion method and system for SDN architecture
CN106383764A (en)Data acquisition method and device
WO2017107514A1 (en)Offline transcoding method and system
CN109992252B (en)Data analysis method, terminal, device and storage medium based on Internet of things
CN103414762B (en)cloud backup method and device
CN111382241A (en)Session scene switching method and device
CN112350986B (en) Fragmentation shaping method and system for audio and video network transmission
CN109089174B (en) Method and device for processing multimedia data stream, and computer storage medium
CN112182046A (en)Information recommendation method, device, equipment and medium
CN113746932B (en) Network request merging method, device and electronic device, computer program product
CN112182327A (en)Data processing method, device, equipment and medium
CN110096612A (en)The acquisition methods and system of the online audio analysis data of voice log
CN108874994A (en)A kind of piecemeal reads the method, apparatus and computer storage medium of data
CN117041825A (en)Multi-audio output method, system, computer equipment and medium
CN106230880A (en)A kind of storage method of data and application server
CN115460446A (en) Alignment method, device and electronic equipment for multi-channel video signals
CN110516043A (en) Answer generation method and device for question answering system
CN112182047A (en)Information recommendation method, device, equipment and medium
CN115766977A (en)Video processing method, device, equipment and storage medium
CN113206997B (en)Method and device for simultaneously detecting quality of multi-service recorded audio data
CN115982160A (en)Data processing method, server, electronic device, and computer storage medium
CN115589479A (en) Debugging and testing method and system for camera interface of vehicle infotainment system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20231204

Address after:Room 601, 6th Floor, Building 13, No. 3 Jinghai Fifth Road, Beijing Economic and Technological Development Zone (Tongzhou), Tongzhou District, Beijing, 100176

Patentee after:Beijing Youbixuan Intelligent Robot Co.,Ltd.

Address before:518000 16th and 22nd Floors, C1 Building, Nanshan Zhiyuan, 1001 Xueyuan Avenue, Nanshan District, Shenzhen City, Guangdong Province

Patentee before:Shenzhen UBTECH Technology Co.,Ltd.


[8]ページ先頭

©2009-2025 Movatter.jp