Movatterモバイル変換


[0]ホーム

URL:


CN115171672B - Voice processing method, device and computer storage medium - Google Patents

Voice processing method, device and computer storage medium

Info

Publication number
CN115171672B
CN115171672BCN202110359571.4ACN202110359571ACN115171672BCN 115171672 BCN115171672 BCN 115171672BCN 202110359571 ACN202110359571 ACN 202110359571ACN 115171672 BCN115171672 BCN 115171672B
Authority
CN
China
Prior art keywords
voice
voice operation
operation command
user
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110359571.4A
Other languages
Chinese (zh)
Other versions
CN115171672A (en
Inventor
应臻恺
时红仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qwik Smart Technology Co Ltd
Original Assignee
Shanghai Qwik Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qwik Smart Technology Co LtdfiledCriticalShanghai Qwik Smart Technology Co Ltd
Priority to CN202110359571.4ApriorityCriticalpatent/CN115171672B/en
Publication of CN115171672ApublicationCriticalpatent/CN115171672A/en
Application grantedgrantedCritical
Publication of CN115171672BpublicationCriticalpatent/CN115171672B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请公开了一种语音处理方法、装置及计算机存储介质,所述方法包括以下步骤:获取用户在输入对目标操作对象的第一语音操作命令之前,已输入的对所述目标操作对象的至少一第二语音操作命令;其中,基于所述第一语音操作命令执行的操作满足用户需求,而基于所述第二语音操作命令执行的操作不满足用户需求;根据所述第二语音操作命令的数量进行语音识别满意度分析,得到用于表征所述用户对语音识别满意度的分析结果。本申请提供的语音处理方法、装置及计算机存储介质,通过对用户使用语音识别的情况进行分析,获取用户对语音识别满意度,能够为语音识别优化提供帮助。

The present application discloses a speech processing method, device, and computer storage medium, the method comprising the following steps: obtaining at least one second voice operation command for a target operation object that has been input by a user before the user inputs a first voice operation command for the target operation object; wherein the operation performed based on the first voice operation command meets the user's needs, while the operation performed based on the second voice operation command does not meet the user's needs; performing a speech recognition satisfaction analysis based on the number of the second voice operation commands to obtain an analysis result for characterizing the user's satisfaction with speech recognition. The speech processing method, device, and computer storage medium provided in the present application can provide assistance for speech recognition optimization by analyzing the user's use of speech recognition and obtaining the user's satisfaction with speech recognition.

Description

Speech processing method, device and computer storage medium
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a speech processing method, apparatus, and computer storage medium.
Background
With the rapid development of voice recognition technology, the application of voice recognition functions on vehicles is becoming more common, and the use of voice recognition functions by users is becoming more and more popular. In order to optimize speech recognition, it is generally necessary to analyze the satisfaction of the user with speech recognition, and the existing ways of analyzing the satisfaction of the user with speech recognition mainly rely on analyzing whether the speech input by the user is successfully recognized, however, in the case of successful speech recognition input by the user, the operations performed by the vehicle based on the input speech may not be desired by the user, and these may also affect the satisfaction of the user with speech recognition.
The foregoing description is provided for general background information and does not necessarily constitute prior art.
Disclosure of Invention
It is an object of the present invention to provide a voice processing method, apparatus and computer storage medium, which are advantageous in that a user's satisfaction with voice recognition can be obtained by analyzing the situation where the user uses voice recognition, thereby being able to provide assistance for voice recognition optimization.
Another object of the present invention is to provide a voice processing method, apparatus, and computer storage medium, which are advantageous in that by analyzing the input time of different voice operation commands, the satisfaction of voice recognition of the corresponding voice operation commands can be accurately obtained, and the operation is simple and convenient.
It is another object of the present invention to provide a voice processing method, apparatus, and computer storage medium, which are advantageous in that voice recognition optimization is further facilitated by improving voice operation commands whose voice recognition results do not satisfy the user's needs.
Other advantages and features of the present invention will become more fully apparent from the following detailed description, and may be learned by the practice of the invention as set forth hereinafter.
According to an invention of the present invention, the foregoing and other objects and advantages are achieved by a speech processing method of the present invention, comprising the steps of:
Acquiring at least one second voice operation command input by a user to a target operation object before inputting a first voice operation command to the target operation object, wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;
And carrying out voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result used for representing the user's satisfaction degree of voice recognition.
According to one embodiment of the present invention, the step of obtaining at least one second voice operation command input by the user to the target operation object before inputting the first voice operation command to the target operation object includes the following steps:
Acquiring a historical voice operation command set consisting of historical voice operation commands with adjacent input times and time intervals meeting preset conditions;
And determining the historical voice operation commands corresponding to the latest input time in the historical voice operation command set as a first voice operation command, and determining the historical voice operation commands except the historical voice operation commands corresponding to the latest input time as a second voice operation command.
According to one embodiment of the present invention, after the voice recognition satisfaction analysis is performed according to the number of the second voice operation commands, the method further includes the following steps:
And taking the second voice operation command as the input of a set voice recognition model, and training the voice recognition model based on the operation executed by the first voice operation command as the output of the voice recognition model.
Correspondingly, the invention provides a device for executing the voice processing method, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor is used for realizing the following steps when executing the computer program, the processor is used for acquiring at least one second voice operation command input to a target operation object before the first voice operation command input to the target operation object by a user, wherein the operation executed based on the first voice operation command meets the user requirement and the operation executed based on the second voice operation command does not meet the user requirement, and the voice recognition satisfaction analysis is carried out according to the number of the second voice operation commands to obtain an analysis result used for representing the voice recognition satisfaction of the user.
Accordingly, the present invention provides a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described speech processing method.
Drawings
Fig. 1 is a schematic flow chart of a voice processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a speech processing device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element(s) defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other like elements in different embodiments of the application having the same meaning as may be defined by the same meaning as they are explained in this particular embodiment or by further reference to the context of this particular embodiment.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context. Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, steps, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, steps, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of A, B, C, A and B, A and C, B and C, A, B and C". An exception to this definition will occur only when a combination of elements, functions, steps or operations are in some way inherently mutually exclusive.
It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily occurring in sequence, but may be performed alternately or alternately with other steps or at least a portion of the other steps or stages.
It should be noted that, in this document, step numbers such as S101 and S102 are adopted, and the purpose of the present application is to more clearly and briefly describe the corresponding content, and not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S102 before executing S101 in the implementation, which are all within the scope of the present application.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present application, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
Referring to fig. 1, a flow chart of a voice processing method provided by an embodiment of the present invention is shown, the method may be suitable for analyzing the satisfaction degree of voice recognition of a user, the method may be performed by a voice processing device provided by an embodiment of the present invention, the voice processing device may be implemented in software and/or hardware, the voice processing device may specifically be a terminal such as a mobile phone, a car machine, a wearable device or a server, etc., and in this embodiment, the method is described by using the application of the voice processing method to a car machine as an example, and the method includes the following steps:
Step S101, acquiring at least one second voice operation command input by a user to a target operation object before inputting the first voice operation command to the target operation object, wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;
The operation performed based on the first voice operation command meets the user requirement, which means that the operation performed based on the first voice operation command is the operation the user wants to perform or reaches, for example, if the operation performed based on the first voice operation command is to turn on the air conditioner, the operation performed based on the first voice operation command meets the user requirement at this time is indicated to be the operation performed based on the first voice operation command, if the operation performed based on the first voice operation command is not to turn on the air conditioner, for example, the window is opened, the operation performed based on the first voice operation command does not meet the user requirement at this time. It should be noted that, the vehicle machine successfully recognizes the first voice operation command and the second voice operation command, and only the recognition results are different, that is, the vehicle machine performs different operations based on the first voice operation command and the second voice operation command respectively. In addition, the input time interval between the first voice operation command and the second voice operation command should be less than a preset duration threshold. The target operation object may be a specific object, such as an air conditioner, a car window, etc., or may be an abstract object, such as an on-board radio, a multimedia application, etc. That is, when the operation performed based on the first voice operation command satisfies the user demand, it is explained that the vehicle machine recognizes the first voice operation command successfully and the operation to be correspondingly performed is desired by the user, and when the operation performed based on the second voice operation command does not satisfy the user demand, it is explained that the vehicle machine recognizes the second voice operation command successfully but the operation to be correspondingly performed is not desired by the user. It will be appreciated that, when the user performs the voice control operation through the vehicle, if the user wants to control the vehicle to perform the operation a, the user may input a corresponding voice operation command, and the vehicle may recognize the operation to be performed by the voice operation command as the operation B due to technical factors such as recognition accuracy and/or artifacts such as accents, at this time, the user may continue to input the same voice operation command or input the adjusted voice operation command, until the vehicle successfully performs the operation a based on the input voice operation command, the voice input is stopped, at this time, the last input voice operation command is determined as the first voice operation command, and the voice operation command with the input time before the last input voice operation command is determined as the second voice operation command.
In one embodiment, the step of obtaining at least one second voice operation command input by the user to the target operation object before inputting the first voice operation command to the target operation object includes the following steps:
Acquiring a historical voice operation command set consisting of historical voice operation commands with adjacent input times and time intervals meeting preset conditions;
And determining the historical voice operation commands corresponding to the latest input time in the historical voice operation command set as a first voice operation command, and determining the historical voice operation commands except the historical voice operation commands corresponding to the latest input time as a second voice operation command.
It can be understood that, the user typically continues to input the voice operation command after the vehicle fails to recognize the input voice operation command, so that each voice operation command has an association relationship in time, that is, the time interval between every two voice operation commands is not too long and is smaller than the time interval when the user normally uses the voice recognition function, so that the first voice operation command and the second voice operation command can be acquired based on the input time and the time interval. It should be noted that, since the operations to be performed by the vehicle machine are different, the number of the obtained first voice operation commands may be plural, and correspondingly, the number of the second voice operation commands may be plural. Wherein, the preset condition may include that a time interval is smaller than a minimum time interval or an average time interval of the user using a voice recognition function. Here, the history data of the user using the voice recognition function may be collected, and the history data may be analyzed to obtain characteristics or habits of the user using the voice recognition function, such as a maximum time interval, a minimum time interval, or an average time interval. Therefore, the first voice operation command and the second voice operation command are acquired based on the input time and the time interval, and the satisfaction condition of voice recognition of the corresponding voice operation command is accurately acquired by analyzing the input time of different voice operation commands, so that the voice recognition method is simple and convenient to operate.
In an embodiment, the step of obtaining the historical voice operation command set including the historical voice operation commands with adjacent input times and time intervals meeting the preset condition includes the following steps:
sequencing historical voice operation commands input by a user within a preset duration according to the sequence of the input time from front to back;
determining a target historical voice operation command from the sequenced historical voice operation commands, wherein the time interval between the input time of the target historical voice operation command and the input time of the last historical voice operation command does not meet a preset condition, and the time interval between the input time of the next historical voice operation command meets the preset condition;
And taking the target historical voice operation command as a starting point, sequentially selecting the historical voice operation commands with adjacent input time and time intervals meeting preset conditions from the sequenced historical voice operation commands, and adding the historical voice operation commands into a historical voice operation command set.
The preset duration may be set according to actual needs, for example, may be set to 30 days, 60 days, or the like. When the time interval between the input time of a history voice operation command and the input time of a previous history voice operation command does not satisfy a preset condition and the time interval between the input time of a next history voice operation command satisfies a preset condition, it is explained that the user wants the history voice operation command and the operation performed by the next history voice operation command should be identical, that is, the content of the history voice operation command and the next history voice operation command should be identical, but the user wants the history voice operation command and the operation performed by the previous history voice operation command should be different, that is, the content of the history voice operation command and the content of the previous history voice operation command should be different. As can be appreciated, when the operation performed by the vehicle based on the currently input voice operation command does not meet the user requirement, the user can rapidly and continuously input the next voice operation command, that is, in order to enable the vehicle to realize the same operation, the input time interval between every two voice operation commands is not too long, so that the target historical voice operation command is taken as a starting point, and the historical voice operation commands with adjacent input times and time intervals meeting the preset conditions are sequentially selected from the sequenced historical voice operation commands to be added into the historical voice operation command set. Therefore, the required voice operation instruction can be accurately extracted by analyzing the input time of the historical voice operation instruction, and the accuracy of acquiring the voice recognition satisfaction degree is further improved.
And S102, performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result used for representing the user' S voice recognition satisfaction.
Here, if the number of the second voice operation commands is smaller, it means that the number of the voice operation commands input by the user is smaller, the voice recognition accuracy of the vehicle is higher, the user satisfaction degree of voice recognition is higher, and if the number of the second voice operation commands is larger, it means that the number of the voice operation commands input by the user is larger, the voice recognition accuracy of the vehicle is lower, and the user satisfaction degree of voice recognition is lower. Therefore, the analysis result obtained by analyzing the satisfaction degree of the voice recognition according to the number of the second voice operation commands can represent the satisfaction degree of the voice recognition of the user, namely, the accuracy rate of the voice recognition can be known, and further, the voice recognition optimization can be assisted.
In one embodiment, the voice recognition satisfaction analysis is performed according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user, and the method includes the steps of determining a target satisfaction level corresponding to the number of the second voice operation commands according to the number of the second voice operation commands and a corresponding relation between different numbers of preset voice operation commands and satisfaction levels, so as to obtain an analysis result for representing the voice recognition satisfaction of the user. The corresponding relation between different numbers of the preset voice operation commands and the satisfaction degree level can be set according to actual situation requirements, for example, the level corresponding to the voice recognition satisfaction degree can be set to be ten levels when the number of the second voice operation commands is 0, the level corresponding to the voice recognition satisfaction degree can be set to be nine levels when the number of the second voice operation commands is 1, and the like. In addition, the voice recognition satisfaction may also be scored according to the number of the second voice operation commands. Therefore, the satisfaction degree of the user on the voice recognition can be rapidly evaluated, and the analysis efficiency is improved.
In summary, in the voice processing method provided in the above embodiment, by analyzing the situation that the user uses voice recognition, the satisfaction degree of the user on voice recognition is obtained, so that the voice processing method can provide assistance for voice recognition optimization.
In one embodiment, after the analysis of the voice recognition satisfaction according to the number of the second voice operation commands to obtain the analysis result for representing the user's voice recognition satisfaction, the method further includes the steps of taking the second voice operation commands as input of a set voice recognition model, and training the voice recognition model based on the operations executed by the first voice operation commands as output of the voice recognition model. It may be appreciated that, since the operation performed based on the first voice operation command meets the user requirement, and the operation performed based on the second voice operation command does not meet the user requirement, it may be considered that the set voice recognition model is wrong for the recognition result of the second voice operation command, and the factors affecting the recognition result may be various, such as speaking accent differences, co-voice words, polyphones, etc., of people located in different regions, so that the second voice operation command is used as an input of the set voice recognition model, and the voice recognition model is trained based on the operation performed by the first voice operation command as an output of the voice recognition model, so as to improve the adaptability of the voice recognition model, thereby correspondingly improving the recognition accuracy. It should be noted that, the speech recognition model may be established by using an artificial intelligence algorithm, such as a genetic algorithm, a neural network algorithm, etc., based on historical speech operation commands of different users and corresponding recognition results. Therefore, the voice recognition model is trained by adopting the voice operation command actually input by the user, so that the adaptability of the voice recognition model can be effectively improved, and the recognition precision is correspondingly improved.
In an embodiment, after the voice recognition satisfaction analysis is performed according to the number of the second voice operation commands to obtain an analysis result for characterizing the user satisfaction of voice recognition, the method further includes the following steps:
carrying out semantic recognition on the second voice operation command to obtain at least one keyword;
and establishing an association relation between the at least one keyword and an operation executed based on the first voice operation command, and storing the association relation into a set voice command library.
It can be understood that, by performing semantic recognition on the second voice operation command, at least one keyword included in the second voice operation command can be obtained, and whether the keyword included in the voice operation command can be correctly recognized during the voice recognition process affects the accuracy of the voice recognition result. In order to improve the satisfaction degree of voice recognition, the association relationship between the keywords included in the second voice operation command, which is executed by the user and does not meet the user requirement, and the operation executed based on the first voice operation command is established, and the association relationship is stored in a set voice command library, so that the operation executed based on the second voice operation command meets the user requirement when the user subsequently inputs the second voice operation command. For example, if a user inputs a voice, a voice error is generated for a polyphone, for example, the user should make a second voice, and at this time, the operation performed by the vehicle machine based on the voice may not meet the user requirement, so the word corresponding to the fourth voice reading of the polyphone may be associated with the operation correctly triggered when the polyphone reads the second voice, thereby improving the voice recognition satisfaction of the user. Therefore, by analyzing different voice operation commands aiming at the target operation object, when the operation executed by the subsequent input does not meet the voice operation command required by the user, the operation meeting the user requirement can be triggered and executed correctly, and the voice recognition satisfaction degree of the user is further improved.
In an embodiment, after the association relationship is stored in the voice command library, the method further includes the following steps:
and outputting a prompt message, wherein the prompt message is used for indicating that the second voice operation command is input and the operation executed based on the first voice operation command can be executed.
It may be appreciated that after the second voice operation command is input to the vehicle, if the operation performed by the vehicle based on the second voice operation command does not meet the user requirement, some users may continue to attempt to input voice, and some users may not continue to use voice functions. Therefore, after the association relation between the at least one keyword and the operation executed based on the first voice operation command is established, a prompt message for indicating that the operation executed based on the first voice operation command is executable by inputting the second voice operation command can be output, so that a user is encouraged to use the voice function, and the convenience and the accuracy of using the voice recognition function are improved. It is assumed that the user has input a voice "blow glass window" and the car machine replies the voice "do not understand what you are speaking", at which time the user may not use the voice function anymore and feel the voice bad. After one month, the car machine can prompt "you can say that the glass window is blown down, and the defrosting function can be opened".
Based on the same inventive concept as the previous embodiments, the present embodiment describes in detail, by different examples, the technical solution of the speech processing method provided in the previous embodiments.
Example one
The voice processing method provided by the example aims at integrally providing an index representing the voice recognition maturity capability through carrying out effectiveness statistics on the behaviors of voice operation of a user, establishing a data model of a voice function and carrying out monitoring and promotion of the user satisfaction.
The implementation principle of the voice processing method provided in this example is as follows:
firstly, when voice recognition is started, digital signals of voice are collected and recorded and stored as voice binary files;
Recording and analyzing the recognition result, if an operation function is triggered correspondingly, the recognition of the voice of the user is not correct, and the correctness of the operation of the voice function can be judged by combining manual operation continuously, for example, if the user directly starts navigation after inputting the navigation destination by voice, the navigation destination is correct by indicating that the operation executed based on the voice command is correct, otherwise, the user is very likely to resume voice input, even a plurality of voice inputs, or possibly text input by the user, the operation executed based on the voice command is incorrect, the recognition rate is very groove cake, or if the user uses voice to control song playing, the user normally plays corresponding songs, but the user does not have further operation, the user continuously inputs related voice instructions, the user is not hit the requirement, namely the song identification contained by the voice is incorrect.
Finally, scoring is performed according to the number of voice operations, for example, if voice operations are completed once, it is 100 points, if the second voice operation is identified successfully, it is 90 points, and so on. Of course, the success rate of the voice command operation of the user can also be counted, and the number of the success times is 1 to 10.
Example two
The voice processing method provided by the example aims at analyzing the time of voice instructions, detecting whether the voice functions of a user are still used or not, improving a cloud model of a voice which is not recognized before and a command which is not executed, remotely upgrading to a command library, prompting that the user can use the voice instructions, and prompting that the voice input is different for different users and prompting is also different.
The implementation principle of the voice processing method provided in this example is as follows:
Firstly, counting the time and frequency of voice recognition used by a user, and establishing user voice commands ordered in time;
Then, analyzing the user voice command of the time sequence to obtain the maximum interval time, the last time and the previous time;
Then, counting the voice frequency of the user before, if the time interval is within a few seconds, representing that the user continuously inputs voice, representing that the recognition accuracy of the voice command of the user is low, and judging that the user is not satisfied at the moment;
Then, carrying out voice analysis and statistics on the original voice file of the user, and carrying out incremental improvement on the voice which is not correctly recognized by the detection analysis;
finally, these users are individually prompted with no previously recognized speech, and these quick user speech commands can now be recognized and added.
Therefore, the maturity of the voice function of the user can be monitored to analyze and improve the satisfaction degree of the voice command of the user and the intention of the user, the user experience is improved, the main intention of the user for using the voice is conveniently counted, the use convenience and the accuracy of the functions are improved, the basis is provided for diagnosing the reasons of the instruction faults of the user, and the debugging and the improvement are convenient.
Based on the same inventive concept as the previous embodiments, the embodiments of the present invention provide a speech processing apparatus, as shown in fig. 2, where the speech processing apparatus includes a processor 110 and a memory 111 for storing a computer program capable of running on the processor 110, where the number of processors 110 shown in fig. 2 is not used to refer to one processor 110, but is only used to refer to a positional relationship of the processor 110 with respect to other devices, in practical application, the number of processors 110 may be one or more, and similarly, the memory 111 shown in fig. 2 is also used in the same meaning, that is, only used to refer to a positional relationship of the memory 111 with respect to other devices, and in practical application, the number of the memories 111 may be one or more. Wherein the processor 110 is configured to implement the steps of the above-mentioned speech processing method when running the computer program.
The speech processing device can also include at least one network interface 112. The various components of the speech processing device are coupled together by a bus system 113. It is understood that the bus system 113 is used to enable connected communications between these components. The bus system 113 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 113.
The memory 111 may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. the non-volatile Memory may be, among other things, a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), Magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only Memory (CD-ROM, compact Disc Read-Only Memory), which may be disk Memory or tape Memory. the volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), Direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). the memory 111 described in embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 111 in the embodiment of the present invention is used to store various types of data to support the operation of the speech processing device. Examples of such data include any computer programs for operating on the speech processing device, such as operating systems and applications, contact data, phonebook data, messages, pictures, videos, etc. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application programs may include various application programs such as a media player (MEDIA PLAYER), a Browser (Browser), etc. for implementing various application services. Here, a program for implementing the method of the embodiment of the present invention may be included in an application program.
Based on the same inventive concept as the previous embodiments, the present embodiment further provides a computer storage medium in which a computer program is stored, where the computer storage medium may be a magnetic random access Memory (FRAM, ferromagnetic random access Memory), a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), an erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disc, or a compact disc-Read Only Memory (CD-ROM), or may be various devices including one or any combination of the above, such as a mobile phone, a computer, a tablet device, a personal digital assistant, and the like. Wherein the computer program, when being executed by a processor, implements the steps of the above-mentioned speech processing method.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a list of elements is included, and may include other elements not expressly listed.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

Translated fromChinese
1.一种语音处理方法,其特征在于,所述方法包括以下步骤:1. A speech processing method, characterized in that the method comprises the following steps:获取用户在输入对目标操作对象的第一语音操作命令之前,已输入的对所述目标操作对象的至少一第二语音操作命令;其中,基于所述第一语音操作命令执行的操作满足用户需求,而基于所述第二语音操作命令执行的操作不满足用户需求;Obtaining at least one second voice operation command for a target operation object that has been input by a user before the user inputs a first voice operation command for the target operation object; wherein an operation performed based on the first voice operation command meets the user's needs, and an operation performed based on the second voice operation command does not meet the user's needs;根据所述第二语音操作命令的数量进行语音识别满意度分析,得到用于表征所述用户对语音识别满意度的分析结果;其中,所述第二语音操作命令的数量与所述语音识别满意度负相关;performing a voice recognition satisfaction analysis based on the number of the second voice operation commands to obtain an analysis result representing the user's satisfaction with the voice recognition; wherein the number of the second voice operation commands is negatively correlated with the voice recognition satisfaction;所述获取用户在输入对目标操作对象的第一语音操作命令之前,已输入的对所述目标操作对象的至少一第二语音操作命令,包括以下步骤:The step of obtaining at least one second voice operation command on the target operation object input by the user before the user inputs the first voice operation command on the target operation object comprises the following steps:获取由输入时间相邻且时间间隔满足预设条件的历史语音操作命令组成的历史语音操作命令集合;Obtain a set of historical voice operation commands consisting of historical voice operation commands whose input times are adjacent and whose time intervals meet a preset condition;将所述历史语音操作命令集合中最晚输入时间对应的历史语音操作命令确定为第一语音操作命令,以及将除所述最晚输入时间对应的历史语音操作命令之外的历史语音操作命令确定为第二语音操作命令。The historical voice operation command corresponding to the latest input time in the historical voice operation command set is determined as a first voice operation command, and the historical voice operation commands other than the historical voice operation command corresponding to the latest input time are determined as second voice operation commands.2.如权利要求1所述的方法,所述预设条件包括时间间隔小于所述用户使用语音识别功能的最小时间间隔或平均时间间隔。2 . The method according to claim 1 , wherein the preset condition includes that the time interval is less than a minimum time interval or an average time interval of the user using the voice recognition function.3.如权利要求1或2所述的方法,所述获取由输入时间相邻且时间间隔满足预设条件的历史语音操作命令组成的历史语音操作命令集合,包括以下步骤:3. The method according to claim 1 or 2, wherein obtaining a set of historical voice operation commands consisting of historical voice operation commands inputted at adjacent times and with a time interval that satisfies a preset condition comprises the following steps:对用户在预设时长内输入的历史语音操作命令按照输入时间从前往后的顺序进行排序;Sort the historical voice operation commands entered by the user within a preset time period in order of input time from the earliest to the latest;从排序后的历史语音操作命令中确定目标历史语音操作命令,所述目标历史语音操作命令的输入时间与上一历史语音操作指令的输入时间之间的时间间隔不满足预设条件且与下一历史语音操作指令的输入时间之间的时间间隔满足预设条件;Determining a target historical voice operation command from the sorted historical voice operation commands, wherein the time interval between the input time of the target historical voice operation command and the input time of the previous historical voice operation command does not meet a preset condition, but the time interval between the input time of the target historical voice operation command and the input time of the next historical voice operation command meets a preset condition;以所述目标历史语音操作命令为起点,从排序后的历史语音操作命令中依次选择输入时间相邻且时间间隔满足预设条件的历史语音操作命令加入历史语音操作命令集合。Taking the target historical voice operation command as a starting point, historical voice operation commands whose input times are adjacent and whose time intervals meet a preset condition are selected in sequence from the sorted historical voice operation commands and added to the historical voice operation command set.4.如权利要求1所述的方法,所述根据所述第二语音操作命令的数量进行语音识别满意度分析,得到用于表征所述用户对语音识别满意度的分析结果之后,还包括以下步骤:4. The method according to claim 1, further comprising the following steps: performing a voice recognition satisfaction analysis based on the number of the second voice operation commands to obtain an analysis result representing the user's satisfaction with the voice recognition:将所述第二语音操作命令作为设置的语音识别模型的输入,基于所述第一语音操作命令执行的操作作为所述语音识别模型的输出,对所述语音识别模型进行训练。The second voice operation command is used as the input of the set voice recognition model, and the operation performed based on the first voice operation command is used as the output of the voice recognition model to train the voice recognition model.5.如权利要求1所述的方法,所述根据所述第二语音操作命令的数量进行语音识别满意度分析,得到用于表征所述用户对语音识别满意度的分析结果之后,还包括以下步骤:5. The method according to claim 1, further comprising the following steps: performing a voice recognition satisfaction analysis based on the number of the second voice operation commands to obtain an analysis result representing the user's satisfaction with the voice recognition;对所述第二语音操作命令进行语义识别,获得至少一关键词;performing semantic recognition on the second voice operation command to obtain at least one keyword;建立所述至少一关键词与基于所述第一语音操作命令执行的操作的关联关系,并将所述关联关系存储至设置的语音命令库。An association relationship between the at least one keyword and an operation executed based on the first voice operation command is established, and the association relationship is stored in a set voice command library.6.如权利要求5所述的方法,所述将所述关联关系存储至设置的语音命令库之后,还包括以下步骤:6. The method according to claim 5, further comprising the following steps after storing the association relationship in a preset voice command library:输出提示消息,所述提示消息用于指示输入所述第二语音操作命令可执行基于所述第一语音操作命令执行的操作。Output a prompt message, where the prompt message is used to indicate that inputting the second voice operation command can execute an operation based on the first voice operation command.7.如权利要求1所述的方法,所述根据所述第二语音操作命令的数量进行语音识别满意度分析,得到用于表征所述用户对语音识别满意度的分析结果,包括以下步骤:7. The method according to claim 1, wherein performing voice recognition satisfaction analysis based on the number of the second voice operation commands to obtain an analysis result representing the user's satisfaction with voice recognition comprises the following steps:根据所述第二语音操作命令的数量以及预设的语音操作命令的不同数量与满意度等级之间的对应关系,确定与所述第二语音操作命令的数量对应的目标满意度等级,以得到用于表征所述用户对语音识别满意度的分析结果。Based on the number of the second voice operation commands and the correspondence between the different numbers of preset voice operation commands and the satisfaction levels, the target satisfaction level corresponding to the number of the second voice operation commands is determined to obtain an analysis result for characterizing the user's satisfaction with voice recognition.8.一种语音处理装置,包括:8. A speech processing device, comprising:存储器,被配置为存储一个或多个计算机程序;a memory configured to store one or more computer programs;以及处理器,耦合至所述存储器并且被配置为执行所述一个或多个计算机程序使所述语音处理装置执行如权利要求1至7中任一项所述语音处理方法的步骤。and a processor coupled to the memory and configured to execute the one or more computer programs to enable the speech processing apparatus to perform the steps of the speech processing method according to any one of claims 1 to 7.9.一种计算机存储介质,所述计算机存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述语音处理方法的步骤。9. A computer storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the speech processing method according to any one of claims 1 to 7 are implemented.
CN202110359571.4A2021-04-022021-04-02 Voice processing method, device and computer storage mediumActiveCN115171672B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110359571.4ACN115171672B (en)2021-04-022021-04-02 Voice processing method, device and computer storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110359571.4ACN115171672B (en)2021-04-022021-04-02 Voice processing method, device and computer storage medium

Publications (2)

Publication NumberPublication Date
CN115171672A CN115171672A (en)2022-10-11
CN115171672Btrue CN115171672B (en)2025-08-26

Family

ID=83476373

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110359571.4AActiveCN115171672B (en)2021-04-022021-04-02 Voice processing method, device and computer storage medium

Country Status (1)

CountryLink
CN (1)CN115171672B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118447848B (en)*2024-07-082024-09-10北京华信有道科技有限公司Voice interaction control method and system for central control host and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
FR3102603A1 (en)*2019-10-242021-04-30Psa Automobiles Sa Method and device for evaluating a voice recognition system
WO2023226700A1 (en)*2022-05-272023-11-30京东方科技集团股份有限公司Voice interaction method and apparatus, electronic device, and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7724888B1 (en)*2005-03-182010-05-25Bevocal LlcAutomated method for determining caller satisfaction
KR101556594B1 (en)*2009-01-142015-10-01삼성전자 주식회사 Speech recognition method in signal processing apparatus and signal processing apparatus
JP6751658B2 (en)*2016-11-152020-09-09クラリオン株式会社 Voice recognition device, voice recognition system
US20180315415A1 (en)*2017-04-262018-11-01Soundhound, Inc.Virtual assistant with error identification
US10504514B2 (en)*2017-09-292019-12-10Visteon Global Technologies, Inc.Human machine interface system and method for improving user experience based on history of voice activity
US10872604B2 (en)*2018-05-172020-12-22Qualcomm IncorporatedUser experience evaluation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
FR3102603A1 (en)*2019-10-242021-04-30Psa Automobiles Sa Method and device for evaluating a voice recognition system
WO2023226700A1 (en)*2022-05-272023-11-30京东方科技集团股份有限公司Voice interaction method and apparatus, electronic device, and storage medium

Also Published As

Publication numberPublication date
CN115171672A (en)2022-10-11

Similar Documents

PublicationPublication DateTitle
CN108121795B (en) User behavior prediction method and device
CN110415705B (en)Hot word recognition method, system, device and storage medium
CN111797632B (en)Information processing method and device and electronic equipment
US11133002B2 (en)Systems and methods of real-time vehicle-based analytics and uses thereof
CN110544473B (en)Voice interaction method and device
CN112825249B (en) Speech processing method and device
CN110400563A (en) Vehicle voice command recognition method, device, computer equipment and storage medium
US20180166103A1 (en)Method and device for processing speech based on artificial intelligence
CN110675862A (en)Corpus acquisition method, electronic device and storage medium
CN112631850A (en)Fault scene simulation method and device
JP6495792B2 (en) Speech recognition apparatus, speech recognition method, and program
CN107256428A (en)Data processing method, data processing equipment, storage device and the network equipment
CN111125658A (en) Method, device, server and storage medium for identifying fraudulent users
CN112017663B (en)Voice generalization method and device and computer storage medium
US11664015B2 (en)Method for searching for contents having same voice as voice of target speaker, and apparatus for executing same
CN111540363B (en)Keyword model and decoding network construction method, detection method and related equipment
KR102737990B1 (en)Electronic device and method for training an artificial intelligence model related to a chatbot using voice data
CN114155860A (en) Abstract recording method, apparatus, computer equipment and storage medium
CN115171672B (en) Voice processing method, device and computer storage medium
CN111883109B (en)Voice information processing and verification model training method, device, equipment and medium
CN119943034A (en) Voice command recognition method, system, device and medium based on intention prediction
CN110704592B (en) Statement analysis processing method, apparatus, computer equipment and storage medium
CN115240659B (en)Classification model training method and device, computer equipment and storage medium
GB2557710A (en)Identifying contacts using speech recognition
CN113392009B (en) Automated testing exception handling method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp