Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element(s) defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other like elements in different embodiments of the application having the same meaning as may be defined by the same meaning as they are explained in this particular embodiment or by further reference to the context of this particular embodiment.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context. Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, steps, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, steps, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of A, B, C, A and B, A and C, B and C, A, B and C". An exception to this definition will occur only when a combination of elements, functions, steps or operations are in some way inherently mutually exclusive.
It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily occurring in sequence, but may be performed alternately or alternately with other steps or at least a portion of the other steps or stages.
It should be noted that, in this document, step numbers such as S101 and S102 are adopted, and the purpose of the present application is to more clearly and briefly describe the corresponding content, and not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S102 before executing S101 in the implementation, which are all within the scope of the present application.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present application, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
Referring to fig. 1, a flow chart of a voice processing method provided by an embodiment of the present invention is shown, the method may be suitable for analyzing the satisfaction degree of voice recognition of a user, the method may be performed by a voice processing device provided by an embodiment of the present invention, the voice processing device may be implemented in software and/or hardware, the voice processing device may specifically be a terminal such as a mobile phone, a car machine, a wearable device or a server, etc., and in this embodiment, the method is described by using the application of the voice processing method to a car machine as an example, and the method includes the following steps:
Step S101, acquiring at least one second voice operation command input by a user to a target operation object before inputting the first voice operation command to the target operation object, wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;
The operation performed based on the first voice operation command meets the user requirement, which means that the operation performed based on the first voice operation command is the operation the user wants to perform or reaches, for example, if the operation performed based on the first voice operation command is to turn on the air conditioner, the operation performed based on the first voice operation command meets the user requirement at this time is indicated to be the operation performed based on the first voice operation command, if the operation performed based on the first voice operation command is not to turn on the air conditioner, for example, the window is opened, the operation performed based on the first voice operation command does not meet the user requirement at this time. It should be noted that, the vehicle machine successfully recognizes the first voice operation command and the second voice operation command, and only the recognition results are different, that is, the vehicle machine performs different operations based on the first voice operation command and the second voice operation command respectively. In addition, the input time interval between the first voice operation command and the second voice operation command should be less than a preset duration threshold. The target operation object may be a specific object, such as an air conditioner, a car window, etc., or may be an abstract object, such as an on-board radio, a multimedia application, etc. That is, when the operation performed based on the first voice operation command satisfies the user demand, it is explained that the vehicle machine recognizes the first voice operation command successfully and the operation to be correspondingly performed is desired by the user, and when the operation performed based on the second voice operation command does not satisfy the user demand, it is explained that the vehicle machine recognizes the second voice operation command successfully but the operation to be correspondingly performed is not desired by the user. It will be appreciated that, when the user performs the voice control operation through the vehicle, if the user wants to control the vehicle to perform the operation a, the user may input a corresponding voice operation command, and the vehicle may recognize the operation to be performed by the voice operation command as the operation B due to technical factors such as recognition accuracy and/or artifacts such as accents, at this time, the user may continue to input the same voice operation command or input the adjusted voice operation command, until the vehicle successfully performs the operation a based on the input voice operation command, the voice input is stopped, at this time, the last input voice operation command is determined as the first voice operation command, and the voice operation command with the input time before the last input voice operation command is determined as the second voice operation command.
In one embodiment, the step of obtaining at least one second voice operation command input by the user to the target operation object before inputting the first voice operation command to the target operation object includes the following steps:
Acquiring a historical voice operation command set consisting of historical voice operation commands with adjacent input times and time intervals meeting preset conditions;
And determining the historical voice operation commands corresponding to the latest input time in the historical voice operation command set as a first voice operation command, and determining the historical voice operation commands except the historical voice operation commands corresponding to the latest input time as a second voice operation command.
It can be understood that, the user typically continues to input the voice operation command after the vehicle fails to recognize the input voice operation command, so that each voice operation command has an association relationship in time, that is, the time interval between every two voice operation commands is not too long and is smaller than the time interval when the user normally uses the voice recognition function, so that the first voice operation command and the second voice operation command can be acquired based on the input time and the time interval. It should be noted that, since the operations to be performed by the vehicle machine are different, the number of the obtained first voice operation commands may be plural, and correspondingly, the number of the second voice operation commands may be plural. Wherein, the preset condition may include that a time interval is smaller than a minimum time interval or an average time interval of the user using a voice recognition function. Here, the history data of the user using the voice recognition function may be collected, and the history data may be analyzed to obtain characteristics or habits of the user using the voice recognition function, such as a maximum time interval, a minimum time interval, or an average time interval. Therefore, the first voice operation command and the second voice operation command are acquired based on the input time and the time interval, and the satisfaction condition of voice recognition of the corresponding voice operation command is accurately acquired by analyzing the input time of different voice operation commands, so that the voice recognition method is simple and convenient to operate.
In an embodiment, the step of obtaining the historical voice operation command set including the historical voice operation commands with adjacent input times and time intervals meeting the preset condition includes the following steps:
sequencing historical voice operation commands input by a user within a preset duration according to the sequence of the input time from front to back;
determining a target historical voice operation command from the sequenced historical voice operation commands, wherein the time interval between the input time of the target historical voice operation command and the input time of the last historical voice operation command does not meet a preset condition, and the time interval between the input time of the next historical voice operation command meets the preset condition;
And taking the target historical voice operation command as a starting point, sequentially selecting the historical voice operation commands with adjacent input time and time intervals meeting preset conditions from the sequenced historical voice operation commands, and adding the historical voice operation commands into a historical voice operation command set.
The preset duration may be set according to actual needs, for example, may be set to 30 days, 60 days, or the like. When the time interval between the input time of a history voice operation command and the input time of a previous history voice operation command does not satisfy a preset condition and the time interval between the input time of a next history voice operation command satisfies a preset condition, it is explained that the user wants the history voice operation command and the operation performed by the next history voice operation command should be identical, that is, the content of the history voice operation command and the next history voice operation command should be identical, but the user wants the history voice operation command and the operation performed by the previous history voice operation command should be different, that is, the content of the history voice operation command and the content of the previous history voice operation command should be different. As can be appreciated, when the operation performed by the vehicle based on the currently input voice operation command does not meet the user requirement, the user can rapidly and continuously input the next voice operation command, that is, in order to enable the vehicle to realize the same operation, the input time interval between every two voice operation commands is not too long, so that the target historical voice operation command is taken as a starting point, and the historical voice operation commands with adjacent input times and time intervals meeting the preset conditions are sequentially selected from the sequenced historical voice operation commands to be added into the historical voice operation command set. Therefore, the required voice operation instruction can be accurately extracted by analyzing the input time of the historical voice operation instruction, and the accuracy of acquiring the voice recognition satisfaction degree is further improved.
And S102, performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result used for representing the user' S voice recognition satisfaction.
Here, if the number of the second voice operation commands is smaller, it means that the number of the voice operation commands input by the user is smaller, the voice recognition accuracy of the vehicle is higher, the user satisfaction degree of voice recognition is higher, and if the number of the second voice operation commands is larger, it means that the number of the voice operation commands input by the user is larger, the voice recognition accuracy of the vehicle is lower, and the user satisfaction degree of voice recognition is lower. Therefore, the analysis result obtained by analyzing the satisfaction degree of the voice recognition according to the number of the second voice operation commands can represent the satisfaction degree of the voice recognition of the user, namely, the accuracy rate of the voice recognition can be known, and further, the voice recognition optimization can be assisted.
In one embodiment, the voice recognition satisfaction analysis is performed according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user, and the method includes the steps of determining a target satisfaction level corresponding to the number of the second voice operation commands according to the number of the second voice operation commands and a corresponding relation between different numbers of preset voice operation commands and satisfaction levels, so as to obtain an analysis result for representing the voice recognition satisfaction of the user. The corresponding relation between different numbers of the preset voice operation commands and the satisfaction degree level can be set according to actual situation requirements, for example, the level corresponding to the voice recognition satisfaction degree can be set to be ten levels when the number of the second voice operation commands is 0, the level corresponding to the voice recognition satisfaction degree can be set to be nine levels when the number of the second voice operation commands is 1, and the like. In addition, the voice recognition satisfaction may also be scored according to the number of the second voice operation commands. Therefore, the satisfaction degree of the user on the voice recognition can be rapidly evaluated, and the analysis efficiency is improved.
In summary, in the voice processing method provided in the above embodiment, by analyzing the situation that the user uses voice recognition, the satisfaction degree of the user on voice recognition is obtained, so that the voice processing method can provide assistance for voice recognition optimization.
In one embodiment, after the analysis of the voice recognition satisfaction according to the number of the second voice operation commands to obtain the analysis result for representing the user's voice recognition satisfaction, the method further includes the steps of taking the second voice operation commands as input of a set voice recognition model, and training the voice recognition model based on the operations executed by the first voice operation commands as output of the voice recognition model. It may be appreciated that, since the operation performed based on the first voice operation command meets the user requirement, and the operation performed based on the second voice operation command does not meet the user requirement, it may be considered that the set voice recognition model is wrong for the recognition result of the second voice operation command, and the factors affecting the recognition result may be various, such as speaking accent differences, co-voice words, polyphones, etc., of people located in different regions, so that the second voice operation command is used as an input of the set voice recognition model, and the voice recognition model is trained based on the operation performed by the first voice operation command as an output of the voice recognition model, so as to improve the adaptability of the voice recognition model, thereby correspondingly improving the recognition accuracy. It should be noted that, the speech recognition model may be established by using an artificial intelligence algorithm, such as a genetic algorithm, a neural network algorithm, etc., based on historical speech operation commands of different users and corresponding recognition results. Therefore, the voice recognition model is trained by adopting the voice operation command actually input by the user, so that the adaptability of the voice recognition model can be effectively improved, and the recognition precision is correspondingly improved.
In an embodiment, after the voice recognition satisfaction analysis is performed according to the number of the second voice operation commands to obtain an analysis result for characterizing the user satisfaction of voice recognition, the method further includes the following steps:
carrying out semantic recognition on the second voice operation command to obtain at least one keyword;
and establishing an association relation between the at least one keyword and an operation executed based on the first voice operation command, and storing the association relation into a set voice command library.
It can be understood that, by performing semantic recognition on the second voice operation command, at least one keyword included in the second voice operation command can be obtained, and whether the keyword included in the voice operation command can be correctly recognized during the voice recognition process affects the accuracy of the voice recognition result. In order to improve the satisfaction degree of voice recognition, the association relationship between the keywords included in the second voice operation command, which is executed by the user and does not meet the user requirement, and the operation executed based on the first voice operation command is established, and the association relationship is stored in a set voice command library, so that the operation executed based on the second voice operation command meets the user requirement when the user subsequently inputs the second voice operation command. For example, if a user inputs a voice, a voice error is generated for a polyphone, for example, the user should make a second voice, and at this time, the operation performed by the vehicle machine based on the voice may not meet the user requirement, so the word corresponding to the fourth voice reading of the polyphone may be associated with the operation correctly triggered when the polyphone reads the second voice, thereby improving the voice recognition satisfaction of the user. Therefore, by analyzing different voice operation commands aiming at the target operation object, when the operation executed by the subsequent input does not meet the voice operation command required by the user, the operation meeting the user requirement can be triggered and executed correctly, and the voice recognition satisfaction degree of the user is further improved.
In an embodiment, after the association relationship is stored in the voice command library, the method further includes the following steps:
and outputting a prompt message, wherein the prompt message is used for indicating that the second voice operation command is input and the operation executed based on the first voice operation command can be executed.
It may be appreciated that after the second voice operation command is input to the vehicle, if the operation performed by the vehicle based on the second voice operation command does not meet the user requirement, some users may continue to attempt to input voice, and some users may not continue to use voice functions. Therefore, after the association relation between the at least one keyword and the operation executed based on the first voice operation command is established, a prompt message for indicating that the operation executed based on the first voice operation command is executable by inputting the second voice operation command can be output, so that a user is encouraged to use the voice function, and the convenience and the accuracy of using the voice recognition function are improved. It is assumed that the user has input a voice "blow glass window" and the car machine replies the voice "do not understand what you are speaking", at which time the user may not use the voice function anymore and feel the voice bad. After one month, the car machine can prompt "you can say that the glass window is blown down, and the defrosting function can be opened".
Based on the same inventive concept as the previous embodiments, the present embodiment describes in detail, by different examples, the technical solution of the speech processing method provided in the previous embodiments.
Example one
The voice processing method provided by the example aims at integrally providing an index representing the voice recognition maturity capability through carrying out effectiveness statistics on the behaviors of voice operation of a user, establishing a data model of a voice function and carrying out monitoring and promotion of the user satisfaction.
The implementation principle of the voice processing method provided in this example is as follows:
firstly, when voice recognition is started, digital signals of voice are collected and recorded and stored as voice binary files;
Recording and analyzing the recognition result, if an operation function is triggered correspondingly, the recognition of the voice of the user is not correct, and the correctness of the operation of the voice function can be judged by combining manual operation continuously, for example, if the user directly starts navigation after inputting the navigation destination by voice, the navigation destination is correct by indicating that the operation executed based on the voice command is correct, otherwise, the user is very likely to resume voice input, even a plurality of voice inputs, or possibly text input by the user, the operation executed based on the voice command is incorrect, the recognition rate is very groove cake, or if the user uses voice to control song playing, the user normally plays corresponding songs, but the user does not have further operation, the user continuously inputs related voice instructions, the user is not hit the requirement, namely the song identification contained by the voice is incorrect.
Finally, scoring is performed according to the number of voice operations, for example, if voice operations are completed once, it is 100 points, if the second voice operation is identified successfully, it is 90 points, and so on. Of course, the success rate of the voice command operation of the user can also be counted, and the number of the success times is 1 to 10.
Example two
The voice processing method provided by the example aims at analyzing the time of voice instructions, detecting whether the voice functions of a user are still used or not, improving a cloud model of a voice which is not recognized before and a command which is not executed, remotely upgrading to a command library, prompting that the user can use the voice instructions, and prompting that the voice input is different for different users and prompting is also different.
The implementation principle of the voice processing method provided in this example is as follows:
Firstly, counting the time and frequency of voice recognition used by a user, and establishing user voice commands ordered in time;
Then, analyzing the user voice command of the time sequence to obtain the maximum interval time, the last time and the previous time;
Then, counting the voice frequency of the user before, if the time interval is within a few seconds, representing that the user continuously inputs voice, representing that the recognition accuracy of the voice command of the user is low, and judging that the user is not satisfied at the moment;
Then, carrying out voice analysis and statistics on the original voice file of the user, and carrying out incremental improvement on the voice which is not correctly recognized by the detection analysis;
finally, these users are individually prompted with no previously recognized speech, and these quick user speech commands can now be recognized and added.
Therefore, the maturity of the voice function of the user can be monitored to analyze and improve the satisfaction degree of the voice command of the user and the intention of the user, the user experience is improved, the main intention of the user for using the voice is conveniently counted, the use convenience and the accuracy of the functions are improved, the basis is provided for diagnosing the reasons of the instruction faults of the user, and the debugging and the improvement are convenient.
Based on the same inventive concept as the previous embodiments, the embodiments of the present invention provide a speech processing apparatus, as shown in fig. 2, where the speech processing apparatus includes a processor 110 and a memory 111 for storing a computer program capable of running on the processor 110, where the number of processors 110 shown in fig. 2 is not used to refer to one processor 110, but is only used to refer to a positional relationship of the processor 110 with respect to other devices, in practical application, the number of processors 110 may be one or more, and similarly, the memory 111 shown in fig. 2 is also used in the same meaning, that is, only used to refer to a positional relationship of the memory 111 with respect to other devices, and in practical application, the number of the memories 111 may be one or more. Wherein the processor 110 is configured to implement the steps of the above-mentioned speech processing method when running the computer program.
The speech processing device can also include at least one network interface 112. The various components of the speech processing device are coupled together by a bus system 113. It is understood that the bus system 113 is used to enable connected communications between these components. The bus system 113 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 113.
The memory 111 may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. the non-volatile Memory may be, among other things, a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), Magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only Memory (CD-ROM, compact Disc Read-Only Memory), which may be disk Memory or tape Memory. the volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), Direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). the memory 111 described in embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 111 in the embodiment of the present invention is used to store various types of data to support the operation of the speech processing device. Examples of such data include any computer programs for operating on the speech processing device, such as operating systems and applications, contact data, phonebook data, messages, pictures, videos, etc. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application programs may include various application programs such as a media player (MEDIA PLAYER), a Browser (Browser), etc. for implementing various application services. Here, a program for implementing the method of the embodiment of the present invention may be included in an application program.
Based on the same inventive concept as the previous embodiments, the present embodiment further provides a computer storage medium in which a computer program is stored, where the computer storage medium may be a magnetic random access Memory (FRAM, ferromagnetic random access Memory), a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), an erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disc, or a compact disc-Read Only Memory (CD-ROM), or may be various devices including one or any combination of the above, such as a mobile phone, a computer, a tablet device, a personal digital assistant, and the like. Wherein the computer program, when being executed by a processor, implements the steps of the above-mentioned speech processing method.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a list of elements is included, and may include other elements not expressly listed.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.