Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described above, in order to solve the problem of low recognition accuracy of a general speech recognition model adopted in an IVR system at present, the embodiment of the present application provides a consultation service processing method based on speech recognition, so as to improve recognition accuracy of speech input in a service consultation process of automatic speech interaction.
It should be noted that, the consultation service processing method based on speech recognition in the embodiment of the present application may adopt the architecture shown in fig. 1 a. As can be seen from fig. 1a, a user can perform voice input through a terminal, corresponding voice data is transmitted to an IVR system at a background of a service provider, the IVR system performs voice recognition on the voice data, and then service consultation service is provided based on consultation problems obtained through recognition. The terminal includes, but is not limited to: the smart phone, the smart watch, the tablet computer, the computer and the like have voice signal acquisition functions. The IVR system may be hosted in a corresponding server (or server cluster), and for convenience of the following description, the server (or server cluster) hosting the IVR system is simply referred to as: and (4) a server.
More specifically, as shown in FIG. 1b, the IVR system in FIG. 1a includes specific speech recognition models corresponding to different business systems in addition to the generic speech recognition model. Wherein, for any special speech recognition Model, an Acoustic Model (AM) and a Language Model (LM) related to the corresponding service system are arranged inside the special speech recognition Model.
Then, in the actual speech recognition process, the decoder will call the corresponding speech recognition model to perform speech recognition processing on the speech data to be recognized.
As a feasible manner in this embodiment of the application, the dedicated speech recognition model may be obtained by training through a service term related to a certain type of service and by using a stored consultation problem template as a training sample, and by using an algorithm such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the like. That is, a dedicated speech recognition model that matches different traffic types will be able to more accurately recognize speech data associated with that type of traffic. Of course, no limitation to the present application is intended thereby.
The service processing method in the present application will be described with reference to the architecture shown in fig. 1a and 1b, specifically, as shown in fig. 1c, the service processing process includes the following steps:
step S101: and receiving voice data to be recognized.
Under an actual voice consultation scene, the problem spoken by the user in a voice input mode can be collected by the terminal used by the user to generate corresponding voice data. In other words, the voice data can be regarded as audio data generated after being collected by the terminal. It should be noted that, in the process of acquiring the voice by the terminal, analog signals obtained by filtering, denoising, and the like of the voice input, digital signals obtained by analog-to-digital conversion, and the like can be considered as the range covered by the voice data in the present application.
It can be understood that in actual operation, the voice data will be sent to the IVR system in the background of the service provider for voice recognition. Therefore, in the embodiment of the present application, the voice data is the voice data that needs to be processed by voice recognition, that is, the voice data to be recognized.
As a feasible way, the voice data to be recognized sent by the user is transmitted to the server at the background of the service provider through the terminal for voice recognition, and at this time, the server receives the voice data to be recognized.
As another possible way, the terminal used by the user may be installed with an application program of the service provider, and the application program carries the corresponding speech recognition model, that is, in this way, the recognition process of the speech data can be executed locally at the terminal. At this time, the body receiving the voice data to be recognized may be regarded as a terminal used by the user.
Step S102: and determining the service type corresponding to the voice data to be recognized.
It is considered that the existing IVR system performs recognition process for the voice data to be recognized, which is usually based on the general voice recognition model only, and it is difficult to accurately determine some specific business terms. Therefore, in the embodiment of the present application, after receiving the voice data to be recognized, the corresponding service type will be determined first.
In the embodiment of the present application, the service type may be determined in multiple manners, and specifically, in one manner, a user may click a service type to be consulted in a corresponding consultation interface and then perform voice input, and in this case, the service type corresponding to the voice data to be recognized may be determined according to the service type selected by the user.
In another mode, a user initiates a consultation service in a certain service interface, then the service problem to be consulted by the user is likely to be related to the service interface, and in this case, the service type corresponding to the service interface can be determined as the service type corresponding to the voice data to be recognized.
Step S103: and performing voice recognition on the voice data to be recognized according to the voice recognition model matched with the service type to generate a recognition result.
As described above, the dedicated speech recognition model in the embodiment of the present application is matched with the corresponding service type, so that the speech recognition processing can be performed on the speech data to be recognized by using the corresponding dedicated speech recognition model on the basis of determining the service type of the speech data to be recognized.
It should be noted that, the dedicated speech recognition model used in the embodiment of the present application can more accurately recognize a service term belonging to a certain service type. Therefore, the recognition precision of the user voice is improved.
The recognition result is usually text information including words, numbers or a combination thereof. It is considered that the recognition result obtained through the above process is a consultation question converted from a consultation question spoken by the user in a voice form into a text format.
Of course, step S103 also includes a case where the service type is not determined, and at this time, the general speech recognition model is selected to implement the recognition processing of the speech data to be recognized. And should not be construed as limiting the application herein.
Step S104: and consulting according to the identification result.
Based on the identification result, the server can implement the processing of the consultation service by using a preset consultation strategy. In a practical application scenario, the consulting policy may be: and under the condition that the generated recognition result is confirmed by the user, selecting an answer matched with the consultation question by using a preset question-answer model. In another practical application scenario, the consulting policy may be: and if the generated recognition result is not confirmed by the user, jumping to a manual customer service system.
Of course, the above-mentioned service processing manner is only an example, and in particular, in the practical application, the adjustment and the setting will be performed according to the requirement of the practical application, and here, the limitation to the present application should not be formed.
Through the steps, after the server receives the voice data to be recognized, the voice recognition processing is not directly carried out on the voice data to be recognized by using the universal voice recognition model, but the service type corresponding to the voice data to be recognized is determined at first. After the service type is determined, the server can adopt the special voice recognition model matched with the service type to perform voice recognition processing on the voice data to be recognized. The special voice recognition model can accurately recognize the special service terms related to the service type, so that the voice recognition precision can be improved to a certain extent.
It should be noted that, in actual operation, after receiving the voice data to be recognized, the server uses which voice recognition model to perform voice recognition, which often depends on the service type determined by the server. The following describes in detail a process of determining a service type corresponding to voice data to be recognized in an embodiment of the present application.
Specifically, the process of determining the service type corresponding to the voice data to be recognized may be: and acquiring user information corresponding to the voice data to be recognized, and determining the service type corresponding to the voice data to be recognized according to the user information.
Wherein the user information at least comprises: account information of the user, page information corresponding to the voice consultation operation sent by the user, and/or service information of a service used when the user sends the voice consultation operation.
During the service using process of the user, corresponding consultation operation may be initiated, and the user enters a corresponding consultation page to perform voice consultation. At this time, the possibility that the user consults the used service is high, and then the server can determine the type of the service which is used by the user as the service type corresponding to the voice data to be recognized.
As a more specific scenario, a user starts a certain service APP on a terminal and enters a page of a certain service product in the APP, and it is assumed that at this time, the user initiates a consultation operation, triggers a voice consultation function, and speaks a corresponding problem. Then, the server can obtain the page based on which the user initiates the consultation operation after receiving the voice problem of the user, and further determine the service type of the page as the service type corresponding to the voice problem consulted by the user.
The above contents are processes for determining the service type corresponding to the voice data to be recognized in the embodiment of the present application, and in actual application, the selection and determination may be specifically performed according to actual situations, and the present application is not limited herein.
Through the service type determination process, two results can be obtained: one is to determine the service type corresponding to the voice data to be recognized, and the other is to not determine the service type corresponding to the voice data to be recognized. The manner in which the two results are processed will be described in detail below.
Firstly, determining a service type corresponding to voice data to be recognized.
In practical applications, the service type determined through the above process is only a possible case and does not represent the service type to which the problem to be actually consulted belongs. Therefore, in the embodiment of the present application, after the identification result is obtained, the determined identification result (i.e., the consultation question) is also displayed to the user for the user to confirm. Only after the user confirms, the consultation is processed for the consultation problem.
It should be noted that, in practical applications, the manner shown in fig. 2 may be adopted for the presentation of the consultation result. As can be seen from fig. 2, the identified consultation problem is presented in the consultation interface, and meanwhile, confirmation options (two options of "yes" and "no" in fig. 2) are also provided in the interface, and the user can operate the confirmation options. Of course, the display manner shown in fig. 2 is only one possible implementation manner, and in practical applications, other manners such as pop-up window, voice prompt, etc. may also be adopted, and this should not be construed as limiting the present application.
If the user confirms that the identification result is the question to be consulted by the user, the server processes the identified consultation question, and as a feasible way, the server can search a pre-established service database for consultation answers matched with the consultation question to be provided for the user.
If the user confirms that the current recognition result is not the problem to be consulted by the user, the server determines that the special voice recognition model may be wrong. In this case, in the embodiment of the present application, the server performs speech recognition using the generic speech recognition model, and generates a corresponding recognition result (the subsequent processing procedure of obtaining the recognition result through the generic speech model is similar to that described below, so that the following contents may be directly referred to).
And II, not determining the service type corresponding to the voice data to be recognized.
And for the condition that the service type is not determined, the server performs voice recognition on the voice data to be recognized by using the universal voice recognition model to obtain a corresponding recognition result. Similarly, the recognition result obtained through the recognition processing process may be a query question required by the user, but may not be a query question required by the user, so in this embodiment of the present application, the server also presents the recognition result to the user for the user to confirm.
If the user confirms that the identification result is the required consultation problem, the server can directly process the consultation problem.
If the user confirms that the recognition result is not the consultation problem required by the user, the server performs credibility calculation of the service type of the recognition result so as to determine the service type of the recognition result.
It should be noted that, for the calculation of the reliability of the service type to which the recognition result belongs, a corresponding reliability algorithm may be adopted, and calculation may be performed based on aspects such as language expression, semantic judgment, keywords, and the like, so as to obtain the reliability corresponding to each service type. Furthermore, for a certain service type, if the confidence level exceeds the confidence level threshold of the service type (it can be considered that each service type corresponds to a confidence level threshold, and for convenience of description, the confidence level threshold is simply referred to as a threshold hereinafter), it indicates that the identification result is more likely to belong to the service type. Therefore, in the embodiment of the application, after the credibility corresponding to each service type is determined, if the credibility exceeds the preset threshold value of the corresponding service type, the server uses the special voice recognition model of the service type to recognize the voice data again. Similar to the above, the generated recognition result is recognized again and still displayed to the user for confirmation. The confirmation process can refer to the foregoing, and thus, will not be described in detail.
In practical application, for the same recognition result, the credibility corresponding to a plurality of service types may all exceed a threshold, and for this case, the server uses the dedicated speech recognition models corresponding to the service types to perform re-recognition on the speech data to be recognized. The server selects any one identification result from a plurality of identification results obtained by identification and displays the selected identification result to the user.
And for the condition that the credibility is lower than the threshold, the server can take the recognition result as a failure result and perform corresponding subsequent processing. For example: and aiming at the failure result, dividing the failure result under the corresponding service type in a manual mode, and meanwhile, after a certain amount of accumulation, inputting a plurality of collected failure results as training samples to a universal speech recognition model for training optimization. Of course, no limitation to the present application is intended thereby.
Through the above contents, if the user still confirms that the identification result is not the consultation problem required by the user, the server can adopt a manual intervention mode to transfer the user to a manual customer service system for consultation service. Of course, this is a possible service decision in the embodiments of the present application and should not be construed as limiting the present application.
As can be seen from the foregoing, in the consultation service processing method based on voice recognition in the embodiment of the present application, only when the corresponding service type is not recognized or the user confirms that the recognition result is not the consultation problem required by the user, the general voice recognition model is used for recognition to obtain the recognition result. At this time, if the user still confirms that the recognition result is not the consultation problem required by the user, the server determines the credibility corresponding to each service type according to the recognition result, and selects the special voice recognition model of the most credible service type (namely, the credibility is not less than the threshold value) for recognition to obtain the recognition result. Obviously, such a manner can play a role in correcting the recognition result, and can achieve a certain degree of fault tolerance. Compared with the mode of only using a universal speech recognition model in the prior art, the method in the embodiment of the application can improve the recognition accuracy and fault tolerance.
In the embodiment of the application, after the speech recognition model performs speech recognition on the speech data to be recognized, the speech data is usually converted into data in a text format, and in consideration of practical application, the spoken language description of the user has strong randomness and contains a part of spoken content, so that the text data obtained through conversion may also contain non-standardized content and cannot completely hit the consultation problem. Based on this, in the embodiment of the present application, a problem model may also be adopted to convert non-standardized text data into a standard consultation problem.
The problem model may include language algorithms such as a semantic recognition algorithm, a synonymous text replacement algorithm, and the like, and is associated with the established consulting question bank, so that the problem model can convert text data into standardized consulting questions according to the corresponding algorithms and the question bank. Of course, no limitation to the present application is intended thereby.
Based on the foregoing, the consulting business process based on speech recognition in practical application using the server as the executing subject is shown in fig. 3:
step S301: and the server receives the voice data to be recognized sent by the terminal.
Here, the voice data to be recognized is generated by the terminal collecting the voice input of the user.
Step S302: and acquiring user information, and determining the service type corresponding to the voice data to be recognized according to the user information. If the service type is determined, executing step S303; if the service type is not determined, step S306 is executed.
Step S303: and according to the special voice recognition model matched with the service type, carrying out recognition processing on the voice data to be processed, generating a recognition result and displaying the recognition result.
Step S304: and receiving confirmation of the user for the displayed identification result, and judging whether the identification result is a consultation problem required by the user, if so, executing the step S305, otherwise, executing the step S306.
Step S305: and inquiring answers corresponding to the identification results in the established question-answer library, and feeding the answers back to the user as consultation results.
Step S306: and according to the general voice recognition model, carrying out recognition processing on the voice data to be processed, generating a recognition result and displaying the recognition result.
Step S307: and receiving confirmation of the user for the displayed identification result, and judging whether the identification result is a consultation problem required by the user, if so, executing the step S305, otherwise, executing the step S308.
Step S308: and calculating the credibility of the recognition result corresponding to each service type, determining the service type with the credibility not less than a preset threshold value, and executing the step S303.
In the processing procedure shown in fig. 3, it should be noted that the server may monitor the number of cycles from step S308 to step S303, and may automatically switch the user to the manual customer service system after the set number of cycles is reached. And are not to be construed as limiting the application herein.
Based on the same idea, the present application also provides an embodiment of a consultation service processing apparatus based on speech recognition, as shown in fig. 4. The counseling service processing apparatus based on voice recognition in fig. 4 includes:
the receivingmodule 401 receives voice data to be recognized;
a determiningmodule 402, configured to determine a service type corresponding to the voice data to be recognized;
therecognition module 403 performs voice recognition on the voice data to be recognized according to the voice recognition model matched with the service type, and generates a recognition result;
and theconsultation processing module 404 is used for consulting and processing according to the identification result.
More specifically, the determiningmodule 402 obtains user information corresponding to the voice data to be recognized, and determines a service type corresponding to the voice data to be recognized according to the user information.
Wherein the user information at least comprises: account information of the user, page information corresponding to the voice consultation operation sent by the user, and/or service information of a service used when the user sends the voice consultation operation.
Therecognition module 403, when the determiningmodule 402 determines the service type of the voice data to be recognized, performs voice recognition on the voice data to be recognized by using a dedicated voice recognition model matched with the service type; when the determiningmodule 402 does not determine the service type of the voice data to be recognized, performing voice recognition on the voice data to be recognized by using the universal voice recognition model.
The device further comprises: theconfirmation module 405 displays the generated recognition result and receives confirmation of the user on the displayed recognition result.
Therecognition module 403, when the user confirms that the recognition result is an error result, performs, on the voice data to be recognized:
if the recognition result is generated through the special voice recognition model, using the general voice recognition model to recognize the voice data to be recognized again;
and if the recognition result is generated through the general voice recognition model, determining the credibility of the recognition result corresponding to each service type, and selecting a special voice recognition model to re-recognize the voice data to be recognized according to the credibility.
Therecognition module 403 selects the service type whose reliability is not less than the reliability threshold, and performs re-recognition on the voice data to be recognized by using a dedicated voice recognition model corresponding to the selected service type, where each service type corresponds to a reliability threshold.
Therecognition module 403 converts the voice data to be recognized into text data, and determines a consultation question matched with the text data in a pre-established question bank as a recognition result.
Theconsultation processing module 404 queries answers matched with the consultation questions from a pre-established answer library as consultation results.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. The application may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.