Background
With the popularization and development of Artificial Intelligence technology, intelligent voice communication with terminal equipment gradually goes into the life of users, the basic flow of intelligent question and Answer (AI) robot assistants and customer service in the current market is that users ask AI robot customer service system questions on terminals by voice, and the client displays the questions and corresponding service answers after intelligently processing the questions of the users.
Normally, the whole process has no problem, but this is established on the premise that the speech is converted into characters accurately, if the user speaks a special accent, for example, what the user originally wants to express is: help me to inquire about three mails, but the voice-to-text system recognizes as: and if the user helps me to inquire the mail in Zhang mountain, the subsequent processes, the extraction of key words (such as name: Zhang III) and the return result of the final inquiry service are definitely wrong. The accent of a person is formed from small voice and is difficult to change, the user may always ask about recognition errors, and the user cannot always obtain the answer wanted by the user.
At present, no better solution is provided for solving the technical problem that in the intelligent question and answer process, the intention of a user is difficult to accurately identify due to the accent of the user, so that correct service answers cannot be provided for the user.
Disclosure of Invention
The embodiment of the invention provides a method and a device for processing voice data in intelligent question answering, computer equipment and a storage medium, which are used for solving the technical problem that correct service answers cannot be provided for a user due to the fact that the intention of the user is difficult to accurately recognize caused by accent of the user.
A processing method of voice data in intelligent question answering is applied to a server and comprises the following steps:
receiving voice data to be recognized sent by an intelligent terminal, and converting the voice data into characters;
extracting a plurality of original keywords from the characters through a semantic intention recognition system to obtain an original intention keyword set comprising the original keywords;
when a target keyword corresponding to at least one original keyword is acquired, replacing the original keyword corresponding to the original intention keyword set with the target keyword to obtain a target intention keyword set comprising the target keyword;
acquiring a business question and a business answer associated with the target intention keyword set;
and sending the service question and the service answer to the intelligent terminal.
An apparatus for processing voice data in intelligent question answering, the apparatus comprising:
the voice receiving module is used for receiving voice data to be recognized, which is sent by the intelligent terminal, and converting the voice data into characters;
the keyword extraction module is used for extracting a plurality of original keywords from the characters through a semantic intention recognition system to obtain an original intention keyword set comprising the original keywords;
a keyword replacing module, configured to replace, when a target keyword corresponding to at least one original keyword is obtained, the original keyword corresponding to the original intention keyword set with the target keyword, so as to obtain a target intention keyword set including the target keyword;
the acquisition module is used for acquiring the business questions and the business answers which are associated with the target intention keyword set;
and the sending module is used for sending the service question and the service answer to the intelligent terminal.
A computer device, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor implements the steps of the method for processing voice data in smart question answering when executing the computer program.
A computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the above method for processing voice data in smart question answering.
The invention provides a processing method, a device, computer equipment and a storage medium of voice data in intelligent question answering, which convert the voice data into words when receiving the voice data to be recognized sent by an intelligent terminal, extract a plurality of original keywords from the words through a semantic intention recognition system to obtain an original intention keyword set comprising a plurality of the original keywords, then replace the corresponding original keywords in the original intention keyword set by the target keywords when obtaining target keywords corresponding to at least one original keyword, obtain a target intention keyword set comprising the target keywords, obtain service questions and service answers related to the target intention keyword set, send the service questions and the service answers to the intelligent terminal, and allow the intelligent terminal to display the service questions and the service answers, if the target keyword corresponding to the original keyword is obtained, the fact that the keyword needing to be corrected exists in the words converted according to the voice is shown, the corrected keyword is the word which is mistakenly identified due to the accent of the user, and the associated service problem and the service answer are obtained according to the corrected target intention keyword set, so that the finally determined service problem and the service answer are obtained according to the keyword corrected according to the accent of the user, and the determined service problem and the service answer are more in line with the intention and expectation of the user.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for processing voice data in intelligent question answering can be applied to an application environment shown in fig. 1, wherein a server is communicated with an intelligent terminal through a network. The intelligent terminal comprises but is not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable equipment, and an AI robot customer service system is installed on the intelligent terminal. The server may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an embodiment, as shown in fig. 2, a method for processing voice data in an intelligent question answering is provided, which is described by taking the method applied to the server in fig. 1 as an example, and includes the following steps S101 to S105.
S101, receiving voice data to be recognized sent by the intelligent terminal, and converting the voice data into characters.
The voice to be recognized is input by a user on the intelligent terminal, the AI robot customer service system is installed on the intelligent terminal, and the intelligent terminal receives voice data input by the user through the AI robot customer service system and sends the received voice data to the local server.
In one embodiment, a voice recognition system is deployed on the home server, and the voice data can be converted into text through the voice recognition system.
The voice data may be a service request voice input by a user, such as "help me inquire about three mails", "help me play a song of zhou jilun", "help me make a call to lie four", and so on.
S102, extracting a plurality of original keywords from the characters through a semantic intention recognition system to obtain an original intention keyword set comprising the original keywords.
Furthermore, the local server is also provided with a semantic intention recognition system, and a plurality of original keywords can be extracted from the characters through the semantic intention recognition system. The intention recognition is to divide sentences of character types into corresponding intention types by a classification method, the types of problems which can be processed by various current chat robots, intelligent customer service and intelligent sound boxes are limited, when a user sends an instruction to the chat robot, the chat robot firstly divides the problems of the user into a certain skill or a plurality of skills according to the intention recognition and then carries out subsequent processing, if the initial user intention recognition is wrong, the subsequent work is directly useless, and very poor user experience can be brought to the user.
In one embodiment, which category of words can be used as a keyword can be preset, then the converted characters are divided into different categories of words through the intention recognition classifier, and when the category to which the word belongs is the preset keyword category, the corresponding word is determined as the original keyword. The word categories that can be used as keywords include, but are not limited to, names of people, verbs, names of devices, and the like.
S103, when a target keyword corresponding to at least one original keyword is obtained, the target keyword is used for replacing the original keyword corresponding to the original intention keyword set, and a target intention keyword set comprising the target keyword is obtained.
In one embodiment, the method for obtaining the target keyword may be obtained by a keyword input by a user, or may be obtained by querying from a history. One usage scenario according to the present embodiment is for example:
the user originally wants to express 'help me inquire about three-page mails', but because the language expression of the user has accent, the voice recognition system recognizes the 'help me inquire about three-page mails', acquires the target keyword 'three-page' corresponding to the original keyword 'three-page' after recognition, and updates the original intention keyword set 'help me inquire about three-page mails' to 'help me inquire about three-page mails'.
And S104, acquiring the business question and the business answer associated with the target intention keyword set.
Specifically, the step is to search the question and answer corresponding to the target intention keyword set locally or on other devices through the network according to the target intention keyword set. For example, when the target intention keyword set is "help me inquire about three-page mail", the mail of which the sender is "three-page" is inquired on a local or corresponding other server.
And S105, sending the service question and the service answer to the intelligent terminal, and displaying the service question and the service answer by the intelligent terminal.
According to one use scenario of the embodiment, when a mail with a sender of three is queried on a local server or other corresponding servers, all queried mails and the target intention keyword set are sent to the intelligent terminal, and the intelligent terminal displays all queried mails and the target intention keyword set (namely, a business problem).
The processing method of voice data in intelligent question answering provided by this embodiment converts voice data to be recognized, which is sent by an intelligent terminal, into text, extracts a plurality of original keywords from the text through a semantic intention recognition system, obtains an original intention keyword set including the plurality of original keywords, then, when a target keyword corresponding to at least one of the original keywords is obtained, replaces the corresponding original keyword in the original intention keyword set with the target keyword, obtains a target intention keyword set including the target keyword, obtains a service question and a service answer associated with the target intention keyword set, and sends the service question and the service answer to the intelligent terminal, for the intelligent terminal to display the service question and the service answer, wherein the target keyword corresponding to the original keyword can be obtained according to user correction or history query, if the target keyword corresponding to the original keyword is obtained, the fact that the keyword needing to be corrected exists in the words converted according to the voice is shown, the corrected keyword is the word which is mistakenly recognized due to the accent of the user, and the associated business problem and business answer are obtained according to the corrected target intention keyword set, so that the finally determined business problem and business answer are obtained according to the keyword corrected according to the accent of the user, and the determined business problem and business answer are more in line with the intention and expectation of the user.
Fig. 3 is a flowchart of a processing method of voice data in an intelligent question and answer according to another embodiment of the present invention, which is described in detail below with reference to fig. 3, and as shown in fig. 3, the processing method of voice data in an intelligent question and answer further includes the following steps S301 to S303 based on the steps S101, S102, S104, and S105, where the step S103 "replace an original keyword corresponding to at least one original keyword in a set of original intention keywords with a target keyword when the target keyword corresponding to the original keyword is acquired".
S301, acquiring a pre-stored mapping relation table. The mapping relation table can be stored in a database of the local server.
S302, when the original keyword exists in the mapping relation table, acquiring a target keyword mapped with the original keyword.
In one embodiment, the mapping table is created when the user first interacts with the AI robot customer service system on the intelligent terminal, and the mapping table stores original keywords and target keywords corresponding to each original keyword. The original keywords stored in the mapping relation table represent characters translated by the accents of the users, and the target keywords stored in the mapping relation table represent the original meanings of the characters translated by the accents of the users.
And S303, replacing the corresponding original keyword in the original intention keyword set by the target keyword to obtain a target intention keyword set comprising the target keyword.
The embodiment provides a method for acquiring a target keyword corresponding to an original keyword, which can acquire the target keyword corresponding to the original keyword by querying a mapping relation table.
Fig. 4 is a flowchart of a processing method of voice data in an intelligent question and answer according to still another embodiment of the present invention, which is described in detail below with reference to fig. 4, and as shown in fig. 4, the processing method of voice data in an intelligent question and answer further includes the following steps S401 and S402, based on the steps S101, S102, S104, and S105, and the step of "replacing a corresponding original keyword in a set of original intention keywords with a target keyword when the target keyword corresponding to at least one of the original keywords is acquired in the step S103".
S401, sending the extracted original keywords to the intelligent terminal, and displaying the original keywords by the intelligent terminal.
S402, when a replacing instruction of the original keyword and a corresponding target keyword sent by the intelligent terminal are received, replacing the corresponding original keyword in the original intention keyword set by the target keyword to obtain a target intention keyword set comprising the target keyword.
The embodiment provides another method for acquiring the target keyword corresponding to the original keyword, the target keyword corresponding to the original keyword can be acquired according to a user input modification mode, and the method can be applied to a first or non-first dialog scene between a user and an AI robot customer service system on an intelligent terminal.
Optionally, the method for processing voice data in intelligent question answering further includes:
and when a replacement instruction of the original keyword and a corresponding target keyword sent by the intelligent terminal are received, receiving the unique identifier and the user identity identifier of the intelligent terminal sent by the intelligent terminal. The user identification may be an identification number of the user, an account number of the user, or the like. The unique identifier of the intelligent terminal may be a mac (media Access Control address) physical address of the intelligent terminal or an IMEI (International Mobile Equipment Identity) code of the intelligent terminal;
judging whether a corresponding mapping relation table is created in advance according to the unique identifier of the intelligent terminal and the user identity identifier, and if not, creating the mapping relation table corresponding to the unique identifier of the intelligent terminal and the user identity identifier;
and storing the original keyword and the corresponding target keyword in the mapping relation table.
This embodiment may be repeated if the user requires multiple revisions. Because the unique accent and vocabulary of each person are certain, the more times the user uses and participates in the correction, the more the purpose of personalized customization of voice can be achieved, and the more the intelligent question-answering AI robot assistant and the customer service system gradually understand the user.
The user and the intelligent terminal used by the user can be uniquely calibrated through the unique identifier of the intelligent terminal and the user identity identifier. The embodiment provides a method for creating a mapping relation table and a method for storing the original keyword and the corresponding target keyword. When the mapping relation table is not available locally, the user is shown to have a conversation with an AI (automatic instruction) robot customer service system on the intelligent terminal for the first time, the mapping relation table is created at the moment, and the original keywords and the corresponding target keywords are stored in the mapping relation table; when the corresponding mapping relation table exists at this time, the original keyword and the corresponding target keyword are directly stored in the mapping relation table.
Optionally, the method for processing voice data in intelligent question answering further includes:
acquiring a preset editable prompt message;
the step of sending the extracted original keywords to the intelligent terminal includes:
and sending the extracted original keywords and the editable prompt message to the intelligent terminal, so that the intelligent terminal can display the original keywords and the editable prompt message.
In other embodiments, the intelligent terminal may also highlight the original keyword that can be edited or modified in a manner of highlighting or bolding or underlining.
The embodiment provides a method for prompting a user that an original keyword is editable, so that the user can immediately modify the original keyword displayed on an intelligent terminal when seeing the original keyword and finding an identification error, and the user experience is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The method for processing the voice data in the intelligent question and answer has great help for the user with the own accent, and can effectively avoid the embarrassment that the user accent question cannot obtain expected answers and results when asking an assistant and a customer service question of the intelligent question and answer AI robot. The more times the user uses and participates in correction, the more the purpose of voice personalized customization can be achieved, the more gradually intelligent question-answering AI robot assistant and customer service system can understand the user, the satisfaction degree and user stickiness of the user to the intelligent question-answering AI robot assistant and customer service system are improved, the frequency of the user using the intelligent question-answering AI robot assistant and customer service system is continuously improved, the manual change frequency of the user is reduced, the working efficiency is improved, and the labor cost expense of manual clients is reduced.
In an embodiment, a processing device for voice data in an intelligent question and answer is provided, and the processing device for voice data in an intelligent question and answer corresponds to the processing method for voice data in an intelligent question and answer in the above embodiment one to one. As shown in fig. 5, theapparatus 100 for processing voice data in smart question answering includes avoice receiving module 11, akeyword extracting module 12, akeyword replacing module 13, an obtainingmodule 14 and a sendingmodule 15. The functional modules are explained in detail as follows:
and thevoice receiving module 11 is configured to receive voice data to be recognized, which is sent by the intelligent terminal, and convert the voice data into characters.
Akeyword extracting module 12, configured to extract a plurality of original keywords from the text through a semantic intent recognition system, so as to obtain an original intent keyword set including the plurality of original keywords.
And akeyword replacing module 13, configured to, when a target keyword corresponding to at least one original keyword is obtained, replace the original keyword corresponding to the original intention keyword set with the target keyword, so as to obtain a target intention keyword set including the target keyword.
An obtainingmodule 14, configured to obtain a business question and a business answer associated with the set of target intention keywords.
And the sendingmodule 15 is configured to send the service question and the service answer to the intelligent terminal.
Optionally, theapparatus 100 for processing voice data in smart question answering further includes:
a mapping relation table obtaining unit, configured to obtain a pre-stored mapping relation table;
a keyword obtaining unit, configured to obtain a target keyword mapped with the original keyword when the original keyword exists in the mapping relationship table;
the keyword replacing module is specifically configured to replace the original keyword corresponding to the original intention keyword set with the target keyword.
Further, theapparatus 100 for processing voice data in intelligent question answering further includes:
the keyword sending unit is used for sending the extracted original keywords to the intelligent terminal so that the intelligent terminal can display the original keywords;
the keyword receiving unit is used for receiving a replacement instruction of the original keyword and a corresponding target keyword sent by the intelligent terminal;
the keyword replacing module is specifically configured to replace the original keyword corresponding to the original intention keyword set with the target keyword.
Optionally, theapparatus 100 for processing voice data in smart question answering further includes:
the identity identification receiving unit is used for receiving the unique identification and the user identity identification of the intelligent terminal sent by the intelligent terminal when receiving the replacement instruction of the original keyword and the corresponding target keyword sent by the intelligent terminal;
a judging unit, configured to judge whether a mapping relation table corresponding to the unique identifier of the intelligent terminal and the user identity identifier is created in advance according to the unique identifier of the intelligent terminal and the user identity identifier, and if not, create a mapping relation table corresponding to the unique identifier of the intelligent terminal and the user identity identifier;
and the storage unit is used for storing the original keyword and the corresponding target keyword in the mapping relation table.
Optionally, theapparatus 100 for processing voice data in smart question answering further includes:
the prompt message acquisition unit is used for acquiring a preset editable prompt message;
the keyword sending unit is specifically configured to send the extracted multiple original keywords and the editable prompt message to the intelligent terminal, so that the intelligent terminal displays the multiple original keywords and the editable prompt message.
Wherein the meaning of "first" and "second" in the above modules/units is only to distinguish different modules/units, and is not used to define which module/unit has higher priority or other defining meaning. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and such that a division of modules presented in this application is merely a logical division and may be implemented in a practical application in a further manner.
For the specific limitation of the processing device for the voice data in the intelligent question answering, reference may be made to the above limitation on the processing method for the voice data in the intelligent question answering, and details are not described here. All or part of each module in the processing device of the voice data in the intelligent question answering can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data involved in the processing method of voice data in intelligent question answering. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing voice data in an intelligent question answering.
In one embodiment, a computer device is provided, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the method for processing voice data in smart question answering in the above-mentioned embodiments, such as the steps 101 to 105 shown in fig. 2 and other extensions of the method and related steps. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the apparatus for processing voice data in smart question answering in the above-described embodiment, for example, the functions of themodules 11 to 15 shown in fig. 5. To avoid repetition, further description is omitted here.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc.
The memory may be integrated in the processor or may be provided separately from the processor.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the method for processing speech data in smart question answering in the above-described embodiments, such as the steps 101 to 105 shown in fig. 2 and extensions of other extensions and related steps of the method. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the apparatus for processing voice data in smart question answering in the above-described embodiment, for example, the functions of themodules 11 to 15 shown in fig. 5. To avoid repetition, further description is omitted here.
In the method, the apparatus, the computer device, and the storage medium for processing voice data in an intelligent question and answer provided by this embodiment, when receiving voice data to be recognized sent by an intelligent terminal, the voice data is converted into text, a plurality of original keywords are extracted from the text by a semantic intention recognition system, an original intention keyword set including the plurality of original keywords is obtained, then when a target keyword corresponding to at least one of the original keywords is obtained, the target keyword is used to replace the corresponding original keyword in the original intention keyword set, a target intention keyword set including the target keyword is obtained, a service question and a service answer associated with the target intention keyword set are obtained, and the service question and the service answer are sent to the intelligent terminal for the intelligent terminal to display the service question and the service answer, if the target keyword corresponding to the original keyword is obtained, the fact that the keyword needing to be corrected exists in the words converted according to the voice is shown, the corrected keyword is the word which is mistakenly identified due to the accent of the user, and the associated service problem and the service answer are obtained according to the corrected target intention keyword set, so that the finally determined service problem and the service answer are obtained according to the keyword corrected according to the accent of the user, and the determined service problem and the service answer are more in line with the intention and expectation of the user.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.