Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A search method, apparatus, electronic device, and computer-readable storage medium based on speech recognition according to embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a search method based on speech recognition according to an embodiment of the present invention. It should be noted that the search method based on speech recognition according to the embodiment of the present invention is applicable to the search apparatus based on speech recognition according to the embodiment of the present invention, and the search apparatus may be configured in an electronic device.
As shown in fig. 1, the search method based on speech recognition may include:
s110, when detecting that the user starts to input voice, acquiring current voice data input by the user in real time.
For example, it is assumed that the search method based on speech recognition according to the embodiment of the present invention is applied to an electronic device, and the electronic device may provide a speech input module for a user, for example, the speech input module may be a microphone or a component with a speech acquisition function, such as a sound box, so that the user may input speech through the speech input module. When the voice input module is detected to be used by a user to start inputting voice, the current voice data input by the user can be acquired in real time. That is, since the voice generation has a time sequence, the current voice data input by the user can be acquired in real time during the voice input process by the user.
And S120, performing voice recognition on the current voice data acquired in real time to obtain corresponding current intermediate text information.
Optionally, the current voice data obtained in real time may be subjected to voice recognition by a voice recognition technology to obtain a corresponding text, and the text is used as the current intermediate text information corresponding to the current voice data.
And S130, predicting a result according to the current intermediate text information to obtain a target text result.
Optionally, the voice input intention of the user may be predicted according to the current intermediate text information, which search result the user wants to implement by the voice is predicted, and a corresponding target text result is predicted according to the predicted voice input intention of the user, so that a search operation is performed according to the target text result in the following.
As an example implementation manner, the result prediction may be performed on the current intermediate text information according to a pre-established prediction model to obtain a corresponding search keyword sample with the largest utilization rate, and the corresponding search keyword sample with the largest utilization rate is used as the target text result. In an embodiment of the present invention, the prediction model is obtained by training a plurality of search keyword samples and usage rates corresponding to the search keyword samples.
That is, the prediction model may be established by training in advance according to a plurality of search keyword samples and usage rates corresponding to the search keyword samples. In this way, in practical application, the result of the current intermediate text information can be tested through the prediction model to obtain a corresponding search keyword sample with the maximum utilization rate, wherein the search keyword sample with the maximum utilization rate can be understood as the search keyword sample with the maximum probability of performing a search, and finally, the corresponding search keyword sample with the maximum utilization rate is used as the target text result.
For example, taking the current intermediate text information corresponding to the current speech data as "weather" as an example, it is assumed that the prediction model includes search keyword samples such as "weather forecast", "weather forecast 15-day query", "beijing weather", "shanghai weather", and the like, and the usage rates of these search keyword samples are 90%, 85%, 50%, and 40%. The prediction model can be used for predicting the result of the current intermediate text information, namely weather, so as to obtain the search keyword sample weather forecast with the highest utilization rate, and at the moment, the search keyword sample weather forecast with the highest utilization rate can be used as the target text result.
In order to ensure the accuracy of speech recognition, optionally, in an embodiment of the present invention, in the process of performing result prediction according to the current intermediate text information to obtain the target text result, next speech data input by the user may be further acquired, speech recognition is performed on the next speech data to obtain corresponding intermediate text information, and the result prediction is calibrated according to the intermediate text information corresponding to the next speech data.
Optionally, in the process of performing result prediction according to the current intermediate text information, the next voice data input by the user may be obtained in real time, the next voice data is subjected to voice recognition through a voice recognition technology to obtain corresponding intermediate text information, and the prediction result when performing result prediction on the current intermediate text information is calibrated according to the intermediate text information.
For example, taking the current intermediate text information as "weather", assuming that the predicted result is "weather forecast" when the result of the current intermediate text information is predicted, at this time, the next voice data input by the user may also be acquired, and voice recognition may be performed on the next voice data to obtain the corresponding intermediate text information "early warning", and at this time, the predicted result "weather forecast" when the result of the previous intermediate text information "weather" is predicted may be calibrated according to the intermediate text information "early warning" to obtain the text result "weather early warning". Therefore, in the process of predicting the result according to the current intermediate text information, the previous prediction result can be calibrated through the intermediate text information corresponding to the next voice data, so that the voice recognition efficiency is improved, and the accuracy of the voice recognition is guaranteed.
And S140, searching according to the target text result, acquiring a corresponding search result, and providing the corresponding search result for the user.
As an example implementation manner, when a target text result is obtained, a search may be performed according to the target text result to obtain a corresponding search result, and then a format type of the search result may be determined, a corresponding presentation manner may be determined according to the format type, and the search result may be presented to the user according to the corresponding presentation manner.
For example, when the format type is the MP3 format, determining that the corresponding presentation mode is a playing mode, and playing the search result to the user through an audio playing module; when the format type is a TTS (text to speech) format (such as weather forecast), determining that the corresponding presentation mode is a voice broadcast and text presentation mode, and providing the search result to the user through the voice broadcast and text presentation modes.
For example, as shown in fig. 2, it is assumed that the search method based on speech recognition according to the embodiment of the present invention is applied to an intelligent robot, and the intelligent robot has a sound box therein, and sound of a surrounding environment can be collected through the sound box. When the voice input of a user is detected, the current voice data input by the user can be obtained in real time through the sound box, the voice recognition system is used for carrying out voice recognition on the current voice data to obtain corresponding current intermediate text information, result prediction is carried out on the current intermediate text information to obtain a target text result, then searching can be carried out in a resource library according to the target text result to obtain a corresponding search result, the format type of the search result is determined, a corresponding display mode is determined according to the format type, and the search result is displayed to the user through the sound box according to the corresponding display mode.
In order to improve the usability and feasibility of the present invention, optionally, in an embodiment of the present invention, before the search is performed according to the target text result, it may be determined whether the user ends the voice input, and when the user ends the voice input, the search is performed according to the target text result.
In the embodiment of the present invention, a specific implementation manner of determining whether the user ends the voice input may be as follows: when the fact that the user starts inputting the voice is detected, the voice feature of the user can be extracted from the voice which starts inputting, therefore, in the process of obtaining the voice which is input by the user, whether the sound sent by the user is contained in the collected audio is judged in real time according to the voice feature, and if the fact that the sound sent by the user is not contained in the currently collected audio is judged, the fact that the user finishes the voice inputting can be judged.
In order to further improve the accuracy of the determination, optionally, in an embodiment of the present invention, when it is detected that the user starts inputting the voice, the voice feature of the user may be extracted from the voice which starts inputting, so that in the process of acquiring the voice input by the user, it is determined whether the collected audio contains the sound emitted by the user according to the voice feature in real time, and if it is determined that the currently collected audio does not contain the sound emitted by the user and the audio containing the sound emitted by the user is collected for a certain time, it may be determined that the user has ended the voice input.
According to the searching method based on voice recognition, when the fact that a user starts to input voice is detected, current voice data input by the user are obtained in real time, voice recognition is conducted on the current voice data obtained in real time to obtain corresponding current intermediate text information, result prediction is conducted according to the current intermediate text information to obtain a target text result, then searching is conducted according to the target text result to obtain a corresponding searching result, and the corresponding searching result is provided for the user. The voice data input by the user is identified and responded in real time, the user does not need to wait for the completion of all voice input and the closing of the microphone, so that the response time of the equipment for voice identification processing is saved invisibly, the voice search efficiency is improved, and the user experience is improved.
Corresponding to the search methods based on speech recognition provided in the above-mentioned several embodiments, an embodiment of the present invention further provides a search apparatus based on speech recognition, and since the search apparatus based on speech recognition provided in the embodiment of the present invention corresponds to the search methods based on speech recognition provided in the above-mentioned several embodiments, the implementation manner of the search method based on speech recognition is also applicable to the search apparatus based on speech recognition provided in the embodiment, and is not described in detail in the embodiment. Fig. 3 is a schematic structural diagram of a search apparatus based on speech recognition according to an embodiment of the present invention. As shown in fig. 3, the speech recognition-based search apparatus 300 may include: an acquisition module 310, a speech recognition module 320, a text result prediction module 330, a search module 340, and a provision module 350.
Specifically, the obtaining module 310 is configured to obtain current voice data input by the user in real time when it is detected that the user starts inputting voice.
The speech recognition module 320 is configured to perform speech recognition on the current speech data acquired in real time to obtain corresponding current intermediate text information.
The text result prediction module 330 is configured to perform result prediction according to the current intermediate text information to obtain a target text result. As an example implementation manner, the text result predicting module 330 may perform result prediction on the current intermediate text information according to a pre-established prediction model to obtain a corresponding search keyword sample with the maximum utilization rate, where the prediction model is obtained by training a plurality of search keyword samples and the utilization rates corresponding to the plurality of search keyword samples, and takes the corresponding search keyword sample with the maximum utilization rate as the target text result.
The search module 340 is configured to perform a search according to the target text result to obtain a corresponding search result.
The providing module 350 is used for providing the corresponding search results to the user. As an example, as shown in fig. 4, the providing module 350 may include a determining unit 351 and a providing unit 352. The determining unit 351 is configured to determine a format type of the search result. The providing unit 352 is configured to determine a corresponding presentation manner according to the format type, and present the search result to the user according to the corresponding presentation manner.
For example, when the format type is MP3 format, the providing unit 352 may determine that the corresponding presentation mode is a playing mode, and play the search result to the user through an audio playing module; when the format type is a TTS format, the providing unit 352 may determine that the corresponding presentation manner is a voice broadcast and text presentation manner, and provide the search result to the user through the voice broadcast and text presentation manner.
In order to guarantee the accuracy of the speech recognition, optionally, in an embodiment of the present invention, as shown in fig. 5, the speech recognition-based search apparatus 300 may further include: prediction result calibration module 360. In an embodiment of the present invention, the obtaining module 310 is further configured to obtain next voice data input by the user; the voice recognition module 320 is further configured to perform voice recognition on the next voice data to obtain corresponding intermediate text information; the prediction result calibration module 360 is configured to calibrate the result prediction according to the intermediate text information corresponding to the next speech data in the process of performing the result prediction according to the current intermediate text information to obtain the target text result.
According to the searching device based on voice recognition, the current voice data input by the user can be obtained in real time when the obtaining module detects that the user starts to input voice, the voice recognition module conducts voice recognition on the current voice data obtained in real time to obtain corresponding current intermediate text information, the text result prediction module conducts result prediction according to the current intermediate text information to obtain a target text result, the searching module conducts searching according to the target text result to obtain a corresponding searching result, and the providing module provides the corresponding searching result for the user. The voice data input by the user is identified and responded in real time, the user does not need to wait for the completion of all voice input and the closing of the microphone, so that the response time of the equipment for voice identification processing is saved invisibly, the voice search efficiency is improved, and the user experience is improved.
In order to implement the above embodiments, the present invention further provides an electronic device.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention. It should be noted that, in the embodiment of the present invention, the electronic device may be a device having a speech recognition system and a search function, so as to implement a speech search function. For example, the electronic equipment can be an intelligent robot, and human-computer voice interaction with a user is realized; as another example, the electronic device can also be a search server with voice search.
As shown in fig. 6, theelectronic device 600 may include: amemory 610, aprocessor 620 and acomputer program 630 stored in thememory 610 and operable on theprocessor 620, wherein theprocessor 620 executes theprogram 630 to implement the search method based on speech recognition according to any of the above embodiments of the present invention.
In order to implement the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the speech recognition based search method according to any of the above embodiments of the present invention.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.