Disclosure of Invention
In view of the above, the present invention provides a control method and apparatus using offline speech, and a readable device, and mainly aims to solve the problem in the prior art that in a special application scenario, when a user does not have a condition for manually operating the readable device due to limited hands, the user is inconvenient to manually operate or misoperation occurs.
In order to solve the above problems, embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a control method using offline speech, including:
receiving a voice request, and authenticating the identity of a user according to the voiceprint of the voice request;
if the identity authentication is passed, determining an operation target carried in the voice request according to an offline database, and displaying the operation target;
receiving a voice selection instruction for the operation target, and if the voice selection instruction is a determination instruction, operating the operation target according to the voice selection instruction;
and receiving a voice control instruction operated aiming at the operation target, and operating the voice control instruction based on the off-line database.
Optionally, the operating the voice control instruction based on the offline database includes:
recognizing the voice control instruction based on a voice recognition model, and determining the similarity between a preset instruction in the voice recognition model and the voice control instruction;
and when the similarity is determined to be greater than or equal to a preset similarity threshold, generating a keyword according to the recognition result, and searching and executing an operation corresponding to the keyword in the offline database.
Optionally, the method further includes:
and when the similarity is determined to be smaller than the preset similarity threshold, discarding the voice control instruction.
Optionally, after receiving a voice selection instruction for the operation target, the method further includes:
if the voice selection instruction is a cancel instruction, outputting prompt information for modifying the voice selection instruction;
and/or if the voice selection instruction is a cancel instruction, confirming the operation target in the offline database again according to the voice request until the received voice selection instruction is a confirmation instruction.
Optionally, before the voice control instruction is executed based on the offline database, the method further includes:
constructing at least two levels of menu applications for common operation targets in the offline database;
executing the voice control instruction based on the offline database comprises:
and running the voice control instruction based on at least two levels of menu applications in the offline database.
In a second aspect, an embodiment of the present invention further provides a control device using offline speech, including:
a first receiving unit for receiving a voice request;
the authentication unit is used for authenticating the identity of the user according to the voiceprint of the voice request;
the determining unit is used for determining an operation target carried in the voice request according to an offline database when the identity authentication of the authentication unit passes;
the display unit is used for displaying the operation target determined by the first determination unit;
the second receiving unit is used for receiving a voice selection instruction of the operation target;
the first operation unit is used for operating an operation target according to the voice selection instruction when the voice selection instruction is a determination instruction;
a third receiving unit, configured to receive a voice control instruction executed for the operation target;
and the second operation unit is used for operating the voice control instruction based on the off-line database.
Optionally, the second operation unit includes:
the recognition module is used for recognizing the voice control instruction based on a voice recognition model;
the determining module is used for determining the similarity between a preset instruction in the voice recognition model and the voice control instruction;
the generating module is used for generating a keyword according to the identification result when the similarity is determined to be greater than or equal to a preset similarity threshold;
and the searching module is used for searching and executing the operation corresponding to the keyword in the offline database.
Optionally, the second operation unit further includes:
and the discarding module is used for discarding the voice control instruction when the second determining module determines that the similarity is smaller than the preset similarity threshold.
Optionally, the apparatus further comprises:
the output unit is used for outputting prompt information for modifying the voice selection instruction after the second receiving unit receives the voice selection instruction of the operation target and when the voice selection instruction is a cancel instruction;
and the confirming unit is used for confirming the operation target in the offline database again according to the voice request when the voice selection instruction is a cancel instruction until the received voice selection instruction is a confirmation instruction.
Optionally, the apparatus further comprises:
the construction unit is used for constructing at least two levels of menu applications of common operation targets in the offline database before the second operation unit operates the voice control instruction based on the offline database;
the second operation unit is further configured to operate the voice control instruction based on at least two levels of menu applications in the offline database.
In a third aspect, an embodiment of the present invention further provides a readable device, where the readable device includes:
at least one processor;
and at least one memory, bus connected with the processor; wherein,
the processor and the memory complete mutual communication through the bus;
the processor is used for calling the program instructions in the memory, and the program instructions execute the control method using the offline voice.
By means of the technical scheme, the control method, the control device and the recognition equipment using the off-line voice receive the voice request and perform identity authentication on the user according to the voiceprint of the voice request; if the identity authentication is passed, determining an operation target contained in the voice request according to an offline database, and displaying the operation target; receiving a first voice selection instruction for the operation target, and if the voice selection instruction is a determination instruction, operating the operation target according to the first voice instruction of the voice selection instruction; and receiving a voice control instruction operated aiming at the operation target, and operating the voice control instruction based on the off-line database. Compared with the prior art, the embodiment of the invention has the advantages that the identity of the user is authenticated through the voiceprint, the operation of the readable equipment is completed through voice, the operability and the safety of the readable equipment are improved, the operation time of the readable equipment in the using process is shortened, and the user experience of the user of the readable equipment is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a control method using offline voice, and mainly aims to solve the problem that in the prior art, due to the fact that the two hands of a user are limited, the condition that a readable device (such as a mobile phone, an iPad and other touch screen electronic devices) is not manually operated usually does not exist, and therefore the user is inconvenient to operate or is easy to operate mistakenly when holding the device by hand. In order to solve the above problem, an embodiment of the present invention provides a control method using offline speech, as shown in fig. 1, the method including:
101. and receiving a voice request, and authenticating the identity of the user according to the voiceprint of the voice request.
In practical applications, the control method using offline voice is mainly applied to readable devices, such as: the smart phone, the smart speaker, the computer, the smart wearable device, the iPad, and the like, specifically, the embodiment of the present invention does not limit the specific type, model, and the like of the readable device, and the following embodiments describe the smart phone as an example, but it should be understood that the description manner is not intended to limit the specific type of the readable device.
After the readable equipment receives the voice request, identity authentication is completed through a voiceprint recognition mode, the recognition is a fuzzy processing mode, namely voice information sent randomly can be verified through comparing an offline voiceprint library to determine whether the user is a readable equipment user, and otherwise, the user is determined to be an invalid request. Compared with the mode of inputting passwords or gesture instructions, the method is more convenient and faster, and compared with voice recognition, the method is easier to adapt to complex environments and is beneficial to improving the recognition accuracy. In order to better authenticate the user, an acoustic model, a language model, etc. may be used to identify the user identity information from the voiceprint information of the voice request, wherein for the identification process of the user identity information from the voiceprint information of the voice request, refer to the specific implementation manner in the prior art.
102. And if the identity authentication is passed, determining an operation target carried in the voice request according to an offline database, and displaying the operation target.
In the embodiment of the present invention, the offline database is mainly used for authenticating the identity of the user and comparing the voice command to realize voice control of the readable device, and the offline data may include user voiceprint information, all application programs in the readable device, interface selection key information corresponding to common application programs of the readable device, and the like, which are not limited specifically.
When the identity is authenticated as the readable device user, the operation target related to the voice request needs to be identified according to the offline database, and the operation target may be specific application software, such as browser software, music software, video software, social software, and the like. The following are exemplary: if the content of the voice request is music software opening, the readable device may directly indicate an icon or text of the music software on the display screen, or directly broadcast music by voice, specifically as shown in fig. 2, where fig. 2 shows a schematic diagram of a display mode of an operation target provided by an embodiment of the present invention.
103. And receiving a voice selection instruction for the operation target, and if the voice selection instruction is a determination instruction, operating the operation target according to the voice selection instruction.
After the operation target is displayed, entering a selection instruction for the user to determine whether the displayed operation target is correct, and continuing to complete the next operation only when the recognizable device receives the determined selection instruction, which is exemplary: if the content of the voice request is music software opening, the operation target displayed by the recognition device is the music software, the received voice selection instruction is a determination instruction, and the recognition device continues to complete positioning the music software icon and open and run the music software.
104. And receiving a voice control instruction operated aiming at the operation target, and operating the voice control instruction based on the off-line database.
In the embodiment of the present invention, the recognition operation of the voice control command is mainly applied to the operation of the specific operation target (such as music) in the readable device, by way of example instep 103, after the music software is opened, the voice control command is sent to search for the song, and the readable device performs intent judgment and keyword extraction on the voice control command according to the offline database to complete the song search command.
The control method using the off-line voice provided by the embodiment of the invention receives a voice request and carries out identity authentication on a user according to a voiceprint of the voice request; determining an operation target contained in the voice request according to an offline database, and displaying the operation target; and receiving a voice control instruction operated aiming at the operation target, and operating the voice control instruction based on the off-line database. Compared with the prior art, the embodiment of the invention has the advantages that the identity of the user is authenticated through the voiceprint, the accuracy of identity authentication is improved, the operation of the readable equipment is completed through voice, the operation time of the readable equipment in the using process is shortened, and the user experience of the user of the readable equipment is enhanced.
As a refinement and extension of the above embodiment, in the embodiment of the present invention, when the voice control instruction is executed according to the offline database, the voice control instruction is recognized based on the voice recognition model, and the similarity between the preset instruction in the voice recognition model and the voice control instruction is determined, so as to determine that the preset instruction and the voice control instruction are all searched in the offline database and perform the operation corresponding to the keyword, so as to improve the accuracy of voice control. In order to implement the above functions, an embodiment of the disclosure further provides a control method using offline speech, as shown in fig. 3, where the method includes:
201. and receiving a voice request, and authenticating the identity of the user according to the voiceprint of the voice request.
For the description ofstep 201, please refer to the detailed description ofstep 101, and the embodiments of the present invention will not be described herein.
202. And if the identity authentication is passed, determining an operation target carried in the voice request according to an offline database, and displaying the operation target.
For the description ofstep 202, please refer to the detailed description ofstep 102, and the embodiments of the present invention will not be described herein.
203. And receiving a voice selection instruction for the operation target, and if the voice selection instruction is a determination instruction, operating the operation target according to the voice selection instruction.
For the description ofstep 203, please refer to the detailed description ofstep 103, and the embodiments of the present invention will not be described herein.
It should be noted that, when the displayed operation target is wrong, the user can send out a voice cancellation instruction, and when the recognition device receives the voice cancellation instruction, the recognition device recognizes the voice information and outputs a prompt message for modifying the voice selection instruction; and/or if the voice selection instruction is a cancel instruction, confirming the operation target in the offline database again according to the voice request until the received voice selection instruction is a confirmation instruction. Exemplary 1: the content of the voice request is that the WeChat software is opened, the operation target displayed by the readable device is a WeChat icon, the direct instruction is wrong, when the voice canceling instruction is sent, the readable device can display the prompt information for modifying the voice selection instruction when receiving the voice canceling instruction, and at the moment, the user can send the confirmation voice instruction again. Example 2: the content of the voice request is that the WeChat software is opened, the operation target displayed by the recognition device is a microblog icon, a user needs to send a voice cancellation instruction, the user receives the voice cancellation instruction, compares the voice cancellation instruction with the offline database again according to the voice request after recognizing the voice cancellation instruction, and reconfirms and displays the operation target until the displayed icon is WeChat.
204. Receiving a voice control instruction running aiming at the operation target, identifying the voice control instruction based on a voice identification model, and determining the similarity between a preset instruction in the voice identification model and the voice control instruction.
In the embodiment of the present invention, the recognition of the voice control instruction, the determination of the content of the voice control instruction, the recognition of the voice control instruction based on the voice recognition model, and the determination of the similarity between the preset instruction in the voice recognition model and the voice control instruction may be performed according to the number of the same words and the semantic similarity, the algorithm of the number of the same words may be the ratio of the number of the same words of the control instruction and the preset instruction to the total number of words of the control instruction, and is not limited specifically, the range of the similarity may be set to 0-1, the maximum similarity is 1, and the minimum similarity is 0.
205. And when the similarity is determined to be greater than or equal to a preset similarity threshold, generating a keyword according to the recognition result, searching and executing the operation corresponding to the keyword in the offline database, and when the similarity is determined to be less than the preset similarity threshold, discarding the voice control instruction.
In the embodiment of the invention, a similarity threshold needs to be preset to judge whether to execute the next operation, the set similarity threshold is not easy to be too large, such as 1, although the accuracy is ensured, the similarity is too large, because the pronunciation of a user is inaccurate or is influenced by the environment, the user is different from a preset instruction, the user needs to repeatedly send the instruction, and the similarity is not easy to be too small, such as 0.1, at this time, the judged similarity is too low, which is not beneficial to improving the accuracy of the operation, in the embodiment of the invention, the similarity thresholds can be set to be 0.7 and 0.8, the instruction sending times can be reduced on the premise of ensuring the accuracy, and the method is not limited specifically.
206. And operating the voice control instruction based on the off-line database.
In the embodiment of the present invention, before using the offline voice control function of the readable device, it is necessary to perfect an offline database, including at least two levels of menu applications for common operation targets built in the offline database, and running a corresponding voice control instruction based on the at least two levels of menu applications in the offline database. When the readable device is a mobile phone, the commonly used operation target may be WeChat, QQ, a call, a short message, music, a video, and the like, the two-level menu application is specifically shown in fig. 4, fig. 4 shows a schematic diagram of the two-level menu application provided in the embodiment of the present invention, for example, when the commonly used operation target is music software, the first-level menu information of the music software is recommendation, a radio station, selection, a list, a video, and my, the second-level menu under the first-level menu list is a new song list, a hot song list, a leap-up-list, and the like, and the corresponding voice control instruction may be to open a list, listen to a new song list, and the like, which is not limited specifically.
It should be noted that, in the embodiment of the present invention, for the actual application situation of different page displays in the actual operation, it may be satisfied that the common operation target includes multiple items of checking and cross-page joint checking information operations, and exemplarily, in the album of the mobile phone device, processing such as deleting and forwarding a plurality of photos together may be completed.
In conclusion, a voice request is received, the user is authenticated according to the voiceprint of the voice request, the offline database is perfected, when the voice control instruction is operated according to the offline database, the voice control instruction is identified based on the voice identification model, the similarity between the preset instruction in the voice identification model and the voice control instruction is determined, so that the operation corresponding to the keyword is searched and executed in the offline database, the authentication of the user of the readable device under special conditions can be completed, the operation of the readable device is completed through voice, the operability and the safety of the readable device are improved, the operation time of the use process of the readable device is shortened, and the user experience of the user of the readable device is improved.
Further, as an implementation of the method shown in the foregoing embodiment, another embodiment of the present invention further provides a device for verifying test data. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method.
An embodiment of the present invention provides a control device using offline speech, as shown in fig. 5, including:
afirst receiving unit 31 for receiving a voice request;
theauthentication unit 32 is used for authenticating the identity of the user according to the voiceprint of the voice request;
a determiningunit 33, configured to determine, according to an offline database, an operation target carried in the voice request when the identity authentication of the authenticatingunit 32 passes;
adisplay unit 34, configured to display the operation target determined by the first determiningunit 33;
asecond receiving unit 35, configured to receive a voice selection instruction for the operation target;
afirst operation unit 36, configured to, when the voice selection instruction is a determination instruction, operate an operation target according to the voice selection instruction;
athird receiving unit 37, configured to receive a voice control instruction executed for the operation target;
asecond operation unit 38, configured to operate the voice control instruction based on the offline database.
The control device using the offline voice receives the voice request and performs identity authentication on a user according to the voiceprint of the voice request; determining an operation target contained in the voice request according to an offline database, and displaying the operation target; and receiving a voice control instruction operated aiming at the operation target, and operating the voice control instruction based on the off-line database. Compared with the prior art, the embodiment of the invention has the advantages that the identity of the user is authenticated through the voiceprint, the accuracy of identity authentication is improved, the operation of the readable equipment is completed through voice, the operation time of the readable equipment in the using process is shortened, and the user experience of the user of the readable equipment is enhanced.
Further, as shown in fig. 6, thesecond operation unit 38 includes:
therecognition module 381 is used for recognizing the voice control instruction based on a voice recognition model;
a determiningmodule 382, configured to determine similarity between a preset instruction in the speech recognition model and the speech control instruction;
agenerating module 383, configured to generate a keyword according to the recognition result when it is determined that the similarity is greater than or equal to a preset similarity threshold;
and the searching module 384 is used for searching and executing the operation corresponding to the keyword in the offline database.
Further, as shown in fig. 6, thesecond operation unit 38 further includes:
an ignoringmodule 385, configured to discard the voice control instruction when the second determiningmodule 382 determines that the similarity is smaller than the preset similarity threshold.
Further, as shown in fig. 6, the apparatus further includes:
an output unit 39, configured to output prompt information for modifying the voice selection instruction after thesecond receiving unit 35 receives the voice selection instruction for the operation target and when the voice selection instruction is a cancel instruction;
the confirmingunit 310 is configured to, when the voice selection instruction is a cancel instruction, re-confirm the operation target in the offline database according to the voice request until the received voice selection instruction is a confirm instruction.
Further, as shown in fig. 6, the apparatus further includes:
aconstructing unit 311, configured to construct at least two levels of menu applications for common operation targets in the offline database before the second executingunit 38 executes the voice control instruction based on the offline database;
thesecond operation unit 38 is further configured to operate the voice control instruction based on at least two levels of menu applications in the offline database.
In conclusion, a voice request is received, the user is authenticated according to the voiceprint of the voice request, the offline database is perfected, when the voice control instruction is operated according to the offline database, the voice control instruction is identified based on the voice identification model, the similarity between the preset instruction in the voice identification model and the voice control instruction is determined, so that the operation corresponding to the keyword is searched and executed in the offline database, the authentication of the user of the readable device under special conditions can be completed, the operation of the readable device is completed through voice, the operability and the safety of the readable device are improved, the operation time of the use process of the readable device is shortened, and the user experience of the user of the readable device is improved.
An embodiment of the present invention provides a readable device, as shown in fig. 7, including:
at least oneprocessor 41;
and at least onememory 42, abus 43 connected with the processor 414; wherein,
theprocessor 41 and thememory 42 complete mutual communication through thebus 43;
theprocessor 41 is configured to call program instructions in thememory 42, which execute the steps in the above-described method embodiments.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the method and apparatus for verification of test data according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.