Disclosure of Invention
The invention provides a voice recognition processing method and device applied to intelligent power application, which can improve the quality of voice signals and the extraction accuracy of voiceprint features, are beneficial to improving the voiceprint recognition accuracy, are beneficial to improving the identity verification accuracy of power system operators, and are beneficial to improving the safety of a power system.
To solve the above technical problem, a first aspect of the present invention discloses a voice recognition processing method applied to intelligent power application, the method comprising:
Acquiring a first voice signal of a current operator of the power system;
Processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;
extracting features of the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;
and identifying the voiceprint characteristic information based on a preset deep neural network algorithm to finish the identity verification of the current operator.
In an optional implementation manner, in a first aspect of the present invention, the feature extracting the second voice signal to obtain voiceprint feature information corresponding to the second voice signal includes:
Analyzing the second voice signal to obtain a signal analysis result;
screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In a first aspect of the present invention, the signal analysis result includes at least the number of users corresponding to the second voice signal, or
The signal analysis result comprises the number of users corresponding to the second voice signal and also comprises the speaking time period of each user;
And screening a target feature extraction mode meeting a preset feature extraction condition from a plurality of preset feature extraction modes according to the signal analysis result, wherein the target feature extraction mode comprises the following steps:
Judging whether the number of users is smaller than or equal to a first preset number according to the number of users contained in the signal analysis result;
When the number of users is smaller than or equal to the first preset number, determining a first feature extraction mode based on GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
Comparing the speaking time periods of all the users contained in the signal analysis result when the number of the users is larger than the first preset number, so as to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
When the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on a deep neural network as a target feature extraction mode meeting a preset feature extraction condition;
when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting a preset characteristic extraction condition;
Each of the feature extraction modes includes one of the first feature extraction mode, the second feature extraction mode and the third feature extraction mode.
In an optional implementation manner, in a first aspect of the present invention, the determining, according to the number of users and the speaking overlapping time periods of all the users, whether the current condition of the second speech signal meets a preset user scene condition includes:
Judging whether the number of users is larger than or equal to a second preset number according to the number of users, and recording the number of users as a first condition;
Judging whether the time length of the speaking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the speaking overlapping time periods of the users, and recording the time length as a second condition;
When the first condition and/or the second condition are/is met, determining that the current condition of the second voice signal meets a preset user scene condition; and when the first condition and the second condition are not met, determining that the current condition of the second voice signal does not meet the preset user scene condition.
In a first aspect of the present invention, the identifying the voiceprint feature information based on a preset deep neural network algorithm to complete the authentication of the current operator includes:
based on a voice recognition model trained by a preset deep neural network algorithm, recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and carrying out identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators.
As an optional implementation manner, in the first aspect of the present invention, the authenticating the current operator according to the probability scores of the voiceprint feature information among all the initial operators includes:
Comparing the probability scores of the voiceprint feature information in all the initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
When the probability score of the target operator is larger than or equal to the preset score, determining that the identity verification of the current operator is passed;
And when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
As an optional implementation manner, in the first aspect of the present invention, after the authenticating the current operator according to the probability score of the voiceprint feature information among all the initial operators, the method further includes:
when the verification fails, outputting prompt information of verification failure to the current operator;
when the verification is passed, determining the identity label of the target operator as the identity label of the current operator;
Determining operation authorities matched with the identity tags from operation authorities of all initial operators preset in the power system according to the identity tags of the current operators, wherein the operation authorities are used as target operation authorities of the current operators, and each operation authority has a corresponding identity tag;
And controlling the power system to unlock the functional module corresponding to the target operation authority to the current operator according to the target operation authority, so that the current operator can execute corresponding control operation on the functional module.
A second aspect of the present invention discloses a voice recognition processing device for smart power applications, the device comprising:
the acquisition module is used for acquiring a first voice signal of a current operator of the power system;
The processing module is used for processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;
the extraction module is used for carrying out feature extraction on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;
And the identification module is used for identifying the voiceprint characteristic information based on a preset deep neural network algorithm so as to finish the identity verification of the current operator.
In a second aspect of the present invention, the extracting module performs feature extraction on the second voice signal, and the manner of obtaining voiceprint feature information corresponding to the second voice signal specifically includes:
Analyzing the second voice signal to obtain a signal analysis result;
screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In a second aspect of the present invention, the signal analysis result includes at least the number of users corresponding to the second voice signal, or
The signal analysis result comprises the number of users corresponding to the second voice signal and also comprises the speaking time period of each user;
And the extraction module screens out a target feature extraction mode meeting the preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result, wherein the method specifically comprises the following steps:
Judging whether the number of users is smaller than or equal to a first preset number according to the number of users contained in the signal analysis result;
When the number of users is smaller than or equal to the first preset number, determining a first feature extraction mode based on GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
Comparing the speaking time periods of all the users contained in the signal analysis result when the number of the users is larger than the first preset number, so as to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
When the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on a deep neural network as a target feature extraction mode meeting a preset feature extraction condition;
when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting a preset characteristic extraction condition;
Each of the feature extraction modes includes one of the first feature extraction mode, the second feature extraction mode and the third feature extraction mode.
In a second aspect of the present invention, the method for determining whether the current condition of the second speech signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all the users includes:
Judging whether the number of users is larger than or equal to a second preset number according to the number of users, and recording the number of users as a first condition;
Judging whether the time length of the speaking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the speaking overlapping time periods of the users, and recording the time length as a second condition;
When the first condition and/or the second condition are/is met, determining that the current condition of the second voice signal meets a preset user scene condition; and when the first condition and the second condition are not met, determining that the current condition of the second voice signal does not meet the preset user scene condition.
In a second aspect of the present invention, the identifying module identifies the voiceprint feature information based on a preset deep neural network algorithm, so as to complete the authentication of the current operator specifically includes:
based on a voice recognition model trained by a preset deep neural network algorithm, recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and carrying out identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators.
In a second aspect of the present invention, as an alternative implementation manner, the recognition module is configured to determine, based on probability scores of the voiceprint feature information among all the initial operators, the method for carrying out identity verification on the current operator specifically comprises the following steps:
Comparing the probability scores of the voiceprint feature information in all the initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
When the probability score of the target operator is larger than or equal to the preset score, determining that the identity verification of the current operator is passed;
And when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
As an alternative embodiment, in the second aspect of the present invention, the apparatus further includes:
The output module is used for outputting prompt information of failure verification to the current operator when the verification fails after the identification module performs identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators;
the determining module is used for determining the identity label of the target operator as the identity label of the current operator when the verification is passed;
The determining module is further configured to determine, according to the identity tag of the current operator, an operation right matched with the identity tag from operation rights of all initial operators preset in the electric power system, where each operation right has a corresponding identity tag as a target operation right of the current operator;
And the control module is used for controlling the power system to unlock the functional module corresponding to the target operation authority to the current operator according to the target operation authority so that the current operator can execute corresponding control operation on the functional module.
A third aspect of the present invention discloses another voice recognition processing device for smart power applications, the device comprising:
a memory storing executable program code;
a processor coupled to the memory;
The processor invokes the executable program code stored in the memory to execute the voice recognition processing method for intelligent power application disclosed in the first aspect of the present invention.
A fourth aspect of the present invention discloses a computer storage medium storing computer instructions for performing the voice recognition processing method for smart power application disclosed in the first aspect of the present invention when the computer instructions are called.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
According to the embodiment of the invention, a first voice signal of a current operator of the power system is obtained, the first voice signal is processed based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal, the quality of the second voice signal is higher than that of the first voice signal, feature extraction is performed on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal, and the voiceprint feature information is identified based on a preset deep neural network algorithm to finish identity verification of the current operator. Therefore, the voice signal quality can be improved, the extraction accuracy of the voiceprint features can be improved, the voiceprint recognition accuracy can be improved, the identity verification accuracy of the current operator of the power system can be improved, and the operation safety of the power system can be improved.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The invention discloses a voice recognition processing method and device applied to intelligent power application, which can improve the quality of voice signals so as to improve the extraction accuracy of voiceprint features, and is beneficial to improving the accuracy of voiceprint recognition so as to improve the accuracy of identity verification of current operators of a power system and further improve the safety of the operation of the power system. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a voice recognition processing method applied to intelligent power application according to an embodiment of the invention. The voice recognition processing method applied to the smart power application described in fig. 1 may be applied to a voice recognition processing device of the smart power application, where the device may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and embodiments of the present invention are not limited. As shown in fig. 1, the voice recognition processing method applied to the smart power application may include the following operations:
101. A first voice signal of a current operator of the power system is obtained.
In the embodiment of the invention, specifically, when a current operator triggers an identity verification instruction for a power system, a prompt message of voiceprint authentication is output to the current operator through the power system so as to trigger the current operator to carry out voiceprint authentication, and the power system receives a voice signal sent by the current operator and is used as a first voice signal of the current operator.
102. And processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal.
In the embodiment of the present invention, the quality of the second speech signal is higher than that of the first speech signal, and optionally, the preset speech signal processing algorithm may be an audio enhancement algorithm of machine learning or Digital Signal Processing (DSP) technology, so as to enhance a low-quality audio signal (i.e., the first speech signal) to improve the recognition capability of the audio signal, or may be an algorithm of a deep learning model (such as GAN), so that, based on the low-quality audio signal, the algorithm of the deep learning model is used to synthesize a high-quality audio signal (i.e., the second speech signal), so as to facilitate subsequent voiceprint recognition training.
Specifically, after acquiring the first voice signal of the current operator of the power system, the method may further include:
analyzing the voice content of the first voice signal and judging whether the voice content is matched with verification content preset in the power system or not;
When the voice content is not matched with the verification content preset in the power system, outputting prompt information of voice content errors to current operators, and carrying out abnormal marking on the voice content in the power system;
When the voice content is judged to be matched with the verification content preset in the power system, triggering and executing the operation of step 102 based on the preset voice signal processing algorithm to process the first voice signal and obtain a second voice signal corresponding to the first voice signal. When the voice content is the same as the verification content or the content similarity between the voice content and the verification content is more than or equal to a preset similarity (such as 90%), the voice content is matched with the verification content; and when the content similarity between the voice content and the verification content is smaller than the preset similarity, the voice content is not matched with the verification content. Therefore, when the voiceprint of the current operator is verified, the voice content sent by the current operator is verified to be matched with the content required by the preset verification, the subsequent voiceprint feature verification operation is triggered and executed when the voice content is matched with the content required by the preset verification, and the identity verification accuracy and reliability of the operator are improved by setting the double verification flow of the voice content verification and the voiceprint feature verification, so that the use safety of the electric power system is improved.
103. And extracting the characteristics of the second voice signal to obtain voiceprint characteristic information corresponding to the second voice signal.
In the embodiment of the invention, specifically, based on the determined target feature extraction mode, feature extraction is performed on the second voice signal, so as to obtain voiceprint feature information corresponding to the second voice signal. The target feature extraction mode may be obtained by screening out a plurality of preset feature extraction modes according to a signal analysis result of the second voice signal, wherein the preset feature extraction conditions are satisfied.
104. And identifying voiceprint characteristic information based on a preset deep neural network algorithm to finish the identity verification of the current operator.
It can be seen that, implementing the voice recognition processing method applied to intelligent power application described in fig. 1 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system.
In an optional embodiment, the extracting the features of the second voice signal in step 103 to obtain voiceprint feature information corresponding to the second voice signal may include:
Analyzing the second voice signal to obtain a signal analysis result;
Screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to a signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In the embodiment of the invention, the signal analysis result at least comprises the number of users corresponding to the second voice signal, or the signal analysis result comprises the number of users corresponding to the second voice signal, and further comprises the speaking time period of each user, wherein the users at least comprise the current operator. Specifically, when the power system receives the voice signal sent by the current operator, the actually received voice signal source at least includes the current operator, and may also include other people in the scene where the current operator is located.
Optionally, each feature extraction mode may include one of a first feature extraction mode based on GMM-UBM (where GMM represents a gaussian mixture model, UBM represents a generic background model, and GMM-UBM represents a model that combines the gaussian mixture model with the generic background model), a second feature extraction mode based on a deep neural network (such as DNN, CNN, RNN, etc.), and a third feature extraction mode based on i-vector (where i-vector is a technology for speaker verification and is mainly used for extracting feature vectors of a speaker), which is not limited by the embodiment of the present invention.
Therefore, according to the alternative embodiment, the second voice signal can be analyzed to obtain a signal analysis result, the target feature extraction mode meeting the preset feature extraction conditions is screened out from the preset feature extraction modes according to the signal analysis result, the screening accuracy and efficiency of the target feature extraction mode aiming at the second voice signal can be improved, then the second voice signal is subjected to feature extraction based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal, the extraction accuracy and efficiency of the voiceprint feature information can be improved based on the accurately screened target feature extraction mode, and the extraction diversity and flexibility of the voiceprint feature of the second voice signal can be improved by providing the feature extraction mode with diversified selection.
In this optional embodiment, as an optional implementation manner, according to a signal analysis result, selecting, from a plurality of preset feature extraction manners, a target feature extraction manner that satisfies a preset feature extraction condition may include:
judging whether the number of users is less than or equal to a first preset number according to the number of users contained in the signal analysis result;
when the number of users is smaller than or equal to a first preset number, determining a first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
When the number of users is judged to be larger than the first preset number, comparing the speaking time periods of all the users contained in the signal analysis result to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
when the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on the deep neural network as a target feature extraction mode meeting the preset feature extraction condition;
And when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting the preset characteristic extraction condition.
In the embodiment of the invention, optionally, the preset feature extraction condition is used for indicating that the environment where the speaker corresponding to the corresponding voice signal is located is in the corresponding speaking environment (such as a small number of people, a multi-person conversation, etc.), the user scene condition is used for indicating that the scene complexity of the environment where the speaker corresponding to the corresponding voice signal is located is in the preset scene complexity interval, and the speaking overlapping time period of all users is used as the determination basis of the scene complexity of the environment where the speaker corresponding to the corresponding voice signal is located. Specifically, when the time length of the speaking overlapping period of a certain user is zero, it indicates that the user does not speak with other users at the same time, and when the time length of the speaking overlapping period of a certain user is greater than zero, it indicates that the user speaks with other users at the same time, which is not limited by the embodiment of the present invention.
For example, when the number of users corresponding to the second voice signal is small, the first feature extraction mode based on the GMM-UBM can be directly selected as the target feature extraction mode of the second voice signal, so that the extracted voiceprint features can be well represented and have better interpretability by selecting the mode based on the GMM-UBM when the data is small, when the number of users corresponding to the second voice signal is large, the scene complexity of the environment where the speaker is located can be further analyzed, and therefore the second feature extraction mode based on the deep neural network or the third feature extraction mode based on the i-vector can be determined to be selected as the target feature extraction mode of the second voice signal, and therefore the recognition accuracy of the speaker can be improved by selecting the second feature extraction mode or the third feature extraction mode when the data is large, and the recognition accuracy and the reliability of the current operator can be improved.
It can be seen that the alternative implementation manner can judge whether the number of users is smaller than or equal to the first preset number according to the number of users contained in the signal analysis result, when the number of users is smaller than or equal to the first preset number, determine the first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting the preset feature extraction condition, thereby improving the accuracy and efficiency of the determination of the first feature extraction mode based on the GMM-UBM as the target feature extraction mode, being beneficial to improving the extraction efficiency and speed of voiceprint features of the second voice signal through the first feature extraction mode, when the number of users is larger than the first preset number, comparing the speaking time periods of all users contained in the signal analysis result to obtain the speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero, and judging whether the current condition of the second voice signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all users, being beneficial to improving the accuracy of the judgment of whether the second voice signal meets the user scene condition and the second voice signal by taking the second voice signal as the target feature extraction mode under the condition of more users, and improving the accuracy of the second voice signal depth of the neural network when the current condition meets the second voice signal extraction condition, thereby improving the accuracy of the target voice network, and determining the third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting preset characteristic extraction conditions, so that the accuracy and the reliability of determining the third characteristic extraction mode based on the i-vector as the target characteristic extraction mode are improved, and the speaker can be accurately distinguished.
In this optional embodiment, optionally, according to the number of users and the speaking overlapping time period of all the users, determining whether the current condition of the second speech signal meets the preset user scene condition may include:
Judging whether the number of the users is larger than or equal to the second preset number according to the number of the users, and marking the number as a first condition;
Judging whether the time length of the talking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the talking overlapping time periods of all the users, and recording the talking overlapping time periods as a second condition;
When the first condition and/or the second condition are/is satisfied, it is determined that the current condition of the second voice signal satisfies the preset user scene condition, and when neither the first condition nor the second condition is satisfied, it is determined that the current condition of the second voice signal does not satisfy the preset user scene condition.
In the embodiment of the invention, specifically, when the number of users is larger than or equal to the second preset number, and/or when the time length of the talking overlapping time period of the users with the number of at least the third preset number is larger than zero in all the users, the scene complexity of the environment where the speaker corresponding to the second voice signal is located is shown to be in the preset complexity interval, at this time, the second feature extraction mode which can effectively improve the recognition precision on the big data set can be selected to perform the voiceprint extraction, and when the number of users is smaller than the second preset number, and the time length of the talking overlapping time period of the users with the number of less than the third preset number is larger than zero, the scene complexity of the environment where the speaker corresponding to the second voice signal is shown to be not in the preset complexity interval, at this time, the third feature extraction mode which has relatively lower calculation complexity and is suitable for facing the environment of multiple speakers can be selected to perform the voiceprint extraction.
For example, assuming that the first preset number is 1, the second preset number is 5, and the third preset number is 3, when it is determined that the number of users of the speaker corresponding to the second voice signal is greater than or equal to 5, and/or when at least 3 users have the situation of speaking simultaneously with other users, it may be determined that the current condition of the second voice signal meets the preset user scene condition, and when it is determined that the number of users of the speaker corresponding to the second voice signal is less than 5, and when less than 3 users have the situation of speaking simultaneously with other users, it may be determined that the current condition of the second voice signal does not meet the preset user scene condition.
It can be seen that the optional implementation manner can also judge whether the number of users is greater than or equal to the second preset number according to the number of users, record as the first condition, judge whether the number of users is at least greater than zero in the speaking overlapping time period of the third preset number according to the speaking overlapping time period of all users, record as the second condition, when the first condition and/or the second condition are met, confirm that the current condition of the second voice signal meets the preset user scene condition, can improve the determination accuracy, diversity and flexibility that the current condition of the second voice signal meets the preset user scene condition by providing a selected diversified scene judging mode, and when the first condition and the second condition are not met, confirm that the current condition of the second voice signal does not meet the preset user scene condition, can improve the determination accuracy and reliability that the current condition of the second voice signal does not meet the preset user scene condition by the multiple judging mode of the scene where the user is located, thereby being beneficial to improving the determination accuracy and reliability of the subsequent feature extraction mode based on the accurate judging result.
Example two
Referring to fig. 2, fig. 2 is a flow chart of a voice recognition processing method applied to intelligent power application according to an embodiment of the invention. The voice recognition processing method applied to the smart power application described in fig. 2 may be applied to a voice recognition processing device of the smart power application, where the device may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and embodiments of the present invention are not limited. As shown in fig. 2, the voice recognition processing method applied to the smart power application may include the following operations:
201. A first voice signal of a current operator of the power system is obtained.
202. And processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal.
203. And extracting the characteristics of the second voice signal to obtain voiceprint characteristic information corresponding to the second voice signal.
In the embodiment of the present invention, for other descriptions of step 201 to step 203, please refer to the detailed descriptions of step 101 to step 103 in the first embodiment, and the description of the embodiment of the present invention is omitted.
204. And identifying voiceprint feature information input into the voice identification model based on a voice identification model trained by a preset deep neural network algorithm to obtain probability scores of the voiceprint feature information in different initial operators.
In the embodiment of the invention, the voice recognition model is obtained by training voice sample information of all initial operators through a deep neural network algorithm. The method comprises the steps of carrying out feature extraction on voice sample information of each initial operator to obtain voice print feature sample information of each initial operator, inputting the voice print feature sample information of all the initial operators into a pre-built model based on a preset depth neural network algorithm to carry out training to obtain a training result, evaluating the recognition accuracy of the model according to the training result, judging whether the recognition accuracy of the model reaches the preset recognition accuracy, determining the model as a voice recognition model trained based on the preset depth neural network algorithm when the recognition accuracy is judged to be reached, and re-executing model training operation when the recognition accuracy is judged not to be reached, until the recognition accuracy of the trained model reaches the preset recognition accuracy.
205. And according to probability scores of voiceprint characteristic information in all initial operators, carrying out identity verification on the current operator.
In the embodiment of the invention, the initial operator with the highest probability score in the probability scores of all the initial operators is optionally determined as the current operator, so that the identity verification of the current operator is finished, the identity verification efficiency and speed of the current operator are improved, or the initial operator with the highest probability score is selected from all the initial operators to be used as the target operator of the suspected current operator, and whether the current operator is the target operator is further judged according to the probability score of the target operator, so that the identity verification of the current operator is finished, and the identity verification accuracy and reliability of the current operator are improved.
It can be seen that, implementing the voice recognition processing method applied to intelligent power application described in fig. 2 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system. In addition, the voice recognition model trained by the preset depth neural network algorithm can be further used for recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, accurate recognition of the voice print characteristic information can be achieved based on the voice recognition model, accuracy and efficiency of the probability scores of the voice print characteristic information in the different initial operators can be improved, then identity verification can be conducted on the current operator according to the probability scores of the voice print characteristic information in all the initial operators, identity verification on the current operator can be achieved accurately based on the probability scores of accurate recognition, and accuracy and efficiency of the identity verification on the current operator can be improved.
In an alternative embodiment, the step 205 of authenticating the current operator according to the probability scores of the voiceprint feature information among all the initial operators may include:
Comparing probability scores of the voiceprint feature information in all initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
when the probability score of the target operator is larger than or equal to a preset score, determining that the identity verification of the current operator is passed;
and when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
In the embodiment of the present invention, optionally, the probability score of the voiceprint feature information in each initial operator may be in a percentage form, and the preset score may be 70%, or 80%, or any other preset value.
Therefore, the optional embodiment can compare probability scores of voiceprint feature information in all initial operators to obtain a target operator with highest probability score, judge whether the probability score of the target operator is larger than or equal to a preset score, determine that the identity verification of the current operator passes when judging that the probability score of the target operator is larger than or equal to the preset score, improve the determination accuracy and reliability of the identity verification of the current operator, and determine that the identity verification of the current operator does not pass when judging that the probability score of the target operator is smaller than the preset score, and improve the determination accuracy and reliability of the identity verification of the current operator.
In this optional embodiment, as an optional implementation manner, after the identity verification of the current operator according to the probability scores of the voiceprint feature information among all the initial operators, the method may further include:
When the verification fails, outputting prompt information of failure in verification to the current operator;
when the verification is passed, determining the identity label of the target operator as the identity label of the current operator;
according to the identity tag of the current operator, determining the operation authority matched with the identity tag from the operation authorities of all initial operators preset in the power system, wherein each operation authority is used as the target operation authority of the current operator and is provided with a corresponding identity tag;
according to the target operation authority, the power system is controlled to unlock the functional module corresponding to the target operation authority to the current operator, so that the current operator can execute corresponding control operation on the functional module.
In the embodiment of the invention, the operation authorities of a plurality of initial operators are preset in the electric power system, each initial operator corresponds to one identity tag, and the identity tag is associated with the operation authority of the corresponding initial operator, so that the corresponding operation authority of the initial operator can be directly determined through the identity tag of the initial operator. The power system may be provided with a plurality of functional modules for operator control, such as at least one of a monitoring and data acquisition module, a scheduling and optimization module, a fault detection and protection module, a device management and maintenance module, a load management and demand response module, and a communication and data management module.
Therefore, the optional implementation manner can directly output the prompt information of failure in verification to the current operator under the condition that the identity verification of the current operator is not passed, thereby being beneficial to prompting the current operator to carry out verification again and facilitating the propulsion of the re-verification, under the condition that the identity verification of the current operator is passed, the identity tag of the target operator is determined as the identity tag of the current operator, the operation authority matched with the identity tag is determined from the operation authorities of all initial operators preset in the electric power system according to the identity tag of the current operator, the determination accuracy of the identity tag of the current operator can be improved as the target operation authority of the current operator, and then the electric power system is controlled to unlock the corresponding functional module of the target operation authority according to the target operation authority, so that the current operator can execute corresponding control operation on the functional module, the corresponding operation authority of the current operator can be accurately granted to the current operator, the corresponding functional module can be unlocked for the current operator, the electric power system can be favorably unlocked by the current operator, and the safety control system can be favorably executed by the current operator.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a voice recognition processing device for smart power application according to an embodiment of the present invention. The voice recognition processing device applied to the smart power application described in fig. 3 may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and the embodiment of the invention is not limited. As shown in fig. 3, the voice recognition processing device applied to the smart power application may include:
the acquiring module 301 is configured to acquire a first voice signal of a current operator of the power system.
The processing module 302 is configured to process the first voice signal based on a preset voice signal processing algorithm, so as to obtain a second voice signal corresponding to the first voice signal.
The extracting module 303 is configured to perform feature extraction on the second voice signal, so as to obtain voiceprint feature information corresponding to the second voice signal.
The recognition module 304 is configured to recognize voiceprint feature information based on a preset deep neural network algorithm, so as to complete authentication of a current operator.
It can be seen that, implementing the voice recognition processing device applied to intelligent power application described in fig. 3 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system.
In an alternative embodiment, the extracting module 303 performs feature extraction on the second voice signal, and the manner of obtaining voiceprint feature information corresponding to the second voice signal may specifically include:
Analyzing the second voice signal to obtain a signal analysis result;
Screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to a signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
Therefore, according to the alternative embodiment, the second voice signal can be analyzed to obtain a signal analysis result, the target feature extraction mode meeting the preset feature extraction conditions is screened out from the preset feature extraction modes according to the signal analysis result, the screening accuracy and efficiency of the target feature extraction mode aiming at the second voice signal can be improved, then the second voice signal is subjected to feature extraction based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal, the extraction accuracy and efficiency of the voiceprint feature information can be improved based on the accurately screened target feature extraction mode, and the extraction diversity and flexibility of the voiceprint feature of the second voice signal can be improved by providing the feature extraction mode with diversified selection.
In this alternative embodiment, as an alternative implementation manner, the signal analysis result includes at least the number of users corresponding to the second voice signal, or the signal analysis result includes the number of users corresponding to the second voice signal, and further includes a speaking period of each user. And, the extracting module 303 may specifically select, according to the signal analysis result, a target feature extraction mode that meets the preset feature extraction condition from the preset feature extraction modes, where the method includes:
judging whether the number of users is less than or equal to a first preset number according to the number of users contained in the signal analysis result;
when the number of users is smaller than or equal to a first preset number, determining a first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
When the number of users is judged to be larger than the first preset number, comparing the speaking time periods of all the users contained in the signal analysis result to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
when the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on the deep neural network as a target feature extraction mode meeting the preset feature extraction condition;
when the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting the preset characteristic extraction condition;
Each feature extraction mode comprises one of a first feature extraction mode, a second feature extraction mode and a third feature extraction mode.
It can be seen that the alternative implementation manner can judge whether the number of users is smaller than or equal to the first preset number according to the number of users contained in the signal analysis result, when the number of users is smaller than or equal to the first preset number, determine the first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting the preset feature extraction condition, thereby improving the accuracy and efficiency of the determination of the first feature extraction mode based on the GMM-UBM as the target feature extraction mode, being beneficial to improving the extraction efficiency and speed of voiceprint features of the second voice signal through the first feature extraction mode, when the number of users is larger than the first preset number, comparing the speaking time periods of all users contained in the signal analysis result to obtain the speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero, and judging whether the current condition of the second voice signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all users, being beneficial to improving the accuracy of the judgment of whether the second voice signal meets the user scene condition and the second voice signal by taking the second voice signal as the target feature extraction mode under the condition of more users, and improving the accuracy of the second voice signal depth of the neural network when the current condition meets the second voice signal extraction condition, thereby improving the accuracy of the target voice network, and determining the third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting preset characteristic extraction conditions, so that the accuracy and the reliability of determining the third characteristic extraction mode based on the i-vector as the target characteristic extraction mode are improved, and the speaker can be accurately distinguished.
In this optional embodiment, optionally, the method for determining, by the extraction module 303, whether the current condition of the second speech signal meets the preset user scene condition according to the number of users and the speaking overlapping time period of all the users may specifically include:
Judging whether the number of the users is larger than or equal to the second preset number according to the number of the users, and marking the number as a first condition;
Judging whether the time length of the talking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the talking overlapping time periods of all the users, and recording the talking overlapping time periods as a second condition;
When the first condition and/or the second condition are/is satisfied, it is determined that the current condition of the second voice signal satisfies the preset user scene condition, and when neither the first condition nor the second condition is satisfied, it is determined that the current condition of the second voice signal does not satisfy the preset user scene condition.
It can be seen that the optional implementation manner can also judge whether the number of users is greater than or equal to the second preset number according to the number of users, record as the first condition, judge whether the number of users is at least greater than zero in the speaking overlapping time period of the third preset number according to the speaking overlapping time period of all users, record as the second condition, when the first condition and/or the second condition are met, confirm that the current condition of the second voice signal meets the preset user scene condition, can improve the determination accuracy, diversity and flexibility that the current condition of the second voice signal meets the preset user scene condition by providing a selected diversified scene judging mode, and when the first condition and the second condition are not met, confirm that the current condition of the second voice signal does not meet the preset user scene condition, can improve the determination accuracy and reliability that the current condition of the second voice signal does not meet the preset user scene condition by the multiple judging mode of the scene where the user is located, thereby being beneficial to improving the determination accuracy and reliability of the subsequent feature extraction mode based on the accurate judging result.
In another alternative embodiment, the identifying module 304 identifies voiceprint feature information based on a preset deep neural network algorithm, so as to complete the authentication of the current operator specifically may include:
Based on a voice recognition model trained by a preset deep neural network algorithm, identifying voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and according to probability scores of voiceprint characteristic information in all initial operators, carrying out identity verification on the current operator.
Therefore, the optional embodiment can identify the voiceprint feature information input into the voiceprint recognition model based on the voiceprint recognition model trained by the preset depth neural network algorithm to obtain the probability scores of the voiceprint feature information in different initial operators, can realize the accurate identification of the voiceprint feature information based on the voiceprint recognition model, is beneficial to improving the accuracy and efficiency of the probability scores of the voiceprint feature information in different initial operators obtained by identification, and then performs identity verification on the current operator according to the probability scores of the voiceprint feature information in all the initial operators, can accurately realize the identity verification on the current operator based on the probability scores of accurate identification, and is beneficial to improving the identity verification accuracy and efficiency of the current operator.
In this optional embodiment, as an optional implementation manner, the identifying module 304 may specifically perform the authentication on the current operator according to the probability scores of the voiceprint feature information among all initial operators, where the method includes:
Comparing probability scores of the voiceprint feature information in all initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
when the probability score of the target operator is larger than or equal to a preset score, determining that the identity verification of the current operator is passed;
and when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
Therefore, the optional implementation manner can compare probability scores of voiceprint feature information in all initial operators to obtain a target operator with highest probability score, judge whether the probability score of the target operator is larger than or equal to a preset score, determine that the identity verification of the current operator passes when judging that the probability score of the target operator is larger than or equal to the preset score, improve the determination accuracy and reliability of the identity verification of the current operator, and determine that the identity verification of the current operator does not pass when judging that the probability score of the target operator is smaller than the preset score, and improve the determination accuracy and reliability of the identity verification of the current operator.
In this alternative implementation, as shown in fig. 4, fig. 4 is a schematic structural diagram of another voice recognition processing device applied to smart power application according to an embodiment of the present invention, as shown in fig. 4, where the device may further include:
And the output module 305 is configured to output prompt information for failure in verification to the current operator when verification fails after the identification module 304 performs identity verification on the current operator according to probability scores of voiceprint feature information in all initial operators.
A determining module 306, configured to determine the identity tag of the target operator as the identity tag of the current operator when the verification is passed.
The determining module 306 is further configured to determine, according to the identity tag of the current operator, an operation right matching with the identity tag from operation rights of all initial operators preset in the power system, where each operation right has a corresponding identity tag as a target operation right of the current operator.
The control module 307 is configured to control the power system to unlock the function module corresponding to the target operation authority to the current operator according to the target operation authority, so that the current operator can perform a corresponding control operation on the function module.
It can be seen that the device described in fig. 4 can also directly output prompt information of failure in verification to the current operator under the condition that the identity verification of the current operator is not passed, so as to prompt the current operator to perform verification again, facilitate the re-verification to proceed, determine the identity tag of the target operator as the identity tag of the current operator under the condition that the identity verification of the current operator is passed, determine the operation right matched with the identity tag from the operation rights of all initial operators preset in the electric power system according to the identity tag of the current operator, as the target operation right of the current operator, improve the accuracy of determining the identity tag of the current operator, thereby improving the accuracy and efficiency of determining the target operation right of the current operator, and then control the electric power system to unlock the functional module corresponding to the target operation right according to the target operation right, so that the current operator can execute corresponding control operation on the functional module, accurately grant the current operator corresponding operation right, and unlock the corresponding functional module corresponding to the current operator.
Example IV
Referring to fig. 5, fig. 5 is a schematic structural diagram of a voice recognition processing device for smart power application according to another embodiment of the present invention. As shown in fig. 5, the voice recognition processing device applied to the smart power application may include:
A memory 401 storing executable program codes;
A processor 402 coupled with the memory 401;
The processor 402 invokes executable program codes stored in the memory 401 to perform the steps of the voice recognition processing method for smart power application described in the first or second embodiment of the present invention.
Example five
The embodiment of the invention discloses a computer storage medium which stores computer instructions for executing the steps in the voice recognition processing method applied to intelligent power application described in the first or second embodiment of the invention when the computer instructions are called.
Example six
An embodiment of the present invention discloses a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps of the voice recognition processing method for application to smart power applications described in the first or second embodiment.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Finally, it should be noted that the disclosure of the voice recognition processing method and apparatus for intelligent power application in the embodiments of the present invention is only a preferred embodiment of the present invention, and is only for illustrating the technical scheme of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical scheme described in the foregoing embodiments may be modified or some of the technical features thereof may be replaced equally, and these modifications or replacements do not make the essence of the corresponding technical scheme deviate from the spirit and scope of the technical scheme of the embodiments of the present invention.