Movatterモバイル変換


[0]ホーム

URL:


CN120600034A - Sound recognition processing method and device for smart power applications - Google Patents

Sound recognition processing method and device for smart power applications

Info

Publication number
CN120600034A
CN120600034ACN202510772303.3ACN202510772303ACN120600034ACN 120600034 ACN120600034 ACN 120600034ACN 202510772303 ACN202510772303 ACN 202510772303ACN 120600034 ACN120600034 ACN 120600034A
Authority
CN
China
Prior art keywords
voice signal
preset
feature extraction
condition
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510772303.3A
Other languages
Chinese (zh)
Inventor
陈梓涛
李家樑
张皓月
王蓬辉
李科君
邓尧骏
柳致远
汤晟源
李沁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State-owned Assets Supervision and Administration Commission of the State Council
Original Assignee
State-owned Assets Supervision and Administration Commission of the State Council
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State-owned Assets Supervision and Administration Commission of the State CouncilfiledCriticalState-owned Assets Supervision and Administration Commission of the State Council
Priority to CN202510772303.3ApriorityCriticalpatent/CN120600034A/en
Publication of CN120600034ApublicationCriticalpatent/CN120600034A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention relates to the technical field of communication and discloses a voice recognition processing method and device applied to intelligent power application, wherein the method comprises the steps of obtaining a first voice signal of a current operator of a power system; the method comprises the steps of processing a first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal, wherein the quality of the second voice signal is higher than that of the first voice signal, extracting features of the second voice signal to obtain voiceprint feature information corresponding to the second voice signal, and identifying the voiceprint feature information based on a preset deep neural network algorithm to finish identity verification of a current operator. Therefore, the voice signal quality can be improved, the extraction accuracy of the voiceprint features can be improved, the voiceprint recognition accuracy can be improved, the identity verification accuracy of the current operator of the power system can be improved, and the operation safety of the power system can be improved.

Description

Voice recognition processing method and device applied to intelligent power application
Technical Field
The invention relates to the technical field of communication, in particular to a voice recognition processing method and device applied to intelligent power application.
Background
With the rapid development of global energy transformation and new generation information technology, the network security problem of the power industry is increasingly prominent, so that identity verification is required to ensure safe and stable operation of the system, for example, verification of the identity of a speaker through voiceprint recognition.
However, it is found in practice that in the prior art, the sampling rate of the voice signal to be recognized in the application of the power system is generally low, and the monitored voice signal may be affected by factors such as interference of environmental noise and simultaneous speaking of both parties, which make the quality of the voice signal low, so that it is difficult to extract obvious voiceprint features, the difficulty of voiceprint recognition is easily increased, and the effect and accuracy of voiceprint recognition are affected. It is important to propose a new information processing scheme to improve the accuracy of authentication, so as to improve the operation safety of the power system.
Disclosure of Invention
The invention provides a voice recognition processing method and device applied to intelligent power application, which can improve the quality of voice signals and the extraction accuracy of voiceprint features, are beneficial to improving the voiceprint recognition accuracy, are beneficial to improving the identity verification accuracy of power system operators, and are beneficial to improving the safety of a power system.
To solve the above technical problem, a first aspect of the present invention discloses a voice recognition processing method applied to intelligent power application, the method comprising:
Acquiring a first voice signal of a current operator of the power system;
Processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;
extracting features of the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;
and identifying the voiceprint characteristic information based on a preset deep neural network algorithm to finish the identity verification of the current operator.
In an optional implementation manner, in a first aspect of the present invention, the feature extracting the second voice signal to obtain voiceprint feature information corresponding to the second voice signal includes:
Analyzing the second voice signal to obtain a signal analysis result;
screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In a first aspect of the present invention, the signal analysis result includes at least the number of users corresponding to the second voice signal, or
The signal analysis result comprises the number of users corresponding to the second voice signal and also comprises the speaking time period of each user;
And screening a target feature extraction mode meeting a preset feature extraction condition from a plurality of preset feature extraction modes according to the signal analysis result, wherein the target feature extraction mode comprises the following steps:
Judging whether the number of users is smaller than or equal to a first preset number according to the number of users contained in the signal analysis result;
When the number of users is smaller than or equal to the first preset number, determining a first feature extraction mode based on GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
Comparing the speaking time periods of all the users contained in the signal analysis result when the number of the users is larger than the first preset number, so as to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
When the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on a deep neural network as a target feature extraction mode meeting a preset feature extraction condition;
when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting a preset characteristic extraction condition;
Each of the feature extraction modes includes one of the first feature extraction mode, the second feature extraction mode and the third feature extraction mode.
In an optional implementation manner, in a first aspect of the present invention, the determining, according to the number of users and the speaking overlapping time periods of all the users, whether the current condition of the second speech signal meets a preset user scene condition includes:
Judging whether the number of users is larger than or equal to a second preset number according to the number of users, and recording the number of users as a first condition;
Judging whether the time length of the speaking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the speaking overlapping time periods of the users, and recording the time length as a second condition;
When the first condition and/or the second condition are/is met, determining that the current condition of the second voice signal meets a preset user scene condition; and when the first condition and the second condition are not met, determining that the current condition of the second voice signal does not meet the preset user scene condition.
In a first aspect of the present invention, the identifying the voiceprint feature information based on a preset deep neural network algorithm to complete the authentication of the current operator includes:
based on a voice recognition model trained by a preset deep neural network algorithm, recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and carrying out identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators.
As an optional implementation manner, in the first aspect of the present invention, the authenticating the current operator according to the probability scores of the voiceprint feature information among all the initial operators includes:
Comparing the probability scores of the voiceprint feature information in all the initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
When the probability score of the target operator is larger than or equal to the preset score, determining that the identity verification of the current operator is passed;
And when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
As an optional implementation manner, in the first aspect of the present invention, after the authenticating the current operator according to the probability score of the voiceprint feature information among all the initial operators, the method further includes:
when the verification fails, outputting prompt information of verification failure to the current operator;
when the verification is passed, determining the identity label of the target operator as the identity label of the current operator;
Determining operation authorities matched with the identity tags from operation authorities of all initial operators preset in the power system according to the identity tags of the current operators, wherein the operation authorities are used as target operation authorities of the current operators, and each operation authority has a corresponding identity tag;
And controlling the power system to unlock the functional module corresponding to the target operation authority to the current operator according to the target operation authority, so that the current operator can execute corresponding control operation on the functional module.
A second aspect of the present invention discloses a voice recognition processing device for smart power applications, the device comprising:
the acquisition module is used for acquiring a first voice signal of a current operator of the power system;
The processing module is used for processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;
the extraction module is used for carrying out feature extraction on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;
And the identification module is used for identifying the voiceprint characteristic information based on a preset deep neural network algorithm so as to finish the identity verification of the current operator.
In a second aspect of the present invention, the extracting module performs feature extraction on the second voice signal, and the manner of obtaining voiceprint feature information corresponding to the second voice signal specifically includes:
Analyzing the second voice signal to obtain a signal analysis result;
screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In a second aspect of the present invention, the signal analysis result includes at least the number of users corresponding to the second voice signal, or
The signal analysis result comprises the number of users corresponding to the second voice signal and also comprises the speaking time period of each user;
And the extraction module screens out a target feature extraction mode meeting the preset feature extraction conditions from a plurality of preset feature extraction modes according to the signal analysis result, wherein the method specifically comprises the following steps:
Judging whether the number of users is smaller than or equal to a first preset number according to the number of users contained in the signal analysis result;
When the number of users is smaller than or equal to the first preset number, determining a first feature extraction mode based on GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
Comparing the speaking time periods of all the users contained in the signal analysis result when the number of the users is larger than the first preset number, so as to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
When the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on a deep neural network as a target feature extraction mode meeting a preset feature extraction condition;
when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting a preset characteristic extraction condition;
Each of the feature extraction modes includes one of the first feature extraction mode, the second feature extraction mode and the third feature extraction mode.
In a second aspect of the present invention, the method for determining whether the current condition of the second speech signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all the users includes:
Judging whether the number of users is larger than or equal to a second preset number according to the number of users, and recording the number of users as a first condition;
Judging whether the time length of the speaking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the speaking overlapping time periods of the users, and recording the time length as a second condition;
When the first condition and/or the second condition are/is met, determining that the current condition of the second voice signal meets a preset user scene condition; and when the first condition and the second condition are not met, determining that the current condition of the second voice signal does not meet the preset user scene condition.
In a second aspect of the present invention, the identifying module identifies the voiceprint feature information based on a preset deep neural network algorithm, so as to complete the authentication of the current operator specifically includes:
based on a voice recognition model trained by a preset deep neural network algorithm, recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and carrying out identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators.
In a second aspect of the present invention, as an alternative implementation manner, the recognition module is configured to determine, based on probability scores of the voiceprint feature information among all the initial operators, the method for carrying out identity verification on the current operator specifically comprises the following steps:
Comparing the probability scores of the voiceprint feature information in all the initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
When the probability score of the target operator is larger than or equal to the preset score, determining that the identity verification of the current operator is passed;
And when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
As an alternative embodiment, in the second aspect of the present invention, the apparatus further includes:
The output module is used for outputting prompt information of failure verification to the current operator when the verification fails after the identification module performs identity verification on the current operator according to probability scores of the voiceprint characteristic information in all the initial operators;
the determining module is used for determining the identity label of the target operator as the identity label of the current operator when the verification is passed;
The determining module is further configured to determine, according to the identity tag of the current operator, an operation right matched with the identity tag from operation rights of all initial operators preset in the electric power system, where each operation right has a corresponding identity tag as a target operation right of the current operator;
And the control module is used for controlling the power system to unlock the functional module corresponding to the target operation authority to the current operator according to the target operation authority so that the current operator can execute corresponding control operation on the functional module.
A third aspect of the present invention discloses another voice recognition processing device for smart power applications, the device comprising:
a memory storing executable program code;
a processor coupled to the memory;
The processor invokes the executable program code stored in the memory to execute the voice recognition processing method for intelligent power application disclosed in the first aspect of the present invention.
A fourth aspect of the present invention discloses a computer storage medium storing computer instructions for performing the voice recognition processing method for smart power application disclosed in the first aspect of the present invention when the computer instructions are called.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
According to the embodiment of the invention, a first voice signal of a current operator of the power system is obtained, the first voice signal is processed based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal, the quality of the second voice signal is higher than that of the first voice signal, feature extraction is performed on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal, and the voiceprint feature information is identified based on a preset deep neural network algorithm to finish identity verification of the current operator. Therefore, the voice signal quality can be improved, the extraction accuracy of the voiceprint features can be improved, the voiceprint recognition accuracy can be improved, the identity verification accuracy of the current operator of the power system can be improved, and the operation safety of the power system can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a voice recognition processing method applied to intelligent power application according to an embodiment of the invention;
FIG. 2 is a flow chart of another voice recognition processing method applied to intelligent power application according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a voice recognition processing device for smart power application according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another voice recognition processing device for smart power applications according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of a voice recognition processing device applied to smart power application according to another embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The invention discloses a voice recognition processing method and device applied to intelligent power application, which can improve the quality of voice signals so as to improve the extraction accuracy of voiceprint features, and is beneficial to improving the accuracy of voiceprint recognition so as to improve the accuracy of identity verification of current operators of a power system and further improve the safety of the operation of the power system. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a voice recognition processing method applied to intelligent power application according to an embodiment of the invention. The voice recognition processing method applied to the smart power application described in fig. 1 may be applied to a voice recognition processing device of the smart power application, where the device may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and embodiments of the present invention are not limited. As shown in fig. 1, the voice recognition processing method applied to the smart power application may include the following operations:
101. A first voice signal of a current operator of the power system is obtained.
In the embodiment of the invention, specifically, when a current operator triggers an identity verification instruction for a power system, a prompt message of voiceprint authentication is output to the current operator through the power system so as to trigger the current operator to carry out voiceprint authentication, and the power system receives a voice signal sent by the current operator and is used as a first voice signal of the current operator.
102. And processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal.
In the embodiment of the present invention, the quality of the second speech signal is higher than that of the first speech signal, and optionally, the preset speech signal processing algorithm may be an audio enhancement algorithm of machine learning or Digital Signal Processing (DSP) technology, so as to enhance a low-quality audio signal (i.e., the first speech signal) to improve the recognition capability of the audio signal, or may be an algorithm of a deep learning model (such as GAN), so that, based on the low-quality audio signal, the algorithm of the deep learning model is used to synthesize a high-quality audio signal (i.e., the second speech signal), so as to facilitate subsequent voiceprint recognition training.
Specifically, after acquiring the first voice signal of the current operator of the power system, the method may further include:
analyzing the voice content of the first voice signal and judging whether the voice content is matched with verification content preset in the power system or not;
When the voice content is not matched with the verification content preset in the power system, outputting prompt information of voice content errors to current operators, and carrying out abnormal marking on the voice content in the power system;
When the voice content is judged to be matched with the verification content preset in the power system, triggering and executing the operation of step 102 based on the preset voice signal processing algorithm to process the first voice signal and obtain a second voice signal corresponding to the first voice signal. When the voice content is the same as the verification content or the content similarity between the voice content and the verification content is more than or equal to a preset similarity (such as 90%), the voice content is matched with the verification content; and when the content similarity between the voice content and the verification content is smaller than the preset similarity, the voice content is not matched with the verification content. Therefore, when the voiceprint of the current operator is verified, the voice content sent by the current operator is verified to be matched with the content required by the preset verification, the subsequent voiceprint feature verification operation is triggered and executed when the voice content is matched with the content required by the preset verification, and the identity verification accuracy and reliability of the operator are improved by setting the double verification flow of the voice content verification and the voiceprint feature verification, so that the use safety of the electric power system is improved.
103. And extracting the characteristics of the second voice signal to obtain voiceprint characteristic information corresponding to the second voice signal.
In the embodiment of the invention, specifically, based on the determined target feature extraction mode, feature extraction is performed on the second voice signal, so as to obtain voiceprint feature information corresponding to the second voice signal. The target feature extraction mode may be obtained by screening out a plurality of preset feature extraction modes according to a signal analysis result of the second voice signal, wherein the preset feature extraction conditions are satisfied.
104. And identifying voiceprint characteristic information based on a preset deep neural network algorithm to finish the identity verification of the current operator.
It can be seen that, implementing the voice recognition processing method applied to intelligent power application described in fig. 1 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system.
In an optional embodiment, the extracting the features of the second voice signal in step 103 to obtain voiceprint feature information corresponding to the second voice signal may include:
Analyzing the second voice signal to obtain a signal analysis result;
Screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to a signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
In the embodiment of the invention, the signal analysis result at least comprises the number of users corresponding to the second voice signal, or the signal analysis result comprises the number of users corresponding to the second voice signal, and further comprises the speaking time period of each user, wherein the users at least comprise the current operator. Specifically, when the power system receives the voice signal sent by the current operator, the actually received voice signal source at least includes the current operator, and may also include other people in the scene where the current operator is located.
Optionally, each feature extraction mode may include one of a first feature extraction mode based on GMM-UBM (where GMM represents a gaussian mixture model, UBM represents a generic background model, and GMM-UBM represents a model that combines the gaussian mixture model with the generic background model), a second feature extraction mode based on a deep neural network (such as DNN, CNN, RNN, etc.), and a third feature extraction mode based on i-vector (where i-vector is a technology for speaker verification and is mainly used for extracting feature vectors of a speaker), which is not limited by the embodiment of the present invention.
Therefore, according to the alternative embodiment, the second voice signal can be analyzed to obtain a signal analysis result, the target feature extraction mode meeting the preset feature extraction conditions is screened out from the preset feature extraction modes according to the signal analysis result, the screening accuracy and efficiency of the target feature extraction mode aiming at the second voice signal can be improved, then the second voice signal is subjected to feature extraction based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal, the extraction accuracy and efficiency of the voiceprint feature information can be improved based on the accurately screened target feature extraction mode, and the extraction diversity and flexibility of the voiceprint feature of the second voice signal can be improved by providing the feature extraction mode with diversified selection.
In this optional embodiment, as an optional implementation manner, according to a signal analysis result, selecting, from a plurality of preset feature extraction manners, a target feature extraction manner that satisfies a preset feature extraction condition may include:
judging whether the number of users is less than or equal to a first preset number according to the number of users contained in the signal analysis result;
when the number of users is smaller than or equal to a first preset number, determining a first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
When the number of users is judged to be larger than the first preset number, comparing the speaking time periods of all the users contained in the signal analysis result to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
when the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on the deep neural network as a target feature extraction mode meeting the preset feature extraction condition;
And when judging that the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting the preset characteristic extraction condition.
In the embodiment of the invention, optionally, the preset feature extraction condition is used for indicating that the environment where the speaker corresponding to the corresponding voice signal is located is in the corresponding speaking environment (such as a small number of people, a multi-person conversation, etc.), the user scene condition is used for indicating that the scene complexity of the environment where the speaker corresponding to the corresponding voice signal is located is in the preset scene complexity interval, and the speaking overlapping time period of all users is used as the determination basis of the scene complexity of the environment where the speaker corresponding to the corresponding voice signal is located. Specifically, when the time length of the speaking overlapping period of a certain user is zero, it indicates that the user does not speak with other users at the same time, and when the time length of the speaking overlapping period of a certain user is greater than zero, it indicates that the user speaks with other users at the same time, which is not limited by the embodiment of the present invention.
For example, when the number of users corresponding to the second voice signal is small, the first feature extraction mode based on the GMM-UBM can be directly selected as the target feature extraction mode of the second voice signal, so that the extracted voiceprint features can be well represented and have better interpretability by selecting the mode based on the GMM-UBM when the data is small, when the number of users corresponding to the second voice signal is large, the scene complexity of the environment where the speaker is located can be further analyzed, and therefore the second feature extraction mode based on the deep neural network or the third feature extraction mode based on the i-vector can be determined to be selected as the target feature extraction mode of the second voice signal, and therefore the recognition accuracy of the speaker can be improved by selecting the second feature extraction mode or the third feature extraction mode when the data is large, and the recognition accuracy and the reliability of the current operator can be improved.
It can be seen that the alternative implementation manner can judge whether the number of users is smaller than or equal to the first preset number according to the number of users contained in the signal analysis result, when the number of users is smaller than or equal to the first preset number, determine the first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting the preset feature extraction condition, thereby improving the accuracy and efficiency of the determination of the first feature extraction mode based on the GMM-UBM as the target feature extraction mode, being beneficial to improving the extraction efficiency and speed of voiceprint features of the second voice signal through the first feature extraction mode, when the number of users is larger than the first preset number, comparing the speaking time periods of all users contained in the signal analysis result to obtain the speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero, and judging whether the current condition of the second voice signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all users, being beneficial to improving the accuracy of the judgment of whether the second voice signal meets the user scene condition and the second voice signal by taking the second voice signal as the target feature extraction mode under the condition of more users, and improving the accuracy of the second voice signal depth of the neural network when the current condition meets the second voice signal extraction condition, thereby improving the accuracy of the target voice network, and determining the third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting preset characteristic extraction conditions, so that the accuracy and the reliability of determining the third characteristic extraction mode based on the i-vector as the target characteristic extraction mode are improved, and the speaker can be accurately distinguished.
In this optional embodiment, optionally, according to the number of users and the speaking overlapping time period of all the users, determining whether the current condition of the second speech signal meets the preset user scene condition may include:
Judging whether the number of the users is larger than or equal to the second preset number according to the number of the users, and marking the number as a first condition;
Judging whether the time length of the talking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the talking overlapping time periods of all the users, and recording the talking overlapping time periods as a second condition;
When the first condition and/or the second condition are/is satisfied, it is determined that the current condition of the second voice signal satisfies the preset user scene condition, and when neither the first condition nor the second condition is satisfied, it is determined that the current condition of the second voice signal does not satisfy the preset user scene condition.
In the embodiment of the invention, specifically, when the number of users is larger than or equal to the second preset number, and/or when the time length of the talking overlapping time period of the users with the number of at least the third preset number is larger than zero in all the users, the scene complexity of the environment where the speaker corresponding to the second voice signal is located is shown to be in the preset complexity interval, at this time, the second feature extraction mode which can effectively improve the recognition precision on the big data set can be selected to perform the voiceprint extraction, and when the number of users is smaller than the second preset number, and the time length of the talking overlapping time period of the users with the number of less than the third preset number is larger than zero, the scene complexity of the environment where the speaker corresponding to the second voice signal is shown to be not in the preset complexity interval, at this time, the third feature extraction mode which has relatively lower calculation complexity and is suitable for facing the environment of multiple speakers can be selected to perform the voiceprint extraction.
For example, assuming that the first preset number is 1, the second preset number is 5, and the third preset number is 3, when it is determined that the number of users of the speaker corresponding to the second voice signal is greater than or equal to 5, and/or when at least 3 users have the situation of speaking simultaneously with other users, it may be determined that the current condition of the second voice signal meets the preset user scene condition, and when it is determined that the number of users of the speaker corresponding to the second voice signal is less than 5, and when less than 3 users have the situation of speaking simultaneously with other users, it may be determined that the current condition of the second voice signal does not meet the preset user scene condition.
It can be seen that the optional implementation manner can also judge whether the number of users is greater than or equal to the second preset number according to the number of users, record as the first condition, judge whether the number of users is at least greater than zero in the speaking overlapping time period of the third preset number according to the speaking overlapping time period of all users, record as the second condition, when the first condition and/or the second condition are met, confirm that the current condition of the second voice signal meets the preset user scene condition, can improve the determination accuracy, diversity and flexibility that the current condition of the second voice signal meets the preset user scene condition by providing a selected diversified scene judging mode, and when the first condition and the second condition are not met, confirm that the current condition of the second voice signal does not meet the preset user scene condition, can improve the determination accuracy and reliability that the current condition of the second voice signal does not meet the preset user scene condition by the multiple judging mode of the scene where the user is located, thereby being beneficial to improving the determination accuracy and reliability of the subsequent feature extraction mode based on the accurate judging result.
Example two
Referring to fig. 2, fig. 2 is a flow chart of a voice recognition processing method applied to intelligent power application according to an embodiment of the invention. The voice recognition processing method applied to the smart power application described in fig. 2 may be applied to a voice recognition processing device of the smart power application, where the device may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and embodiments of the present invention are not limited. As shown in fig. 2, the voice recognition processing method applied to the smart power application may include the following operations:
201. A first voice signal of a current operator of the power system is obtained.
202. And processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal.
203. And extracting the characteristics of the second voice signal to obtain voiceprint characteristic information corresponding to the second voice signal.
In the embodiment of the present invention, for other descriptions of step 201 to step 203, please refer to the detailed descriptions of step 101 to step 103 in the first embodiment, and the description of the embodiment of the present invention is omitted.
204. And identifying voiceprint feature information input into the voice identification model based on a voice identification model trained by a preset deep neural network algorithm to obtain probability scores of the voiceprint feature information in different initial operators.
In the embodiment of the invention, the voice recognition model is obtained by training voice sample information of all initial operators through a deep neural network algorithm. The method comprises the steps of carrying out feature extraction on voice sample information of each initial operator to obtain voice print feature sample information of each initial operator, inputting the voice print feature sample information of all the initial operators into a pre-built model based on a preset depth neural network algorithm to carry out training to obtain a training result, evaluating the recognition accuracy of the model according to the training result, judging whether the recognition accuracy of the model reaches the preset recognition accuracy, determining the model as a voice recognition model trained based on the preset depth neural network algorithm when the recognition accuracy is judged to be reached, and re-executing model training operation when the recognition accuracy is judged not to be reached, until the recognition accuracy of the trained model reaches the preset recognition accuracy.
205. And according to probability scores of voiceprint characteristic information in all initial operators, carrying out identity verification on the current operator.
In the embodiment of the invention, the initial operator with the highest probability score in the probability scores of all the initial operators is optionally determined as the current operator, so that the identity verification of the current operator is finished, the identity verification efficiency and speed of the current operator are improved, or the initial operator with the highest probability score is selected from all the initial operators to be used as the target operator of the suspected current operator, and whether the current operator is the target operator is further judged according to the probability score of the target operator, so that the identity verification of the current operator is finished, and the identity verification accuracy and reliability of the current operator are improved.
It can be seen that, implementing the voice recognition processing method applied to intelligent power application described in fig. 2 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system. In addition, the voice recognition model trained by the preset depth neural network algorithm can be further used for recognizing the voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, accurate recognition of the voice print characteristic information can be achieved based on the voice recognition model, accuracy and efficiency of the probability scores of the voice print characteristic information in the different initial operators can be improved, then identity verification can be conducted on the current operator according to the probability scores of the voice print characteristic information in all the initial operators, identity verification on the current operator can be achieved accurately based on the probability scores of accurate recognition, and accuracy and efficiency of the identity verification on the current operator can be improved.
In an alternative embodiment, the step 205 of authenticating the current operator according to the probability scores of the voiceprint feature information among all the initial operators may include:
Comparing probability scores of the voiceprint feature information in all initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
when the probability score of the target operator is larger than or equal to a preset score, determining that the identity verification of the current operator is passed;
and when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
In the embodiment of the present invention, optionally, the probability score of the voiceprint feature information in each initial operator may be in a percentage form, and the preset score may be 70%, or 80%, or any other preset value.
Therefore, the optional embodiment can compare probability scores of voiceprint feature information in all initial operators to obtain a target operator with highest probability score, judge whether the probability score of the target operator is larger than or equal to a preset score, determine that the identity verification of the current operator passes when judging that the probability score of the target operator is larger than or equal to the preset score, improve the determination accuracy and reliability of the identity verification of the current operator, and determine that the identity verification of the current operator does not pass when judging that the probability score of the target operator is smaller than the preset score, and improve the determination accuracy and reliability of the identity verification of the current operator.
In this optional embodiment, as an optional implementation manner, after the identity verification of the current operator according to the probability scores of the voiceprint feature information among all the initial operators, the method may further include:
When the verification fails, outputting prompt information of failure in verification to the current operator;
when the verification is passed, determining the identity label of the target operator as the identity label of the current operator;
according to the identity tag of the current operator, determining the operation authority matched with the identity tag from the operation authorities of all initial operators preset in the power system, wherein each operation authority is used as the target operation authority of the current operator and is provided with a corresponding identity tag;
according to the target operation authority, the power system is controlled to unlock the functional module corresponding to the target operation authority to the current operator, so that the current operator can execute corresponding control operation on the functional module.
In the embodiment of the invention, the operation authorities of a plurality of initial operators are preset in the electric power system, each initial operator corresponds to one identity tag, and the identity tag is associated with the operation authority of the corresponding initial operator, so that the corresponding operation authority of the initial operator can be directly determined through the identity tag of the initial operator. The power system may be provided with a plurality of functional modules for operator control, such as at least one of a monitoring and data acquisition module, a scheduling and optimization module, a fault detection and protection module, a device management and maintenance module, a load management and demand response module, and a communication and data management module.
Therefore, the optional implementation manner can directly output the prompt information of failure in verification to the current operator under the condition that the identity verification of the current operator is not passed, thereby being beneficial to prompting the current operator to carry out verification again and facilitating the propulsion of the re-verification, under the condition that the identity verification of the current operator is passed, the identity tag of the target operator is determined as the identity tag of the current operator, the operation authority matched with the identity tag is determined from the operation authorities of all initial operators preset in the electric power system according to the identity tag of the current operator, the determination accuracy of the identity tag of the current operator can be improved as the target operation authority of the current operator, and then the electric power system is controlled to unlock the corresponding functional module of the target operation authority according to the target operation authority, so that the current operator can execute corresponding control operation on the functional module, the corresponding operation authority of the current operator can be accurately granted to the current operator, the corresponding functional module can be unlocked for the current operator, the electric power system can be favorably unlocked by the current operator, and the safety control system can be favorably executed by the current operator.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a voice recognition processing device for smart power application according to an embodiment of the present invention. The voice recognition processing device applied to the smart power application described in fig. 3 may include a recognition processing device or a recognition processing server, where the recognition processing server may include a cloud server or a local server, and the embodiment of the invention is not limited. As shown in fig. 3, the voice recognition processing device applied to the smart power application may include:
the acquiring module 301 is configured to acquire a first voice signal of a current operator of the power system.
The processing module 302 is configured to process the first voice signal based on a preset voice signal processing algorithm, so as to obtain a second voice signal corresponding to the first voice signal.
The extracting module 303 is configured to perform feature extraction on the second voice signal, so as to obtain voiceprint feature information corresponding to the second voice signal.
The recognition module 304 is configured to recognize voiceprint feature information based on a preset deep neural network algorithm, so as to complete authentication of a current operator.
It can be seen that, implementing the voice recognition processing device applied to intelligent power application described in fig. 3 can obtain the first voice signal of the current operator of the power system, and based on the preset voice signal processing algorithm, process the first voice signal to obtain the second voice signal corresponding to the first voice signal, then perform feature extraction on the second voice signal to obtain the voiceprint feature information corresponding to the second voice signal, and process the first voice signal to obtain the second voice signal with higher quality, so as to improve the quality and accuracy of the voice signal, thereby improving the extraction accuracy of the voiceprint feature information, and then identify the voiceprint feature information based on the preset depth neural network algorithm, so as to complete the identity verification of the current operator, and facilitate improving the identification accuracy of the voiceprint feature information, thereby facilitating the improvement of the identity verification accuracy and safety of the current operator of the power system, facilitating the realization of accurate identification of the user identity, further facilitating the improvement of the safety and effectiveness of the power system operation, and facilitating the improvement of the safety protection level of the power system, and also facilitating the assurance of stable and safe operation of the power system.
In an alternative embodiment, the extracting module 303 performs feature extraction on the second voice signal, and the manner of obtaining voiceprint feature information corresponding to the second voice signal may specifically include:
Analyzing the second voice signal to obtain a signal analysis result;
Screening target feature extraction modes meeting preset feature extraction conditions from a plurality of preset feature extraction modes according to a signal analysis result;
And carrying out feature extraction on the second voice signal based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal.
Therefore, according to the alternative embodiment, the second voice signal can be analyzed to obtain a signal analysis result, the target feature extraction mode meeting the preset feature extraction conditions is screened out from the preset feature extraction modes according to the signal analysis result, the screening accuracy and efficiency of the target feature extraction mode aiming at the second voice signal can be improved, then the second voice signal is subjected to feature extraction based on the target feature extraction mode to obtain voiceprint feature information corresponding to the second voice signal, the extraction accuracy and efficiency of the voiceprint feature information can be improved based on the accurately screened target feature extraction mode, and the extraction diversity and flexibility of the voiceprint feature of the second voice signal can be improved by providing the feature extraction mode with diversified selection.
In this alternative embodiment, as an alternative implementation manner, the signal analysis result includes at least the number of users corresponding to the second voice signal, or the signal analysis result includes the number of users corresponding to the second voice signal, and further includes a speaking period of each user. And, the extracting module 303 may specifically select, according to the signal analysis result, a target feature extraction mode that meets the preset feature extraction condition from the preset feature extraction modes, where the method includes:
judging whether the number of users is less than or equal to a first preset number according to the number of users contained in the signal analysis result;
when the number of users is smaller than or equal to a first preset number, determining a first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting preset feature extraction conditions;
When the number of users is judged to be larger than the first preset number, comparing the speaking time periods of all the users contained in the signal analysis result to obtain speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero;
when the current condition of the second voice signal meets the user scene condition, determining a second feature extraction mode based on the deep neural network as a target feature extraction mode meeting the preset feature extraction condition;
when the current condition of the second voice signal meets the user scene condition, determining a third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting the preset characteristic extraction condition;
Each feature extraction mode comprises one of a first feature extraction mode, a second feature extraction mode and a third feature extraction mode.
It can be seen that the alternative implementation manner can judge whether the number of users is smaller than or equal to the first preset number according to the number of users contained in the signal analysis result, when the number of users is smaller than or equal to the first preset number, determine the first feature extraction mode based on the GMM-UBM as a target feature extraction mode meeting the preset feature extraction condition, thereby improving the accuracy and efficiency of the determination of the first feature extraction mode based on the GMM-UBM as the target feature extraction mode, being beneficial to improving the extraction efficiency and speed of voiceprint features of the second voice signal through the first feature extraction mode, when the number of users is larger than the first preset number, comparing the speaking time periods of all users contained in the signal analysis result to obtain the speaking overlapping time periods of each user, wherein the time length of the speaking overlapping time periods is larger than or equal to zero, and judging whether the current condition of the second voice signal meets the preset user scene condition according to the number of users and the speaking overlapping time periods of all users, being beneficial to improving the accuracy of the judgment of whether the second voice signal meets the user scene condition and the second voice signal by taking the second voice signal as the target feature extraction mode under the condition of more users, and improving the accuracy of the second voice signal depth of the neural network when the current condition meets the second voice signal extraction condition, thereby improving the accuracy of the target voice network, and determining the third characteristic extraction mode based on the i-vector as a target characteristic extraction mode meeting preset characteristic extraction conditions, so that the accuracy and the reliability of determining the third characteristic extraction mode based on the i-vector as the target characteristic extraction mode are improved, and the speaker can be accurately distinguished.
In this optional embodiment, optionally, the method for determining, by the extraction module 303, whether the current condition of the second speech signal meets the preset user scene condition according to the number of users and the speaking overlapping time period of all the users may specifically include:
Judging whether the number of the users is larger than or equal to the second preset number according to the number of the users, and marking the number as a first condition;
Judging whether the time length of the talking overlapping time periods of the users with the number of at least a third preset number is larger than zero in all the users according to the talking overlapping time periods of all the users, and recording the talking overlapping time periods as a second condition;
When the first condition and/or the second condition are/is satisfied, it is determined that the current condition of the second voice signal satisfies the preset user scene condition, and when neither the first condition nor the second condition is satisfied, it is determined that the current condition of the second voice signal does not satisfy the preset user scene condition.
It can be seen that the optional implementation manner can also judge whether the number of users is greater than or equal to the second preset number according to the number of users, record as the first condition, judge whether the number of users is at least greater than zero in the speaking overlapping time period of the third preset number according to the speaking overlapping time period of all users, record as the second condition, when the first condition and/or the second condition are met, confirm that the current condition of the second voice signal meets the preset user scene condition, can improve the determination accuracy, diversity and flexibility that the current condition of the second voice signal meets the preset user scene condition by providing a selected diversified scene judging mode, and when the first condition and the second condition are not met, confirm that the current condition of the second voice signal does not meet the preset user scene condition, can improve the determination accuracy and reliability that the current condition of the second voice signal does not meet the preset user scene condition by the multiple judging mode of the scene where the user is located, thereby being beneficial to improving the determination accuracy and reliability of the subsequent feature extraction mode based on the accurate judging result.
In another alternative embodiment, the identifying module 304 identifies voiceprint feature information based on a preset deep neural network algorithm, so as to complete the authentication of the current operator specifically may include:
Based on a voice recognition model trained by a preset deep neural network algorithm, identifying voice print characteristic information input into the voice recognition model to obtain probability scores of the voice print characteristic information in different initial operators, wherein the voice recognition model is obtained by training voice sample information of all the initial operators through the deep neural network algorithm;
and according to probability scores of voiceprint characteristic information in all initial operators, carrying out identity verification on the current operator.
Therefore, the optional embodiment can identify the voiceprint feature information input into the voiceprint recognition model based on the voiceprint recognition model trained by the preset depth neural network algorithm to obtain the probability scores of the voiceprint feature information in different initial operators, can realize the accurate identification of the voiceprint feature information based on the voiceprint recognition model, is beneficial to improving the accuracy and efficiency of the probability scores of the voiceprint feature information in different initial operators obtained by identification, and then performs identity verification on the current operator according to the probability scores of the voiceprint feature information in all the initial operators, can accurately realize the identity verification on the current operator based on the probability scores of accurate identification, and is beneficial to improving the identity verification accuracy and efficiency of the current operator.
In this optional embodiment, as an optional implementation manner, the identifying module 304 may specifically perform the authentication on the current operator according to the probability scores of the voiceprint feature information among all initial operators, where the method includes:
Comparing probability scores of the voiceprint feature information in all initial operators to obtain a target operator with the highest probability score, and judging whether the probability score of the target operator is greater than or equal to a preset score;
when the probability score of the target operator is larger than or equal to a preset score, determining that the identity verification of the current operator is passed;
and when the probability score of the target operator is smaller than the preset score, determining that the identity verification of the current operator is not passed.
Therefore, the optional implementation manner can compare probability scores of voiceprint feature information in all initial operators to obtain a target operator with highest probability score, judge whether the probability score of the target operator is larger than or equal to a preset score, determine that the identity verification of the current operator passes when judging that the probability score of the target operator is larger than or equal to the preset score, improve the determination accuracy and reliability of the identity verification of the current operator, and determine that the identity verification of the current operator does not pass when judging that the probability score of the target operator is smaller than the preset score, and improve the determination accuracy and reliability of the identity verification of the current operator.
In this alternative implementation, as shown in fig. 4, fig. 4 is a schematic structural diagram of another voice recognition processing device applied to smart power application according to an embodiment of the present invention, as shown in fig. 4, where the device may further include:
And the output module 305 is configured to output prompt information for failure in verification to the current operator when verification fails after the identification module 304 performs identity verification on the current operator according to probability scores of voiceprint feature information in all initial operators.
A determining module 306, configured to determine the identity tag of the target operator as the identity tag of the current operator when the verification is passed.
The determining module 306 is further configured to determine, according to the identity tag of the current operator, an operation right matching with the identity tag from operation rights of all initial operators preset in the power system, where each operation right has a corresponding identity tag as a target operation right of the current operator.
The control module 307 is configured to control the power system to unlock the function module corresponding to the target operation authority to the current operator according to the target operation authority, so that the current operator can perform a corresponding control operation on the function module.
It can be seen that the device described in fig. 4 can also directly output prompt information of failure in verification to the current operator under the condition that the identity verification of the current operator is not passed, so as to prompt the current operator to perform verification again, facilitate the re-verification to proceed, determine the identity tag of the target operator as the identity tag of the current operator under the condition that the identity verification of the current operator is passed, determine the operation right matched with the identity tag from the operation rights of all initial operators preset in the electric power system according to the identity tag of the current operator, as the target operation right of the current operator, improve the accuracy of determining the identity tag of the current operator, thereby improving the accuracy and efficiency of determining the target operation right of the current operator, and then control the electric power system to unlock the functional module corresponding to the target operation right according to the target operation right, so that the current operator can execute corresponding control operation on the functional module, accurately grant the current operator corresponding operation right, and unlock the corresponding functional module corresponding to the current operator.
Example IV
Referring to fig. 5, fig. 5 is a schematic structural diagram of a voice recognition processing device for smart power application according to another embodiment of the present invention. As shown in fig. 5, the voice recognition processing device applied to the smart power application may include:
A memory 401 storing executable program codes;
A processor 402 coupled with the memory 401;
The processor 402 invokes executable program codes stored in the memory 401 to perform the steps of the voice recognition processing method for smart power application described in the first or second embodiment of the present invention.
Example five
The embodiment of the invention discloses a computer storage medium which stores computer instructions for executing the steps in the voice recognition processing method applied to intelligent power application described in the first or second embodiment of the invention when the computer instructions are called.
Example six
An embodiment of the present invention discloses a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps of the voice recognition processing method for application to smart power applications described in the first or second embodiment.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Finally, it should be noted that the disclosure of the voice recognition processing method and apparatus for intelligent power application in the embodiments of the present invention is only a preferred embodiment of the present invention, and is only for illustrating the technical scheme of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical scheme described in the foregoing embodiments may be modified or some of the technical features thereof may be replaced equally, and these modifications or replacements do not make the essence of the corresponding technical scheme deviate from the spirit and scope of the technical scheme of the embodiments of the present invention.

Claims (10)

Translated fromChinese
1.一种应用于智慧电力应用的声音识别处理方法,其特征在于,所述方法包括:1. A sound recognition processing method for smart power applications, characterized in that the method comprises:获取电力系统的当前操作人员的第一语音信号;Acquire a first voice signal of a current operator of the power system;基于预设语音信号处理算法,对所述第一语音信号进行处理,得到所述第一语音信号对应的第二语音信号;Processing the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;对所述第二语音信号进行特征提取,得到所述第二语音信号对应的声纹特征信息;performing feature extraction on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;基于预设深度神经网络算法,对所述声纹特征信息进行识别,以完成对所述当前操作人员的身份验证。Based on a preset deep neural network algorithm, the voiceprint feature information is identified to complete the identity verification of the current operator.2.根据权利要求1所述的应用于智慧电力应用的声音识别处理方法,其特征在于,所述对所述第二语音信号进行特征提取,得到所述第二语音信号对应的声纹特征信息,包括:2. The voice recognition processing method for smart power applications according to claim 1, wherein the feature extraction of the second voice signal to obtain voiceprint feature information corresponding to the second voice signal comprises:分析所述第二语音信号,得到信号分析结果;analyzing the second speech signal to obtain a signal analysis result;根据所述信号分析结果,从预设的多个特征提取方式中筛选出满足预设特征提取条件的目标特征提取方式;According to the signal analysis result, a target feature extraction method that meets the preset feature extraction conditions is selected from a plurality of preset feature extraction methods;基于所述目标特征提取方式,对所述第二语音信号进行特征提取,得到所述第二语音信号对应的声纹特征信息。Based on the target feature extraction method, feature extraction is performed on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal.3.根据权利要求2所述的应用于智慧电力应用的声音识别处理方法,其特征在于,所述信号分析结果至少包括所述第二语音信号对应的用户数量;或者,3. The voice recognition processing method for smart power applications according to claim 2, wherein the signal analysis result at least includes the number of users corresponding to the second voice signal; or所述信号分析结果包括所述第二语音信号对应的用户数量,还包括每一所述用户的说话时间段;The signal analysis result includes the number of users corresponding to the second voice signal and the speaking time period of each user;以及,所述根据所述信号分析结果,从预设的多个特征提取方式中筛选出满足预设特征提取条件的目标特征提取方式,包括:Furthermore, the method of selecting a target feature extraction method that meets preset feature extraction conditions from a plurality of preset feature extraction methods according to the signal analysis result includes:根据所述信号分析结果包含的用户数量,判断所述用户数量是否小于等于第一预设数量;Determining, based on the number of users included in the signal analysis result, whether the number of users is less than or equal to a first preset number;当判断出所述用户数量小于等于所述第一预设数量时,确定基于GMM-UBM的第一特征提取方式,作为满足预设特征提取条件的目标特征提取方式;When it is determined that the number of users is less than or equal to the first preset number, determining a first feature extraction method based on GMM-UBM as a target feature extraction method that meets the preset feature extraction condition;当判断出所述用户数量大于所述第一预设数量时,比对所述信号分析结果包含的所有所述用户的说话时间段,得到每一所述用户的说话重叠时间段,所述说话重叠时间段的时间长度大于等于零;并根据所述用户数量及所有所述用户的说话重叠时间段,判断所述第二语音信号的当前条件是否满足预设用户场景条件;When it is determined that the number of users is greater than the first preset number, comparing the speaking time periods of all the users included in the signal analysis result to obtain a speaking overlap time period for each user, where the length of the speaking overlap time period is greater than or equal to zero; and determining whether the current condition of the second voice signal meets the preset user scenario condition based on the number of users and the speaking overlap time periods of all the users;当判断出所述第二语音信号的当前条件满足所述用户场景条件时,确定基于深度神经网络的第二特征提取方式,作为满足预设特征提取条件的目标特征提取方式;When it is determined that the current condition of the second voice signal meets the user scenario condition, determining a second feature extraction method based on a deep neural network as a target feature extraction method that meets the preset feature extraction condition;当判断出所述第二语音信号的当前条件满足所述用户场景条件时,确定基于i-vector的第三特征提取方式,作为满足预设特征提取条件的目标特征提取方式;When it is determined that the current condition of the second voice signal meets the user scenario condition, determining a third feature extraction method based on i-vector as a target feature extraction method that meets the preset feature extraction condition;其中,每一所述特征提取方式包括所述第一特征提取方式、所述第二特征提取方式及所述第三特征提取方式的其中一种。Each of the feature extraction methods includes one of the first feature extraction method, the second feature extraction method, and the third feature extraction method.4.根据权利要求3所述的应用于智慧电力应用的声音识别处理方法,其特征在于,所述根据所述用户数量及所有所述用户的说话重叠时间段,判断所述第二语音信号的当前条件是否满足预设用户场景条件,包括:4. The voice recognition processing method for smart power applications according to claim 3, wherein determining whether the current condition of the second voice signal meets a preset user scenario condition based on the number of users and the overlapping time period of all users' speeches comprises:根据所述用户数量,判断所述用户数量是否大于等于第二预设数量,记为第一条件;According to the number of users, determining whether the number of users is greater than or equal to a second preset number, which is recorded as a first condition;根据所有所述用户的说话重叠时间段,判断所有所述用户中是否存在数量至少为第三预设数量的用户的说话重叠时间段的时间长度大于零,记为第二条件;所述第一预设数量小于第三预设数量,且所述第三预设数量小于所述第二预设数量;determining, based on the speech overlap time periods of all the users, whether at least a third preset number of users among all the users have speech overlap time periods whose lengths are greater than zero, which is recorded as a second condition; the first preset number is less than the third preset number, and the third preset number is less than the second preset number;其中,当所述第一条件和/或所述第二条件被满足时,确定所述第二语音信号的当前条件满足预设用户场景条件;当所述第一条件及所述第二条件均不被满足时,确定所述第二语音信号的当前条件不满足预设用户场景条件。Among them, when the first condition and/or the second condition is met, it is determined that the current condition of the second voice signal meets the preset user scenario condition; when both the first condition and the second condition are not met, it is determined that the current condition of the second voice signal does not meet the preset user scenario condition.5.根据权利要求1-4任一项所述的应用于智慧电力应用的声音识别处理方法,其特征在于,所述基于预设深度神经网络算法,对所述声纹特征信息进行识别,以完成对所述当前操作人员的身份验证,包括:5. The voice recognition processing method for smart power applications according to any one of claims 1 to 4, wherein the voiceprint feature information is recognized based on a preset deep neural network algorithm to complete the identity verification of the current operator, comprising:基于预设深度神经网络算法所训练出的声音识别模型,对输入至所述声音识别模型的所述声纹特征信息进行识别,得到所述声纹特征信息在不同初始操作人员中的概率得分,所述声音识别模型是通过深度神经网络算法对所有所述初始操作人员的语音样本信息进行训练得到的;Based on a voice recognition model trained by a preset deep neural network algorithm, the voiceprint feature information input to the voice recognition model is recognized to obtain probability scores of the voiceprint feature information for different initial operators. The voice recognition model is trained by a deep neural network algorithm on voice sample information of all the initial operators;根据所述声纹特征信息在所有所述初始操作人员中的概率得分,对所述当前操作人员进行身份验证。The current operator is authenticated according to the probability score of the voiceprint feature information among all the initial operators.6.根据权利要求5所述的应用于智慧电力应用的声音识别处理方法,其特征在于,所述根据所述声纹特征信息在所有所述初始操作人员中的概率得分,对所述当前操作人员进行身份验证,包括:6. The voice recognition processing method for smart power applications according to claim 5, wherein the authentication of the current operator based on the probability score of the voiceprint feature information among all the initial operators comprises:比对所述声纹特征信息在所有所述初始操作人员中的概率得分,得到所述概率得分最高的目标操作人员,并判断所述目标操作人员的概率得分是否大于等于预设得分;Comparing the probability scores of the voiceprint feature information among all the initial operators, obtaining the target operator with the highest probability score, and determining whether the probability score of the target operator is greater than or equal to a preset score;当判断出所述目标操作人员的概率得分大于等于所述预设得分时,确定所述当前操作人员的身份验证通过;When it is determined that the probability score of the target operator is greater than or equal to the preset score, determining that the identity verification of the current operator is passed;当判断出所述目标操作人员的概率得分小于所述预设得分时,确定所述当前操作人员的身份验证不通过。When it is determined that the probability score of the target operator is less than the preset score, it is determined that the identity verification of the current operator has failed.7.根据权利要求6所述的应用于智慧电力应用的声音识别处理方法,其特征在于,在所述根据所述声纹特征信息在所有所述初始操作人员中的概率得分,对所述当前操作人员进行身份验证之后,所述方法还包括:7. The voice recognition processing method for smart power applications according to claim 6, characterized in that after authenticating the current operator based on the probability score of the voiceprint feature information among all the initial operators, the method further comprises:当验证不通过时,向所述当前操作人员输出验证失败的提示信息;When the verification fails, a prompt message indicating the verification failure is output to the current operator;当验证通过时,将所述目标操作人员的身份标签确定为所述当前操作人员的身份标签;When the verification is successful, the identity tag of the target operator is determined as the identity tag of the current operator;根据所述当前操作人员的身份标签,从所述电力系统中预先设置的所有所述初始操作人员的操作权限中确定出与所述身份标签相匹配的操作权限,作为所述当前操作人员的目标操作权限,每一所述操作权限均存在对应的身份标签;According to the identity tag of the current operator, determining an operation permission that matches the identity tag from the operation permissions of all the initial operators pre-set in the power system as the target operation permission of the current operator, each of the operation permissions having a corresponding identity tag;根据所述目标操作权限,控制所述电力系统向所述当前操作人员解锁关于所述目标操作权限对应的功能模块,以使得所述当前操作人员能够对所述功能模块执行相应的管控操作。According to the target operation authority, the power system is controlled to unlock the function module corresponding to the target operation authority to the current operator, so that the current operator can perform corresponding management and control operations on the function module.8.一种应用于智慧电力应用的声音识别处理装置,其特征在于,所述装置包括:8. A voice recognition and processing device for smart power applications, characterized in that the device comprises:获取模块,用于获取电力系统的当前操作人员的第一语音信号;An acquisition module, configured to acquire a first voice signal of a current operator of the power system;处理模块,用于基于预设语音信号处理算法,对所述第一语音信号进行处理,得到所述第一语音信号对应的第二语音信号;a processing module, configured to process the first voice signal based on a preset voice signal processing algorithm to obtain a second voice signal corresponding to the first voice signal;提取模块,用于对所述第二语音信号进行特征提取,得到所述第二语音信号对应的声纹特征信息;an extraction module, configured to perform feature extraction on the second voice signal to obtain voiceprint feature information corresponding to the second voice signal;识别模块,用于基于预设深度神经网络算法,对所述声纹特征信息进行识别,以完成对所述当前操作人员的身份验证。The recognition module is used to identify the voiceprint feature information based on a preset deep neural network algorithm to complete the identity verification of the current operator.9.一种应用于智慧电力应用的声音识别处理装置,其特征在于,所述装置包括:9. A voice recognition and processing device for smart power applications, characterized in that the device comprises:存储有可执行程序代码的存储器;a memory storing executable program code;与所述存储器耦合的处理器;a processor coupled to the memory;所述处理器调用所述存储器中存储的所述可执行程序代码,执行如权利要求1-7任一项所述的应用于智慧电力应用的声音识别处理方法。The processor calls the executable program code stored in the memory to execute the sound recognition processing method for smart power applications according to any one of claims 1 to 7.10.一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机指令,所述计算机指令被调用时,用于执行如权利要求1-7任一项所述的应用于智慧电力应用的声音识别处理方法。10. A computer storage medium, characterized in that the computer storage medium stores computer instructions, and when the computer instructions are called, they are used to execute the sound recognition processing method for smart power applications according to any one of claims 1 to 7.
CN202510772303.3A2025-06-102025-06-10 Sound recognition processing method and device for smart power applicationsPendingCN120600034A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510772303.3ACN120600034A (en)2025-06-102025-06-10 Sound recognition processing method and device for smart power applications

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510772303.3ACN120600034A (en)2025-06-102025-06-10 Sound recognition processing method and device for smart power applications

Publications (1)

Publication NumberPublication Date
CN120600034Atrue CN120600034A (en)2025-09-05

Family

ID=96884894

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510772303.3APendingCN120600034A (en)2025-06-102025-06-10 Sound recognition processing method and device for smart power applications

Country Status (1)

CountryLink
CN (1)CN120600034A (en)

Similar Documents

PublicationPublication DateTitle
JP7109634B2 (en) Identity authentication method and device
US11468901B2 (en)End-to-end speaker recognition using deep neural network
US11646038B2 (en)Method and system for separating and authenticating speech of a speaker on an audio stream of speakers
Chen et al.Robust deep feature for spoofing detection-the SJTU system for ASVspoof 2015 challenge.
US10476872B2 (en)Joint speaker authentication and key phrase identification
WO2010047817A1 (en)Speaker verification methods and systems
WO2010047816A1 (en)Speaker verification methods and apparatus
WO2018129869A1 (en)Voiceprint verification method and apparatus
CN110379433A (en)Method, apparatus, computer equipment and the storage medium of authentication
KR20190085731A (en)Method for user authentication
KR20240132372A (en) Speaker Verification Using Multi-Task Speech Models
CN110931020A (en)Voice detection method and device
CN115346532B (en) Optimization method, terminal device and storage medium of voiceprint recognition system
CN120600034A (en) Sound recognition processing method and device for smart power applications
CN112530441A (en)Method and device for authenticating legal user, computer equipment and storage medium
Das et al.Comparison of DTW score and warping path for text dependent speaker verification system
CN113178196B (en)Audio data extraction method and device, computer equipment and storage medium
JP3828099B2 (en) Personal authentication system, personal authentication method, and personal authentication program
JP2000148187A (en) Speaker recognition method, apparatus using the method, and program recording medium therefor
Liu et al.Mtbv: Multi-trigger backdoor attacks on speaker verification
US20250252968A1 (en)Synthetic voice fraud detection
US10789960B2 (en)Method and system for user authentication by voice biometrics
Panda et al.Automatic Speaker Verification Under Spoofing Attack
CN119577727A (en) Identity authentication method and related equipment
CN120148490A (en) Method, device, computer equipment and medium for improving speech recognition accuracy

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp