Disclosure of Invention
The embodiment of the invention provides a data processing method and device, a storage medium and an electronic device, and aims to solve the problems that a speech recognition system for accents is too expensive to train in the speech recognition system and the like. According to an embodiment of the present invention, there is provided a data processing method including: inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data and the second voice test data.
In the embodiment of the present invention, third voice test data output by the first model for the first voice test data is obtained; and taking the third voice test data as the input of the first model, so that the specified content in the second voice test data output to the second model by the first model accounts for more than a preset threshold in the second voice test data.
In an embodiment of the present invention, after inputting the second speech test data into the second model to instruct the second model to train parameters of the second model according to the second speech test data, the method further includes: determining a second model corresponding to the trained parameters; recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and displaying the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In the embodiment of the invention, the first voice test data comprises test data corresponding to standard Mandarin voice, and the second voice test data comprises: non-standard mandarin corresponding test data.
According to another embodiment of the present invention, there is also provided a data processing apparatus including: the first input module is used for inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data; the first acquisition module is used for acquiring second voice test data output by the first model; a second input module, configured to input the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data and the second voice test data.
In an embodiment of the present invention, the apparatus further includes: the second obtaining module is used for obtaining third voice test data output by the first model aiming at the first voice test data; and the processing module is used for taking the third voice test data as the input of the first model so as to enable the proportion of the specified content in the second voice test data output to the second model by the first model to exceed a preset threshold in the second voice test data.
In an embodiment of the present invention, the apparatus further includes: the determining module is used for determining a second model corresponding to the trained parameters; the recognition module is used for recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and the display module is used for displaying the identification result.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when executed.
According to another embodiment of the present invention, there is also provided an electronic apparatus including a memory and a processor, wherein the memory stores therein a computer program, and the processor is configured to execute the computer program to perform the element processing method according to any one of the above.
According to the invention, the acquired first voice test data is input into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in having solved the correlation technique, among the training voice model process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced the cost of carrying out the mark to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Fig. 1 is a flow chart of an alternative data processing method according to an embodiment of the present invention, and as shown in fig. 1, the flow chart includes the following steps:
step S102, inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
step S104, acquiring second voice test data output by the first model;
step S106, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data, the second voice test data.
According to the invention, the acquired first voice test data is input into the first model, wherein the first model is used for converting the first voice test data into the second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in having solved the correlation technique, among the training voice model process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced the cost of carrying out the mark to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
In an embodiment of the present invention, before inputting the second speech test data into the second model, the method further includes: acquiring third voice test data output by the first model aiming at the first voice test data; and taking the third voice test data as the input of the first model, so that the specified content in the second voice test data output to the second model by the first model accounts for more than a preset threshold in the second voice test data.
The third voice test data (for example, the accented data) may be obtained by inputting the first voice test data (for example, the standard accent data) into the first model, and until the obtained third voice test data (for example, the accented data) exceeds a preset threshold (for example, 95% or more), the third voice test data at that time may be input into the second model as the second voice test data.
In an embodiment of the present invention, after inputting the second speech test data into the second model to instruct the second model to train parameters of the second model according to the second speech test data, the method further includes: determining a second model corresponding to the trained parameters; recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and displaying the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In the embodiment of the invention, the first voice test data comprises test data corresponding to standard Mandarin voice, and the second voice test data comprises: non-standard mandarin corresponding test data.
Optionally, inputting the acquired first voice test data into the first model, including: acquiring voice test data of a preset region as the first voice test data; and inputting the acquired first voice test data into the first model.
The preset region can be a region with a standard mandarin chinese comparison, such as beijing.
The following explains the above data processing procedure with an example, but is not intended to limit the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
fig. 2 is a flowchart of an alternative training method of a speech recognition system according to an embodiment of the present invention, as shown in fig. 2, the training method includes:
step 1, standard accent data and accent data are used for training a two-classification Neural Network, wherein the two-classification Neural Network can be a Deep Neural Network (DNN for short). Wherein the standard accent data corresponds to the first speech test data; the accent data corresponds to the second speech test data.
And 2, training a feature transformation network by using standard accent data, and taking the output of the network as the input of a two-classification neural network (such as DNN). Then, continuously and iteratively training the parameters of the feature conversion network, inputting standard accent data into the feature conversion network after iteration to obtain accent data, and stopping iteration until the probability of the accent data reaches more than 95% (namely the preset threshold). It should be noted that, the process only trains the parameters of the feature transformation network, and does not train the parameters of the binary neural network. The feature conversion network may be understood as a neural network, which may implement a function of converting the first voice test data into the second voice test data, that is, a function of converting the standard accent data into the accent data.
And 3, inputting the standard accent data into the trained feature conversion network, outputting the standard accent data as the features of the voice recognition system, and training the system. The voice recognition system is applied to the scene with the accent for recognition, and a recognition result is obtained.
By adopting the technical scheme, the method can avoid marking the accent data, strengthens the characteristics of the standard accent data to ensure that the characteristics of the accent data have high similarity with the accent data, trains by using the strengthened accent data, not only reduces the problem of excessively expensive marking of the accent data, but also improves the robustness of the voice recognition system to the accent data.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method according to the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a data processing apparatus is further provided, and the data processing apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of an alternative data processing apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
thefirst input module 30 is configured to input the acquired first voice test data into a first model, where the first model is configured to convert the first voice test data into second voice test data;
a first obtainingmodule 32, configured to obtain second voice test data output by the first model;
asecond input module 34, configured to input the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data and the second voice test data.
According to the invention, the acquired first voice test data is input into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in the correlation technique of having solved, the training voice model in-process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced and carried out the cost marked to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
In this embodiment of the present invention, fig. 4 is another structural block diagram of an optional data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus further includes:
a second obtainingmodule 36, configured to obtain third voice test data output by the first model for the first voice test data;
and theprocessing module 38 is configured to use the third voice test data as an input of the first model, so that the specified content in the second voice test data output by the first model to the second model exceeds a preset threshold in the second voice test data.
In the embodiment of the present invention, as shown in fig. 4, the apparatus further includes:
a determiningmodule 40, configured to determine a second model corresponding to the trained parameter;
therecognition module 42 is configured to recognize the voice information according to the second model corresponding to the trained parameter, so as to obtain a recognition result;
and adisplay module 44, configured to display the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
It should be noted that the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, inputting acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
s2, acquiring second voice test data output by the first model;
s3, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data and the second voice test data.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, inputting acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
s2, acquiring second voice test data output by the first model;
s3, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data, the second voice test data.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.