Movatterモバイル変換


[0]ホーム

URL:


CN110400560B - Data processing method and device, storage medium and electronic device - Google Patents

Data processing method and device, storage medium and electronic device
Download PDF

Info

Publication number
CN110400560B
CN110400560BCN201910673507.6ACN201910673507ACN110400560BCN 110400560 BCN110400560 BCN 110400560BCN 201910673507 ACN201910673507 ACN 201910673507ACN 110400560 BCN110400560 BCN 110400560B
Authority
CN
China
Prior art keywords
test data
model
voice
voice test
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910673507.6A
Other languages
Chinese (zh)
Other versions
CN110400560A (en
Inventor
郭欣
唐大闰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltdfiledCriticalBeijing Mininglamp Software System Co ltd
Priority to CN201910673507.6ApriorityCriticalpatent/CN110400560B/en
Publication of CN110400560ApublicationCriticalpatent/CN110400560A/en
Application grantedgrantedCritical
Publication of CN110400560BpublicationCriticalpatent/CN110400560B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a data processing method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data and the second voice test data.

Description

Data processing method and device, storage medium and electronic device
Technical Field
The present invention relates to the field of computers, and in particular, to a data processing method and apparatus, a storage medium, and an electronic apparatus.
Background
In the related art, the labeling cost of the speech recognition training data is high, but the collected data is relatively simple. Assume that there is a batch of labeled normal speech data, but a speech recognition system trained using the speech data does not recognize accented speech data at a high rate. However, it is relatively costly to acquire and label accented data and train a speech recognition system for the accents.
Aiming at the problems that in the related technology, in the process of training a voice model, the model cannot be effectively identified for two voice test data, and the like, an effective solution does not exist at present.
Disclosure of Invention
The embodiment of the invention provides a data processing method and device, a storage medium and an electronic device, and aims to solve the problems that a speech recognition system for accents is too expensive to train in the speech recognition system and the like. According to an embodiment of the present invention, there is provided a data processing method including: inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data and the second voice test data.
In the embodiment of the present invention, third voice test data output by the first model for the first voice test data is obtained; and taking the third voice test data as the input of the first model, so that the specified content in the second voice test data output to the second model by the first model accounts for more than a preset threshold in the second voice test data.
In an embodiment of the present invention, after inputting the second speech test data into the second model to instruct the second model to train parameters of the second model according to the second speech test data, the method further includes: determining a second model corresponding to the trained parameters; recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and displaying the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In the embodiment of the invention, the first voice test data comprises test data corresponding to standard Mandarin voice, and the second voice test data comprises: non-standard mandarin corresponding test data.
According to another embodiment of the present invention, there is also provided a data processing apparatus including: the first input module is used for inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data; the first acquisition module is used for acquiring second voice test data output by the first model; a second input module, configured to input the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data and the second voice test data.
In an embodiment of the present invention, the apparatus further includes: the second obtaining module is used for obtaining third voice test data output by the first model aiming at the first voice test data; and the processing module is used for taking the third voice test data as the input of the first model so as to enable the proportion of the specified content in the second voice test data output to the second model by the first model to exceed a preset threshold in the second voice test data.
In an embodiment of the present invention, the apparatus further includes: the determining module is used for determining a second model corresponding to the trained parameters; the recognition module is used for recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and the display module is used for displaying the identification result.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when executed.
According to another embodiment of the present invention, there is also provided an electronic apparatus including a memory and a processor, wherein the memory stores therein a computer program, and the processor is configured to execute the computer program to perform the element processing method according to any one of the above.
According to the invention, the acquired first voice test data is input into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in having solved the correlation technique, among the training voice model process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced the cost of carrying out the mark to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
FIG. 1 is a flow diagram of an alternative data processing method according to an embodiment of the invention;
FIG. 2 is a flow diagram of an alternative speech recognition system training method according to an embodiment of the present invention;
FIG. 3 is a block diagram of an alternative data processing apparatus according to an embodiment of the present invention;
fig. 4 is another block diagram of an alternative data processing apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Fig. 1 is a flow chart of an alternative data processing method according to an embodiment of the present invention, and as shown in fig. 1, the flow chart includes the following steps:
step S102, inputting the acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
step S104, acquiring second voice test data output by the first model;
step S106, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data, the second voice test data.
According to the invention, the acquired first voice test data is input into the first model, wherein the first model is used for converting the first voice test data into the second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in having solved the correlation technique, among the training voice model process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced the cost of carrying out the mark to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
In an embodiment of the present invention, before inputting the second speech test data into the second model, the method further includes: acquiring third voice test data output by the first model aiming at the first voice test data; and taking the third voice test data as the input of the first model, so that the specified content in the second voice test data output to the second model by the first model accounts for more than a preset threshold in the second voice test data.
The third voice test data (for example, the accented data) may be obtained by inputting the first voice test data (for example, the standard accent data) into the first model, and until the obtained third voice test data (for example, the accented data) exceeds a preset threshold (for example, 95% or more), the third voice test data at that time may be input into the second model as the second voice test data.
In an embodiment of the present invention, after inputting the second speech test data into the second model to instruct the second model to train parameters of the second model according to the second speech test data, the method further includes: determining a second model corresponding to the trained parameters; recognizing the voice information according to the second model corresponding to the trained parameters to obtain a recognition result; and displaying the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In the embodiment of the invention, the first voice test data comprises test data corresponding to standard Mandarin voice, and the second voice test data comprises: non-standard mandarin corresponding test data.
Optionally, inputting the acquired first voice test data into the first model, including: acquiring voice test data of a preset region as the first voice test data; and inputting the acquired first voice test data into the first model.
The preset region can be a region with a standard mandarin chinese comparison, such as beijing.
The following explains the above data processing procedure with an example, but is not intended to limit the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
fig. 2 is a flowchart of an alternative training method of a speech recognition system according to an embodiment of the present invention, as shown in fig. 2, the training method includes:
step 1, standard accent data and accent data are used for training a two-classification Neural Network, wherein the two-classification Neural Network can be a Deep Neural Network (DNN for short). Wherein the standard accent data corresponds to the first speech test data; the accent data corresponds to the second speech test data.
And 2, training a feature transformation network by using standard accent data, and taking the output of the network as the input of a two-classification neural network (such as DNN). Then, continuously and iteratively training the parameters of the feature conversion network, inputting standard accent data into the feature conversion network after iteration to obtain accent data, and stopping iteration until the probability of the accent data reaches more than 95% (namely the preset threshold). It should be noted that, the process only trains the parameters of the feature transformation network, and does not train the parameters of the binary neural network. The feature conversion network may be understood as a neural network, which may implement a function of converting the first voice test data into the second voice test data, that is, a function of converting the standard accent data into the accent data.
And 3, inputting the standard accent data into the trained feature conversion network, outputting the standard accent data as the features of the voice recognition system, and training the system. The voice recognition system is applied to the scene with the accent for recognition, and a recognition result is obtained.
By adopting the technical scheme, the method can avoid marking the accent data, strengthens the characteristics of the standard accent data to ensure that the characteristics of the accent data have high similarity with the accent data, trains by using the strengthened accent data, not only reduces the problem of excessively expensive marking of the accent data, but also improves the robustness of the voice recognition system to the accent data.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method according to the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a data processing apparatus is further provided, and the data processing apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of an alternative data processing apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
thefirst input module 30 is configured to input the acquired first voice test data into a first model, where the first model is configured to convert the first voice test data into second voice test data;
a first obtainingmodule 32, configured to obtain second voice test data output by the first model;
asecond input module 34, configured to input the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, where the second model is used to recognize voice information, and the voice information includes: the first voice test data and the second voice test data.
According to the invention, the acquired first voice test data is input into a first model, wherein the first model is used for converting the first voice test data into second voice test data; acquiring second voice test data output by the first model; inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: first pronunciation test data, second pronunciation test data adopts above-mentioned technical scheme, in the correlation technique of having solved, the training voice model in-process, to two kinds of pronunciation test data, the model can't effectively discern the scheduling problem, can convert first pronunciation test data into second pronunciation test data through first model, make first pronunciation test data and second pronunciation test data have the similarity, then use the second pronunciation test data after the conversion to train, adopt above-mentioned technical scheme can avoid marking second pronunciation test data, not only reduced and carried out the cost marked to second pronunciation test data, and can realize carrying out effective discernment to first pronunciation test data and second pronunciation test data.
In this embodiment of the present invention, fig. 4 is another structural block diagram of an optional data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus further includes:
a second obtainingmodule 36, configured to obtain third voice test data output by the first model for the first voice test data;
and theprocessing module 38 is configured to use the third voice test data as an input of the first model, so that the specified content in the second voice test data output by the first model to the second model exceeds a preset threshold in the second voice test data.
In the embodiment of the present invention, as shown in fig. 4, the apparatus further includes:
a determiningmodule 40, configured to determine a second model corresponding to the trained parameter;
therecognition module 42 is configured to recognize the voice information according to the second model corresponding to the trained parameter, so as to obtain a recognition result;
and adisplay module 44, configured to display the recognition result.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
In an embodiment of the invention, the first model comprises: a feature transformation network; the second model includes: a two-class neural network.
It should be noted that the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, inputting acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
s2, acquiring second voice test data output by the first model;
s3, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data and the second voice test data.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, inputting acquired first voice test data into a first model, wherein the first model is used for converting the first voice test data into second voice test data;
s2, acquiring second voice test data output by the first model;
s3, inputting the second voice test data into a second model to instruct the second model to train parameters of the second model according to the second voice test data, wherein the second model is used for recognizing voice information, and the voice information comprises: the first voice test data, the second voice test data.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

CN201910673507.6A2019-07-242019-07-24Data processing method and device, storage medium and electronic deviceActiveCN110400560B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910673507.6ACN110400560B (en)2019-07-242019-07-24Data processing method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910673507.6ACN110400560B (en)2019-07-242019-07-24Data processing method and device, storage medium and electronic device

Publications (2)

Publication NumberPublication Date
CN110400560A CN110400560A (en)2019-11-01
CN110400560Btrue CN110400560B (en)2022-10-18

Family

ID=68325913

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910673507.6AActiveCN110400560B (en)2019-07-242019-07-24Data processing method and device, storage medium and electronic device

Country Status (1)

CountryLink
CN (1)CN110400560B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111696551A (en)*2020-06-052020-09-22海尔优家智能科技(北京)有限公司Device control method, device, storage medium, and electronic apparatus
CN111916105B (en)*2020-07-152022-07-15北京声智科技有限公司Voice signal processing method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7457745B2 (en)*2002-12-032008-11-25Hrl Laboratories, LlcMethod and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments
CN104732976A (en)*2013-12-202015-06-24上海伯释信息科技有限公司Voice recognition method for converting mandarin into dialects
CN109616101A (en)*2019-02-122019-04-12百度在线网络技术(北京)有限公司Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing
CN109859737A (en)*2019-03-282019-06-07深圳市升弘创新科技有限公司Communication encryption method, system and computer readable storage medium
CN109887497A (en)*2019-04-122019-06-14北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition
CN109949808A (en)*2019-03-152019-06-28上海华镇电子科技有限公司 Speech recognition home appliance control system and method compatible with Mandarin and dialects

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106023985A (en)*2016-05-192016-10-12北京捷通华声科技股份有限公司Linguistic model training method and system and speech recognition system
US10460040B2 (en)*2016-06-272019-10-29Facebook, Inc.Language model using reverse translations
RU2693324C2 (en)*2017-11-242019-07-02Общество С Ограниченной Ответственностью "Яндекс"Method and a server for converting a categorical factor value into its numerical representation
CN109359309B (en)*2018-12-112023-02-03成都金山互动娱乐科技有限公司Translation method and device, and translation model training method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7457745B2 (en)*2002-12-032008-11-25Hrl Laboratories, LlcMethod and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments
CN104732976A (en)*2013-12-202015-06-24上海伯释信息科技有限公司Voice recognition method for converting mandarin into dialects
CN109616101A (en)*2019-02-122019-04-12百度在线网络技术(北京)有限公司Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing
CN109949808A (en)*2019-03-152019-06-28上海华镇电子科技有限公司 Speech recognition home appliance control system and method compatible with Mandarin and dialects
CN109859737A (en)*2019-03-282019-06-07深圳市升弘创新科技有限公司Communication encryption method, system and computer readable storage medium
CN109887497A (en)*2019-04-122019-06-14北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition

Also Published As

Publication numberPublication date
CN110400560A (en)2019-11-01

Similar Documents

PublicationPublication DateTitle
US10777207B2 (en)Method and apparatus for verifying information
CN107909065B (en)Method and device for detecting face occlusion
CN111190939B (en)User portrait construction method and device
CN108830235B (en)Method and apparatus for generating information
US11270099B2 (en)Method and apparatus for generating facial feature
CN109034069B (en)Method and apparatus for generating information
CN109993150B (en)Method and device for identifying age
CN110890088B (en)Voice information feedback method and device, computer equipment and storage medium
CN110046254B (en)Method and apparatus for generating a model
CN110287318B (en)Service operation detection method and device, storage medium and electronic device
CN109582825B (en)Method and apparatus for generating information
CN112836521A (en) Question-answer matching method, device, computer equipment and storage medium
CN110400560B (en)Data processing method and device, storage medium and electronic device
CN109766422A (en)Information processing method, apparatus and system, storage medium, terminal
CN113782022A (en)Communication method, device, equipment and storage medium based on intention recognition model
CN110750637A (en)Text abstract extraction method and device, computer equipment and storage medium
CN111625636B (en)Method, device, equipment and medium for rejecting man-machine conversation
CN114841287A (en)Training method of classification model, image classification method and device
CN119739817A (en) Intelligent dialogue method, device, computer equipment, storage medium and product
CN111435418B (en)Method and device for identifying personalized object of robot, storage medium and robot
CN111259698B (en)Method and device for acquiring image
CN116894192A (en)Large model training method, and related method, device, equipment, system and medium
CN116166858A (en)Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN110888971B (en)Multi-round interaction method and device for robot customer service and user
CN115248843A (en)Method and device for assisting in generating record and record generating system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp