Movatterモバイル変換


[0]ホーム

URL:


CN113901992B - Training data screening method, system, device and medium - Google Patents

Training data screening method, system, device and medium
Download PDF

Info

Publication number
CN113901992B
CN113901992BCN202111109966.5ACN202111109966ACN113901992BCN 113901992 BCN113901992 BCN 113901992BCN 202111109966 ACN202111109966 ACN 202111109966ACN 113901992 BCN113901992 BCN 113901992B
Authority
CN
China
Prior art keywords
decoding
audio data
data
accuracy
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111109966.5A
Other languages
Chinese (zh)
Other versions
CN113901992A (en
Inventor
袁正鹏
王强强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baige Feichi Technology Co ltd
Original Assignee
Beijing Baige Feichi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baige Feichi Technology Co ltdfiledCriticalBeijing Baige Feichi Technology Co ltd
Priority to CN202111109966.5ApriorityCriticalpatent/CN113901992B/en
Publication of CN113901992ApublicationCriticalpatent/CN113901992A/en
Application grantedgrantedCritical
Publication of CN113901992BpublicationCriticalpatent/CN113901992B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及语音识别处理技术领域,特别适用于语音识别转写时使用的机器学习模型的训练数据的获取。针对不同场景需要大批量数据训练模型并需要大量人工标注数据而存在数据获取成本高、消耗大、数据质量/准确度差以及现有伪标签数据筛选效果差等缺陷,提出了本发明的训练数据的筛选方法、系统、装置及介质,旨在解决如何基于半监督学习的伪标签准确度筛选高质量的应用于语音识别、搜索、转写等模型的训练数据的技术问题。为此,本发明的方法通过在解码中利用解码路径的节点链接个数均值对解码结果排序以筛选排序靠前的伪标签语音数据作为模型训练数据。提高了筛选效率和数据质量,降低成本减少消耗。

The present invention relates to the field of speech recognition processing technology, and is particularly suitable for obtaining training data for machine learning models used in speech recognition transcription. In view of the high cost of data acquisition, high consumption, poor data quality/accuracy, and poor screening effect of existing pseudo-label data in different scenarios, a method, system, device, and medium for screening training data of the present invention are proposed, aiming to solve the technical problem of how to screen high-quality training data for models such as speech recognition, search, and transcription based on pseudo-label accuracy of semi-supervised learning. To this end, the method of the present invention uses the mean number of node links of the decoding path in decoding to sort the decoding results to screen the pseudo-label speech data with the highest sorting as model training data. The screening efficiency and data quality are improved, and the cost and consumption are reduced.

Description

Training data screening method, system, device and medium
Technical Field
The invention belongs to the technical field of voice recognition processing, and particularly relates to a method, a system, a device and a medium for screening training data, which are particularly suitable for acquiring training data of a machine learning model used in voice recognition transfer.
Background
Processing such as speech recognition, speech conversion, speech recognition search, etc. is used in the context of intelligent speech interaction, man-machine interaction, etc., and may be implemented by machine learning, for example, various speech/audio recognition models, etc. As speech recognition technology has grown in progress, speech recognition technology trained using large volumes of data has been able to surpass humans in certain specific scenarios. However, the training of the voice model requires a large amount of manually marked data to train and improve the predictive recognition performance (precision and accuracy) of the model, namely supervised learning is performed to better realize voice recognition and interaction, and if insufficient data is available, a good model structure cannot learn enough knowledge. However, the data of the manually labeled tag is high in quality but is often high in price, high in cost, high in consumption and low in efficiency, so that it is more desirable to enhance the performance of the model by using the data which is not labeled. The training of the model by using the data which are not marked in the existing data is called semi-supervised learning, for example, the existing voice model is directly used for identifying the audio data, so that a 'pseudo tag' of the audio data is obtained, namely, the audio data is marked with a plurality of pseudo tags, and then the data with the 'pseudo tag' with relatively good quality are added into the marked data, so that the training of the model is resumed.
However, the quality of the obtained audio data marked with the pseudo tag is uncertain whether the audio data meets the requirement of training data, so that the accuracy of the pseudo tag data, namely the data identification/transcription accuracy of the audio data, is determined according to a certain index, then the pseudo tag data with higher accuracy is screened out according to the accuracy of judgment, the pseudo tag data with high accuracy is added into the training data and provided for a speech recognition model to perform model training again, and the enhancement of the recognition efficiency and recognition effect of the model is ensured.
The conventional screening scheme of training data filters, namely calculates the confidence level of sentences according to the confidence level of sentences, and calculates the confidence level of words by MBR (Minimum Bayes Risk).
The former approach is often used by systems on line to measure sentence confidence by taking advantage of the difference in total cost of the optimal and sub-optimal paths for each sentence lattice
sentence_confidence=1-exp(-(best_cost-second_best_cost))
The confidence coefficient calculated in this way is coarse and inaccurate, and cannot reflect the actual sentence recognition quality, especially when long sentences are encountered, the language model is not good enough to score the long sequences, and the standard is completely invalid.
The latter scheme decodes with a decoding algorithm, calculates word level confidence according to the MBR value of the decoding result, and uses the word level confidence as a screening data condition. The filtering of the data according to the MBR value has the disadvantage that the calculation of the MBR value is based on the result obtained by the whole decoding step including the language model, which is affected by the acousic-scale and LM rescore, and different decoding strategies can obtain different results, so that the strategy of filtering according to the index such as the MBR value is not stable. The specific MBR index screening thought can be seen in "Xu H,Povey D,Mangu L,et al.Minimum bayes risk decoding and system combination based on a recursion for edit distance[J].Computer Speech&Language,2011,25(4):802-828.".
Therefore, the existing screening mode of the pseudo tag data needs to be improved, so that the screening is less in interference, stable in screening, high in efficiency and capable of determining the accuracy of the pseudo tag data more accurately and rapidly, valuable high-quality pseudo tag data is screened out efficiently, and more and better training data are obtained.
Disclosure of Invention
First, the technical problem to be solved
The method aims to solve the technical problem of how to screen training data based on pseudo tag accuracy of semi-supervised learning, further solve the technical problem of how to effectively utilize/screen existing data to perform semi-supervised training under a K12 scene AI teaching system to improve the recognition effect of an intelligent voice model, and further solve the technical problem of how to screen training data which is high in quality and is applied to voice models of different scenes.
(II) technical scheme
In order to solve the technical problems, a first aspect of the present invention provides a training data screening method, which includes using a mean value of a number of nodes, which each node of each decoding path is expected to link, in all decoding paths during speech recognition as an index for determining accuracy, labeling an audio data corresponding to a decoding result with high accuracy with a pseudo tag, and using the audio data with the pseudo tag as selected training data.
According to the preferred embodiment of the invention, the voice recognition of all decoding paths during decoding specifically comprises the steps of utilizing a voice recognition model of kaldi tool kit to carry out voice recognition on input unlabeled audio data, carrying out beam search decoding according to a preset beam width beamsize in the voice recognition process, and storing all decoding paths which are searched during decoding and are expected to exist.
According to the preferred embodiment of the invention, all the decoding paths which are searched during decoding and are expected to exist are stored, and the method specifically comprises the steps of searching the decoding paths from a state network WFST during decoding to obtain a decoding path diagram, storing all the decoding paths searched in the decoding path diagram by utilizing lattice, and outputting the optimal decoding paths as decoding results corresponding to the audio data.
According to the preferred embodiment of the invention, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths is taken as an index for judging the accuracy, and the method specifically comprises the steps of acquiring the number of paths, which each node can link backwards, in each decoding path from all decoding paths stored by law, calculating the average value, taking the average value as an index LATTICEDEPTH for judging the accuracy, and carrying out sequencing comparison according to a plurality of indexes LATTICEDEPTH obtained by voice recognition on a plurality of audio data in a data set during decoding so as to judge the accuracy.
According to the preferred embodiment of the invention, the corresponding multiple indexes LATTICEDEPTH obtained when the multiple audio data in the dataset are decoded through voice recognition are compared in order to judge the accuracy, and the method specifically comprises the steps of sequencing the indexes LATTICEDEPTH corresponding to the multiple audio data respectively and comparing the indexes with a preset threshold value, and if the sequencing of the indexes LATTICEDEPTH is lower than the preset threshold value, indicating that the accuracy of the decoding result of the corresponding audio data is higher.
According to the preferred embodiment of the invention, the pseudo tag is marked for the audio data corresponding to the decoding result with high accuracy, and the method specifically comprises the steps of obtaining the unlabeled audio data corresponding to the decoding result with high accuracy, and marking the pseudo tag for the audio data according to the text tag in the decoding result output by voice recognition.
According to the preferred embodiment of the invention, the audio data with the pseudo tag is used as the selected training data, and the method specifically comprises the steps of screening unlabeled audio data corresponding to all decoding results with high accuracy, adding the audio data with the pseudo tag into a speech recognition model training set after the audio data is labeled with the pseudo tag, and retraining a speech recognition model.
According to the preferred embodiment of the invention, the method further comprises the steps of directly inputting a plurality of unlabeled audio data in the data set into a voice recognition model through a small amount of voice recognition models trained by manually labeling tags or through initializing the voice recognition models provided with text tags based on a semi-supervised learning mode, outputting a plurality of corresponding decoding results, and directly labeling the unlabeled audio data corresponding to the decoding results with the text tags in the decoding results when one or more decoding results with high accuracy are screened out.
In order to solve the technical problems, a second aspect of the present invention provides a training data screening system, which includes an index setting module, a screening data module, and a training data selecting module, wherein the index setting module is used for taking an average value of node numbers which can be linked by each node in a decoding path during speech recognition as an index for judging the accuracy, and the screening data module is used for labeling an audio data corresponding to a decoding result with high accuracy with a pseudo tag to take the audio data with the pseudo tag as selected training data.
To solve the above technical problem, a third aspect of the present invention proposes an electronic device comprising a processor and a memory for storing a computer executable program, which when executed by the processor performs the method according to any of the first aspects.
To solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable medium storing a computer-executable program that, when executed, implements the method according to any one of the first aspects.
To solve the above technical problem, a fifth aspect of the present invention proposes a computer executable program that, when executed, implements the method according to any one of the first aspects.
(III) beneficial effects
Aiming at the technical problem of how to screen high-quality training data applied to models such as voice recognition, searching and transcription based on pseudo tag accuracy of semi-supervised learning, further how to utilize (screen) existing transcription data to perform semi-supervised training to improve model recognition effect when the training data is inadequately marked in intelligent voice recognition of a K12 scene AI teaching system, a new screening index is adopted to judge pseudo tag accuracy by sorting decoding results by using node link number average values of decoding paths, pseudo tag voice data with the front screening sorting is used as model training data, screening efficiency and data quality are improved, cost is reduced, model training is performed according to the high-quality training data, and model training efficiency and model recognition effect are improved.
Furthermore, the index LATTICEDEPTH is determined by the nodes of the decoding diagram obtained in the decoding path searching process in the state network of the speech recognition model such as Kaldi speech recognition tool, and the index LATTICE DEPTH can be used as the screening index of the reflow data under the condition of not increasing the consumption of the calculation power basically, so that the high-quality unlabeled data (such as unlabeled audio data) can be screened out with high efficiency and low consumption, and the model can be retrained to enhance the effect of the model. The method does not introduce other parameters, classes and other treatments, and has the advantages of simplicity, easiness in use, high audio quality, low noise and the like.
Furthermore, the index screening method adopted by the embodiment of the invention can be used in any speech recognition scene, and only a decoder requiring speech recognition can obtain a decoding diagram by utilizing the FST, so that the method is a more general screening scheme.
Drawings
FIG. 1 is a main flow diagram of one embodiment of a method of screening training data according to the present invention;
FIG. 2 is a block diagram of the main structure of one embodiment of a screening system for training data in accordance with the present invention;
FIG. 3 is a block diagram of the primary structure of one embodiment of an electronic device in accordance with the present invention;
FIG. 4 is a schematic diagram of the main structure of one embodiment of a more specific electronic device according to the present invention;
fig. 5 is a main structural diagram of one embodiment of a computer-readable medium according to the present invention.
Detailed Description
In describing particular embodiments, specific details of construction, performance, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by those skilled in the art. It is not excluded that one skilled in the art may implement the present invention in a particular case in a solution that does not include the structures, properties, effects, or other characteristics described above.
The flow diagrams in the figures are merely exemplary flow illustrations and do not represent that all of the elements, operations, and steps in the flow diagrams must be included in the aspects of the invention, nor that the steps must be performed in the order shown in the figures. For example, some operations/steps in the flowcharts may be decomposed, some operations/steps may be combined or partially combined, etc., and the order of execution shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit means and/or microcontroller means.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or," "and/or" is meant to include all combinations of any one or more of the items listed.
Some technical terms that may be involved in the related content of the present invention are described below:
ASR speech recognition (Automatic Speech Recognition, ASR) speech recognition is a interdisciplinary sub-field of computer science and computational linguistics that has developed methods and techniques to enable computers to recognize and translate spoken language into text.
Kaldi A speech recognition open source kit originating from the summer seminar of John Hopkins university in 2009, kaldi is one of the most popular speech technology kits in recent years.
The FA-finite automata (Finite Automata, FA) is composed of a finite set of states and state transitions, each transition having at least one tag. The most basic FA is the finite state receiver (FINITE STATE Acceptor, FSA). For a given input sequence, the FSA returns to either a "receive" or "no receive" state.
FST a finite State transducer (FINITE STATE Transducers, FST), which is an extension of FSA, has one output tag per state transition, called an input-output tag pair, by which the FST can describe a regular set of transitions or a set of symbol sequences to another set of symbol sequences.
Lattice, a form of Finite State Transducer (FST), whose inputs and outputs can be label (usually transition id and word) of any FST, weights include acoustic models, language models, and conversion weights. WFST, in turn, identifies a network of states in searching for paths based on the decoding map or decoding formed by FST, and searches for paths (e.g., sentences, words, etc.) that best or most closely match the sound.
CER is word error rate (CHARACTER ERROR RATE, CER), recognition result of audio, edit distance of real label relative to audio, and percentage value obtained by dividing the real label word number.
The scaling factor of the log probability score of the acoustic model. In Kaldi's speech recognition, the audio may be jointly recognized by an acoustic model and a language model to ensure that the output of audio sounds more closely conforms to the habits and logic of human speaking. The scaling factor is a coefficient that scales the probabilistic output of the acoustic model in order to balance the acoustic and language models.
LM rescore language model re-scoring. After obtaining a speech recognition model, if it is desired to make the model better able to recognize data in a certain domain, or to integrate advantages of different language models (such as NGRAM +rnnlm) to enhance model effect, the recognition result is often scored secondarily by using the data in different domains or the language models obtained by training different model structures, and the result of the secondary scoring is used as a final result.
The method mainly comprises the steps of achieving training data acquisition or screening based on a pseudo tag accuracy screening mode of semi-supervised learning, wherein the implementation flow is that S1, the average value of the number of nodes which can be linked by each node of each decoding path in all decoding paths in voice recognition during decoding is used as an index for judging the accuracy, and S2, the pseudo tag is marked for audio data corresponding to decoding results with high accuracy, so that the audio data with the pseudo tag is used as selected training data.
[ Example 1]
In order to make the objects, technical solutions and advantages of the present invention more apparent, the implementation of the method of the present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings.
The main step flow diagram of one embodiment of the method of the present invention as shown in fig. 1 will be described herein. The method is mainly based on a semi-supervised learning mode, a few of models for speech recognition trained by manually labeling tags or a few of models for speech recognition with text tags arranged in an initializing mode are used for directly inputting a plurality of unlabeled audio data in a data set into the models for speech recognition, and a plurality of corresponding decoding results are output, so that when one or more decoding results with high accuracy are screened out, the unlabeled audio data corresponding to the decoding results are directly labeled by the text tags in the decoding results. The main flow comprises the following steps:
Step S110, performing speech recognition on the audio data based on the model of the semi-supervised learning training speech recognition.
In one embodiment, the pseudo tag accuracy of the training data is used in model training based on speech recognition in a semi-supervised learning mode. In particular, models of speech recognition are constructed, including acoustic models and language models, such as speech recognition tools implemented in kaldi kits.
Further, models of speech recognition in different application scenarios often require training data in a specific field to achieve the best results. For example, in the bullet screen voice recognition and quality inspection voice transcription task, the models are required to be subjected to targeted enhancement training by respectively utilizing the data of the respective scene reflux so as to adapt to the requirements under different scenes. One embodiment of the invention based on the class supervised learning training speech recognition model can adopt the method that after a very small amount of audio data of manually marked text labels are input to train the speech recognition model, the corresponding text labels can be output when the decoding result/recognition result is recognized to be output, so that the text labels in the result are directly marked on the audio data corresponding to the screened decoding result with high accuracy as the training data for backflow to retrain the model. Or, text labels can be set for the voice recognition model in advance, for example, according to the characteristics of different scenes such as voice barrage recognition, quality inspection voice transcription, voice search question recognition and the like in actual application scenes, unlabeled audio data are directly input into the model to evaluate accuracy, the data are screened, and the corresponding text labels predicted in decoding results are utilized for labeling.
Therefore, when the labeling data used by the speech recognition model is insufficient in the intelligent speech recognition of the AI teaching system under the K12 scene, the existing transcription data (audio data corresponding to the text label) can be effectively utilized (screened) to perform semi-supervised training, training data are obtained to train the model, and the recognition performance, effect and quality of the model are improved. In order to avoid the influence of noise, the existing transfer data also needs to meet certain requirements for quality, so that the transfer data can be used for expanding training data, namely high-quality data need to be screened out, the transfer accuracy of the screened data is ensured to be high enough, the noise is small enough, and training by using the data is really helpful for improving the performance and effect of the model.
Step S120, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths during the voice recognition is used as an index for judging the accuracy.
In one embodiment, the voice recognition model of kaldi tool kit can be utilized to perform voice recognition on the input unlabeled audio data, perform beam search decoding according to the preset beam width beamsize in the voice recognition process, and store all decoding paths which are searched during decoding and are expected to exist.
Further, all the decoding paths which are searched during decoding and expected to exist are stored, and the method specifically comprises the steps of searching the decoding paths from the state network WFST during decoding to obtain a decoding path diagram, storing all the decoding paths searched in the decoding path diagram by utilizing lattice, and outputting the optimal decoding paths as decoding results corresponding to the audio data.
Further, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths is used as an index for judging the accuracy, and specifically includes the steps of acquiring the number of paths, which each node can link backward, in each decoding path from all decoding paths stored in lattice, calculating the average value, using the average value as an index LATTICE DEPTH for judging the accuracy, and carrying out sorting comparison according to a plurality of indexes LATTICE DEPTH obtained by voice recognition on a plurality of audio data in a data set during decoding to judge the accuracy.
Further, the method comprises the steps of sorting and comparing the indexes LATTICE DEPTH corresponding to the audio data respectively with a preset threshold value according to a plurality of indexes LATTICE DEPTH obtained in the decoding process of the audio data in the dataset through voice recognition to judge the accuracy, and if the sorting of the indexes LATTICE DEPTH is lower than the preset threshold value, the higher the accuracy of the decoding result of the corresponding audio data is indicated.
Specifically, a voice recognition tool implemented by using kaldi toolkit, such as an HMM model, constructs a WFST state network, that is, a state network based on FST and lattice, and decodes the WFST state network by searching a path which is the best match with a sound to be recognized from the state network, for example, a beam search (searching for a globally optimal path) can be performed according to a given beam width beam size during decoding, so as to obtain a corresponding decoding graph, which has a plurality of possible decoding paths, and find the optimal, that is, the best path matching as a decoding result, and at this time, all possible decoding paths can be stored by using lattice, which is equivalent to storing the decoding graph. The optimal path is the decoding result, and the decoding result of the model, namely the recognition result of the audio, can be output.
And taking the average value of the number of nodes possibly linked by each node in the decoding path as a sorting index or a judging index LATTICE DEPTH for evaluating the accuracy of the pseudo tag. The calculation mean value only averages the number of nodes possibly linked or the number of paths possibly unfolded, the algorithm is simple, high in efficiency and low in calculation resource consumption, and calculation processing which does not introduce any interference and uncertainty is directly obtained in the decoding process.
In performing recognition and training based on semi-supervised learning of a speech recognition model, a data set such as a plurality of unlabeled audio inputs are used to decode into the model to obtain a decoded result and a plurality of said indicators LATTICEDEPTH corresponding to the plurality of audio inputs, the plurality of indicators LATTICEDEPTH are ranked from high to low or low to high, preferably low to high, and a suitable threshold is preset, such as a threshold of 5 in a barrage speech recognition scenario. And then all the audios with LATTICEDEPTH values lower than the threshold value 5 are recovered, the audios are high-quality audios, and the corresponding indexes LATTICEDEPTH lower than the threshold value 5 are decoding results with high accuracy.
Therefore, LATTICE DEPTH is utilized as an accuracy ordering index of an audio decoding result, any other operation steps or parameter processing are not required to be introduced, such as no introduction of an external-scale scaling factor, LM rescore re-scoring, no confidence score calculation, coarse sentence level confidence calculation, no interference of various decoding strategies and the like, the accuracy ordering index is not influenced by external-scale and LM rescore, and the accuracy ordering index is more stable than indexes such as MBR, so that partial audio with accurate decoding can be found out from all audio more accurately. In addition, in speech recognition, for the audio with small noise and easy recognition, too many decoding paths are not easy to generate during decoding, so that if LATTICEDEPTH of the decoding recognition audio is small, the quality of the audio can be indicated to be high, namely the audio A is recognized as a language A to be very accurate, the noise is small, and thus the model is not easy to be negatively influenced. When the index LATTICEDEPTH is actually checked, it is very discriminated for the recognized sentences. Further, when the index LATTICE DEPTH is obtained, the calculation consumed computational power resource is very small, namely, the redundant resource consumption can be effectively reduced, and the computational power burden of computer decoding is not increased.
And step S130, labeling the pseudo tag for the audio data corresponding to the decoding result with high accuracy, and taking the audio data with the pseudo tag as the selected training data.
In one embodiment, unlabeled audio data corresponding to a decoding result with high accuracy is obtained, and pseudo tag labeling is performed on the audio data according to text tags in the decoding result output by voice recognition.
In one embodiment, the audio data with the pseudo tag is used as the selected training data, for example, unlabeled audio data corresponding to all decoding results with high accuracy are screened out, after the pseudo tag labeling is carried out on the audio data, the audio data with the pseudo tag is added into a model training set of voice recognition, and a model of the voice recognition is retrained.
Specifically, after the audio is decoded by the speech recognition model, a decoding result is output, for example, recognized text data is given, the text label/text data in the decoding result with high accuracy is evaluated as a pseudo label of the corresponding audio directly by the index LATTICEDEPTH, namely, the model recognition result is a pseudo label of the high-quality audio, and then the reflowed audio data (namely, the audio marked with the pseudo label) is added into the marked data to be used as training data in a training set to retrain the model. Therefore, the ASR semi-supervised learning reflux corpus screening based on LATTICE DEPTH can screen out proper training data, and a large amount of data training is continuously provided for improving the recognition performance effect quality of the model.
Therefore, the pseudo tag is directly provided, the realization is simple and easy, and the screened training data can also ensure high enough transfer accuracy and small noise. In addition, the data selected by LATTICE DEPTH can also be used for enhancing the training of models in the scenes of speech synthesis, noise elimination and the like, and the index LATTICE DEPTH is more widely used for indicating how much mode recognition is possible, namely, the degree of data classification is determined, and in other artificial intelligence fields (such as computer vision image classification), other similar indexes can be designed by utilizing the thought and used for screening high-quality data to train and enhance the performance effect of the existing models, so that the index can be more widely applied in the future.
[ Example 2]
For the purposes of promoting an understanding of the principles and technical aspects of the invention, reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings.
The main structural block diagram of one embodiment of the method of the present invention will be described herein with reference to fig. 2. The system is also mainly based on a semi-supervised learning mode, a voice recognition model trained by a small amount of manual labeling labels or a voice recognition model provided with text labels by initialization, a plurality of unlabeled audio data in a data set are directly input into the model for voice recognition, a plurality of corresponding decoding results are output, and when one or more decoding results with high accuracy are screened out, the unlabeled audio data corresponding to the decoding results are directly labeled by the text labels in the decoding results. This embodiment mainly includes:
The training recognition module B210 trains a model of speech recognition based on semi-supervised learning, and performs speech recognition on the audio data.
In one embodiment, the pseudo tag accuracy of the training data is used in model training based on speech recognition in a semi-supervised learning mode. In particular, models of speech recognition are constructed, including acoustic models and language models, such as speech recognition tools implemented in kaldi kits.
Further, models of speech recognition in different application scenarios often require training data in a specific field to achieve the best results. For example, in the bullet screen voice recognition and quality inspection voice transcription task, the models are required to be subjected to targeted enhancement training by respectively utilizing the data of the respective scene reflux so as to adapt to the requirements under different scenes. One embodiment of the invention based on the class supervised learning training speech recognition model can adopt the method that after a very small amount of audio data of manually marked text labels are input to train the speech recognition model, the corresponding text labels can be output when the decoding result/recognition result is recognized to be output, so that the text labels in the result are directly marked on the audio data corresponding to the screened decoding result with high accuracy as the training data for backflow to retrain the model. Or, text labels can be set for the voice recognition model in advance, for example, according to the characteristics of different scenes such as voice barrage recognition, quality inspection voice transcription, voice search question recognition and the like in actual application scenes, unlabeled audio data are directly input into the model to evaluate accuracy, the data are screened, and the corresponding text labels predicted in decoding results are utilized for labeling.
Therefore, when the labeling data used by the speech recognition model is insufficient in the intelligent speech recognition of the AI teaching system under the K12 scene, the existing transcription data (audio data corresponding to the text label) can be effectively utilized (screened) to perform semi-supervised training, training data are obtained to train the model, and the recognition performance, effect and quality of the model are improved. In order to avoid the influence of noise, the existing transfer data also needs to meet certain requirements for quality, so that the transfer data can be used for expanding training data, namely high-quality data need to be screened out, the transfer accuracy of the screened data is ensured to be high enough, the noise is small enough, and training by using the data is really helpful for improving the performance and effect of the model.
The index module B220 is arranged, and the average value of the number of nodes, which are expected to be linked by each node of each decoding path in all decoding paths during the voice recognition in decoding is used as an index for judging the accuracy.
In one embodiment, the voice recognition model of kaldi tool kit can be utilized to perform voice recognition on the input unlabeled audio data, perform beam search decoding according to the preset beam width beamsize in the voice recognition process, and store all decoding paths which are searched during decoding and are expected to exist.
Further, all the decoding paths which are searched during decoding and expected to exist are stored, and the method specifically comprises the steps of searching the decoding paths from the state network WFST during decoding to obtain a decoding path diagram, storing all the decoding paths searched in the decoding path diagram by utilizing lattice, and outputting the optimal decoding paths as decoding results corresponding to the audio data.
Further, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths is used as an index for judging the accuracy, and specifically includes the steps of acquiring the number of paths, which each node can link backward, in each decoding path from all decoding paths stored in lattice, calculating the average value, using the average value as an index LATTICE DEPTH for judging the accuracy, and carrying out sorting comparison according to a plurality of indexes LATTICE DEPTH obtained by voice recognition on a plurality of audio data in a data set during decoding to judge the accuracy.
Further, the method comprises the steps of sorting and comparing the indexes LATTICE DEPTH corresponding to the audio data respectively with a preset threshold value according to a plurality of indexes LATTICE DEPTH obtained in the decoding process of the audio data in the dataset through voice recognition to judge the accuracy, and if the sorting of the indexes LATTICE DEPTH is lower than the preset threshold value, the higher the accuracy of the decoding result of the corresponding audio data is indicated.
Specifically, a voice recognition tool implemented by using kaldi toolkit, such as an HMM model, constructs a WFST state network, that is, a state network based on FST and lattice, and decodes the WFST state network by searching a path which is the best match with a sound to be recognized from the state network, for example, a beam search (searching for a globally optimal path) can be performed according to a given beam width beam size during decoding, so as to obtain a corresponding decoding graph, which has a plurality of possible decoding paths, and find the optimal, that is, the best path matching as a decoding result, and at this time, all possible decoding paths can be stored by using lattice, which is equivalent to storing the decoding graph. The optimal path is the decoding result, and the decoding result of the model, namely the recognition result of the audio, can be output.
And taking the average value of the number of nodes possibly linked by each node in the decoding path as a sorting index or a judging index LATTICE DEPTH for evaluating the accuracy of the pseudo tag. The calculation mean value only averages the number of nodes possibly linked or the number of paths possibly unfolded, the algorithm is simple, high in efficiency and low in calculation resource consumption, and calculation processing which does not introduce any interference and uncertainty is directly obtained in the decoding process.
In performing recognition and training based on semi-supervised learning of a speech recognition model, a data set such as a plurality of unlabeled audio inputs are used to decode into the model to obtain a decoded result and a plurality of said indicators LATTICE DEPTH corresponding to the plurality of audio inputs, the plurality of indicators LATTICE DEPTH are ranked from high to low or low to high, preferably low to high, and a suitable threshold is preset, such as a threshold of 5 in a barrage speech recognition scenario. And then all the audios with LATTICE DEPTH values lower than the threshold value 5 are recovered, the audios are high-quality audios, and the indexes LATTICE DEPTH lower than the threshold value 5 correspond to decoding results with high accuracy.
Therefore, LATTICE DEPTH is utilized as an accuracy ordering index of an audio decoding result, any other operation steps or parameter processing are not required to be introduced, such as no introduction of an external-scale scaling factor, LM rescore re-scoring, no confidence score calculation, coarse sentence level confidence calculation, no interference of various decoding strategies and the like, the accuracy ordering index is not influenced by external-scale and LM rescore, and the accuracy ordering index is more stable than indexes such as MBR, so that partial audio with accurate decoding can be found out from all audio more accurately. In addition, in speech recognition, for the audio with small noise and easy recognition, too many decoding paths are not easy to generate during decoding, so that if LATTICE DEPTH of the decoding recognition audio is small, the quality of the audio can be indicated to be high, namely the audio A is recognized as a language A to be very accurate, the noise is small, and thus the model is not easy to be negatively influenced. When the index LATTICE DEPTH is actually checked, it is very discriminated for the recognized sentences. Further, when the index LATTICE DEPTH is obtained, the calculation consumed computational power resource is very small, namely, the redundant resource consumption can be effectively reduced, and the computational power burden of computer decoding is not increased.
And the screening data module B230 marks the pseudo tag for the audio data corresponding to the decoding result with high accuracy, and takes the audio data with the pseudo tag as the selected training data.
In one embodiment, unlabeled audio data corresponding to a decoding result with high accuracy is obtained, and pseudo tag labeling is performed on the audio data according to text tags in the decoding result output by voice recognition.
In one embodiment, the audio data with the pseudo tag is used as the selected training data, for example, unlabeled audio data corresponding to all decoding results with high accuracy are screened out, after the pseudo tag labeling is carried out on the audio data, the audio data with the pseudo tag is added into a model training set of voice recognition, and a model of the voice recognition is retrained.
Specifically, after the audio is decoded by the speech recognition model, a decoding result is output, for example, recognized text data is given, the text label/text data in the decoding result with high accuracy is evaluated as a pseudo label of the corresponding audio directly by the index LATTICE DEPTH, namely, the model recognition result is a pseudo label of the high-quality audio, and then the reflowed audio data (namely, the audio marked with the pseudo label) is added into the marked data to be used as training data in a training set to retrain the model. Therefore, the ASR semi-supervised learning reflux corpus screening based on LATTICE DEPTH can screen out proper training data, and a large amount of data training is continuously provided for improving the recognition performance effect quality of the model.
Therefore, the pseudo tag is directly provided, the realization is simple and easy, and the screened training data can also ensure high enough transfer accuracy and small noise. In addition, the data selected by LATTICE DEPTH can also be used for enhancing the training of models in the scenes of speech synthesis, noise elimination and the like, and the index LATTICE DEPTH is more widely used for indicating how much mode recognition is possible, namely, the degree of data classification is determined, and in other artificial intelligence fields (such as computer vision image classification), other similar indexes can be designed by utilizing the thought and used for screening high-quality data to train and enhance the performance effect of the existing models, so that the index can be more widely applied in the future.
[ Example 3]
The implementation of the present invention is further described below with reference to embodiments 1 and 2 in an overall application scenario:
speech recognition is achieved using Kaldi kits. The model of speech recognition in Kaldi of the configuration is trained based on semi-supervised learning. The training can be directly carried out by adopting unlabeled audio, and the training is combined with the modes of autonomous labeling, screening of training data (audio) and retraining, so that the labeled training data is obtained in a simple and feasible mode with high efficiency, low cost and low resource consumption, and the model is extendably adapted to models in various different scenes and training models so as to improve the model performance and enhance the recognition effect and quality of the model.
The voice recognition model of Kaldi tool kit can adopt sound models such as HMM and the like to combine with language models to recognize input audio, and the process includes recognizing states and state combinations of the input audio after dividing the input audio into frames to form phonemes and phonemic synthesized words. Whereas the HMM of Kaldi supports FST, lattice, so that in decoding the recognition state by HMM, a state network WFST is constructed from FST, and in accordance with a given beam width beamsize, the best path is searched in WFST by beam search, i.e. the path (decoding path) that best matches the sound is found. During the search, all possible decoding paths are stored by lattice (see explanation of lattice and FST relationships previously described). The number of nodes possibly linked by each node in the decoding path or the number of paths possibly connected are obtained, and the average value is calculated as an accuracy or precision index LATTICEDEPTH (lattice depth) of the decoding result. For example, node a links 1 node in path 1, node B links 2 nodes, node C links 1 node, node a links 1 node in path 2, node C links 2 nodes, node D links 1 node, and the mean value is 8/4=2, i.e., the index has a value of 2 (this is a conceptual illustration only and is not an actual calculation).
The smaller the value of the index is, the higher the accuracy is, and the higher the quality of the corresponding audio serving as training data is, namely the identification accuracy of the audio can better meet the actual requirements of the current application scene.
The LATTICEDEPTH indexes of the decoding results of the input unlabeled multiple audios are ranked from small to large, and compared with a preset threshold value, such as a bullet screen voice recognition setting threshold value of 5. The threshold value setting also meets the requirements of the practical application field, so that one index can be used in more different scenes needing voice recognition.
Assuming that LATTICEDEPTH index values after A, B, C three audio inputs are 6, 5.5 and 4 in sequence, only index of audio C in 4, 5.5 and 6 is lower than threshold value 5. Further, the text labels in the decoding result of the C may be directly given to the C as labels, i.e. the C is screened out, and the labeled text labels are added to the training data set, and when the model based on semi-supervised learning (i.e. the model of speech recognition in the configuration Kaldi) is trained, model training is performed by using the new training data C continuously.
[ Example 4]
Fig. 3 is a block schematic diagram of a structure of an electronic device according to an embodiment of the invention, comprising a processor and a memory for storing a computer executable program, which processor performs the embodiment steps of the method as referred to in the previous embodiments 1,3, when the computer program is executed by the processor.
As shown in fig. 3, the electronic apparatus is in the form of a general purpose computing device. The processor may be one or a plurality of processors and work cooperatively. The invention does not exclude that the distributed processing is performed, i.e. the processor may be distributed among different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executable by the processor to enable an electronic device to perform the method of the present invention, or at least some of the steps of the method.
The memory includes volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may be non-volatile memory, such as Read Only Memory (ROM).
Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for exchanging data between the electronic device and an external device. The I/O interface may be a bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
More specifically, referring to a block diagram of a more specific example of the electronic apparatus according to the embodiment shown in fig. 4. The electronic apparatus 200 of this exemplary embodiment is in the form of a general-purpose data processing device. The components of the electronic device 200 may include, but are not limited to, at least one processing unit 210, at least one memory unit 220, a bus 230 connecting the different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.
The storage unit 220 stores therein a computer readable program, which may be a source program or code of a program that is read only. The program may be executed by the processing unit 210 such that the processing unit 210 performs the steps of various embodiments of the present invention. For example, the processing unit 210 may perform the respective steps of the methods of the foregoing embodiments 2 to 5.
The memory unit 220 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 2201 and/or cache memory 2202, and may further include Read Only Memory (ROM) 2203. The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 230 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic apparatus 200 may also be in communication with one or more external devices 300 (e.g., a keyboard, a display, a network device, a bluetooth device, etc.), such that a user can interact with the electronic apparatus 200 via the external devices 300, and/or such that the electronic apparatus 200 can communicate with one or more other data processing devices (e.g., a router, a modem, etc.). Such communication may occur through an input/output (I/O) interface 250, and may also occur through a network adapter 260 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet. Network adapter 260 may communicate with other modules of electronic device 200 via bus 230. It should be appreciated that although not shown, other hardware and/or software modules may be used in the electronic apparatus 200, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be understood that the electronic device shown in fig. 3 and 4 is only one example of the present invention, and the electronic device of the present invention may further include elements or components not shown in the above examples. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a man-machine interaction element such as a button, a keyboard, and the like. The electronic device may be considered as covered by the invention as long as the electronic device is capable of executing a computer readable program in a memory for carrying out the method or at least part of the steps of the method.
[ Example 5]
Fig. 5 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in fig. 5, a computer-readable recording medium stores a computer-executable program that, when executed, implements the topic-based coaching policy providing method described above. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
From the above description of embodiments, those skilled in the art will readily appreciate that the present invention may be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, as well as electronic processing units, servers, clients, handsets, control units, processors, etc. included in the system, as well as by a vehicle comprising at least a portion of the above system or component. The invention may also be implemented by computer software executing the method of the invention, for example by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. on the locomotive side. It should be noted that the computer software for performing the method according to the present invention is not limited to be executed by one or a specific hardware entity, but may be implemented in a distributed manner by unspecified specific hardware, for example, some method steps executed by a computer program may be executed at the locomotive end, and another part may be executed in a mobile terminal or a smart helmet, etc. For computer software, the software product may be stored on a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), or may be stored distributed over a network, as long as it enables the electronic device to perform the method according to the invention.
From the above description of embodiments, those skilled in the art will readily appreciate that the exemplary embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware.
The foregoing description of the specific embodiments provides further details of the objects, aspects and advantages of the present invention, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic device, and that various general purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

The method comprises the steps of obtaining the number of paths which can be linked backwards by each node in each decoding path from all decoding paths stored in a lattice, calculating the average value, using the average value as an index LATTICE DEPTH for judging the accuracy, and carrying out sorting comparison according to a plurality of indexes LATTICE DEPTH obtained when a plurality of audio data in a data set are decoded through voice recognition to judge the accuracy, wherein the average value is used as an index for judging the accuracy, specifically comprises the steps of sorting indexes LATTICE DEPTH corresponding to the audio data respectively and comparing the indexes with a preset threshold, and if the sorting of the indexes LATTICE DEPTH is lower than the preset threshold, indicating that the accuracy of the decoding result of the corresponding audio data is higher;
CN202111109966.5A2021-09-172021-09-17 Training data screening method, system, device and mediumActiveCN113901992B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111109966.5ACN113901992B (en)2021-09-172021-09-17 Training data screening method, system, device and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111109966.5ACN113901992B (en)2021-09-172021-09-17 Training data screening method, system, device and medium

Publications (2)

Publication NumberPublication Date
CN113901992A CN113901992A (en)2022-01-07
CN113901992Btrue CN113901992B (en)2025-02-25

Family

ID=79028780

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111109966.5AActiveCN113901992B (en)2021-09-172021-09-17 Training data screening method, system, device and medium

Country Status (1)

CountryLink
CN (1)CN113901992B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114399995B (en)*2022-01-202025-07-15腾讯科技(深圳)有限公司 Speech model training method, device, equipment and computer-readable storage medium
CN116705001B (en)*2023-05-042024-06-14内蒙古工业大学Mongolian voice data selection method and system
CN118447828B (en)*2024-07-082024-09-20上海弋途科技有限公司 Vehicle-mounted human-computer interaction model optimization method and system based on voice data reflow
CN118711573B (en)*2024-07-192025-08-29摩尔线程智能科技(北京)股份有限公司 Speech recognition model training method, speech recognition method, device and storage medium
CN119993196B (en)*2025-02-112025-07-04北京云上曲率科技有限公司Voice training data acquisition method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108831439A (en)*2018-06-272018-11-16广州视源电子科技股份有限公司Voice recognition method, device, equipment and system
CN111883110A (en)*2020-07-302020-11-03上海携旅信息技术有限公司Acoustic model training method, system, device and medium for speech recognition

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112802461B (en)*2020-12-302023-10-24深圳追一科技有限公司Speech recognition method and device, server and computer readable storage medium
CN113066480B (en)*2021-03-262023-02-17北京达佳互联信息技术有限公司Voice recognition method and device, electronic equipment and storage medium
CN113327597B (en)*2021-06-232023-08-22网易(杭州)网络有限公司Speech recognition method, medium, device and computing equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108831439A (en)*2018-06-272018-11-16广州视源电子科技股份有限公司Voice recognition method, device, equipment and system
CN111883110A (en)*2020-07-302020-11-03上海携旅信息技术有限公司Acoustic model training method, system, device and medium for speech recognition

Also Published As

Publication numberPublication date
CN113901992A (en)2022-01-07

Similar Documents

PublicationPublication DateTitle
CN113901992B (en) Training data screening method, system, device and medium
CN108170749B (en)Dialog method, device and computer readable medium based on artificial intelligence
CN108763510B (en)Intention recognition method, device, equipment and storage medium
CN1667699B (en)Generating large units of graphonemes with mutual information criterion for letter to sound conversion
JP5901001B1 (en) Method and device for acoustic language model training
CN108710704B (en) Method, device, electronic device and storage medium for determining dialog state
CN108052499B (en)Text error correction method and device based on artificial intelligence and computer readable medium
CN112711948A (en)Named entity recognition method and device for Chinese sentences
CN112883193A (en)Training method, device and equipment of text classification model and readable medium
CN1571013A (en)Method and device for predicting word error rate from text
CN114091595B (en) Sample processing method, device and computer readable storage medium
CN115293139A (en)Training method of voice transcription text error correction model and computer equipment
CN110751234B (en)OCR (optical character recognition) error correction method, device and equipment
CN113191133B (en) A Doc2Vec-based audio-text alignment method and system
CN115274086B (en)Intelligent diagnosis guiding method and system
CN112908359A (en)Voice evaluation method and device, electronic equipment and computer readable medium
CN115563959A (en) Self-supervised pre-training method, system and medium for Chinese Pinyin spelling error correction
JP2006113570A (en)Hidden conditional random field model for phonetic classification and speech recognition
CN113378569A (en)Model generation method, entity identification method, model generation device, entity identification device, electronic equipment and storage medium
Luo et al.Loss prediction: End-to-end active learning approach for speech recognition
CN118378623A (en) A global vision-guided image description generation method based on a cross-modal large model
CN116561592A (en)Training method of text emotion recognition model, text emotion recognition method and device
CN114091449A (en) A Chinese word segmentation method and Chinese word segmentation device in the medical field
CN119312819A (en) A method, device and storage medium for translating entries
CN113627563A (en)Label labeling method, device and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right

Effective date of registration:20230710

Address after:6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085

Applicant after:Beijing Baige Feichi Technology Co.,Ltd.

Address before:100085 4002, 4th floor, No.1 Kaifa Road, Shangdi Information Industry base, Haidian District, Beijing

Applicant before:ZUOYEBANG EDUCATION TECHNOLOGY (BEIJING) CO.,LTD.

TA01Transfer of patent application right
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp