Detailed Description
In describing particular embodiments, specific details of construction, performance, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by those skilled in the art. It is not excluded that one skilled in the art may implement the present invention in a particular case in a solution that does not include the structures, properties, effects, or other characteristics described above.
The flow diagrams in the figures are merely exemplary flow illustrations and do not represent that all of the elements, operations, and steps in the flow diagrams must be included in the aspects of the invention, nor that the steps must be performed in the order shown in the figures. For example, some operations/steps in the flowcharts may be decomposed, some operations/steps may be combined or partially combined, etc., and the order of execution shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit means and/or microcontroller means.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or," "and/or" is meant to include all combinations of any one or more of the items listed.
Some technical terms that may be involved in the related content of the present invention are described below:
ASR speech recognition (Automatic Speech Recognition, ASR) speech recognition is a interdisciplinary sub-field of computer science and computational linguistics that has developed methods and techniques to enable computers to recognize and translate spoken language into text.
Kaldi A speech recognition open source kit originating from the summer seminar of John Hopkins university in 2009, kaldi is one of the most popular speech technology kits in recent years.
The FA-finite automata (Finite Automata, FA) is composed of a finite set of states and state transitions, each transition having at least one tag. The most basic FA is the finite state receiver (FINITE STATE Acceptor, FSA). For a given input sequence, the FSA returns to either a "receive" or "no receive" state.
FST a finite State transducer (FINITE STATE Transducers, FST), which is an extension of FSA, has one output tag per state transition, called an input-output tag pair, by which the FST can describe a regular set of transitions or a set of symbol sequences to another set of symbol sequences.
Lattice, a form of Finite State Transducer (FST), whose inputs and outputs can be label (usually transition id and word) of any FST, weights include acoustic models, language models, and conversion weights. WFST, in turn, identifies a network of states in searching for paths based on the decoding map or decoding formed by FST, and searches for paths (e.g., sentences, words, etc.) that best or most closely match the sound.
CER is word error rate (CHARACTER ERROR RATE, CER), recognition result of audio, edit distance of real label relative to audio, and percentage value obtained by dividing the real label word number.
The scaling factor of the log probability score of the acoustic model. In Kaldi's speech recognition, the audio may be jointly recognized by an acoustic model and a language model to ensure that the output of audio sounds more closely conforms to the habits and logic of human speaking. The scaling factor is a coefficient that scales the probabilistic output of the acoustic model in order to balance the acoustic and language models.
LM rescore language model re-scoring. After obtaining a speech recognition model, if it is desired to make the model better able to recognize data in a certain domain, or to integrate advantages of different language models (such as NGRAM +rnnlm) to enhance model effect, the recognition result is often scored secondarily by using the data in different domains or the language models obtained by training different model structures, and the result of the secondary scoring is used as a final result.
The method mainly comprises the steps of achieving training data acquisition or screening based on a pseudo tag accuracy screening mode of semi-supervised learning, wherein the implementation flow is that S1, the average value of the number of nodes which can be linked by each node of each decoding path in all decoding paths in voice recognition during decoding is used as an index for judging the accuracy, and S2, the pseudo tag is marked for audio data corresponding to decoding results with high accuracy, so that the audio data with the pseudo tag is used as selected training data.
[ Example 1]
In order to make the objects, technical solutions and advantages of the present invention more apparent, the implementation of the method of the present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings.
The main step flow diagram of one embodiment of the method of the present invention as shown in fig. 1 will be described herein. The method is mainly based on a semi-supervised learning mode, a few of models for speech recognition trained by manually labeling tags or a few of models for speech recognition with text tags arranged in an initializing mode are used for directly inputting a plurality of unlabeled audio data in a data set into the models for speech recognition, and a plurality of corresponding decoding results are output, so that when one or more decoding results with high accuracy are screened out, the unlabeled audio data corresponding to the decoding results are directly labeled by the text tags in the decoding results. The main flow comprises the following steps:
Step S110, performing speech recognition on the audio data based on the model of the semi-supervised learning training speech recognition.
In one embodiment, the pseudo tag accuracy of the training data is used in model training based on speech recognition in a semi-supervised learning mode. In particular, models of speech recognition are constructed, including acoustic models and language models, such as speech recognition tools implemented in kaldi kits.
Further, models of speech recognition in different application scenarios often require training data in a specific field to achieve the best results. For example, in the bullet screen voice recognition and quality inspection voice transcription task, the models are required to be subjected to targeted enhancement training by respectively utilizing the data of the respective scene reflux so as to adapt to the requirements under different scenes. One embodiment of the invention based on the class supervised learning training speech recognition model can adopt the method that after a very small amount of audio data of manually marked text labels are input to train the speech recognition model, the corresponding text labels can be output when the decoding result/recognition result is recognized to be output, so that the text labels in the result are directly marked on the audio data corresponding to the screened decoding result with high accuracy as the training data for backflow to retrain the model. Or, text labels can be set for the voice recognition model in advance, for example, according to the characteristics of different scenes such as voice barrage recognition, quality inspection voice transcription, voice search question recognition and the like in actual application scenes, unlabeled audio data are directly input into the model to evaluate accuracy, the data are screened, and the corresponding text labels predicted in decoding results are utilized for labeling.
Therefore, when the labeling data used by the speech recognition model is insufficient in the intelligent speech recognition of the AI teaching system under the K12 scene, the existing transcription data (audio data corresponding to the text label) can be effectively utilized (screened) to perform semi-supervised training, training data are obtained to train the model, and the recognition performance, effect and quality of the model are improved. In order to avoid the influence of noise, the existing transfer data also needs to meet certain requirements for quality, so that the transfer data can be used for expanding training data, namely high-quality data need to be screened out, the transfer accuracy of the screened data is ensured to be high enough, the noise is small enough, and training by using the data is really helpful for improving the performance and effect of the model.
Step S120, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths during the voice recognition is used as an index for judging the accuracy.
In one embodiment, the voice recognition model of kaldi tool kit can be utilized to perform voice recognition on the input unlabeled audio data, perform beam search decoding according to the preset beam width beamsize in the voice recognition process, and store all decoding paths which are searched during decoding and are expected to exist.
Further, all the decoding paths which are searched during decoding and expected to exist are stored, and the method specifically comprises the steps of searching the decoding paths from the state network WFST during decoding to obtain a decoding path diagram, storing all the decoding paths searched in the decoding path diagram by utilizing lattice, and outputting the optimal decoding paths as decoding results corresponding to the audio data.
Further, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths is used as an index for judging the accuracy, and specifically includes the steps of acquiring the number of paths, which each node can link backward, in each decoding path from all decoding paths stored in lattice, calculating the average value, using the average value as an index LATTICE DEPTH for judging the accuracy, and carrying out sorting comparison according to a plurality of indexes LATTICE DEPTH obtained by voice recognition on a plurality of audio data in a data set during decoding to judge the accuracy.
Further, the method comprises the steps of sorting and comparing the indexes LATTICE DEPTH corresponding to the audio data respectively with a preset threshold value according to a plurality of indexes LATTICE DEPTH obtained in the decoding process of the audio data in the dataset through voice recognition to judge the accuracy, and if the sorting of the indexes LATTICE DEPTH is lower than the preset threshold value, the higher the accuracy of the decoding result of the corresponding audio data is indicated.
Specifically, a voice recognition tool implemented by using kaldi toolkit, such as an HMM model, constructs a WFST state network, that is, a state network based on FST and lattice, and decodes the WFST state network by searching a path which is the best match with a sound to be recognized from the state network, for example, a beam search (searching for a globally optimal path) can be performed according to a given beam width beam size during decoding, so as to obtain a corresponding decoding graph, which has a plurality of possible decoding paths, and find the optimal, that is, the best path matching as a decoding result, and at this time, all possible decoding paths can be stored by using lattice, which is equivalent to storing the decoding graph. The optimal path is the decoding result, and the decoding result of the model, namely the recognition result of the audio, can be output.
And taking the average value of the number of nodes possibly linked by each node in the decoding path as a sorting index or a judging index LATTICE DEPTH for evaluating the accuracy of the pseudo tag. The calculation mean value only averages the number of nodes possibly linked or the number of paths possibly unfolded, the algorithm is simple, high in efficiency and low in calculation resource consumption, and calculation processing which does not introduce any interference and uncertainty is directly obtained in the decoding process.
In performing recognition and training based on semi-supervised learning of a speech recognition model, a data set such as a plurality of unlabeled audio inputs are used to decode into the model to obtain a decoded result and a plurality of said indicators LATTICEDEPTH corresponding to the plurality of audio inputs, the plurality of indicators LATTICEDEPTH are ranked from high to low or low to high, preferably low to high, and a suitable threshold is preset, such as a threshold of 5 in a barrage speech recognition scenario. And then all the audios with LATTICEDEPTH values lower than the threshold value 5 are recovered, the audios are high-quality audios, and the corresponding indexes LATTICEDEPTH lower than the threshold value 5 are decoding results with high accuracy.
Therefore, LATTICE DEPTH is utilized as an accuracy ordering index of an audio decoding result, any other operation steps or parameter processing are not required to be introduced, such as no introduction of an external-scale scaling factor, LM rescore re-scoring, no confidence score calculation, coarse sentence level confidence calculation, no interference of various decoding strategies and the like, the accuracy ordering index is not influenced by external-scale and LM rescore, and the accuracy ordering index is more stable than indexes such as MBR, so that partial audio with accurate decoding can be found out from all audio more accurately. In addition, in speech recognition, for the audio with small noise and easy recognition, too many decoding paths are not easy to generate during decoding, so that if LATTICEDEPTH of the decoding recognition audio is small, the quality of the audio can be indicated to be high, namely the audio A is recognized as a language A to be very accurate, the noise is small, and thus the model is not easy to be negatively influenced. When the index LATTICEDEPTH is actually checked, it is very discriminated for the recognized sentences. Further, when the index LATTICE DEPTH is obtained, the calculation consumed computational power resource is very small, namely, the redundant resource consumption can be effectively reduced, and the computational power burden of computer decoding is not increased.
And step S130, labeling the pseudo tag for the audio data corresponding to the decoding result with high accuracy, and taking the audio data with the pseudo tag as the selected training data.
In one embodiment, unlabeled audio data corresponding to a decoding result with high accuracy is obtained, and pseudo tag labeling is performed on the audio data according to text tags in the decoding result output by voice recognition.
In one embodiment, the audio data with the pseudo tag is used as the selected training data, for example, unlabeled audio data corresponding to all decoding results with high accuracy are screened out, after the pseudo tag labeling is carried out on the audio data, the audio data with the pseudo tag is added into a model training set of voice recognition, and a model of the voice recognition is retrained.
Specifically, after the audio is decoded by the speech recognition model, a decoding result is output, for example, recognized text data is given, the text label/text data in the decoding result with high accuracy is evaluated as a pseudo label of the corresponding audio directly by the index LATTICEDEPTH, namely, the model recognition result is a pseudo label of the high-quality audio, and then the reflowed audio data (namely, the audio marked with the pseudo label) is added into the marked data to be used as training data in a training set to retrain the model. Therefore, the ASR semi-supervised learning reflux corpus screening based on LATTICE DEPTH can screen out proper training data, and a large amount of data training is continuously provided for improving the recognition performance effect quality of the model.
Therefore, the pseudo tag is directly provided, the realization is simple and easy, and the screened training data can also ensure high enough transfer accuracy and small noise. In addition, the data selected by LATTICE DEPTH can also be used for enhancing the training of models in the scenes of speech synthesis, noise elimination and the like, and the index LATTICE DEPTH is more widely used for indicating how much mode recognition is possible, namely, the degree of data classification is determined, and in other artificial intelligence fields (such as computer vision image classification), other similar indexes can be designed by utilizing the thought and used for screening high-quality data to train and enhance the performance effect of the existing models, so that the index can be more widely applied in the future.
[ Example 2]
For the purposes of promoting an understanding of the principles and technical aspects of the invention, reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings.
The main structural block diagram of one embodiment of the method of the present invention will be described herein with reference to fig. 2. The system is also mainly based on a semi-supervised learning mode, a voice recognition model trained by a small amount of manual labeling labels or a voice recognition model provided with text labels by initialization, a plurality of unlabeled audio data in a data set are directly input into the model for voice recognition, a plurality of corresponding decoding results are output, and when one or more decoding results with high accuracy are screened out, the unlabeled audio data corresponding to the decoding results are directly labeled by the text labels in the decoding results. This embodiment mainly includes:
The training recognition module B210 trains a model of speech recognition based on semi-supervised learning, and performs speech recognition on the audio data.
In one embodiment, the pseudo tag accuracy of the training data is used in model training based on speech recognition in a semi-supervised learning mode. In particular, models of speech recognition are constructed, including acoustic models and language models, such as speech recognition tools implemented in kaldi kits.
Further, models of speech recognition in different application scenarios often require training data in a specific field to achieve the best results. For example, in the bullet screen voice recognition and quality inspection voice transcription task, the models are required to be subjected to targeted enhancement training by respectively utilizing the data of the respective scene reflux so as to adapt to the requirements under different scenes. One embodiment of the invention based on the class supervised learning training speech recognition model can adopt the method that after a very small amount of audio data of manually marked text labels are input to train the speech recognition model, the corresponding text labels can be output when the decoding result/recognition result is recognized to be output, so that the text labels in the result are directly marked on the audio data corresponding to the screened decoding result with high accuracy as the training data for backflow to retrain the model. Or, text labels can be set for the voice recognition model in advance, for example, according to the characteristics of different scenes such as voice barrage recognition, quality inspection voice transcription, voice search question recognition and the like in actual application scenes, unlabeled audio data are directly input into the model to evaluate accuracy, the data are screened, and the corresponding text labels predicted in decoding results are utilized for labeling.
Therefore, when the labeling data used by the speech recognition model is insufficient in the intelligent speech recognition of the AI teaching system under the K12 scene, the existing transcription data (audio data corresponding to the text label) can be effectively utilized (screened) to perform semi-supervised training, training data are obtained to train the model, and the recognition performance, effect and quality of the model are improved. In order to avoid the influence of noise, the existing transfer data also needs to meet certain requirements for quality, so that the transfer data can be used for expanding training data, namely high-quality data need to be screened out, the transfer accuracy of the screened data is ensured to be high enough, the noise is small enough, and training by using the data is really helpful for improving the performance and effect of the model.
The index module B220 is arranged, and the average value of the number of nodes, which are expected to be linked by each node of each decoding path in all decoding paths during the voice recognition in decoding is used as an index for judging the accuracy.
In one embodiment, the voice recognition model of kaldi tool kit can be utilized to perform voice recognition on the input unlabeled audio data, perform beam search decoding according to the preset beam width beamsize in the voice recognition process, and store all decoding paths which are searched during decoding and are expected to exist.
Further, all the decoding paths which are searched during decoding and expected to exist are stored, and the method specifically comprises the steps of searching the decoding paths from the state network WFST during decoding to obtain a decoding path diagram, storing all the decoding paths searched in the decoding path diagram by utilizing lattice, and outputting the optimal decoding paths as decoding results corresponding to the audio data.
Further, the average value of the number of nodes, which each node of each decoding path is expected to link, in all decoding paths is used as an index for judging the accuracy, and specifically includes the steps of acquiring the number of paths, which each node can link backward, in each decoding path from all decoding paths stored in lattice, calculating the average value, using the average value as an index LATTICE DEPTH for judging the accuracy, and carrying out sorting comparison according to a plurality of indexes LATTICE DEPTH obtained by voice recognition on a plurality of audio data in a data set during decoding to judge the accuracy.
Further, the method comprises the steps of sorting and comparing the indexes LATTICE DEPTH corresponding to the audio data respectively with a preset threshold value according to a plurality of indexes LATTICE DEPTH obtained in the decoding process of the audio data in the dataset through voice recognition to judge the accuracy, and if the sorting of the indexes LATTICE DEPTH is lower than the preset threshold value, the higher the accuracy of the decoding result of the corresponding audio data is indicated.
Specifically, a voice recognition tool implemented by using kaldi toolkit, such as an HMM model, constructs a WFST state network, that is, a state network based on FST and lattice, and decodes the WFST state network by searching a path which is the best match with a sound to be recognized from the state network, for example, a beam search (searching for a globally optimal path) can be performed according to a given beam width beam size during decoding, so as to obtain a corresponding decoding graph, which has a plurality of possible decoding paths, and find the optimal, that is, the best path matching as a decoding result, and at this time, all possible decoding paths can be stored by using lattice, which is equivalent to storing the decoding graph. The optimal path is the decoding result, and the decoding result of the model, namely the recognition result of the audio, can be output.
And taking the average value of the number of nodes possibly linked by each node in the decoding path as a sorting index or a judging index LATTICE DEPTH for evaluating the accuracy of the pseudo tag. The calculation mean value only averages the number of nodes possibly linked or the number of paths possibly unfolded, the algorithm is simple, high in efficiency and low in calculation resource consumption, and calculation processing which does not introduce any interference and uncertainty is directly obtained in the decoding process.
In performing recognition and training based on semi-supervised learning of a speech recognition model, a data set such as a plurality of unlabeled audio inputs are used to decode into the model to obtain a decoded result and a plurality of said indicators LATTICE DEPTH corresponding to the plurality of audio inputs, the plurality of indicators LATTICE DEPTH are ranked from high to low or low to high, preferably low to high, and a suitable threshold is preset, such as a threshold of 5 in a barrage speech recognition scenario. And then all the audios with LATTICE DEPTH values lower than the threshold value 5 are recovered, the audios are high-quality audios, and the indexes LATTICE DEPTH lower than the threshold value 5 correspond to decoding results with high accuracy.
Therefore, LATTICE DEPTH is utilized as an accuracy ordering index of an audio decoding result, any other operation steps or parameter processing are not required to be introduced, such as no introduction of an external-scale scaling factor, LM rescore re-scoring, no confidence score calculation, coarse sentence level confidence calculation, no interference of various decoding strategies and the like, the accuracy ordering index is not influenced by external-scale and LM rescore, and the accuracy ordering index is more stable than indexes such as MBR, so that partial audio with accurate decoding can be found out from all audio more accurately. In addition, in speech recognition, for the audio with small noise and easy recognition, too many decoding paths are not easy to generate during decoding, so that if LATTICE DEPTH of the decoding recognition audio is small, the quality of the audio can be indicated to be high, namely the audio A is recognized as a language A to be very accurate, the noise is small, and thus the model is not easy to be negatively influenced. When the index LATTICE DEPTH is actually checked, it is very discriminated for the recognized sentences. Further, when the index LATTICE DEPTH is obtained, the calculation consumed computational power resource is very small, namely, the redundant resource consumption can be effectively reduced, and the computational power burden of computer decoding is not increased.
And the screening data module B230 marks the pseudo tag for the audio data corresponding to the decoding result with high accuracy, and takes the audio data with the pseudo tag as the selected training data.
In one embodiment, unlabeled audio data corresponding to a decoding result with high accuracy is obtained, and pseudo tag labeling is performed on the audio data according to text tags in the decoding result output by voice recognition.
In one embodiment, the audio data with the pseudo tag is used as the selected training data, for example, unlabeled audio data corresponding to all decoding results with high accuracy are screened out, after the pseudo tag labeling is carried out on the audio data, the audio data with the pseudo tag is added into a model training set of voice recognition, and a model of the voice recognition is retrained.
Specifically, after the audio is decoded by the speech recognition model, a decoding result is output, for example, recognized text data is given, the text label/text data in the decoding result with high accuracy is evaluated as a pseudo label of the corresponding audio directly by the index LATTICE DEPTH, namely, the model recognition result is a pseudo label of the high-quality audio, and then the reflowed audio data (namely, the audio marked with the pseudo label) is added into the marked data to be used as training data in a training set to retrain the model. Therefore, the ASR semi-supervised learning reflux corpus screening based on LATTICE DEPTH can screen out proper training data, and a large amount of data training is continuously provided for improving the recognition performance effect quality of the model.
Therefore, the pseudo tag is directly provided, the realization is simple and easy, and the screened training data can also ensure high enough transfer accuracy and small noise. In addition, the data selected by LATTICE DEPTH can also be used for enhancing the training of models in the scenes of speech synthesis, noise elimination and the like, and the index LATTICE DEPTH is more widely used for indicating how much mode recognition is possible, namely, the degree of data classification is determined, and in other artificial intelligence fields (such as computer vision image classification), other similar indexes can be designed by utilizing the thought and used for screening high-quality data to train and enhance the performance effect of the existing models, so that the index can be more widely applied in the future.
[ Example 3]
The implementation of the present invention is further described below with reference to embodiments 1 and 2 in an overall application scenario:
speech recognition is achieved using Kaldi kits. The model of speech recognition in Kaldi of the configuration is trained based on semi-supervised learning. The training can be directly carried out by adopting unlabeled audio, and the training is combined with the modes of autonomous labeling, screening of training data (audio) and retraining, so that the labeled training data is obtained in a simple and feasible mode with high efficiency, low cost and low resource consumption, and the model is extendably adapted to models in various different scenes and training models so as to improve the model performance and enhance the recognition effect and quality of the model.
The voice recognition model of Kaldi tool kit can adopt sound models such as HMM and the like to combine with language models to recognize input audio, and the process includes recognizing states and state combinations of the input audio after dividing the input audio into frames to form phonemes and phonemic synthesized words. Whereas the HMM of Kaldi supports FST, lattice, so that in decoding the recognition state by HMM, a state network WFST is constructed from FST, and in accordance with a given beam width beamsize, the best path is searched in WFST by beam search, i.e. the path (decoding path) that best matches the sound is found. During the search, all possible decoding paths are stored by lattice (see explanation of lattice and FST relationships previously described). The number of nodes possibly linked by each node in the decoding path or the number of paths possibly connected are obtained, and the average value is calculated as an accuracy or precision index LATTICEDEPTH (lattice depth) of the decoding result. For example, node a links 1 node in path 1, node B links 2 nodes, node C links 1 node, node a links 1 node in path 2, node C links 2 nodes, node D links 1 node, and the mean value is 8/4=2, i.e., the index has a value of 2 (this is a conceptual illustration only and is not an actual calculation).
The smaller the value of the index is, the higher the accuracy is, and the higher the quality of the corresponding audio serving as training data is, namely the identification accuracy of the audio can better meet the actual requirements of the current application scene.
The LATTICEDEPTH indexes of the decoding results of the input unlabeled multiple audios are ranked from small to large, and compared with a preset threshold value, such as a bullet screen voice recognition setting threshold value of 5. The threshold value setting also meets the requirements of the practical application field, so that one index can be used in more different scenes needing voice recognition.
Assuming that LATTICEDEPTH index values after A, B, C three audio inputs are 6, 5.5 and 4 in sequence, only index of audio C in 4, 5.5 and 6 is lower than threshold value 5. Further, the text labels in the decoding result of the C may be directly given to the C as labels, i.e. the C is screened out, and the labeled text labels are added to the training data set, and when the model based on semi-supervised learning (i.e. the model of speech recognition in the configuration Kaldi) is trained, model training is performed by using the new training data C continuously.
[ Example 4]
Fig. 3 is a block schematic diagram of a structure of an electronic device according to an embodiment of the invention, comprising a processor and a memory for storing a computer executable program, which processor performs the embodiment steps of the method as referred to in the previous embodiments 1,3, when the computer program is executed by the processor.
As shown in fig. 3, the electronic apparatus is in the form of a general purpose computing device. The processor may be one or a plurality of processors and work cooperatively. The invention does not exclude that the distributed processing is performed, i.e. the processor may be distributed among different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executable by the processor to enable an electronic device to perform the method of the present invention, or at least some of the steps of the method.
The memory includes volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may be non-volatile memory, such as Read Only Memory (ROM).
Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for exchanging data between the electronic device and an external device. The I/O interface may be a bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
More specifically, referring to a block diagram of a more specific example of the electronic apparatus according to the embodiment shown in fig. 4. The electronic apparatus 200 of this exemplary embodiment is in the form of a general-purpose data processing device. The components of the electronic device 200 may include, but are not limited to, at least one processing unit 210, at least one memory unit 220, a bus 230 connecting the different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.
The storage unit 220 stores therein a computer readable program, which may be a source program or code of a program that is read only. The program may be executed by the processing unit 210 such that the processing unit 210 performs the steps of various embodiments of the present invention. For example, the processing unit 210 may perform the respective steps of the methods of the foregoing embodiments 2 to 5.
The memory unit 220 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 2201 and/or cache memory 2202, and may further include Read Only Memory (ROM) 2203. The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 230 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic apparatus 200 may also be in communication with one or more external devices 300 (e.g., a keyboard, a display, a network device, a bluetooth device, etc.), such that a user can interact with the electronic apparatus 200 via the external devices 300, and/or such that the electronic apparatus 200 can communicate with one or more other data processing devices (e.g., a router, a modem, etc.). Such communication may occur through an input/output (I/O) interface 250, and may also occur through a network adapter 260 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet. Network adapter 260 may communicate with other modules of electronic device 200 via bus 230. It should be appreciated that although not shown, other hardware and/or software modules may be used in the electronic apparatus 200, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be understood that the electronic device shown in fig. 3 and 4 is only one example of the present invention, and the electronic device of the present invention may further include elements or components not shown in the above examples. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a man-machine interaction element such as a button, a keyboard, and the like. The electronic device may be considered as covered by the invention as long as the electronic device is capable of executing a computer readable program in a memory for carrying out the method or at least part of the steps of the method.
[ Example 5]
Fig. 5 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in fig. 5, a computer-readable recording medium stores a computer-executable program that, when executed, implements the topic-based coaching policy providing method described above. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
From the above description of embodiments, those skilled in the art will readily appreciate that the present invention may be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, as well as electronic processing units, servers, clients, handsets, control units, processors, etc. included in the system, as well as by a vehicle comprising at least a portion of the above system or component. The invention may also be implemented by computer software executing the method of the invention, for example by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. on the locomotive side. It should be noted that the computer software for performing the method according to the present invention is not limited to be executed by one or a specific hardware entity, but may be implemented in a distributed manner by unspecified specific hardware, for example, some method steps executed by a computer program may be executed at the locomotive end, and another part may be executed in a mobile terminal or a smart helmet, etc. For computer software, the software product may be stored on a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), or may be stored distributed over a network, as long as it enables the electronic device to perform the method according to the invention.
From the above description of embodiments, those skilled in the art will readily appreciate that the exemplary embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware.
The foregoing description of the specific embodiments provides further details of the objects, aspects and advantages of the present invention, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic device, and that various general purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.