Disclosure of Invention
In order to distinguish different sounds in a class, the application provides a method, a device, equipment and a storage medium for distinguishing different sounds in the class.
In a first aspect, the present application provides a training method for a sound classification model, which adopts the following technical scheme:
a method of training a sound classification model, comprising:
Converting each training sample in a training sample set into a first spectrogram, wherein the first spectrogram is a two-dimensional Mel spectrogram;
and inputting the Mel spectrogram into the voice classification model, and training the voice classification model by utilizing a VGG11 network structure.
By adopting the technical scheme, the two-dimensional Mel spectrogram is only used for voice recognition and voiceprint recognition at present, the two-dimensional Mel spectrogram is applied to voice classification, voice recognition refers to recognizing voice content, voiceprint recognition refers to recognizing who speaks, voice classification refers to distinguishing specific categories of voice regardless of voice content and regardless of voice speaker, and at present, the network structure suitable for training a voice classification model is few or even none, so that the VGG11 network structure in image recognition is used for training the voice classification model, and meanwhile, a large number of experiments prove that the two-dimensional Mel spectrogram is very suitable for being combined with the VGG11 network structure to be used as a training sample of the voice classification model to train the voice classification model, and the voice processing and the image recognition are combined based on the Mel spectrogram and the VGG11 network structure to well train the voice classification model, so that the accurate voice classification model can be obtained.
In a second aspect, the present application provides a method for distinguishing different sounds in a class, which adopts the following technical scheme:
a method of distinguishing between different sounds in a class, comprising:
Collecting classroom sound, and inputting the classroom sound into a trained voiceprint model to obtain voiceprint vectors of a plurality of sections of sound fragments;
Judging whether a sound fragment corresponding to the voiceprint vector is non-teacher sound or not according to the voiceprint vector;
If yes, extracting features of the sound fragments corresponding to the voiceprint vectors to obtain a second spectrogram, inputting the second spectrogram into a trained sound classification model, and classifying the sound fragments corresponding to the voiceprint vectors according to the voiceprint vectors, wherein the second spectrogram is a two-dimensional Mel spectrogram;
The training method of the sound classification model comprises the following steps:
Converting each training sample in a training sample set into a first spectrogram, wherein the first spectrogram is a two-dimensional Mel spectrogram;
And inputting the first spectrogram into the sound classification model, and training the sound classification model by utilizing a VGG11 network structure.
By adopting the technical scheme, in a classroom scene, the teacher basically has only 1 person, the sound of the teacher is single, the students have multiple persons, the sound is complex, the teacher usually moves in the range of a lecture table, all students are concentrated at the middle and rear parts of the classroom, the sound of the teacher lecture is generally in the classroom, and the number of the students and the teacher speak together is small, so that the collected classroom sound is better distinguished from the sound of the teacher and the non-teacher. Therefore, it is first to identify whether the sound clip is a non-teacher sound, and the teacher sound is not further classified, and only the non-teacher sound is further classified, so that the sound of a single student, the sound of a plurality of students reading in parallel, the sound in question, the noise, and the like are distinguished.
Preferably, the training method of the voiceprint model comprises the following steps:
Acquiring an open-source sound data set, preparing pre-acquired classroom sound into a classroom sound data set, and taking the open-source sound data set and the classroom sound data set together as a sample set;
And inputting the samples in the sample set into the voiceprint model, and training the voiceprint model by using a deep learning algorithm.
According to the technical scheme, the voiceprint model is obtained by training only an open-source sound data set, most of the open-source sound data set is acquired from near-field recording and video sounds on a video website, most of sounds in a classroom environment are acquired from hanging microphones on a ceiling and belong to far-field sounds, so that the problems of cross-domain acquisition environment and use environment exist, the traditional voiceprint model is slightly poor in performance when applied to the classroom environment, the acquisition cost of the sound data set is high, the acquisition standard is not uniform, the class sound data set made of a large number of classroom sounds is added to the sample set, the used voiceprint model is trained based on the sample set, the voiceprint model is suitable for being used in the classroom environment, and the accuracy of outputting voiceprint vectors of the voiceprint model is improved.
Preferably, the collecting classroom sound, inputting the classroom sound into a trained voiceprint model to obtain voiceprint vectors of a plurality of segments of sound clips, includes:
Dividing the classroom sound into a plurality of sound clips;
and respectively carrying out voiceprint extraction on the plurality of sections of sound fragments to obtain the voiceprint vector.
Preferably, the dividing the classroom sound into a plurality of sound clips includes:
Dividing the classroom sound into a plurality of segments, wherein a shared part and a non-shared part are arranged between adjacent segments;
respectively calculating the matching degree of voiceprint characteristics of the shared part and the non-shared part of the adjacent segment;
acquiring a switching point based on the voiceprint feature matching degree;
And dividing the classroom sound into a plurality of sound clips according to the switching point.
By adopting the technical scheme, the switching point is detected based on the voiceprint feature matching degree, the classroom sound is divided into a plurality of sections of sound fragments, each section of sound fragment is the same type of sound, for example, one section of sound fragment is teacher sound, and the other section of sound fragment is noise, so that the classification of each section of sound fragment in the later stage is facilitated.
Preferably, the determining, according to the voiceprint vector, whether the sound clip corresponding to the voiceprint vector is a non-teacher sound includes:
And adopting a BIRTCH clustering algorithm and a Calinski-Harabaz index combination method to perform voiceprint clustering based on the voiceprint vector, and judging whether the sound fragment corresponding to the voiceprint vector is non-teacher sound.
By adopting the technical scheme, the sound print vector is clustered by utilizing BIRTCH clustering algorithm, and the clustering accuracy is improved by utilizing Calinski-Harabaz index to evaluate the quality of the clustering effect, so that the clustering result is more accurate.
Preferably, the method of adopting BIRTCH clustering algorithm and Calinski-Harabaz index combination, performing voiceprint clustering based on the voiceprint vector, and judging whether the sound clip corresponding to the voiceprint vector is non-teacher sound, includes:
clustering all voiceprint vectors by adopting BIRTCH clustering algorithm, and dividing all voiceprint vectors into a first class and a second class;
Performing secondary clustering on all voiceprint vectors in the first class and all voiceprint vectors in the second class by adopting BIRTCH clustering algorithm;
Respectively obtaining a first index and a second index, wherein the first index is Calinski-Harabaz index after performing secondary clustering on all voiceprint vectors in the first class, and the second index is Calinski-Harabaz index after performing secondary clustering on all voiceprint vectors in the second class;
judging whether the first index is larger than the second index;
if yes, judging the sound segment corresponding to the voiceprint vector in the first class as non-teacher sound;
if not, judging the sound segment corresponding to the voiceprint vector in the second class as non-teacher sound.
By adopting the technical scheme, under the classroom environment, teacher sounds and non-teacher sounds have certain differences, so that all the sound print vectors are clustered for the first time by utilizing a BIRTCH clustering algorithm, all the sound print vectors can be clustered into two types by the differences between the teacher sounds and the non-teacher sounds, but after the first clustering is finished, the sound segments corresponding to the sound print vectors in which type are non-teacher sounds, the sound segments corresponding to the sound print vectors in which type are teacher sounds are not clear, the sound segments corresponding to the sound print vectors in which type are teacher sounds are clustered for the second time respectively, the first indexes and the second indexes are obtained, calinski-Harabaz indexes can evaluate the clustering effect, and by the characteristics, the teacher sounds and the non-teacher sounds are distinguished, so that the clustering effect is poor if the teacher sounds are few, the sound segments corresponding to the sound print vectors in which type are teacher sounds, the sound segments corresponding to the sound print vectors in which type are Calinski-Harabaz are small are judged to be teacher sounds, and the sound segments corresponding to the non-teacher sounds are judged to be more than Harabaz, and the sound segments corresponding to the non-teacher sounds are judged to be more than 5338.
In a third aspect, the present application provides a device for distinguishing different sounds in a class, which adopts the following technical scheme:
A device for distinguishing different sounds in a class comprises,
The collection module is used for collecting classroom sound, inputting the classroom sound into the trained voiceprint model, and obtaining voiceprint vectors of the multi-section sound clips;
The judging module is used for judging whether the sound fragment corresponding to the voiceprint vector is non-teacher sound according to the voiceprint vector, if so, transferring to the classifying module, and
The classification module is used for extracting characteristics of sound fragments corresponding to the voiceprint vectors to obtain a second spectrogram, inputting the second spectrogram into a trained sound classification model, and classifying the sound fragments corresponding to the voiceprint vectors according to the voiceprint vectors, wherein the second spectrogram is a two-dimensional Mel spectrogram;
The training method of the sound classification model comprises the following steps:
Converting each training sample in a training sample set into a first spectrogram, wherein the first spectrogram is a two-dimensional Mel spectrogram;
And inputting the first spectrogram into the sound classification model, and training the sound classification model by utilizing a VGG11 network structure.
In a fourth aspect, the present application provides a computer device, which adopts the following technical scheme:
A computer device comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing the method of distinguishing between different sounds in a class according to any of the first aspects.
In a fifth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
A computer-readable storage medium storing a computer program capable of being loaded by a processor and executing the method of distinguishing between different sounds in a class according to any one of the first aspects.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
The embodiment provides a method for distinguishing different sounds in a class, as shown in fig. 1, the main flow of the method is described as follows (steps S101 to S103):
and S101, collecting classroom sound, and inputting the classroom sound into a trained voiceprint model to obtain voiceprint vectors of the multi-section sound clips.
The training method of the voiceprint model specifically comprises the following steps:
The sound data of the open source is acquired on the network, the sound data on the network is generally acquired from near-field recording and video sound on a video website, and the acquired sound data are consolidated into a sound data set. And acquiring sound data in the classroom in real time through audio acquisition equipment installed in the classroom, and sorting and merging the acquired sound data in the classroom into a classroom sound data set. The open source sound data set and the classroom sound data set are used as a sample set together.
And inputting the samples in the sample set into a voiceprint model, and training the voiceprint model by using a deep learning algorithm. The audio acquisition device can be a hanging microphone in a classroom.
In this embodiment, when a class starts, the audio collection device collects the classroom sound, inputs the classroom sound into the trained voiceprint model, and divides the classroom sound into a plurality of segments of sound clips by using the voiceprint model, and extracts voiceprints of the segments of sound clips to obtain voiceprint vectors of each segment of sound clip.
The method comprises the steps of dividing the classroom sound into a plurality of segments, wherein a shared part and a non-shared part are arranged between adjacent segments, respectively calculating the voiceprint feature matching degree of the shared part and the non-shared part of the adjacent segments, acquiring a switching point based on the voiceprint feature matching degree, and dividing the classroom sound into the plurality of segments according to the switching point.
The specific method for acquiring the switching point is that the voiceprint feature matching degree of the shared part and the non-shared part of the two adjacent fragments is compared, and the fragment with higher voiceprint feature matching degree with the shared part is judged to be the voice fragment of the same person speaking, so that the switching point is the coincidence point of the non-shared part and the shared part of the other fragment.
The above is illustrated by referring to fig. 2, in which the axis is the time axis, the collection of the classroom sound is started at the beginning of the classroom, and this is noted as 0 seconds on the axis. Every 2 seconds of classroom sound is taken as one segment, 1 second of sound in adjacent segments is repeated, and the repeated sound is taken as a shared part between the adjacent segments.
In FIG. 2, 4 fragments are shown, fragment A [0s,2s ], fragment B [1s,3s ], fragment C [2s,4s ] and fragment D [3s,5s ], respectively. Wherein, the segments B to C are 1s to 4s, the common part of the segments B and C is [2s,3s ], the non-common part of the segments B is [1s,2s ], and the non-common part of the segments C is [3s,4s ].
And respectively carrying out voiceprint extraction on the parts of [1s,2s ], [2s,3s ] and [3s,4s ], so as to obtain 3 voiceprint feature vectors. The method comprises the steps of calculating the voiceprint feature matching degree of the voiceprint feature vector of the [2s,3s ] part and the voiceprint feature vector of the [1s,2s ] part, taking the voiceprint feature matching degree as a first matching degree, calculating the voiceprint feature matching degree of the voiceprint feature vector of the [2s,3s ] part and the voiceprint feature vector of the [3s,4s ] part, and taking the voiceprint feature matching degree as a second matching degree.
Judging whether the first matching degree is larger than the second matching degree, if so, taking the 3 rd second as a switching point, and if not, taking the 2 nd second as a switching point.
It is noted that if the first matching degree is equal to the second matching degree, it is determined that the segment B and the segment C are the segments of the same person speaking, and there is no switching point from the segment B to the segment C. However, the first matching degree is less than the second matching degree, so that the class sound is cut into a plurality of sound fragments with a high probability that the class sound is considered to have a switching point.
Judging whether the classroom is over, if yes, turning to the following step S102.
The specific method for judging the end of the class comprises the steps of judging whether the class sound exists, if not, timing, judging whether the timing time is longer than the preset time, and if so, judging that the class is ended.
The application also provides another possible embodiment for dividing classroom sound into multiple segments of sound, specifically as follows:
the method comprises the steps of dividing classroom sound into a plurality of segments, determining adjacent segments with sound transformation, calculating the matching degree of voiceprint characteristics of the common segments and the non-common segments of the adjacent segments, acquiring switching points based on the matching degree of voiceprint characteristics, and dividing the classroom sound into a plurality of segments according to the switching points.
The specific method for determining the adjacent segments with sound transformation comprises the steps of for 4 continuous segments, sequentially arranging 3 groups of adjacent segments, respectively calculating the voiceprint feature matching degree between each group of adjacent segments, defining the voiceprint feature matching degree between the first group of adjacent segments as a first voiceprint feature matching degree, defining the voiceprint feature matching degree between the second group of adjacent segments as a second voiceprint feature matching degree, and defining the voiceprint feature matching degree between the third group of adjacent segments as a third voiceprint feature matching degree. Judging whether the difference of the second voice print characteristic matching degree minus the first voice print characteristic matching degree is smaller than a first preset value, and the difference of the third voice print characteristic matching degree minus the second voice print characteristic matching degree is larger than a second preset value, if so, judging that the second group of adjacent fragments are adjacent fragments with voice conversion. The method for obtaining the switching point in the adjacent segment where the sound transformation occurs later is consistent with the principle of the first embodiment for dividing the classroom sound into multiple segments of sound segments, which is not described herein.
For example, referring to fig. 2, segment a and segment B are a first set of adjacent segments, segment B and segment C are a second set of adjacent segments, and segment C and segment D are a third set of adjacent segments. And calculating a first voiceprint feature matching degree based on the voiceprint feature vectors of the segment A and the segment B, calculating a second voiceprint feature matching degree based on the voiceprint feature vectors of the segment B and the segment C, calculating a third voiceprint feature matching degree based on the voiceprint feature vectors of the segment C and the segment D, and judging the segment B and the segment C as adjacent segments with sound transformation through comparison of the first voiceprint feature matching degree, the second voiceprint feature matching degree and the third voiceprint feature matching degree.
It is noted that, the classroom sound may be obtained in real time during the course of a classroom, and the obtained classroom sound may be input into the voiceprint model for processing, or the classroom sound of a whole class may be input into the voiceprint model for processing after the end of the classroom.
Step S102, judging whether the sound clip corresponding to the sound print vector is non-teacher sound according to the sound print vector, if so, proceeding to step S103.
In this embodiment, a BIRTCH clustering algorithm and a Calinski-Harabaz exponential combination method are adopted to perform voiceprint clustering on all the voiceprint vectors obtained in step S101, and whether the sound clip corresponding to each voiceprint vector is a non-teacher sound is determined.
The effect of Calinski-Harabaz index is to measure the quality of the voiceprint clustering effect, and the larger the Calinski-Harabaz index is, the more compact each class is, the more scattered the classes are, and the better the clustering result is.
Specifically, a BIRTCH clustering algorithm is adopted to cluster all voiceprint vectors, all the voiceprint vectors are divided into a first class and a second class, a BIRTCH clustering algorithm is adopted to respectively cluster all the voiceprint vectors in the first class and all the voiceprint vectors in the second class, a first index and a second index are respectively obtained, wherein the first index is a Calinski-Harabaz index after the second clustering of all the voiceprint vectors in the first class, and the second index is a Calinski-Harabaz index after the second clustering of all the voiceprint vectors in the second class.
Judging whether the first index is larger than the second index, if so, judging that the sound segment corresponding to the voiceprint vector in the first class is non-teacher sound, and if not, judging that the sound segment corresponding to the voiceprint vector in the first class is teacher sound, and the sound segment corresponding to the voiceprint vector in the second class is non-teacher sound.
It is noted that since there is a relatively significant difference between teacher sound and non-teacher sound, there is substantially no case where the first index is equal to the second index.
The principle of BIRTCH clustering algorithm is to create a tree structure based on parameters, wherein, whether two voiceprint vectors are the same class or not is judged, and the sample interval threshold parameter used is voiceprint feature similarity.
The voice print feature similarity is preset, all voice print vectors are clustered into two types, namely a first type and a second type, through the voice print feature similarity, and at the moment, it is unclear which type of voice print vector corresponds to the voice segment which is a non-teacher voice. Therefore, the second clustering is performed on all the voiceprint vectors in the first class and all the voiceprint vectors in the second class respectively, a first index and a second index are obtained, sound fragments corresponding to the voiceprint vectors in the class with larger indexes are non-teacher sounds, and sound fragments corresponding to the voiceprint vectors in the class with smaller indexes are teacher sounds.
Wherein, the calculation formulas of the first index and the second index are universal, and the calculation formulas are specifically as follows:
Wherein S represents Calinski-Harabaz index, K represents clustering class number (the application needs to gather all voiceprint vectors into 2 classes, so the clustering class number is 2), N represents total sample number (the sample is voiceprint vector, the total sample number refers to the number of voiceprint vectors), SSB represents intra-class compactness, and SSW represents inter-class separation.
Wherein, the expression of the intra-class compactness is as follows:
The expression of the degree of separation between classes is as follows:
Wherein Ck represents the number of samples of the kth class, Xkc represents the eigenvalue of the kth sample of the kth class, Xkmean represents the average eigenvalue of the kth class, and Xmean represents the average eigenvalue of all samples.
The application also provides another possible embodiment, in particular as follows:
Presetting a plurality of voiceprint feature similarities, and clustering all voiceprint vectors for a plurality of times by adopting the plurality of voiceprint feature similarities to obtain a plurality of clustering results. The Calinski-Harabaz index for each cluster was calculated. And selecting a clustering result corresponding to the maximum Calinski-Harabaz index as an optimal clustering result, wherein two types in the optimal clustering result are a first type and a second type respectively.
For example, referring to fig. 3, four voiceprint feature similarities are preset through manual experience, and four voiceprint feature similarities are used to cluster all voiceprint vectors four times. The method comprises the steps of clustering all voiceprint vectors into 2 classes by adopting 0.35 voiceprint feature similarity, namely a class and b class respectively, wherein Calinski-Harabaz indexes of the clustering are first clustering indexes, clustering all voiceprint vectors into 2 classes by adopting 0.5 voiceprint feature similarity, namely c class and d class respectively, calinski-Harabaz indexes of the clustering are second clustering indexes, clustering all voiceprint vectors into 2 classes by adopting 0.65 voiceprint feature similarity, namely e class and f class respectively, wherein Calinski-Harabaz indexes of the clustering are third clustering indexes, clustering all voiceprint vectors into 2 classes by adopting 0.8 voiceprint feature similarity, namely g class and h class respectively, and Calinski-Harabaz indexes of the clustering are fourth clustering indexes. By comparison, the second class index is the largest, so that 0.5 voiceprint feature similarity is the best of four voiceprint feature similarities of 0.35, 0.5, 0.65 and 0.8, class c is taken as the first class, and class d is taken as the second class.
It should be noted that fig. 3 illustrates only one case of a method for determining non-teacher sound and teacher sound, and the numerical values and the calculation results are merely used for illustration, and do not limit the protection scope of the present application.
Through the above content, the first round of selection is performed on the voiceprint feature similarity, further, the second round of selection can also be performed on the voiceprint feature similarity, and even more rounds of selection are performed, so that better voiceprint feature similarity can be selected. For example, if the voiceprint feature similarity of 0.5 is selected in the first round of selection, the voiceprint feature similarities of 0.44, 0.47, 0.5, 0.53 and 0.56 are set for the second round of selection, the principle of the second round of selection is the same as that of the first round of selection, and is not described in detail herein, and the result of the second round of selection is assumed to be the voiceprint feature similarity of 0.53, and accordingly, the clustering result obtained by performing voiceprint clustering through the voiceprint feature similarity of 0.53 is taken as an optimal clustering result, wherein two types of the optimal clustering result are the first type and the second type respectively.
And respectively carrying out secondary clustering on all the voiceprint vectors in the first class and all the voiceprint vectors in the second class, judging which class is the non-teacher sound and which class is the teacher sound, wherein the judging method is the same as the principle of the method for judging the non-teacher sound and the teacher sound through the first index and the second index in the above description, and the description is omitted.
Step S103, extracting features of the sound fragments corresponding to the voiceprint vectors to obtain a second spectrogram, inputting the second spectrogram into a trained sound classification model, and classifying the sound fragments corresponding to the voiceprint vectors according to the voiceprint vectors, wherein the second spectrogram is a two-dimensional Mel spectrogram.
The training method of the sound classification model specifically comprises the following steps:
Each training sample in the training sample set is converted into a first spectrogram, wherein the first spectrogram is a two-dimensional Mel spectrogram, the first spectrogram is input into a sound classification model, and the sound classification model is trained by utilizing a VGG11 network structure.
The method for acquiring the training sample set comprises the steps of collecting sounds in a classroom through audio collection equipment installed in the classroom, and extracting the sounds in the classroom into a plurality of sections of sounds, wherein each section of sound is used as a training sample. Each training sample is marked manually, and the corresponding marking of each training sample is single student sound, multiple students read simultaneously, discussion sound, noise and the like. The final number of samples marked in this embodiment is 69000, where the single student's voice is 35000, the multiple students read simultaneously with each other with voice 9000, the voice in question is 12000, and the noise is 13000.
In this embodiment, feature extraction is performed on a sound segment that is a non-teacher sound, a second spectrogram is obtained, the second spectrogram is input into a trained sound classification model, the sound classification model performs sound classification according to the second spectrogram, and a classification result is output, where the classification result includes a single student sound, multiple students' read-in sounds, a discussion sound and noise.
In summary, the application utilizes the voice data of the whole class and combines the voiceprint model and the voice classification model to automatically distinguish different voices in the class in a zero-interaction manner, thereby more efficiently and accurately carrying out teaching analysis of the class.
In order to better implement the method, the embodiment of the application also provides a device for distinguishing different sounds in a class, which can be particularly integrated in computer equipment, such as a terminal or a server, and the terminal can include, but is not limited to, mobile phones, tablet computers or desktop computers.
Fig. 4 is a block diagram of a device for distinguishing different sounds in a class according to an embodiment of the present application, and as shown in fig. 4, the device mainly includes:
the collection module 201 is configured to collect classroom sound, and input the classroom sound into the trained voiceprint model to obtain voiceprint vectors of the multi-segment sound clips;
a judging module 202 for judging whether the sound segment corresponding to the sound print vector is a non-teacher sound according to the sound print vector, if yes, transferring to a classifying module, and
The classification module 203 is configured to perform feature extraction on a sound segment corresponding to the voiceprint vector to obtain a second spectrogram, input the second spectrogram into a trained sound classification model, and classify the sound segment corresponding to the voiceprint vector according to the voiceprint vector;
the training method of the sound classification model comprises the following steps:
Converting each training sample in the training sample set into a first spectrogram, wherein the first spectrogram is a two-dimensional Mel spectrogram;
and inputting the first spectrogram into the sound classification model, and training the sound classification model by utilizing the VGG11 network structure.
The various modifications and specific examples of the method provided in the foregoing embodiment are applicable to the device for distinguishing different sounds in a classroom according to the present embodiment, and by the foregoing detailed description of the method for distinguishing different sounds in a classroom, those skilled in the art can clearly know the implementation method of the device for distinguishing different sounds in a classroom according to the present embodiment, which is not described in detail herein for brevity of description.
To better execute the program of the above method, the embodiment of the present application further provides a computer device, as shown in fig. 5, where the computer device 300 includes a memory 301 and a processor 302.
The computer device 300 may be implemented in a variety of forms including a cell phone, tablet computer, palmtop computer, notebook computer, desktop computer, and the like.
Wherein the memory 301 may be used to store instructions, programs, code sets, or instruction sets. The memory 301 may include a stored program area that may store instructions for implementing an operating system, instructions for at least one function (such as determining whether a sound clip corresponding to a voiceprint vector is a non-teacher sound, classifying a sound clip corresponding to a voiceprint vector, and the like), instructions for implementing a method of distinguishing different sounds in a class provided in the above-described embodiment, and a stored data area that may store data involved in the method of distinguishing different sounds in a class provided in the above-described embodiment, and the like.
Processor 302 may include one or more processing cores. The processor 302 performs the various functions of the present application and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 301, invoking data stored in the memory 301. The Processor 302 may be at least one of an Application SPECIFIC INTEGRATED Circuit (ASIC), a digital signal Processor (DIGITAL SIGNAL Processor, DSP), a digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronics for implementing the functions of the processor 302 described above may be other for different devices, and embodiments of the present application are not particularly limited.
The embodiment of the application provides a computer readable storage medium, which comprises various media capable of storing program codes, such as a U disk, a mobile hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk and the like. The computer readable storage medium stores a computer program that can be loaded by a processor and execute the method of distinguishing different sounds in a class of the above-described embodiments.
The present application is not limited by the specific embodiments, and those skilled in the art, having read the present specification, may make modifications to the embodiments without creative contribution as necessary, but are protected by patent laws within the scope of the claims of the present application.