Movatterモバイル変換


[0]ホーム

URL:


CN112599135A - Teaching mode analysis method and system - Google Patents

Teaching mode analysis method and system
Download PDF

Info

Publication number
CN112599135A
CN112599135ACN202011473387.4ACN202011473387ACN112599135ACN 112599135 ACN112599135 ACN 112599135ACN 202011473387 ACN202011473387 ACN 202011473387ACN 112599135 ACN112599135 ACN 112599135A
Authority
CN
China
Prior art keywords
classroom
teacher
ubm
vector
speaking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011473387.4A
Other languages
Chinese (zh)
Inventor
刘三女牙
陈增照
陈荣
易宝林
戴志诚
郑秋雨
张婧
王梦珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal UniversityfiledCriticalCentral China Normal University
Priority to CN202011473387.4ApriorityCriticalpatent/CN112599135A/en
Publication of CN112599135ApublicationCriticalpatent/CN112599135A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供一种教学模式分析方法及系统,包括:检测课堂音频中的活动音,并标记各段活动音的起始时间和结束时间;根据各段活动音的起始时间和结束时间将教学音频进行切割,得到多段活动音频;基于组合的梅尔倒谱特征MFCC向量提取各段活动音频中的不同说话人特征和不同说话人的时长;基于预训练好的通用背景模型UBM将不同说话人特征分别判别为教师说话和学生说话,并确定对应的教师说话时长和学生说话时长;根据教师说话时长占总课堂时长的比例判断课堂的教学模式为练习型课堂、讲授型课堂或混合型课堂。本发明采用人工智能技术从课堂音频中提取出课堂上教师学生交互话语数据,对教师课堂的教学模式进行分析。

Figure 202011473387

The invention provides a teaching mode analysis method and system, comprising: detecting active sounds in classroom audio, and marking the start time and end time of each segment of activity sounds; The audio is cut to obtain multiple segments of active audio; based on the combined Mel cepstrum feature MFCC vector, the different speaker features and different speaker durations in each segment of active audio are extracted; based on the pre-trained universal background model UBM, different speakers are extracted. The characteristics are respectively identified as teacher speaking and student speaking, and the corresponding teacher speaking time and student speaking time are determined; according to the proportion of teacher speaking time to the total classroom time, the teaching mode of the classroom is judged as practice class, lecture class or mixed class. The invention adopts artificial intelligence technology to extract the interactive speech data of teachers and students in the classroom from the classroom audio, and analyzes the teaching mode of the teacher's classroom.

Figure 202011473387

Description

Teaching mode analysis method and system
Technical Field
The invention belongs to the field of combination of teaching activities and artificial intelligence, and particularly relates to a teaching mode analysis method and system.
Background
With the high-speed development of education informatization, teaching activities and artificial intelligence technology are more and more closely fused, but a teaching evaluation link is still in a traditional artificial labeling and counting stage, an intelligent strategy is lacked, and convenience, effectiveness and objectivity are lacked. Real-time teaching mode analysis can help the lessee-giving teacher to timely think back about teaching behaviors and teaching methods, and summarize and correct problems and deficiencies in the teaching link, so that deep direct and effective teaching activities are implemented, professional literacy development of the teacher is facilitated, and the teaching quality is improved. Under the background of an educational informatization era, teaching analysis and an artificial intelligence technology are combined, the problems existing in the traditional teaching mode analysis method are solved, and meanwhile, the method also meets the aims of improving the professional ability of teachers and promoting the teaching quality.
The teaching mode analysis has very important significance in the teaching evaluation link, although a series of mature teachers and students behavior quantitative analysis methods such as an S-T analysis method are proposed by many researchers at home and abroad at present, the speech interaction behaviors of teachers and students in a classroom, which are required to be distinguished in the data processing stage in the research process, still need to be manually distinguished and marked for classroom teaching audios, and a system tool capable of automatically analyzing the teaching mode is lacked.
In summary, the defects of the existing teaching mode analysis method mainly include the following points:
1) based on the traditional measurement and evaluation method, the teaching mode analysis has richer theoretical basis, but the teaching mode analysis cannot be popularized due to the complexity, subjectivity and consumption of data processing, so that the related authoritative scales and evaluation indexes are fewer, and the research is difficult to break through.
2) The teaching mode analysis has high cost and strong subjectivity, and lacks of objective and automatic data processing models and analysis tools. In the process of traditional teaching mode analysis, comprehensive manual judgment needs to be carried out on classroom audio, and particularly, a large amount of data information needs to be processed, so that the complexity and difficulty of the existing teaching mode analysis are reflected.
3) Traditional teaching behavior analysis system need call API instrument at speech detection cutting stage and realize the conversion of audio frequency to time stamp node JSON file, thereby obtains audio frequency time breakpoint and cuts, and the accuracy of cutting can not be guaranteed to this kind of mode to be in the charge mode in the use, economic saving is not enough and sustainability is not high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a teaching mode analysis method and a teaching mode analysis system, and aims to solve the problems that the conventional teaching mode analysis method needs to carry out comprehensive manual judgment on classroom audio, particularly needs to carry out processing on a large amount of data information, cannot ensure the accuracy and the economical efficiency of cutting in a voice detection cutting stage and the like.
In order to achieve the above object, in a first aspect, the present invention provides a teaching mode analysis method, including the following steps:
detecting active sounds in classroom audio, and marking the starting time and the ending time of each section of active sound; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
extracting different speaker characteristics and the time lengths of different speakers in each section of active audio based on the combined Mel frequency cepstrum characteristic MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
respectively judging the characteristics of different speakers as teacher speaking and student speaking based on a pre-trained universal background model UBM, and determining the corresponding speaking time of the teacher and the speaking time of the students; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
judging whether the teaching mode of the classroom is an exercise classroom, a lecture classroom or a mixed classroom according to the proportion of the speaking time of the teacher to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
In an alternative embodiment, a Gaussian mixture model GMM is used to detect both spoken and non-spoken portions of classroom audio; wherein, the speaking part is active sound, and the non-speaking part is mute.
In an optional embodiment, the distinguishing the characteristics of different speakers from the characteristics of a teacher to speak to a student based on the pre-trained universal background model UBM is as follows:
extracting different speaker characteristics of a plurality of real classroom audios based on the combined MFCC vector, and training a UBM based on the different speaker characteristics of the real classroom audios; the UBM can fit the characteristics of a large number of speakers, the characteristic data of a target speaker is scattered around the Gaussian distribution of the UBM, and each Gaussian distribution in the UBM is shifted to the characteristic data of the target speaker through a MAP adaptive algorithm;
extracting collected teacher voice fragments of a plurality of real classroom audios based on the combined MFCC vector, and training a corresponding teacher GMM model on the basis of UBM; the teacher GMM model is a Gaussian mixture model trained by features extracted from audio of a teacher and used for simulating continuous probability distribution of voice vector features of the teacher;
and scoring the different speaker characteristics through a GMM and UBM self-contained scoring method and a teacher GMM model, and judging that the corresponding speaker characteristic teacher utterances are higher than a preset threshold value if the score is higher than the preset threshold value, otherwise, judging that the speaker characteristic teacher utterances are the student utterances.
In an optional embodiment, the feature data of the target speaker is scattered around the gaussian distribution of UBM, and each gaussian distribution in UBM is shifted to the feature data of the target speaker by a MAP adaptive algorithm, specifically:
calculating a vector set X (X) of characteristic data of the target speaker1,X2,…,XT) Middle ith feature vector XiSimilarity pr (i | x) to ith Gaussian componentt):
Figure BDA0002836719060000031
Wherein x istFeature vector, ω, representing the target speaker at time tiRepresents the weight corresponding to the ith target speaker feature vector, pi(xt) Representing the probability score of the ith vector in the target speaker voice feature vector sequence relative to each UBM mixed vector, M representing the number of mixed Gaussian vectors, wjDenotes the weight, p, corresponding to the jth hybrid vectorj(xt) Representing the probability scores of each vector in the jth mixed vector sequence in the general background model relative to each UBM mixed vector;
obtaining a mean value E of a new universal background model UBM according to the similarityi(x) And variance Ei(x2) Parameters are as follows:
Figure BDA0002836719060000041
Figure BDA0002836719060000042
Figure BDA0002836719060000043
wherein n isiRepresenting the frame number of the target voice belonging to the ith mixed Gaussian vector;
and fusing the new parameters obtained in the last step with the original parameters of the UBM model to obtain a final GMM Gaussian mixture model of the target speaker:
Figure BDA0002836719060000044
Figure BDA0002836719060000045
Figure BDA0002836719060000046
wherein, aiωWeight correction factor representing the Gaussian component of the generic background model, aimMean correction factor, a, representing the Gaussian component of a generic background modelivVariance correction factor, mu, representing the Gaussian component of the generic background modeliRepresenting the mean of the generic background model before updating the parameters,
Figure BDA0002836719060000047
represents the mean of the generic background model after updating the parameters,
Figure BDA0002836719060000048
show moreThe weight of the generic background model after the new parameters,
Figure BDA0002836719060000049
representing the variance of the general background model after updating the parameters, T representing the frame number of the training voice, gamma representing a relation factor, and constraining the change scale of the correction factor to make the sum of all the mixed weights be 1; a isiω、aim、aivFor adjusting new parameters of the UBM to shift each gaussian distribution in the UBM towards the targeted speaker profile.
In an alternative embodiment, the judgment result of the teaching mode is visualized through a PyQt5 interactive visualization GUI design tool; the visual result comprises a classroom utterance timing diagram and a classroom utterance distribution diagram;
in the class speaking timing sequence diagram, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of a teacher or a student in each minute of the class in seconds;
in the class utterance profile, the total time and respective proportion of the teacher utterance, the student utterance, and silence of the whole class are shown in the form of a pie chart.
In a second aspect, the present invention provides a teaching mode analysis system, comprising:
the classroom audio detection unit is used for detecting active sounds in classroom audio and marking the starting time and the ending time of each section of active sound; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
the speaker characteristic extraction unit is used for extracting different speaker characteristics and the time lengths of different speakers in each section of active audio based on the combined Mel cepstrum characteristic MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
the speaking duration determining unit is used for respectively judging the characteristics of different speakers as teacher speaking and student speaking based on a pre-trained universal background model UBM and determining the corresponding teacher speaking duration and student speaking duration; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
the teaching mode judging unit is used for judging whether the teaching mode of the classroom is an exercise classroom, a lecture classroom or a mixed classroom according to the proportion of the speaking time of the teacher to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
In an optional embodiment, the classroom audio detection unit adopts a Gaussian mixture model GMM to detect an utterance part and a non-utterance part in classroom audio; wherein, the speaking part is active sound, and the non-speaking part is mute.
In an alternative embodiment, the speaker feature extraction unit extracts different speaker features of the collected multiple real classroom audios based on the combined MFCC vectors;
the speaking duration determining unit trains UBMs based on different speaker characteristics of a plurality of real classroom audios; the UBM can fit the characteristics of a large number of speakers, the characteristic data of a target speaker is scattered around the Gaussian distribution of the UBM, and each Gaussian distribution in the UBM is shifted to the characteristic data of the target speaker through a MAP adaptive algorithm; extracting collected teacher voice fragments of a plurality of real classroom audios based on the combined MFCC vector, and training a corresponding teacher GMM model on the basis of UBM; the teacher GMM model is a Gaussian mixture model trained by features extracted from audio of a teacher and used for simulating continuous probability distribution of voice vector features of the teacher; and scoring the different speaker characteristics through a GMM and UBM self-contained scoring method and a teacher GMM model, and judging the corresponding speaker characteristic teacher words if the score is higher than a preset threshold value, otherwise, judging the words as the student words.
In an alternative embodiment, the utterance duration determination orderMeta-computation target speaker characteristic data vector set X (X)1,X2,…,XT) Middle ith feature vector XiSimilarity p to the ith Gaussian componentr(i|xt):
Figure BDA0002836719060000061
Wherein x istFeature vector, ω, representing the target speaker at time tiRepresents the weight corresponding to the ith target speaker feature vector, pi(xt) Representing the probability score of the ith vector in the target speaker voice feature vector sequence relative to each UBM mixed vector, M representing the number of mixed Gaussian vectors, wjDenotes the weight, p, corresponding to the jth hybrid vectorj(xt) Representing the probability scores of each vector in the jth mixed vector sequence in the general background model relative to each UBM mixed vector;
obtaining a mean value E of a new universal background model UBM according to the similarityi(x) And variance Ei(x2) Parameters are as follows:
Figure BDA0002836719060000062
Figure BDA0002836719060000071
Figure BDA0002836719060000072
wherein n isiRepresenting the frame number of the target voice belonging to the ith mixed Gaussian vector;
and fusing the new parameters obtained in the last step with the original parameters of the UBM model to obtain a final GMM Gaussian mixture model of the target speaker:
Figure BDA0002836719060000073
Figure BDA0002836719060000074
Figure BDA0002836719060000075
wherein, aiωWeight correction factor representing the Gaussian component of the generic background model, aimMean correction factor, a, representing the Gaussian component of a generic background modelivVariance correction factor, mu, representing the Gaussian component of the generic background modeliRepresenting the mean of the generic background model before updating the parameters,
Figure BDA0002836719060000076
represents the mean of the generic background model after updating the parameters,
Figure BDA0002836719060000077
representing the weight of the generic background model after updating the parameters,
Figure BDA0002836719060000078
representing the variance of the general background model after updating the parameters, T representing the frame number of the training voice, gamma representing a relation factor, and constraining the change scale of the correction factor to make the sum of all the mixed weights be 1; a isiω、aim、aivFor adjusting new parameters of the UBM to shift each gaussian distribution in the UBM towards the targeted speaker profile.
In an optional embodiment, the system further comprises: the visualization unit is used for visualizing the judgment result of the teaching mode through a PyQt5 interactive visualization GUI design tool; the visual result comprises a classroom utterance timing diagram and a classroom utterance distribution diagram; in the class speaking timing sequence diagram, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of a teacher or a student in each minute of the class in seconds; in the class utterance profile, the total time and respective proportion of the teacher utterance, the student utterance, and silence of the whole class are shown in the form of a pie chart.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
the invention provides a teaching mode analysis method and a teaching mode analysis system. The voice activity detection algorithm and the speaker recognition algorithm are utilized to detect, cut and recognize teaching audio, the recognition and analysis result is visualized through a PyQt5 interactive visual GUI design tool, the analysis result can help teachers and students to find the interactive frequency of words in the classroom teaching process, the teaching method is improved, and the teaching effect is improved.
The invention provides a teaching mode analysis method and a system, based on a traditional GMM-UNM speaker recognition model, a VAD activity voice detection algorithm is adopted to detect words and non-words in the process of preprocessing classroom audio to obtain the starting and ending time stamps of each segment, the classroom audio is cut according to the time stamps, a method which combines an artificial intelligence technology and can visualize the teaching mode is provided, and the distribution of words of teachers and students in the classroom and the word change curve chart of the teachers and students can be visually displayed by leading in the classroom audio.
The invention provides a teaching mode analysis method and a system, and provides a teaching mode analysis method, which is characterized by taking lesson time as a horizontal axis and teacher or student speaking time length in unit time as a vertical axis, and a class teacher-student speaking proportion pie chart.
Drawings
FIG. 1 is a flow chart of a teaching mode analysis method provided by an embodiment of the present invention;
FIG. 2 is a flow chart of teaching mode analysis provided by an embodiment of the present invention;
FIG. 3 is a flow chart of speaker identification provided by an embodiment of the present invention;
FIG. 4 is a timing diagram of classroom utterances provided by embodiments of the present invention;
FIG. 5 is a classroom utterance profile provided by an embodiment of the present invention;
fig. 6 is an architecture diagram of a teaching mode analysis system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention aims to detect, cut and identify words and non-words of classroom teaching audio based on Voice Activity Detection algorithm (VAD) and Gaussian mixture-universal background model (GMM-UBM), classify the identification result into three identities of Q (silence), S (student words) and T (teacher words), thereby automatically analyzing classroom teaching mode, and finally displaying the final result as a cake distribution graph and a curve graph of the teacher words, the student words and the silence.
Fig. 1 is a flowchart of a teaching mode analysis method provided in an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
s101, detecting active sounds in classroom audio, and marking the starting time and the ending time of each section of active sound; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
s102, extracting different speaker characteristics and durations of different speakers in each section of active audio based on the combined Mel frequency cepstrum characteristics MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
s103, respectively judging the characteristics of different speakers as teacher speaking and student speaking based on a pre-trained universal background model UBM, and determining the corresponding teacher speaking duration and student speaking duration; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
s104, judging whether the teaching mode of the classroom is an exercise classroom, a lecture classroom or a mixed classroom according to the proportion of the speaking time of the teacher to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
The teaching mode analysis method of the invention is divided into three parts: voice activity detection and segmentation, speaker recognition and classroom pattern analysis visualization; the speaker recognition process comprises three steps: and (4) extracting features, establishing a model and predicting a result. The overall process flow is shown in figure 2. The audio of classroom teaching is firstly led into a teaching mode analysis system, detection and cutting are carried out on the audio of classroom by utilizing a voice activity detection algorithm, speaker recognition is respectively carried out on audio segments, the audio segments are divided into three types of silence, teacher words and student words according to recognition results, and finally visualization of the analysis results of the teaching mode is realized.
1. VAD voice activity detection algorithm
The VAD detection algorithm is used to detect the active part and the mute part in the audio frame by frame, and marks the active sound as 1, the mute as 0, and marks the start time and the end time. By designing an active speech generator in an algorithmic implementation, using a filled sliding window over the audio frames, the gatherer will trigger and start producing audio frames when more than 90% of the frames in the window are voiced. The collector will wait until 90% of the frames in the window are cleared. The timestamps are then stored in a txt file and a list of start and end timestamps is generated for use in subsequent audio cutting and result visualization stages.
1.1 Activity tone detection
Active tone detection typically employs a Gaussian Mixture Model (GMM), which is essentially a linear superposition of multiple Gaussian models, and in general, the signal distributions of spoken and non-spoken segments in speech can be represented by a weighted superposition of multiple Gaussian models:
Figure BDA0002836719060000101
probability distribution function P of Gaussian model of each dimension in formulai(Ot) Comprises the following steps:
Figure BDA0002836719060000102
wherein, mui、∑iRespectively mean vector and covariance matrix of the Gaussian model, their mixture number M with Gaussian, and weight omega of each dimensioniTogether, a gaussian mixture model is formed:
λ={M,ωi,μi,∑i}
and establishing respective Gaussian mixture models for the spoken and non-spoken parts in the classroom audio according to the algorithm, and then carrying out frame-by-frame detection on the whole audio and judging the similarity of the whole audio and the generated spoken and non-spoken models, thereby achieving the effect of distinguishing the spoken and non-spoken parts. And dividing the utterances (1) and the non-utterances (0) in the whole classroom audio frame by frame through a voice activity detection algorithm.
1.2 Audio cutting
The audio cutting is to determine a cutting edge by detecting a speech jumping point, and cut the whole class audio according to a start and end timestamp file obtained by VAD activity voice detection algorithm to obtain cut audio segments and start and end time nodes corresponding to each segment.
2. Speaker recognition
Fig. 3 is a flowchart of speaker recognition according to an embodiment of the present invention, as shown in fig. 3, including the following steps:
2.1 feature extraction
After the classroom audio is cut, features are extracted for each audio segment separately. In the feature extraction, the Mel cepstrum feature MFCC which best accords with the auditory features of human ears is used, and the MFCC, the first-order difference MFCC and the second-order difference MFCC are simultaneously combined for transverse splicing to obtain a 39-dimensional feature vector, so that the features of a speaker can be retained to the greatest extent, and more excellent feature extraction is realized.
2.2 model training
The method comprises the steps of preprocessing ten collected real classroom audios (each class is 45 minutes, 30-35 students and one teacher can meet the data volume requirement of a Universal Background Model), extracting Mel cepstrum characteristics, training a Universal Background Model (UBM), and paying attention to the fact that the number of non-target training sets is increased when the Universal Background Model is trained, the training effect of the Model is improved, and the generalization capability is improved.
In the Gaussian mixture-general background model, the UBM can fit the characteristics of a large number of speakers, the characteristic data of a target speaker is scattered around some Gaussian distributions of the UBM, and each Gaussian distribution in the UBM is shifted to target user data through a MAP adaptive algorithm. The specific calculation method is as follows:
computing a training vector set X (X)1,X2,…,XT) In (C) XiSimilarity to ith gaussian component:
Figure BDA0002836719060000111
then, updating the weight, mean and variance parameters according to the similarity:
Figure BDA0002836719060000121
Figure BDA0002836719060000122
Figure BDA0002836719060000123
fusing the updated parameters obtained in the last step with UBM parameters to obtain a final target speaker model:
Figure BDA0002836719060000124
Figure BDA0002836719060000125
Figure BDA0002836719060000126
wherein the adaptive parameter alphaiwim,αivFor adjusting the influence of the new parameters and the UBM parameters on the final model.
And extracting teacher voice segments of the target classroom, training a corresponding teacher GMM model on the basis of UBM, then respectively detecting each audio segment, skipping if the audio segments are non-voice, otherwise, extracting characteristics, and scoring with the teacher GMM model by a score method carried by GMM and UBM, wherein if the score is higher than a set threshold value, the teacher voice segment is the teacher voice, and otherwise, the student voice segment is determined as the student voice.
3. Result visualization
A window program is designed by adopting a python self-contained PyQt5 interactive visual GUI design tool to visualize the teaching mode analysis result, and the result is respectively displayed by a classroom utterance timing diagram and a classroom utterance distribution diagram. The analysis result of a high-quality primary school Chinese classroom can be shown through two mode analysis graphs, the classroom belongs to a teacher-guided classroom, the classroom atmosphere is active, and the teacher-student interaction is strong.
As shown in fig. 4, in the class speaking timing chart, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of the teacher or the student in the ith minute of the target class in seconds, so that the speaking situation of the teacher or the student in a certain time period can be intuitively and clearly observed in the visualization mode.
As shown in fig. 5, in the class speech distribution diagram, the total time and percentage of the teacher speech, the student speech and the silence of the whole class are displayed in the form of a pie chart, so that the whole degree of participation of the teacher and the students in the whole class can be grasped and recognized at a glance, and a series of subsequent teaching analyses can be facilitated.
Fig. 6 is an architecture diagram of a teaching mode analysis system according to an embodiment of the present invention, as shown in fig. 6, including:
the classroomaudio detection unit 610 is used for detecting active voices in classroom audio and marking the starting time and the ending time of each active voice; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
a speakerfeature extraction unit 620, configured to extract different speaker features and durations of different speakers in each segment of active audio based on the combined mel-frequency cepstrum feature MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
the speakingduration determining unit 630 is used for respectively judging the characteristics of different speakers as teacher speaking and student speaking based on the pre-trained universal background model UBM, and determining the corresponding teacher speaking duration and student speaking duration; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
a teachingmode judging unit 640, configured to judge that the teaching mode of the classroom is an exercise classroom, a lecture classroom, or a mixed classroom according to a ratio of the teacher speaking time to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
Thevisualization unit 650 is used for visualizing the judgment result of the teaching mode through a PyQt5 interactive visualization GUI design tool; the visual result comprises a classroom utterance timing diagram and a classroom utterance distribution diagram; in the class speaking timing sequence diagram, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of a teacher or a student in each minute of the class in seconds; in the class utterance profile, the total time and respective proportion of the teacher utterance, the student utterance, and silence of the whole class are shown in the form of a pie chart.
Specifically, the functions of each unit in fig. 6 can be referred to the description in the foregoing method embodiment, and are not described herein again.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A teaching mode analysis method is characterized by comprising the following steps:
detecting active sounds in classroom audio, and marking the starting time and the ending time of each section of active sound; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
extracting different speaker characteristics and the time lengths of different speakers in each section of active audio based on the combined Mel frequency cepstrum characteristic MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
respectively judging the characteristics of different speakers as teacher speaking and student speaking based on a pre-trained universal background model UBM, and determining the corresponding speaking time of the teacher and the speaking time of the students; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
judging whether the teaching mode of the classroom is an exercise classroom, a lecture classroom or a mixed classroom according to the proportion of the speaking time of the teacher to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
2. The teaching mode analysis method of claim 1, wherein the spoken and non-spoken parts in the classroom audio are detected using a gaussian mixture model GMM; wherein, the speaking part is active sound, and the non-speaking part is mute.
3. The teaching mode analysis method of claim 2, wherein the pre-trained universal background model UBM is used to distinguish different speaker characteristics from speaker to teacher and speaker to student respectively, specifically:
extracting different speaker characteristics of a plurality of real classroom audios based on the combined MFCC vector, and training a UBM based on the different speaker characteristics of the real classroom audios; the UBM can fit the characteristics of a large number of speakers, the characteristic data of a target speaker is scattered around the Gaussian distribution of the UBM, and each Gaussian distribution in the UBM is shifted to the characteristic data of the target speaker through a MAP adaptive algorithm;
extracting collected teacher voice fragments of a plurality of real classroom audios based on the combined MFCC vector, and training a corresponding teacher GMM model on the basis of UBM; the teacher GMM model is a Gaussian mixture model trained by features extracted from audio of a teacher and used for simulating continuous probability distribution of voice vector features of the teacher;
and scoring the different speaker characteristics through a GMM and UBM self-contained scoring method and a teacher GMM model, and judging that the corresponding speaker characteristic teacher utterances are higher than a preset threshold value if the score is higher than the preset threshold value, otherwise, judging that the speaker characteristic teacher utterances are the student utterances.
4. The pedagogical pattern analysis method of claim 2 wherein the feature data of the targeted speaker is scattered around a gaussian distribution of UBMs, each gaussian distribution in the UBMs being shifted towards the feature data of the targeted speaker by a MAP adaptation algorithm, specifically:
calculating a vector set X (X) of characteristic data of the target speaker1,X2,...,XT) Middle ith feature vector XiSimilarity p to the ith Gaussian componentr(i|xt):
Figure FDA0002836719050000021
Wherein x istFeature vector, ω, representing the target speaker at time tiRepresents the weight corresponding to the ith target speaker feature vector, pi(xt) Representing the probability score of the ith vector in the target speaker voice feature vector sequence relative to each UBM mixed vector, M representing the number of mixed Gaussian vectors, wjDenotes the weight, p, corresponding to the jth hybrid vectorj(xt) Representing the probability scores of each vector in the jth mixed vector sequence in the general background model relative to each UBM mixed vector;
obtaining a mean value E of a new universal background model UBM according to the similarityi(x) And variance Ei(x2) Parameters are as follows:
Figure FDA0002836719050000022
Figure FDA0002836719050000023
Figure FDA0002836719050000031
wherein n isiRepresenting the frame number of the target voice belonging to the ith mixed Gaussian vector;
and fusing the new parameters obtained in the last step with the original parameters of the UBM model to obtain a final GMM Gaussian mixture model of the target speaker:
Figure FDA0002836719050000032
Figure FDA0002836719050000033
Figure FDA0002836719050000034
wherein, aiωWeight correction factor representing the Gaussian component of the generic background model, aimMean correction factor, a, representing the Gaussian component of a generic background modelivVariance correction factor, mu, representing the Gaussian component of the generic background modeliRepresenting the mean of the generic background model before updating the parameters,
Figure FDA0002836719050000035
represents the mean of the generic background model after updating the parameters,
Figure FDA0002836719050000036
representing the weight of the generic background model after updating the parameters,
Figure FDA0002836719050000037
representing the variance of the generic background model after updating the parameters, T representing the frame of the training speechThe number gamma represents a relation factor, the change scale of the correction factor is restricted, and the sum of all the mixing weights is 1; a isiω、aim、aivFor adjusting new parameters of the UBM to shift each gaussian distribution in the UBM towards the targeted speaker profile.
5. An instructional pattern analysis method according to any one of claims 1 to 4 wherein the judgment result of the instructional pattern is visualized by PyQt5 interactive visualization GUI design tool; the visual result comprises a classroom utterance timing diagram and a classroom utterance distribution diagram;
in the class speaking timing sequence diagram, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of a teacher or a student in each minute of the class in seconds;
in the class utterance profile, the total time and respective proportion of the teacher utterance, the student utterance, and silence of the whole class are shown in the form of a pie chart.
6. A teaching mode analysis system, comprising:
the classroom audio detection unit is used for detecting active sounds in classroom audio and marking the starting time and the ending time of each section of active sound; cutting the teaching audio according to the starting time and the ending time of each section of active audio to obtain a plurality of sections of active audio; the active tone refers to non-silent audio;
the speaker characteristic extraction unit is used for extracting different speaker characteristics and the time lengths of different speakers in each section of active audio based on the combined Mel cepstrum characteristic MFCC vector; the combined MFCC vector is obtained by transversely splicing an MFCC, a first-order differential MFCC vector and a second-order differential MFCC vector;
the speaking duration determining unit is used for respectively judging the characteristics of different speakers as teacher speaking and student speaking based on a pre-trained universal background model UBM and determining the corresponding teacher speaking duration and student speaking duration; the pre-trained UBM can fit the characteristics of different speakers, including teachers and students;
the teaching mode judging unit is used for judging whether the teaching mode of the classroom is an exercise classroom, a lecture classroom or a mixed classroom according to the proportion of the speaking time of the teacher to the total classroom time; when the ratio of the speaking time of the teacher to the total classroom time is lower than a first threshold, the teaching mode is an exercise classroom; when the ratio of the speaking time of the teacher to the total classroom time is greater than a second threshold, the teaching mode is a lecture classroom; otherwise, the teaching mode is considered as a mixed classroom; the first threshold is less than a second threshold.
7. The tutorial pattern analysis system of claim 6, wherein the classroom audio detection unit employs a gaussian mixture model GMM to detect spoken and non-spoken portions of classroom audio; wherein, the speaking part is active sound, and the non-speaking part is mute.
8. The pedagogical pattern analysis system of claim 7 wherein the speaker feature extraction unit extracts different speaker features of the captured plurality of real classroom audios based on the combined MFCC vectors;
the speaking duration determining unit trains UBMs based on different speaker characteristics of a plurality of real classroom audios; the UBM can fit the characteristics of a large number of speakers, the characteristic data of a target speaker is scattered around the Gaussian distribution of the UBM, and each Gaussian distribution in the UBM is shifted to the characteristic data of the target speaker through a MAP adaptive algorithm; extracting collected teacher voice fragments of a plurality of real classroom audios based on the combined MFCC vector, and training a corresponding teacher GMM model on the basis of UBM; the teacher GMM model is a Gaussian mixture model trained by features extracted from audio of a teacher and used for simulating continuous probability distribution of voice vector features of the teacher; and scoring the different speaker characteristics through a GMM and UBM self-contained scoring method and a teacher GMM model, and judging the corresponding speaker characteristic teacher words if the score is higher than a preset threshold value, otherwise, judging the words as the student words.
9. The instructional mode analysis system of claim 7 wherein the utterance duration determination unit calculates a set of target speaker feature data vectors X (X)1,X2,...,XT) Middle ith feature vector XiSimilarity p to the ith Gaussian componentr(i|xt):
Figure FDA0002836719050000051
Wherein x istFeature vector, ω, representing the target speaker at time tiRepresents the weight corresponding to the ith target speaker feature vector, pi(xt) Representing the probability score of the ith vector in the target speaker voice feature vector sequence relative to each UBM mixed vector, M representing the number of mixed Gaussian vectors, wjDenotes the weight, p, corresponding to the jth hybrid vectorj(xt) Representing the probability scores of each vector in the jth mixed vector sequence in the general background model relative to each UBM mixed vector;
obtaining a mean value E of a new universal background model UBM according to the similarityi(x) And variance Ei(x2) Parameters are as follows:
Figure FDA0002836719050000052
Figure FDA0002836719050000053
Figure FDA0002836719050000054
wherein n isiRepresenting the frame number of the target voice belonging to the ith mixed Gaussian vector;
and fusing the new parameters obtained in the last step with the original parameters of the UBM model to obtain a final GMM Gaussian mixture model of the target speaker:
Figure FDA0002836719050000055
Figure FDA0002836719050000061
Figure FDA0002836719050000062
wherein, aiωWeight correction factor representing the Gaussian component of the generic background model, aimMean correction factor, a, representing the Gaussian component of a generic background modelivVariance correction factor, mu, representing the Gaussian component of the generic background modeliRepresenting the mean of the generic background model before updating the parameters,
Figure FDA0002836719050000063
represents the mean of the generic background model after updating the parameters,
Figure FDA0002836719050000064
representing the weight of the generic background model after updating the parameters,
Figure FDA0002836719050000065
representing the variance of the general background model after updating the parameters, T representing the frame number of the training voice, gamma representing a relation factor, and constraining the change scale of the correction factor to make the sum of all the mixed weights be 1; a isiω、aim、aivFor adjusting new parameters of UBM to shift each Gaussian distribution in UBM to characteristic data of target speaker。
10. An instructional pattern analysis system according to any one of claims 6 to 9 further comprising:
the visualization unit is used for visualizing the judgment result of the teaching mode through a PyQt5 interactive visualization GUI design tool; the visual result comprises a classroom utterance timing diagram and a classroom utterance distribution diagram; in the class speaking timing sequence diagram, the horizontal axis represents the duration of a class in minutes, and the vertical axis represents the speaking duration of a teacher or a student in each minute of the class in seconds; in the class utterance profile, the total time and respective proportion of the teacher utterance, the student utterance, and silence of the whole class are shown in the form of a pie chart.
CN202011473387.4A2020-12-152020-12-15Teaching mode analysis method and systemPendingCN112599135A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011473387.4ACN112599135A (en)2020-12-152020-12-15Teaching mode analysis method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011473387.4ACN112599135A (en)2020-12-152020-12-15Teaching mode analysis method and system

Publications (1)

Publication NumberPublication Date
CN112599135Atrue CN112599135A (en)2021-04-02

Family

ID=75195403

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011473387.4APendingCN112599135A (en)2020-12-152020-12-15Teaching mode analysis method and system

Country Status (1)

CountryLink
CN (1)CN112599135A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113743250A (en)*2021-08-162021-12-03华中师范大学Method and system for constructing classroom teaching behavior event description model
CN113743263A (en)*2021-08-232021-12-03华中师范大学Method and system for measuring non-verbal behaviors of teacher
CN114550721A (en)*2022-03-032022-05-27深圳地平线机器人科技有限公司 Method, device, electronic device and storage medium for detecting user conversation state
CN116578755A (en)*2022-03-302023-08-11江苏控智电子科技有限公司Information analysis system and method based on artificial intelligence and big data
CN116884436A (en)*2023-08-312023-10-13南京览众智能科技有限公司Classroom teaching mode recognition method and device based on voice analysis
CN117079655A (en)*2023-10-162023-11-17华南师范大学Audio analysis method, device, equipment and readable storage medium
CN118016073A (en)*2023-12-272024-05-10华中科技大学Classroom coarse granularity sound event detection method based on audio and video feature fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103077491A (en)*2012-11-102013-05-01南昌大学Classroom teaching model analytical method
CN107918821A (en)*2017-03-232018-04-17广州思涵信息科技有限公司Teachers ' classroom teaching process analysis method and system based on artificial intelligence technology
CN109378014A (en)*2018-10-222019-02-22华中师范大学 A method and system for source identification of mobile devices based on convolutional neural network
CN109614934A (en)*2018-12-122019-04-12易视腾科技股份有限公司Online teaching quality assessment parameter generation method and device
CN110534101A (en)*2019-08-272019-12-03华中师范大学A kind of mobile device source discrimination and system based on multimodality fusion depth characteristic
CN110544481A (en)*2019-08-272019-12-06华中师范大学 A S-T classification method, device and equipment terminal based on voiceprint recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103077491A (en)*2012-11-102013-05-01南昌大学Classroom teaching model analytical method
CN107918821A (en)*2017-03-232018-04-17广州思涵信息科技有限公司Teachers ' classroom teaching process analysis method and system based on artificial intelligence technology
CN109378014A (en)*2018-10-222019-02-22华中师范大学 A method and system for source identification of mobile devices based on convolutional neural network
CN109614934A (en)*2018-12-122019-04-12易视腾科技股份有限公司Online teaching quality assessment parameter generation method and device
CN110534101A (en)*2019-08-272019-12-03华中师范大学A kind of mobile device source discrimination and system based on multimodality fusion depth characteristic
CN110544481A (en)*2019-08-272019-12-06华中师范大学 A S-T classification method, device and equipment terminal based on voiceprint recognition

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113743250A (en)*2021-08-162021-12-03华中师范大学Method and system for constructing classroom teaching behavior event description model
US12254692B2 (en)2021-08-162025-03-18Central China Normal UniversityConstruction method and system of descriptive model of classroom teaching behavior events
CN113743250B (en)*2021-08-162024-02-13华中师范大学Construction method and system of classroom teaching behavior event description model
WO2023019652A1 (en)*2021-08-162023-02-23华中师范大学Method and system for constructing classroom teaching behavior event description model
CN113743263B (en)*2021-08-232024-02-13华中师范大学Teacher nonverbal behavior measurement method and system
CN113743263A (en)*2021-08-232021-12-03华中师范大学Method and system for measuring non-verbal behaviors of teacher
CN114550721A (en)*2022-03-032022-05-27深圳地平线机器人科技有限公司 Method, device, electronic device and storage medium for detecting user conversation state
CN114550721B (en)*2022-03-032025-09-02深圳地平线机器人科技有限公司 Method, device, electronic device and storage medium for detecting user conversation status
CN116578755B (en)*2022-03-302024-01-09张家口微智网络科技有限公司Information analysis system and method based on artificial intelligence and big data
CN116578755A (en)*2022-03-302023-08-11江苏控智电子科技有限公司Information analysis system and method based on artificial intelligence and big data
CN116884436A (en)*2023-08-312023-10-13南京览众智能科技有限公司Classroom teaching mode recognition method and device based on voice analysis
CN117079655A (en)*2023-10-162023-11-17华南师范大学Audio analysis method, device, equipment and readable storage medium
CN117079655B (en)*2023-10-162023-12-22华南师范大学 An audio analysis method, device, equipment and readable storage medium
CN118016073A (en)*2023-12-272024-05-10华中科技大学Classroom coarse granularity sound event detection method based on audio and video feature fusion

Similar Documents

PublicationPublication DateTitle
CN112599135A (en)Teaching mode analysis method and system
CN110544481B (en)S-T classification method and device based on voiceprint recognition and equipment terminal
CN101740024B (en) An automatic assessment method for oral fluency based on generalized fluency
DongApplication of artificial intelligence software based on semantic web technology in english learning and teaching
CN101197084A (en)Automatic spoken English evaluating and learning system
US20060004567A1 (en)Method, system and software for teaching pronunciation
CN105575199A (en)Intelligent music teaching system
CN105551328A (en)Language teaching coaching and study synchronization integration system on the basis of mobile interaction and big data analysis
CN102034475A (en)Method for interactively scoring open short conversation by using computer
CN110473548A (en)A kind of classroom Internet analysis method based on acoustic signal
CN103366759A (en)Speech data evaluation method and speech data evaluation device
CN117078094A (en)Teacher comprehensive ability assessment method based on artificial intelligence
CN102880693A (en)Music recommendation method based on individual vocality
CN119322819B (en)Classroom teaching behavior analysis method and system based on semantic understanding
CN105608960A (en)Spoken language formative teaching method and system based on multi-parameter analysis
CN114678039A (en) A singing evaluation method based on deep learning
CN118782096A (en) A classroom teaching effect evaluation system based on progressive embedding of multi-feature speech
CN112201100A (en)Music singing scoring system and method for evaluating artistic quality of primary and secondary schools
CN119848220A (en)Personalized learning plan recommendation method and system
CN111341346A (en)Language expression capability evaluation method and system for fusion depth language generation model
CN119400205A (en) English phoneme personalized correction system and method based on acoustic feature analysis
Wang et al.Speaker Diarization in the Classroom: How Much Does Each Student Speak in Group Discussions?
CN202758611U (en)Speech data evaluation device
Zechner et al.Automatic scoring of children’s read-aloud text passages and word lists
CN114758560B (en) A Humming Pitch Evaluation Method Based on Dynamic Time Warping

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20210402


[8]ページ先頭

©2009-2025 Movatter.jp