Movatterモバイル変換


[0]ホーム

URL:


CN112885168A - Immersive speech feedback training system based on AI - Google Patents

Immersive speech feedback training system based on AI
Download PDF

Info

Publication number
CN112885168A
CN112885168ACN202110081356.2ACN202110081356ACN112885168ACN 112885168 ACN112885168 ACN 112885168ACN 202110081356 ACN202110081356 ACN 202110081356ACN 112885168 ACN112885168 ACN 112885168A
Authority
CN
China
Prior art keywords
information
module
voice
shape coefficient
character information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110081356.2A
Other languages
Chinese (zh)
Other versions
CN112885168B (en
Inventor
范虹
刘蓝冰
尉泽民
严晓波
茹文亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing Peoples Hospital
Original Assignee
Shaoxing Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoxing Peoples HospitalfiledCriticalShaoxing Peoples Hospital
Priority to CN202110081356.2ApriorityCriticalpatent/CN112885168B/en
Publication of CN112885168ApublicationCriticalpatent/CN112885168A/en
Application grantedgrantedCritical
Publication of CN112885168BpublicationCriticalpatent/CN112885168B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于AI的沉浸式言语反馈训练系统,包括能力评级模块、分级学习模块、标准库、影视播放模块、声音辨识模块、影像采集模块、数据接收模块、数据处理模块、学习评分模块与评分展示模块;所述能力评级模块用于对语言障碍患者进行语音能力的评级,并生成语音评级信息,语音评级信息被发送到分级学习模块,所述分级学习模块接收到语音评级信息从标准库中调取对应级别的学习影视,所述标准库储存着不同等级的训练影视信息;所述影视播放模块用于接收分级学习模块控制标准库发送出对应级别的影视信息,影视播放模块接收到对应级别的影视信息后开始播放。本发明能够更好的帮助和促进语言障碍者进行康复训练。

Figure 202110081356

The invention discloses an AI-based immersive speech feedback training system, comprising an ability rating module, a graded learning module, a standard library, a video playback module, a sound recognition module, an image acquisition module, a data receiving module, a data processing module, and a learning scoring module. Module and scoring display module; the ability rating module is used to rate the speech ability of patients with language disorders, and generate speech rating information, and the speech rating information is sent to the grading learning module, and the grading learning module receives the speech rating information from the The learning film and television information of the corresponding level is retrieved from the standard library, and the standard library stores the training film and television information of different levels; the film and television playback module is used to receive the hierarchical learning module and control the standard library to send out the film and television information of the corresponding level, and the film and television playback module receives Start playing after reaching the video information of the corresponding level. The present invention can better help and promote the language barrier persons to carry out rehabilitation training.

Figure 202110081356

Description

Immersive speech feedback training system based on AI
Technical Field
The invention relates to the field of language training, in particular to an immersive speech feedback training system based on AI.
Background
Speech and language dysgenesis refers to a disorder of normal language acquisition patterns at an early stage of development, manifested as delays and abnormalities in pronunciation, language understanding, or development of language expression abilities that affect learning, occupation, and social functions. The situations are not caused by the abnormality of nerve or speech mechanisms, sensory impairment, mental development retardation or surrounding environmental factors, and a speech feedback training system is used for assisting rehabilitation training in the rehabilitation process of language disorder.
The existing speech feedback training system has a single function in the using process, so that the training effect is poor, the using requirement of a user cannot be met, certain influence is brought to the use of the speech feedback training system, and therefore, the immersive speech feedback training system based on the AI is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to solve current speech feedback training system, in the use, the function singleness leads to its training effect relatively poor, can not satisfy user's user demand, has brought the problem of certain influence for speech feedback training system's use, provides an immersive speech feedback training system based on AI.
The invention solves the technical problems through the following technical scheme that the system comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a sound identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.
Preferably, the specific process of the ability rating module for rating the ability of the language-handicapped patient is as follows:
the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;
step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is larger than a preset value, the level is judged to belong to, and when two or more than two similarities are larger than the preset value, the similarity with high level is taken as a final judgment result.
Preferably, the video playing module plays the audio information synchronously while playing the video.
Preferably, training movie and television information of different grades is stored in the standard library, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, and the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to obtain motion comparison information.
Preferably, the shape factor comprises a first shape factor and a second shape factor, and the specific process of the shape factor is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ LRatio of,LRatio ofThe length of L3 is the second mouth shape factor;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculatedDifference (D)Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculatedDifference (D)
S3: calculate Pq1Difference (D)Absolute value of (1) and Pq2Difference (D)Of absolute value of (Pq)Andobtaining the action comparison information PqAnd
preferably, the specific processing procedure of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as FSign board
SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as FFruit of Chinese wolfberryI.e. voice comparison information FFruit of Chinese wolfberry
SS 3: obtaining the real-time voiceprint FFruit of Chinese wolfberryAnd standard voiceprint FSign boardComparing the similarity to obtain a similarity FRatio of
Preferably, the specific process of the learning scoring module for processing the voice comparison information and the motion comparison information to generate the training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 ═ MnAndobtaining training score information MnAnd
preferably, the scoring display module ranks all the received training scoring information from high to low, and displays the personnel information corresponding to the first three maximum training scoring information after being amplified by a preset font.
Compared with the prior art, the invention has the following advantages: the immersive speech feedback training system based on AI can better evaluate the level of language disorder of a patient with language disorder before performing speech training, through the arrangement, the system can better provide speech training contents for the patient with language disorder, the arrangement is easy to achieve, the use experience of the system can be effectively improved, the frustration caused by the difficulty of the training contents to the patient with language disorder can be effectively avoided, the mouth shape condition of pronunciation can be simultaneously checked for the patient with language disorder through synchronous playing of movie contents and sound, pronunciation is performed through observing mouth shape simulation, the rehabilitation training progress of the patient with language disorder is accelerated, meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the patient with language disorder can be more accurately evaluated, different arrangements meet different use requirements of the patient with language disorder, the system is more worthy of popularization and application.
Drawings
FIG. 1 is a system block diagram of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
As shown in fig. 1, the present embodiment provides a technical solution: an immersion type speech feedback training system based on AI comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a voice identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, the movie playing module starts playing after receiving the movie information of the corresponding level, the movie playing module performs amplification close-up processing on mouths of characters in images when playing the movie information, so that mouth shape imitation of patients with language disorder is facilitated, the sound identification module collects voice information sent by the patients with language disorder at the moment, and meanwhile, the image collection module operates to collect mouth action information when the patients with language disorder send the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.
The specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:
the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;
step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarities are greater than the preset value, the similarity with high level is taken as a final judgment result;
before carrying out the pronunciation training, better carry out the rank assessment of language obstacle to language obstacle patient, through this kind of setting, let that this system is better provide the pronunciation training content for language obstacle patient, from easy to difficult setting, can effectually promote the use of this system and experience, effectively avoided the training content too difficult to lead to the fact the frustration for language obstacle patient in addition.
The movie & TV broadcast module is when carrying out the image broadcast, and audio information is broadcast in step, and the effectual language disorder patient pronunciation that has avoided the sound painting desynchrony to make mistakes to through movie & TV content and sound synchronous broadcast, let can look over the mouth shape situation of pronunciation simultaneously for language disorder patient, carry out pronunciation through observing the simulation of mouth shape, accelerated language disorder patient's rehabilitation training progress.
The standard library stores training movie and television information of different grades, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes collected mouth motion information into real-time mouth shape coefficient information through the image collecting module, the real-time mouth shape coefficient information is compared with prestored mouth shape coefficient information to obtain motion comparison information, and the mouth shape coefficient is set to better evaluate the rehabilitation training state of a patient with language disorder.
The shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ LRatio of,LRatio ofThe length of L3 is the second mouth shape factor;
the judgment accuracy is further improved by setting the two mouth shape coefficients;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculatedDifference (D)Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculatedDifference (D)
S3: calculate Pq1Difference (D)Absolute value of (1) and Pq2Difference (D)Of absolute value of (Pq)Andobtaining the action comparison information PqAnd
the action comparison information can be better acquired through the setting.
The specific processing process of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as FSign board
SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as FFruit of Chinese wolfberryI.e. voice comparison information FFruit of Chinese wolfberry
SS 3: obtaining the real-time voiceprint FFruit of Chinese wolfberryAnd standard voiceprint FSign boardComparing the similarity to obtain a similarity FRatio of
The specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 ═ MnAndobtaining training score information MnAnd
through the double analysis to the shape of mouth and pronunciation, language disorder patient's that can be more accurate rehabilitation training progress aassessment, and the different user demands of language disorder patient have been satisfied in the setting of multiple difference, let this system be worth using widely more.
The scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three persons with the maximum training scoring information by using a preset font and then displaying the personnel information;
the method can lead the language disorder patient to know the recovery state of other sick and friends through setting the ranking information, and can stimulate the rehabilitation training confidence of the language disorder patient, thereby accelerating the recovery speed of the language disorder patient.
In conclusion, when the system is used, the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from the standard library, the standard library stores training movie information at different levels, the system can better perform level evaluation on the language disorder of the patient with language disorder before voice training, through the setting, the system can better provide voice training contents for the patient with language disorder, the setting is easy to go wrong, the use experience of the system can be effectively improved, the frustration caused by the fact that the training contents are too difficult to the patient with language disorder can be effectively avoided, the movie playing module is used for receiving the movie information of corresponding levels sent by the grading learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, and synchronously plays through video content and sound to enable a patient with language disorder to simultaneously check the mouth shape condition of pronunciation, pronounces through observing mouth shape simulation, accelerates the rehabilitation training progress of the patient with language disorder, the sound identification module collects the voice information sent by the patient with language disorder at the moment, the image collection module operates to collect the mouth action information when the patient with language disorder sends the voice information, the voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information are both sent to the data receiving module, the data receiving module processes the received voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information, and generates voice comparison information and action comparison information, the voice comparison information and the action comparison information are all sent to the learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, and meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the language disorder patient can be more accurate to evaluate, different use requirements of the language disorder patient are met through setting of multiple differences, the system is enabled to be more worthy of popularization and use, the training scoring information is sent to the scoring display module, and the scoring display module is used for conducting training scoring display.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. An immersion type speech feedback training system based on AI is characterized by comprising a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;
the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;
the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;
the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;
the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.
2. The AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:
the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;
step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;
step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;
step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;
step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;
step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is larger than a preset value, the level is judged to belong to, and when two or more than two similarities are larger than the preset value, the similarity with high level is taken as a final judgment result.
3. The AI-based immersive verbal feedback training system of claim 1, wherein: the video playing module plays the video and simultaneously plays the audio information synchronously.
4. The AI-based immersive verbal feedback training system of claim 1, wherein: the standard library stores training movie and television information of different grades, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes collected mouth motion information into real-time mouth shape coefficient information through the image collection module, and the real-time mouth shape coefficient information is compared with prestored mouth shape coefficient information to obtain motion comparison information.
5. The AI-based immersive verbal feedback training system of claim 4, wherein: the shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:
the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;
step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;
step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;
step four: by the formula (L1+ L2)/(L1-L2) ═ LRatio of,LRatio ofThe length of L3 is the second mouth shape factor;
the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:
s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;
s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculatedDifference (D)Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculatedDifference (D)
S3: calculate Pq1Difference (D)Absolute value of (1) and Pq2Difference (D)Of absolute value of (Pq)Andobtaining the action comparison information PqAnd
6. the AI-based immersive verbal feedback training system of claim 1, wherein: the specific processing process of the data processing module for processing the voice comparison information is as follows:
SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as FSign board
SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as FFruit of Chinese wolfberryI.e. voice comparison information FFruit of Chinese wolfberry
SS 3: obtaining the real-time voiceprint FFruit of Chinese wolfberryAnd standard voiceprint FSign boardComparing the similarity to obtain a similarity FRatio of
7. The AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:
s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;
s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;
s03: by the formula M U1+ N U2 ═ MnAndobtaining training score information MnAnd
8. the AI-based immersive verbal feedback training system of claim 1, wherein: the scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three maximum training scoring information by using a preset font and then displaying the personnel information.
CN202110081356.2A2021-01-212021-01-21 An AI-based Immersive Verbal Feedback Training SystemActiveCN112885168B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110081356.2ACN112885168B (en)2021-01-212021-01-21 An AI-based Immersive Verbal Feedback Training System

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110081356.2ACN112885168B (en)2021-01-212021-01-21 An AI-based Immersive Verbal Feedback Training System

Publications (2)

Publication NumberPublication Date
CN112885168Atrue CN112885168A (en)2021-06-01
CN112885168B CN112885168B (en)2022-09-09

Family

ID=76051484

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110081356.2AActiveCN112885168B (en)2021-01-212021-01-21 An AI-based Immersive Verbal Feedback Training System

Country Status (1)

CountryLink
CN (1)CN112885168B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113744880A (en)*2021-09-082021-12-03邵阳学院Child language barrier degree management and analysis system
CN114306871A (en)*2021-12-302022-04-12首都医科大学附属北京天坛医院Artificial intelligence-based aphasia patient rehabilitation training method and system
CN116343996A (en)*2023-03-132023-06-27杭州南粟科技有限公司 Assisted rehabilitation system and method for speech and language barriers
CN116631238A (en)*2023-04-102023-08-22北京青牛技术股份有限公司Conversation training method and system based on seat conversation analysis
CN117672024A (en)*2023-11-292024-03-08杭州惠耳听力技术设备有限公司 A children's language rehabilitation training method and system based on speech and mouth shape recognition
WO2024261626A1 (en)*2023-06-222024-12-26Technology Innovation Institute – Sole Proprietorship LLCVisual speech recognition based communication training system
CN120089285A (en)*2025-04-272025-06-03山东特殊教育职业学院 A multi-logic interactive system for speech disorder rehabilitation training

Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH0713046U (en)*1993-07-191995-03-03武盛 豊永 Dictation word processor
US6356868B1 (en)*1999-10-252002-03-12Comverse Network Systems, Inc.Voiceprint identification system
US20080004879A1 (en)*2006-06-292008-01-03Wen-Chen HuangMethod for assessing learner's pronunciation through voice and image
CN101751809A (en)*2010-02-102010-06-23长春大学Deaf children speech rehabilitation method and system based on three-dimensional head portrait
CN102063903A (en)*2010-09-252011-05-18中国科学院深圳先进技术研究院Speech interactive training system and speech interactive training method
KR20140075994A (en)*2012-12-122014-06-20주홍찬Apparatus and method for language education by using native speaker's pronunciation data and thought unit
CN105982641A (en)*2015-01-302016-10-05上海泰亿格康复医疗科技股份有限公司Speech and language hypoacousie multi-parameter diagnosis and rehabilitation apparatus and cloud rehabilitation system
US9548048B1 (en)*2015-06-192017-01-17Amazon Technologies, Inc.On-the-fly speech learning and computer model generation using audio-visual synchronization
CN109872714A (en)*2019-01-252019-06-11广州富港万嘉智能科技有限公司A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
CN110349565A (en)*2019-07-022019-10-18长春大学A kind of auxiliary word pronunciation learning method and its system towards hearing-impaired people
CN110379221A (en)*2019-08-092019-10-25陕西学前师范学院A kind of pronunciation of English test and evaluation system
CN110853624A (en)*2019-11-292020-02-28杭州南粟科技有限公司Speech rehabilitation training system
CN111081080A (en)*2019-05-292020-04-28广东小天才科技有限公司Voice detection method and learning device
CN112233679A (en)*2020-10-102021-01-15安徽讯呼信息科技有限公司Artificial intelligence speech recognition system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH0713046U (en)*1993-07-191995-03-03武盛 豊永 Dictation word processor
US6356868B1 (en)*1999-10-252002-03-12Comverse Network Systems, Inc.Voiceprint identification system
US20080004879A1 (en)*2006-06-292008-01-03Wen-Chen HuangMethod for assessing learner's pronunciation through voice and image
CN101751809A (en)*2010-02-102010-06-23长春大学Deaf children speech rehabilitation method and system based on three-dimensional head portrait
CN102063903A (en)*2010-09-252011-05-18中国科学院深圳先进技术研究院Speech interactive training system and speech interactive training method
KR20140075994A (en)*2012-12-122014-06-20주홍찬Apparatus and method for language education by using native speaker's pronunciation data and thought unit
CN105982641A (en)*2015-01-302016-10-05上海泰亿格康复医疗科技股份有限公司Speech and language hypoacousie multi-parameter diagnosis and rehabilitation apparatus and cloud rehabilitation system
US9548048B1 (en)*2015-06-192017-01-17Amazon Technologies, Inc.On-the-fly speech learning and computer model generation using audio-visual synchronization
CN109872714A (en)*2019-01-252019-06-11广州富港万嘉智能科技有限公司A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
CN111081080A (en)*2019-05-292020-04-28广东小天才科技有限公司Voice detection method and learning device
CN110349565A (en)*2019-07-022019-10-18长春大学A kind of auxiliary word pronunciation learning method and its system towards hearing-impaired people
CN110379221A (en)*2019-08-092019-10-25陕西学前师范学院A kind of pronunciation of English test and evaluation system
CN110853624A (en)*2019-11-292020-02-28杭州南粟科技有限公司Speech rehabilitation training system
CN112233679A (en)*2020-10-102021-01-15安徽讯呼信息科技有限公司Artificial intelligence speech recognition system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴大江: "基于深度学习的唇读识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》*
吴大江: "基于深度学习的唇读识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 31 December 2018 (2018-12-31)*

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113744880A (en)*2021-09-082021-12-03邵阳学院Child language barrier degree management and analysis system
CN113744880B (en)*2021-09-082023-11-17邵阳学院 A management and analysis system for the degree of children's language impairment
CN114306871A (en)*2021-12-302022-04-12首都医科大学附属北京天坛医院Artificial intelligence-based aphasia patient rehabilitation training method and system
CN116343996A (en)*2023-03-132023-06-27杭州南粟科技有限公司 Assisted rehabilitation system and method for speech and language barriers
CN116631238A (en)*2023-04-102023-08-22北京青牛技术股份有限公司Conversation training method and system based on seat conversation analysis
WO2024261626A1 (en)*2023-06-222024-12-26Technology Innovation Institute – Sole Proprietorship LLCVisual speech recognition based communication training system
CN117672024A (en)*2023-11-292024-03-08杭州惠耳听力技术设备有限公司 A children's language rehabilitation training method and system based on speech and mouth shape recognition
CN120089285A (en)*2025-04-272025-06-03山东特殊教育职业学院 A multi-logic interactive system for speech disorder rehabilitation training

Also Published As

Publication numberPublication date
CN112885168B (en)2022-09-09

Similar Documents

PublicationPublication DateTitle
CN112885168A (en)Immersive speech feedback training system based on AI
CN114578969B (en)Method, apparatus, device and medium for man-machine interaction
Bänziger et al.Introducing the geneva multimodal emotion portrayal (gemep) corpus
US10706738B1 (en)Systems and methods for providing a multi-modal evaluation of a presentation
CN109887349B (en)Dictation auxiliary method and device
US20200286396A1 (en)Following teaching system having voice evaluation function
Davies et al.Facial composite production: A comparison of mechanical and computer-driven systems.
CN111212317A (en) A jump navigation method for video playback
CN102063903B (en)Speech interactive training system and speech interactive training method
CN111709358A (en) Teacher-student behavior analysis system based on classroom video
WO2019075826A1 (en)Internet teaching platform-based accompanying teaching system
CN111048095A (en)Voice transcription method, equipment and computer readable storage medium
CN105118354A (en)Data processing method for language learning and device thereof
CN113936236B (en) A video entity relationship and interaction recognition method based on multimodal features
TWI771632B (en) Learning support device, learning support method, and recording medium
TW202042172A (en)Intelligent teaching consultant generation method, system and device and storage medium
CN116088675A (en)Virtual image interaction method, related device, equipment, system and medium
CN111460220A (en)Method for making word flash card video and video product
Li et al.Multi-stream deep learning framework for automated presentation assessment
CN111554303A (en)User identity recognition method and storage medium in song singing process
CN113505604B (en)Online auxiliary experiment method, device and equipment for psychological education
CN108428458A (en)A kind of vocality study electron assistant articulatory system
CN114936952A (en)Digital education internet learning system
Sümer et al.Estimating presentation competence using multimodal nonverbal behavioral cues
CN109710735B (en) Reading content recommendation method and electronic device based on multiple social channels

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp