CN112885168A

Movatterモバイル変換

Info

Publication number: CN112885168A
Application number: CN202110081356.2A
Authority: CN
Inventors: 范虹; 刘蓝冰; 尉泽民; 严晓波; 茹文亚
Original assignee: Shaoxing Peoples Hospital
Current assignee: Shaoxing Peoples Hospital
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-06-01
Anticipated expiration: 2041-01-21
Also published as: CN112885168B

Abstract

Translated fromChinese

本发明公开了一种基于AI的沉浸式言语反馈训练系统，包括能力评级模块、分级学习模块、标准库、影视播放模块、声音辨识模块、影像采集模块、数据接收模块、数据处理模块、学习评分模块与评分展示模块；所述能力评级模块用于对语言障碍患者进行语音能力的评级，并生成语音评级信息，语音评级信息被发送到分级学习模块，所述分级学习模块接收到语音评级信息从标准库中调取对应级别的学习影视，所述标准库储存着不同等级的训练影视信息；所述影视播放模块用于接收分级学习模块控制标准库发送出对应级别的影视信息，影视播放模块接收到对应级别的影视信息后开始播放。本发明能够更好的帮助和促进语言障碍者进行康复训练。

The invention discloses an AI-based immersive speech feedback training system, comprising an ability rating module, a graded learning module, a standard library, a video playback module, a sound recognition module, an image acquisition module, a data receiving module, a data processing module, and a learning scoring module. Module and scoring display module; the ability rating module is used to rate the speech ability of patients with language disorders, and generate speech rating information, and the speech rating information is sent to the grading learning module, and the grading learning module receives the speech rating information from the The learning film and television information of the corresponding level is retrieved from the standard library, and the standard library stores the training film and television information of different levels; the film and television playback module is used to receive the hierarchical learning module and control the standard library to send out the film and television information of the corresponding level, and the film and television playback module receives Start playing after reaching the video information of the corresponding level. The present invention can better help and promote the language barrier persons to carry out rehabilitation training.

Description

Immersive speech feedback training system based on AI

Technical Field

The invention relates to the field of language training, in particular to an immersive speech feedback training system based on AI.

Background

Speech and language dysgenesis refers to a disorder of normal language acquisition patterns at an early stage of development, manifested as delays and abnormalities in pronunciation, language understanding, or development of language expression abilities that affect learning, occupation, and social functions. The situations are not caused by the abnormality of nerve or speech mechanisms, sensory impairment, mental development retardation or surrounding environmental factors, and a speech feedback training system is used for assisting rehabilitation training in the rehabilitation process of language disorder.

The existing speech feedback training system has a single function in the using process, so that the training effect is poor, the using requirement of a user cannot be met, certain influence is brought to the use of the speech feedback training system, and therefore, the immersive speech feedback training system based on the AI is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to solve current speech feedback training system, in the use, the function singleness leads to its training effect relatively poor, can not satisfy user's user demand, has brought the problem of certain influence for speech feedback training system's use, provides an immersive speech feedback training system based on AI.

The invention solves the technical problems through the following technical scheme that the system comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a sound identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from a standard library, and the standard library stores training movie information at different levels;

the video playing module is used for receiving video information of a corresponding level sent by the hierarchical learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, the sound identification module collects voice information sent by a language barrier patient at the moment, and meanwhile, the image collection module operates to collect mouth action information when the language barrier patient sends the voice information;

the voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient are both sent to the data receiving module, and the data receiving module processes the received voice information sent by the language disorder patient and the mouth action information when the voice information is sent by the language disorder patient to generate voice comparison information and action comparison information;

the voice comparison information and the action comparison information are both sent to a learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, the training scoring information is sent to a scoring display module, and the scoring display module is used for displaying training scoring.

Preferably, the specific process of the ability rating module for rating the ability of the language-handicapped patient is as follows:

the method comprises the following steps: the capability rating module is preset with different levels of text content information, including primary text, middle level text information, high level text information and normal text information, and the difficulty level is as follows: the primary characters are less than the middle-level character information and more than the high-level character information and less than the normal character information;

step two: sequentially selecting at least x groups of character information from the primary character information, the middle-level character information, the high-level character information and the normal character information from normal to high, wherein x is more than or equal to 5;

step three: displaying x groups of character information selected from the primary characters, the middle-level character information, the high-level character information and the normal character information, sequentially reading the x groups of character information in the primary characters, the x groups of character information in the middle-level character information, the x groups of character information in the high-level character information and the x groups of character information in the normal character information by a patient with language disorder, and respectively marking the reading of the character information from low to high as K1, K2, K3 and K4 according to the rank sequence;

step four: extracting preset pronunciation information of x groups of character information selected from primary characters, middle-level character information, high-level character information and normal character information, and respectively marking the preset pronunciation information as M1, M2, M3 and M4 according to the rank order;

step five: carrying out similarity matching on K1 and M1 to obtain similarity Km1, carrying out similarity matching on K2 and M2 to obtain similarity Km2, carrying out similarity matching on K3 and M3 to obtain similarity Km3, and carrying out similarity matching on K4 and M4 to obtain similarity Km 4;

step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is larger than a preset value, the level is judged to belong to, and when two or more than two similarities are larger than the preset value, the similarity with high level is taken as a final judgment result.

Preferably, the video playing module plays the audio information synchronously while playing the video.

Preferably, training movie and television information of different grades is stored in the standard library, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes the acquired mouth motion information into real-time mouth shape coefficient information through the image acquisition module, and the real-time mouth shape coefficient information is compared with the pre-stored mouth shape coefficient information to obtain motion comparison information.

Preferably, the shape factor comprises a first shape factor and a second shape factor, and the specific process of the shape factor is as follows:

the method comprises the following steps: marking the key point of the upper lip as a point A1, marking two corner points of the upper lip as a point A2 and a point A3 respectively, and acquiring an arc segment L1 through the set point A1, the point A2 and the point A3;

step two: marking the key point of the lower lip as a point B1, marking two corner points of the lower lip as a point B2 and a point B3 respectively, and acquiring an arc line segment L2 through a set point B1, a set point B2 and a set point B3;

step three: connecting the point A1 with the point A2 to obtain a line segment L3, measuring the radians of the arc line segment L1 and the arc line segment L2, and measuring the length of the line segment L3;

step four: by the formula (L1+ L2)/(L1-L2) ═ L_{Ratio of}，L_{Ratio of}The length of L3 is the second mouth shape factor;

the specific process of comparing the real-time mouth shape coefficient information with the pre-stored mouth shape coefficient information by the data processing module is as follows:

s1: extracting a real-time first mouth shape coefficient, a real-time second mouth shape coefficient, a preset first mouth shape coefficient and a preset second mouth shape coefficient, marking the real-time first mouth shape coefficient as P1, marking the real-time second mouth shape coefficient as P2, marking the preset first mouth shape coefficient as Q1 and marking the preset second mouth shape coefficient as Q2;

s2: the difference Pq1 between the real-time first shape factor P1 and the preset first shape factor labeled Q1 is calculated_{Difference (D)}Then, the difference Pq2 between the real-time second shape coefficient P2 and the preset second shape coefficient Q2 is calculated_{Difference (D)}；

S3: calculate Pq1_{Difference (D)}Absolute value of (1) and Pq2_{Difference (D)}Of absolute value of (Pq)_Andobtaining the action comparison information Pq_And。

preferably, the specific processing procedure of the data processing module for processing the voice comparison information is as follows:

SS 1: extracting standard voice information of the film and television information in the pre-storage library, performing voiceprint processing on the standard voice information to obtain standard voiceprint, and marking the standard voiceprint as F_{Sign board}；

SS 2: the voice information of the preset character content read by the language barrier patient and acquired by the voice identification module is subjected to culture-filling processing to obtain real-time voiceprints, and the real-time voiceprints are marked as F_{Fruit of Chinese wolfberry}I.e. voice comparison information F_{Fruit of Chinese wolfberry}；

SS 3: obtaining the real-time voiceprint F_{Fruit of Chinese wolfberry}And standard voiceprint F_{Sign board}Comparing the similarity to obtain a similarity F_{Ratio of}。

Preferably, the specific process of the learning scoring module for processing the voice comparison information and the motion comparison information to generate the training scoring information is as follows:

s01: extracting the obtained voice comparison information and action comparison information, and respectively marking the obtained voice comparison information and action comparison information as M and N;

s02: in order to highlight the importance of voice comparison, a correction value U1 is given to the voice comparison information, a correction value U2 is given to the action comparison information, U1 is greater than U2, and U1+ U2 is 1;

s03: by the formula M U1+ N U2 ═ Mn_Andobtaining training score information Mn_And。

preferably, the scoring display module ranks all the received training scoring information from high to low, and displays the personnel information corresponding to the first three maximum training scoring information after being amplified by a preset font.

Compared with the prior art, the invention has the following advantages: the immersive speech feedback training system based on AI can better evaluate the level of language disorder of a patient with language disorder before performing speech training, through the arrangement, the system can better provide speech training contents for the patient with language disorder, the arrangement is easy to achieve, the use experience of the system can be effectively improved, the frustration caused by the difficulty of the training contents to the patient with language disorder can be effectively avoided, the mouth shape condition of pronunciation can be simultaneously checked for the patient with language disorder through synchronous playing of movie contents and sound, pronunciation is performed through observing mouth shape simulation, the rehabilitation training progress of the patient with language disorder is accelerated, meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the patient with language disorder can be more accurately evaluated, different arrangements meet different use requirements of the patient with language disorder, the system is more worthy of popularization and application.

Drawings

FIG. 1 is a system block diagram of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

As shown in fig. 1, the present embodiment provides a technical solution: an immersion type speech feedback training system based on AI comprises a capability rating module, a grading learning module, a standard library, a film and television playing module, a voice identification module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

the movie playing module is used for receiving movie information of a corresponding level sent by the hierarchical learning module control standard library, the movie playing module starts playing after receiving the movie information of the corresponding level, the movie playing module performs amplification close-up processing on mouths of characters in images when playing the movie information, so that mouth shape imitation of patients with language disorder is facilitated, the sound identification module collects voice information sent by the patients with language disorder at the moment, and meanwhile, the image collection module operates to collect mouth action information when the patients with language disorder send the voice information;

The specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:

step five: when any one of the similarity of Km1, Km2, Km3 and Km4 is greater than a preset value, the level is judged to belong to, and when two or more than two similarities are greater than the preset value, the similarity with high level is taken as a final judgment result;

before carrying out the pronunciation training, better carry out the rank assessment of language obstacle to language obstacle patient, through this kind of setting, let that this system is better provide the pronunciation training content for language obstacle patient, from easy to difficult setting, can effectually promote the use of this system and experience, effectively avoided the training content too difficult to lead to the fact the frustration for language obstacle patient in addition.

The movie & TV broadcast module is when carrying out the image broadcast, and audio information is broadcast in step, and the effectual language disorder patient pronunciation that has avoided the sound painting desynchrony to make mistakes to through movie & TV content and sound synchronous broadcast, let can look over the mouth shape situation of pronunciation simultaneously for language disorder patient, carry out pronunciation through observing the simulation of mouth shape, accelerated language disorder patient's rehabilitation training progress.

The standard library stores training movie and television information of different grades, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes collected mouth motion information into real-time mouth shape coefficient information through the image collecting module, the real-time mouth shape coefficient information is compared with prestored mouth shape coefficient information to obtain motion comparison information, and the mouth shape coefficient is set to better evaluate the rehabilitation training state of a patient with language disorder.

The shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:

the judgment accuracy is further improved by setting the two mouth shape coefficients;

S3: calculate Pq1_{Difference (D)}Absolute value of (1) and Pq2_{Difference (D)}Of absolute value of (Pq)_Andobtaining the action comparison information Pq_And；

the action comparison information can be better acquired through the setting.

The specific processing process of the data processing module for processing the voice comparison information is as follows:

The specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:

s03: by the formula M U1+ N U2 ═ Mn_Andobtaining training score information Mn_And；

through the double analysis to the shape of mouth and pronunciation, language disorder patient's that can be more accurate rehabilitation training progress aassessment, and the different user demands of language disorder patient have been satisfied in the setting of multiple difference, let this system be worth using widely more.

The scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three persons with the maximum training scoring information by using a preset font and then displaying the personnel information;

the method can lead the language disorder patient to know the recovery state of other sick and friends through setting the ranking information, and can stimulate the rehabilitation training confidence of the language disorder patient, thereby accelerating the recovery speed of the language disorder patient.

In conclusion, when the system is used, the ability rating module is used for rating the voice ability of the patient with language disorder and generating voice rating information, the voice rating information is sent to the grading learning module, the grading learning module receives the voice rating information and calls learning movies at corresponding levels from the standard library, the standard library stores training movie information at different levels, the system can better perform level evaluation on the language disorder of the patient with language disorder before voice training, through the setting, the system can better provide voice training contents for the patient with language disorder, the setting is easy to go wrong, the use experience of the system can be effectively improved, the frustration caused by the fact that the training contents are too difficult to the patient with language disorder can be effectively avoided, the movie playing module is used for receiving the movie information of corresponding levels sent by the grading learning module control standard library, the video playing module starts playing after receiving the video information of the corresponding level, and synchronously plays through video content and sound to enable a patient with language disorder to simultaneously check the mouth shape condition of pronunciation, pronounces through observing mouth shape simulation, accelerates the rehabilitation training progress of the patient with language disorder, the sound identification module collects the voice information sent by the patient with language disorder at the moment, the image collection module operates to collect the mouth action information when the patient with language disorder sends the voice information, the voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information are both sent to the data receiving module, the data receiving module processes the received voice information sent by the patient with language disorder and the mouth action information when the patient with language disorder sends the voice information, and generates voice comparison information and action comparison information, the voice comparison information and the action comparison information are all sent to the learning scoring module, the learning scoring module processes the voice comparison information and the action comparison information to generate training scoring information, and meanwhile, through double analysis of mouth shape and pronunciation, the rehabilitation training progress of the language disorder patient can be more accurate to evaluate, different use requirements of the language disorder patient are met through setting of multiple differences, the system is enabled to be more worthy of popularization and use, the training scoring information is sent to the scoring display module, and the scoring display module is used for conducting training scoring display.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An immersion type speech feedback training system based on AI is characterized by comprising a capability rating module, a grading learning module, a standard library, a movie playing module, a voice recognition module, an image acquisition module, a data receiving module, a data processing module, a learning scoring module and a scoring display module;

2. The AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the capacity rating module for rating the capacity of the language barrier patient is as follows:

3. The AI-based immersive verbal feedback training system of claim 1, wherein: the video playing module plays the video and simultaneously plays the audio information synchronously.

4. The AI-based immersive verbal feedback training system of claim 1, wherein: the standard library stores training movie and television information of different grades, wherein the training movie and television information comprises mouth shape coefficient information corresponding to character information, the data processing module processes collected mouth motion information into real-time mouth shape coefficient information through the image collection module, and the real-time mouth shape coefficient information is compared with prestored mouth shape coefficient information to obtain motion comparison information.

5. The AI-based immersive verbal feedback training system of claim 4, wherein: the shape coefficient comprises a first shape coefficient and a second shape coefficient, and the specific process of the shape coefficient is as follows:

6. the AI-based immersive verbal feedback training system of claim 1, wherein: the specific processing process of the data processing module for processing the voice comparison information is as follows:

7. The AI-based immersive verbal feedback training system of claim 1, wherein: the specific process of the learning scoring module for processing the voice comparison information and the action comparison information to generate training scoring information is as follows:

8. the AI-based immersive verbal feedback training system of claim 1, wherein: the scoring display module is used for ranking all the received training scoring information from high to low, amplifying the personnel information corresponding to the first three maximum training scoring information by using a preset font and then displaying the personnel information.