Movatterモバイル変換


[0]ホーム

URL:


CN112232083A - Man-machine conversation spoken language evaluation system - Google Patents

Man-machine conversation spoken language evaluation system
Download PDF

Info

Publication number
CN112232083A
CN112232083ACN202011100849.8ACN202011100849ACN112232083ACN 112232083 ACN112232083 ACN 112232083ACN 202011100849 ACN202011100849 ACN 202011100849ACN 112232083 ACN112232083 ACN 112232083A
Authority
CN
China
Prior art keywords
module
dialogue
evaluation
user
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011100849.8A
Other languages
Chinese (zh)
Other versions
CN112232083B (en
Inventor
王鑫
许昭慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Original Assignee
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Squirrel Classroom Artificial Intelligence Technology Co LtdfiledCriticalShanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority to CN202011100849.8ApriorityCriticalpatent/CN112232083B/en
Publication of CN112232083ApublicationCriticalpatent/CN112232083A/en
Application grantedgrantedCritical
Publication of CN112232083BpublicationCriticalpatent/CN112232083B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请涉及一种人机对话口语测评系统,是一种基于人机对话、语音测评相关技术,应用于口语测评以场景驱动的任务导向型对话系统。本申请的测评系统具备三个主要特征:会话式、场景驱动、和任务导向。通过与用户的自然语言交流沟通的任务导向型对话系统,可以了解学生用户实际运用语言的能力和综合运用英语进行交际的能力,对学生用户的口语学习及教师进行口语教学起到反拨效应。

Figure 202011100849

The present application relates to a human-machine dialogue oral language evaluation system, which is a task-oriented dialogue system driven by scenarios based on human-machine dialogue and voice evaluation related technologies. The evaluation system of this application has three main characteristics: conversational, scene-driven, and task-oriented. Through the task-oriented dialogue system that communicates with the user's natural language, it is possible to understand the student user's ability to actually use language and the ability to comprehensively use English for communication, which has a washback effect on the student user's oral language learning and the teacher's oral language teaching.

Figure 202011100849

Description

Man-machine conversation spoken language evaluation system
Technical Field
The application relates to the technical field of human-computer interaction, in particular to a conversation type human-computer interaction spoken language evaluation system.
Background
There are two main types of spoken language testing: interview and recording oral test. The interview has high validity, but is time-consuming and labor-consuming to organize, in a large-scale oral test, a man-machine interaction mode is adopted, examinees only need to complete answering and full-automatic intelligent scoring on hearing and oral test questions through a computer and headset equipment, judgment can be conducted from multiple dimensions such as sentence prosody, completeness and accuracy, and a paper answer evaluation report can be generated.
In online voice culture products, the adoption of a voice recognition technology and a voice evaluation technology is quite common, the pronunciations of student users and the pronunciations of machines are compared and graded in a mode of 'listening to original voices, reading/repeating, grading of a system, multi-color visual feedback and adjusting', and the purpose of improving the English listening comprehension and the pronunciations of students is achieved under repeated practice.
Disclosure of Invention
The inventor finds that oral English is different from other courses through long-term observation and research, the oral English is not mainly used for teaching knowledge, English is a carrier of knowledge and culture, and a student user needs to use a language expression thought to communicate with a person so as to achieve the purpose of real culture. The ability of the students to actually use the language and the ability of the students to comprehensively use the English to communicate are developed, and the ability becomes a main teaching task of oral English. Examination and evaluation should serve for teaching, however, the english evaluation technology applied to human-computer interaction has the following disadvantages:
the examination method comprises the steps of firstly, examining the spoken language level of a student through prerecorded voice examination questions, wherein the form is single, the questions are specified in advance, the examination content is in an instruction form, the student passively receives the examination questions and scores, the examination-requiring spoken language examination is generally that the student speaks and auditors listen to the examination questions and then marks a score for the student, and the teaching and learning conditions cannot be comprehensively reflected. In the interview, emotional interaction between the examiner and the examinee can also interfere with the evaluation result.
And secondly, traditional classroom or online oral assessment is final assessment of an examination taking type, is examination-question-driven assessment experience, judges a learning result of a student in a first study period through a one-time end-of-term examination, or determines the class level of the student during learning through a diagnosis test before the beginning of the study period, and then upgrades the students one by one.
Third, through following reading/repeating the activity in the study, the student user contrasts pronunciation of oneself and the pronunciation of machine, revises the exercise of oneself pronunciation repeatedly from grading feedback, is helped to english listening ability and pronunciation, but to the ability of student's actual application language and the ability of using english comprehensively to carry out the interpersonal, but can't survey student's actual level through current technique, has not produced the enlightening effect to spoken english study even more.
In view of the above defects in the prior art, the present application provides a conversational human-computer interactive spoken language assessment system, which is a task-oriented dialog system based on human-computer dialog and speech assessment correlation techniques and applied to spoken language assessment and scene driving. The evaluation system of the present application has three main features: conversational, scene driven, and task oriented. Through the task-oriented dialogue system communicated with the natural language of the user, the ability of the student user for actually utilizing the language and the ability of comprehensively utilizing the English for communication can be known, and a reverse dialing effect is achieved for the oral learning of the student user and the oral teaching of a teacher.
The application provides a conversational human-computer interaction spoken language evaluation system, including the dialog system, the dialog system includes: a voice recognition module configured to recognize a voice input of a user and convert the voice input into text; an intent understanding module configured to enable semantic understanding of the converted text to identify user intent; a dialog management module configured to generate a corresponding system action based on an understanding result of the intent understanding module; the language generation module is configured to convert the system action generated by the dialogue management module into natural language; and a language synthesis module configured to convert the natural language into speech and feed back to the user.
In some embodiments, optionally, the intent understanding module is further configured to enable slot filling, wherein a slot is information that needs to be completed to translate the user intent into an explicit user instruction during the session.
In some embodiments, optionally, the intent understanding module is further configured to enable user intent understanding from the user representation and/or the scenarized information.
In some embodiments, optionally, the dialog management module further includes a dialog state tracking module configured to be able to represent the phase of the dialog and to fuse context information of the dialog process.
In some embodiments, optionally, the dialog management module further comprises a dialog policy learning module configured to generate a next operation of the system based on the current dialog state.
In some embodiments, optionally, further comprising an evaluation system, the evaluation system comprising: the scene dialogue voice and semantic evaluation module is configured to compare the similarity of texts converted from the user voice according to standard contents of the voice and the semantic and obtain a voice evaluation score and a semantic evaluation score; the grammar evaluating and error checking module is configured to be capable of carrying out grammar checking on the text converted from the user voice and obtaining grammar evaluation scores; and the easy mixing evaluating module is configured to mark the error of the easy mixing on the text converted from the user voice so as to evaluate the easy mixing.
In some embodiments, optionally, the dialog management module is further configured to generate a corresponding system action based on the evaluation result of the evaluation system.
In some embodiments, optionally, the speech evaluation score is higher when the similarity between the user speech and the standard speech phoneme is higher; and when the similarity between the content expressed by the user and the comparison reference answer is higher, the semantic evaluation score is higher.
In some embodiments, optionally, the syntax evaluation and error checking module is further configured to examine logical relationships in the sentence, the logical relationships including one or more of the following: matching of subjects and predicates, temporal expression, syntactic structure, single or plural number.
In some embodiments, the conversational, human-computer interactive, spoken language assessment system is optionally a stand-alone and/or online configuration-based computer system to develop an assessment of language-class content.
Compared with the prior art, the beneficial effect of this application lies in at least:
the first and the second application are conversation type human-computer interaction spoken language evaluation systems, a large number of communication opportunities with different virtual people are provided through human-computer conversation, communication scenes are created, a positive reverse dialing effect can be achieved for learning and teaching of student users through repeated communication practices, the learning attitude of students can be changed through a tested reverse dialing effect, and the enthusiasm of learning and using spoken language of the students at ordinary times is stimulated. Furthermore, the conversational human-computer interactive spoken language evaluation system can also avoid emotional interaction between human examiners and examinees.
Secondly, the application is a scene-driven spoken language assessment system, which is a meaningful technology capable of reflecting the content of the taught teaching and reflecting the learning content and the learning process. Not only can a detailed evaluation feedback be obtained in the process of completing the learning task, but also the following steps are included: the method finds the problems of the student users in voice, tone, communication and expression, analyzes the reasons of the problems, can collect rich student user voice and the adopted communication strategy, and has great significance for providing personalized guidance for the student users by follow-up teachers. Moreover, the scene-driven assessment can reduce the tension and anxiety of the student users, and truly reflect the true level and performance of the student users.
Third, the application is a task-oriented spoken language evaluation system, task-oriented spoken language activities are expressed in a meaning-weighted manner, but not in a language-normalized manner, so that student users can experience success easily and experience achievement, inherent learning interest and desire are stimulated, better performance is achieved, interactive spoken English emphasizes the opportunity of providing in-person experience for the student users, knowledge is searched and problems are found from participating in real natural and interactive activities, own communication modes, concepts and strategies are constructed, and the purpose of information transmission and idea expression learning is achieved by completing tasks.
The conception, specific structure and technical effects of the present application will be further described in conjunction with the accompanying drawings to fully understand the purpose, characteristics and effects of the present application.
Drawings
The present application will become more readily understood from the following detailed description when read in conjunction with the accompanying drawings, wherein like reference numerals designate like parts throughout the figures, and in which:
fig. 1 is a schematic structural diagram of a functional module according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a program module according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The present application may be embodied in many different forms of embodiments and the scope of the present application is not limited to only the embodiments set forth herein. All other embodiments that can be derived by a person skilled in the art from the embodiments given herein without making any creative effort shall fall within the protection scope of the present application.
Ordinal terms such as "first" and "second" are used herein only for distinguishing and identifying, and do not have any other meanings, unless otherwise specified, either by indicating a particular sequence or by indicating a particular relationship. For example, the term "first component" does not itself imply the presence of a "second component", nor does the term "second component" itself imply the presence of a "first component".
Fig. 1 is a schematic structural diagram of a functional module according to an embodiment of the present application. As shown in FIG. 1, the conversational, human-computer interactive, spoken language assessment system may be based on stand-alone and/or online configured computer systems to develop an assessment of language-class content, including dialog systems and assessment systems.
The dialog system includes a speech recognition module, an intent understanding module, a dialog management module, a language generation module, and a language synthesis module. The voice recognition module can recognize the voice input of the user and convert the voice input into text; the intention understanding module can carry out semantic understanding on the converted text to identify the intention of the user; the dialogue management module can generate corresponding system action based on the understanding result of the intention understanding module; the language generation module can convert the system action generated by the dialogue management module into natural language; the language synthesis module can convert the natural language into voice and feed back to the user.
In some embodiments, the speech recognition module is responsible for recognizing the student user's speech input and converting it into text; the intention understanding module is responsible for carrying out semantic understanding on the text converted from the voice of the student user, and comprises user intention identification and slot filling, wherein the slot is information required to be completed by converting the user intention into an explicit user instruction in the conversation process; the dialogue management module is responsible for the management of the whole dialogue, including dialogue state tracking and dialogue strategy learning; the language generation module is responsible for converting the system action selected by the conversation strategy module into a natural language; the language synthesis module is responsible for converting the text into voice and finally feeding back the voice to the student users. The intent understanding module is also capable of user intent understanding based on the user representation and/or the scenarization information.
The intention can be regarded as a multi-classification problem based on texts, namely, the corresponding category is determined according to the user expression, the intention can be understood as the function or flow of a certain application, the request and the purpose of a user are mainly met, and when the student user expresses My name is Carol or expresses This Carol, the intention of self introduction can be triggered. The slot position is information which is required to be completed by converting the preliminary user intention into a definite user instruction in a multi-turn conversation process, one slot position corresponds to one information which is required to be obtained in the process of processing one thing, in the My name is Carol expressed by a student user, the Carol represents the slot position of a name, an intention understanding module not only inputs voice, but also considers user portrait and scene information, and the intention understanding accuracy can be improved through a more comprehensive context.
The user representation may include: name, grade, location, spoken horizontal dimensions of student user, such as: accuracy of sound, completeness, fluency, etc., as well as behavioral characteristics, sexual hobbies, etc. The user portrait can be updated in real time in each round of conversation, the context information is influenced in the next round of conversation and is combined with the context information, the function that the virtual human has memory is achieved, along with the increase of the frequency of the conversation, the system has more understanding on the student users, and the reaction given to the student users by the virtual human is smoother.
The dialog management module may also include a dialog state tracking module and/or a dialog policy learning module. The dialog state tracking module can represent the stage of the dialog and fuse the context information of the dialog process. The dialogue strategy learning module can generate the next operation of the system according to the current dialogue state. In some embodiments, the dialog state tracking module is used for representing the current dialog state information, is a representation of the current whole dialog stage in the dialog system, and fuses context information of the dialog process; and the conversation strategy learning module is used for generating the next operation of the system according to the current conversation state.
The evaluation system can comprise a scene dialogue voice and semantic evaluation module, a grammar evaluation and error check module and an easy mixing evaluation module. The scene dialogue voice and semantic evaluation module can compare the similarity of the text converted from the user voice according to the standard contents of the voice and the semantic and obtain a voice evaluation score and a semantic evaluation score; the grammar evaluation and error check module can carry out grammar check on the text converted from the user voice and obtain grammar evaluation scores; the upmixing evaluating module can mark the upmixing error to the text converted from the user voice so as to evaluate the upmixing.
In some embodiments, the evaluation system may include three modules, namely, a speech and semantic evaluation module of a scenario dialog, a grammar evaluation module, an error check module and a remix evaluation module, where the speech and semantic evaluation module of the scenario dialog is responsible for comparing the similarity of texts converted from the speech of a student user with respect to the standard contents of the speech and the semantics, and when the similarity of the speech of the user and the phonemes of the standard speech is higher, the speech evaluation score is higher, and when the similarity of the expressed contents of the user and the reference answers is higher, the semantic evaluation score is higher. The grammar evaluation and error check are responsible for scoring and indicating errors of texts converted from the voices of the student users aiming at the errors of the grammars, mainly examining logical relations in sentences, including single-complex numbers, main and predicate collocation, temporal expression, use of syntactic structures and the like, wherein the evaluation score is higher when the errors of the grammars are less. The easy-mixing evaluation module is responsible for marking the error of the easy mixing of the text converted from the voice of the student user, so that the evaluation of the easy mixing is realized, the error which is frequently made by Chinese students is required to be brought into the training corpus of the model in the voice recognition module, and the voice recognition module is prevented from actively correcting the error.
The dialogue management module can generate corresponding system action according to the evaluation result of the evaluation system. In some embodiments, the evaluation results of the three modules of the evaluation system enter the dialogue management module of the dialogue system, and the dialogue management module can respond by combining the evaluation target and the strategy after obtaining the evaluation result of the evaluation system on the user voice.
Fig. 2 is a schematic structural diagram of a program module according to an embodiment of the present application. As shown in fig. 2, the system first takes out a first test point, the test point corresponds to a task to be completed in a scene, and the student user sees the description of the task on the front-end interface.
In some embodiments, in the conversational, human-computer interactive spoken language assessment system: the description of the task is provided with conversation background and scene information for the student user, the student user is used for completing a real, natural and interpersonal task type activity, the current end system is virtual and real-time, and the student user can obtain general experience with real and human conversation from rich three-dimensional information.
By adopting the technical scheme: the system starts to carry out conversation according to the information of the context, the user and the system can start to ask or ask questions according to the requirements of different examination points, when the voice of the student user is converted into a text through voice recognition, and after the intention is recognized through the intention recognition module, the text can obtain the scores and the error contents of the voice, the semanteme, the grammar and the easy mixing multi-dimension through the evaluation module, and the new information can be updated to the user portrait.
In some embodiments, in the conversational human-computer interactive spoken language assessment system, the assessment module includes: speech and semantic evaluation, grammar evaluation and error checking, and remix evaluation of scene dialogs. The evaluation purpose is to show the evaluation report need after the evaluation is finished, and also can be used as the information of the virtual human response dialogue, so that the language complexity, the speed or the intelligibility of the human dialogue can be automatically adjusted according to different dialogue objects.
By adopting the technical scheme: after the voice of the student user is converted into the text through voice recognition, the text obtains the intention of conversation through the intention recognition, the slot position is extracted according to the expression of the student user, the voice of the student user is understood, the content of the next conversation is determined, the virtual person is made to speak through language generation, and after the whole process is circulated through a plurality of examination points until the evaluation is finished, an evaluation report is generated.
In some embodiments, in the conversational human-computer interactive spoken language assessment system: the evaluation report comprises: the basic information of the student and the evaluation result of the spoken language level process can indicate the position of the speech and grammar errors of the student user, such as abnormal speech, inaccurate tone, frequently made grammar errors and the like, and further can analyze the capability of comprehensively using the language and the used communication strategy from the behavior characteristics of the student user.
In some embodiments, the conversational, human-computer interactive spoken language assessment system may include: a dialogue system and an evaluation system. In practice, as an example, the working process is as follows:
the system takes out a first examination point at first, the examination point corresponds to a task to be completed in a scene, and a student user sees a description of the task on a front-end interface, such as: the examination point is that strangers are acquainted through English expression, the system can display a proper conversation scene through rich text or virtual reality, and student users see the following task descriptions: recognize new friends, politely greet, and ask the other party for their name and where from.
The system starts a dialog based on the context information, and the examination point is set to let the user start asking questions, when the student user says "Hello, I'm ray. After the voice of the student user is converted into text through voice recognition, the text obtains the intention of a conversation through intention recognition and is called, and scores of the voice, the semanteme, the grammar and the easy mixing sound multi-dimension are obtained through an evaluation module and are updated to the user portrait.
The intention recognition is to make a call, and the slot position is extracted according to the expression of the student user, namely the slot position is extracted as a name, the parameter value is Ray, after the voice of the student user is understood, the content of the next conversation needs to be determined, the virtual person is made to say through language generation, and after a plurality of examination points are taken out in the whole process in a circulating mode until the evaluation is finished, an evaluation report is generated.
In some embodiments, the method further comprises: when the system says Where do you come from? Then, the student user responds a place name of a hometown small city, which is beyond the comprehensible range of the system, the system integrates the context information of the conversation process according to the current stage of the whole conversation in the conversation state tracking module, the conversation strategy learning module adopts a general response strategy, and the system responds to Wow!through a virtual human! This is a nice place! To keep the session ongoing.
In some embodiments, the method may further comprise: when a student user says that the system is not allowed to use a mobile phone when the student user takes the airplane in a scene of taking the airplane, the student user is informed that the social interaction specification score of the student user is low in the user image, and the student user can preferentially select serious counseling response in the conversation strategy selection.
In some embodiments, the various methods, processes, modules, apparatuses, devices, or systems described above may be implemented or performed in one or more processing devices (e.g., digital processors, analog processors, digital circuits designed to process information, analog circuits designed to process information, state machines, computing devices, computers, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices that perform some or all of the operations of a method in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for performing one or more operations of a method. The above description is only for the preferred embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present application, and equivalent alternatives or modifications according to the technical solutions and the inventive concepts of the present application, and all such alternatives or modifications are encompassed in the scope of the present application.
Embodiments of the present application may be implemented in hardware, firmware, software, or various combinations thereof. The present application may also be implemented as instructions stored on a machine-readable medium, which may be read and executed using one or more processing devices. In one implementation, a machine-readable medium may include various mechanisms for storing and/or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable storage medium may include read-only memory, random-access memory, magnetic disk storage media, optical storage media, flash-memory devices, and other media for storing information, and a machine-readable transmission medium may include various forms of propagated signals (including carrier waves, infrared signals, digital signals), and other media for transmitting information. While firmware, software, routines, or instructions may be described in the above disclosure in terms of performing certain exemplary aspects and embodiments of certain actions, it will be apparent that such descriptions are merely for convenience and that such actions in fact result from a machine device, computing device, processing device, processor, controller, or other device or machine executing the firmware, software, routines, or instructions.
This specification discloses the application using examples in which one or more examples are described or illustrated in the specification and drawings. Each example is provided by way of explanation of the application, not limitation of the application. In fact, it will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope or spirit of the application. For instance, features illustrated or described as part of one embodiment, can be used with another embodiment to yield a still further embodiment. It is therefore intended that the present application cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. The above description is only for the specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application.

Claims (10)

Translated fromChinese
1.一种人机对话口语测评系统,其特征在于包括:1. a human-machine dialogue oral language evaluation system is characterized in that comprising:语音识别模块,所述语音识别模块被配置为能够识别学生用户的语音输入并转换成文本;a speech recognition module configured to recognize and convert the student user's speech input into text;意图理解模块,所述意图理解模块被配置为能够结合用户画像和场景化信息对转换后的文本进行语义理解,以识别所述学生用户在口语对话中的用户意图,所述用户画像包括用户的口语水平维度,所述场景化信息包括当前对话发生的虚拟场景;An intent understanding module, the intent understanding module is configured to be able to perform semantic understanding on the converted text in combination with the user portrait and the contextualized information to identify the user intent of the student user in the spoken dialogue, the user portrait including the user's Spoken language level dimension, the scene information includes the virtual scene in which the current dialogue occurs;对话管理模块,所述对话管理模块被配置为能够基于所述意图理解模块的理解结果进行相应的语音回应;a dialogue management module, the dialogue management module is configured to be able to make a corresponding voice response based on the understanding result of the intention understanding module;语言生成模块,所述语言生成模块被配置为能够将所述对话管理模块产生的系统动作转化为自然语言;以及a language generation module configured to convert the system actions generated by the dialog management module into natural language; and语言合成模块,所述语言合成模块被配置为能够将自然语言转换成语音,并反馈给所述学生用户;a language synthesis module, the language synthesis module is configured to be able to convert natural language into speech and feed back to the student user;其中,所述对话管理模块包括对话状态跟踪模块和对话策略学习模块,当所述学生用户的回应超出了所述意图理解模块能够理解的范围时,所述对话状态跟踪模块根据当前整个对话所处阶段并融合对话过程的上下文信息表征当前对话状态,对话策略学习模块根据所述当前对话状态采用通用回应策略并通过虚拟人回应通用语句来保持会话继续进行。Wherein, the dialogue management module includes a dialogue state tracking module and a dialogue strategy learning module. When the response of the student user exceeds the range that can be understood by the intention understanding module, the dialogue state tracking module is based on the current position of the entire dialogue. stage and integrate the context information of the dialogue process to represent the current dialogue state, and the dialogue strategy learning module adopts a general response strategy according to the current dialogue state and uses the virtual person to respond to the general sentence to keep the conversation going.2.根据前述权利要求中任一项所述的系统,其特征在于:2. A system according to any preceding claim, wherein:所述意图理解模块进一步被配置为能够进行槽位填充,其中,所述槽位是在对话过程中将所述用户意图转化为明确的用户指令所需要补全的信息。The intent understanding module is further configured to be capable of filling a slot, wherein the slot is information that needs to be completed to convert the user intent into an explicit user instruction during a dialog.3.根据前述权利要求中任一项所述的系统,其特征在于:3. The system of any preceding claim, wherein:所述对话管理模块还包括对话状态跟踪模块,所述对话状态跟踪模块被配置为能够表示对话所处的阶段,并融合对话过程的上下文信息。The dialogue management module further includes a dialogue state tracking module, the dialogue state tracking module is configured to be able to represent the stage of the dialogue and integrate contextual information of the dialogue process.4.根据前述权利要求中任一项所述的系统,其特征在于,还包括测评系统,所述测评系统包括:4. The system of any preceding claim, further comprising an assessment system comprising:情景对话语音和语义评测模块,所述情景对话语音和语义评测模块被配置为能够根据语音和语义的标准内容,对用户语音转换成的文本进行相似度对比,并得到语音测评得分和语义测评得分。A situational dialogue speech and semantic evaluation module, the situational dialogue speech and semantic evaluation module is configured to be able to compare the similarity of the text converted from the user's speech according to the standard content of speech and semantics, and obtain a speech evaluation score and a semantic evaluation score .5.根据前述权利要求中任一项所述的系统,其特征在于,所述测评系统还包括:5. The system of any preceding claim, wherein the assessment system further comprises:语法评测和错误检查模块,所述语法评测和错误检查模块被配置为能够对用户语音转换成的文本进行语法检查,并得到语法测评得分。A grammar evaluation and error checking module, the grammar evaluation and error checking module is configured to be able to perform grammar check on the text converted from the user's speech, and obtain a grammar evaluation score.6.根据前述权利要求中任一项所述的系统,其特征在于,所述测评系统还包括:6. The system of any preceding claim, wherein the assessment system further comprises:易混音评测模块,所述易混音评测模块被配置为能够对用户语音转换成的文本标示出易混音的错误,以进行易混音的测评。An easy-to-mix evaluation module, the easy-to-mix evaluation module is configured to be able to mark the easy-to-mix errors on the text converted from the user's voice, so as to perform the easy-to-mix evaluation.7.根据前述权利要求中任一项所述的系统,其特征在于:7. The system of any preceding claim, wherein:所述对话管理模块进一步被配置为能够根据所述测评系统的测评结果产生相应的系统动作。The dialogue management module is further configured to generate corresponding system actions according to the evaluation results of the evaluation system.8.根据前述权利要求中任一项所述的系统,其特征在于:8. The system of any preceding claim, wherein:当用户语音和标准语音音素的相似度越高,所述语音测评得分越高;以及When the similarity between the user's speech and the standard speech phoneme is higher, the speech evaluation score is higher; and当用户表达的内容和对比参考答案的相似度越高,所述语义测评得分越高。The higher the similarity between the content expressed by the user and the comparison reference answer, the higher the semantic evaluation score.9.根据前述权利要求中任一项所述的系统,其特征在于:9. The system of any preceding claim, wherein:所述语法评测和错误检查模块进一步被配置为能够考察句子中的逻辑关系,所述逻辑关系包括以下一种或多种关系:主谓搭配、时态表达、句法结构、单复数。The grammar evaluation and error checking module is further configured to be able to examine logical relationships in sentences, the logical relationships including one or more of the following: subject-verb collocation, tense expression, syntactic structure, singular and plural.10.根据前述权利要求中任一项所述的系统,其特征在于:10. The system of any preceding claim, wherein:所述会话式人机交互口语测评系统是基于单机和/或在线配置的计算机系统,以开展语言类内容的测评。The conversational human-computer interaction oral language evaluation system is a computer system based on a stand-alone and/or online configuration, so as to carry out the evaluation of language content.
CN202011100849.8A2019-08-232019-08-23Man-machine dialogue spoken language evaluation systemActiveCN112232083B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011100849.8ACN112232083B (en)2019-08-232019-08-23Man-machine dialogue spoken language evaluation system

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
CN201910781649.4ACN110489756B (en)2019-08-232019-08-23 Conversational Human-Computer Interaction Oral Assessment System
CN202011100849.8ACN112232083B (en)2019-08-232019-08-23Man-machine dialogue spoken language evaluation system

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910781649.4ADivisionCN110489756B (en)2019-08-232019-08-23 Conversational Human-Computer Interaction Oral Assessment System

Publications (2)

Publication NumberPublication Date
CN112232083Atrue CN112232083A (en)2021-01-15
CN112232083B CN112232083B (en)2025-09-16

Family

ID=68553024

Family Applications (3)

Application NumberTitlePriority DateFiling Date
CN202011101041.1AActiveCN112307742B (en)2019-08-232019-08-23 Conversational human-computer interaction oral evaluation method, device and storage medium
CN202011100849.8AActiveCN112232083B (en)2019-08-232019-08-23Man-machine dialogue spoken language evaluation system
CN201910781649.4AActiveCN110489756B (en)2019-08-232019-08-23 Conversational Human-Computer Interaction Oral Assessment System

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
CN202011101041.1AActiveCN112307742B (en)2019-08-232019-08-23 Conversational human-computer interaction oral evaluation method, device and storage medium

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
CN201910781649.4AActiveCN110489756B (en)2019-08-232019-08-23 Conversational Human-Computer Interaction Oral Assessment System

Country Status (1)

CountryLink
CN (3)CN112307742B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112951207A (en)*2021-02-102021-06-11网易有道信息技术(北京)有限公司Spoken language evaluation method and device and related product
CN114065773A (en)*2021-11-222022-02-18山东新一代信息产业技术研究院有限公司 A Semantic Representation Method of Historical Context for Multi-round Question Answering System

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110956142A (en)*2019-12-032020-04-03中国太平洋保险(集团)股份有限公司Intelligent interactive training system
CN110910687A (en)*2019-12-042020-03-24深圳追一科技有限公司Teaching method and device based on voice information, electronic equipment and storage medium
CN111368191B (en)*2020-02-292021-04-02重庆百事得大牛机器人有限公司User portrait system based on legal consultation interaction process
CN111767718B (en)*2020-07-032021-12-07北京邮电大学Chinese grammar error correction method based on weakened grammar error feature representation
CN111768667A (en)*2020-07-152020-10-13唐山劳动技师学院Interactive cycle demonstration method and system for English teaching
CN114020894B (en)*2021-11-082024-03-26桂林电子科技大学Intelligent evaluation system capable of realizing multi-wheel interaction
CN114170864B (en)*2021-11-112024-03-29卡斯柯信号有限公司 Scenario comprehensive management and verification method and device for fully automatic operation of smart subway
CN115602004A (en)*2021-12-272023-01-13沈阳理工大学(Cn)Conversion method of automatic spoken language learning system
CN114339303A (en)*2021-12-312022-04-12北京有竹居网络技术有限公司Interactive evaluation method and device, computer equipment and storage medium
CN115497455B (en)*2022-11-212023-05-05山东山大鸥玛软件股份有限公司Intelligent evaluating method, system and device for oral English examination voice
CN118800215A (en)*2023-04-132024-10-18科大讯飞股份有限公司 Oral learning method, device, equipment and storage medium
CN118535683A (en)*2024-07-182024-08-23杭州菲助科技有限公司 Artificial intelligence driven multifunctional English language learning and assessment method and its application

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103151042A (en)*2013-01-232013-06-12中国科学院深圳先进技术研究院Full-automatic oral language evaluating management and scoring system and scoring method thereof
CN105094315A (en)*2015-06-252015-11-25百度在线网络技术(北京)有限公司Method and apparatus for smart man-machine chat based on artificial intelligence
KR20160008949A (en)*2014-07-152016-01-25한국전자통신연구원Apparatus and method for foreign language learning based on spoken dialogue
CN105513593A (en)*2015-11-242016-04-20南京师范大学Intelligent human-computer interaction method drove by voice
US20170092151A1 (en)*2015-09-242017-03-30Wei XiSecond language instruction system and methods
CN106557464A (en)*2016-11-182017-04-05北京光年无限科技有限公司A kind of data processing method and device for talking with interactive system
CN107230173A (en)*2017-06-072017-10-03南京大学A kind of spoken language exercise system and method based on mobile terminal
CN109547331A (en)*2018-11-222019-03-29大连智讯科技有限公司Multi-round voice chat model construction method
CN109785698A (en)*2017-11-132019-05-21上海流利说信息技术有限公司Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2002061658A2 (en)*2001-01-302002-08-08Personal Genie, Inc.System and method for matching consumers with products
CN104050966B (en)*2013-03-122019-01-01百度国际科技(深圳)有限公司The voice interactive method of terminal device and the terminal device for using this method
CN103594087B (en)*2013-11-082016-10-12科大讯飞股份有限公司Improve the method and system of oral evaluation performance
CN106326307A (en)*2015-06-302017-01-11芋头科技(杭州)有限公司Language interaction method
CN105068661B (en)*2015-09-072018-09-07百度在线网络技术(北京)有限公司Man-machine interaction method based on artificial intelligence and system
CN106558252B (en)*2015-09-282020-08-21百度在线网络技术(北京)有限公司Spoken language practice method and device realized by computer
CN106558309B (en)*2015-09-282019-07-09中国科学院声学研究所A kind of spoken dialog strategy-generating method and spoken dialog method
CN105741831B (en)*2016-01-272019-07-16广东外语外贸大学 A method and system for oral language evaluation based on grammatical analysis
JP2018206055A (en)*2017-06-052018-12-27コニカミノルタ株式会社Conversation recording system, conversation recording method, and care support system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103151042A (en)*2013-01-232013-06-12中国科学院深圳先进技术研究院Full-automatic oral language evaluating management and scoring system and scoring method thereof
KR20160008949A (en)*2014-07-152016-01-25한국전자통신연구원Apparatus and method for foreign language learning based on spoken dialogue
CN105094315A (en)*2015-06-252015-11-25百度在线网络技术(北京)有限公司Method and apparatus for smart man-machine chat based on artificial intelligence
US20170092151A1 (en)*2015-09-242017-03-30Wei XiSecond language instruction system and methods
CN105513593A (en)*2015-11-242016-04-20南京师范大学Intelligent human-computer interaction method drove by voice
CN106557464A (en)*2016-11-182017-04-05北京光年无限科技有限公司A kind of data processing method and device for talking with interactive system
CN107230173A (en)*2017-06-072017-10-03南京大学A kind of spoken language exercise system and method based on mobile terminal
CN109785698A (en)*2017-11-132019-05-21上海流利说信息技术有限公司Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN109547331A (en)*2018-11-222019-03-29大连智讯科技有限公司Multi-round voice chat model construction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨迪,汪少敏,任华: "基于人工智能的智能交互系统体系架构", 电信科学, vol. 34, no. 12, 23 January 2019 (2019-01-23), pages 92 - 101*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112951207A (en)*2021-02-102021-06-11网易有道信息技术(北京)有限公司Spoken language evaluation method and device and related product
CN112951207B (en)*2021-02-102022-01-07网易有道信息技术(北京)有限公司Spoken language evaluation method and device and related product
CN114065773A (en)*2021-11-222022-02-18山东新一代信息产业技术研究院有限公司 A Semantic Representation Method of Historical Context for Multi-round Question Answering System

Also Published As

Publication numberPublication date
CN112232083B (en)2025-09-16
CN110489756B (en)2020-10-27
CN112307742B (en)2021-10-22
CN110489756A (en)2019-11-22
CN112307742A (en)2021-02-02

Similar Documents

PublicationPublication DateTitle
CN112307742B (en) Conversational human-computer interaction oral evaluation method, device and storage medium
Litman et al.Speech technologies and the assessment of second language speaking: Approaches, challenges, and opportunities
McCrocklinLearners’ feedback regarding ASR-based dictation practice for pronunciation learning
Ward et al.My science tutor: A conversational multimedia virtual tutor for elementary school science
MichaelAutomated Speech Recognition in language learning: Potential models, benefits and impact
CN111833853A (en)Voice processing method and device, electronic equipment and computer readable storage medium
Evanini et al.Overview of automated speech scoring
US10607504B1 (en)Computer-implemented systems and methods for a crowd source-bootstrapped spoken dialog system
CN114255759B (en) Machine-implemented oral training method, device, and readable storage medium
Lai et al.An exploratory study on the accuracy of three speech recognition software programs for young Taiwanese EFL learners
Ureta et al.At home with Alexa: a tale of two conversational agents
CN111078010A (en)Man-machine interaction method and device, terminal equipment and readable storage medium
Halimah et al.Cello As a Language Teaching Method in Industrial Revolution 4.0 Era
BachanCommunicative alignment of synthetic speech
Stativă et al.Assessment of Pronunciation in Language Learning Applications
JP2015060056A (en)Education device and ic and medium for education device
LiuApplication of speech recognition technology in pronunciation correction of college oral English teaching
ShuklaDevelopment of a human-AI teaming based mobile language learning solution for dual language learners in early and special educations
Dalton et al.Using speech analysis to unmask perceptual bias: Dialect, difference, and tolerance
León-Montaño et al.Design of the architecture for text recognition and reading in an online assessment applied to visually impaired students
KR102689260B1 (en)Server and method for operating a lecture translation platform based on real-time speech recognition
AlsabaanPronunciation support for Arabic learners
ZouAn experimental evaluation of grounding strategies for conversational agents
US20240321131A1 (en)Method and system for facilitating ai-based language learning partner
Souici et al.The Potential of Using Siri to Practice Pronunciation-A case study of EFL first year LMD students at Biskra University

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp