Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Referring to fig. 1, fig. 1 is a block diagram of an intelligent teaching system according to an embodiment of the present application. As shown in fig. 1, the intelligent teaching system 100 includes a terminal device 110 and a server 120. The terminal device 110 is a common type of electronic device on the market, which can perform man-machine interaction for the user 130 and carries a display screen, including but not limited to a computer, a smart television, a mobile phone, and the like. The terminal device 110 may interact with the user 130 to start the intelligent teaching service, and after detecting that the intelligent teaching service is started, the terminal device 110 may first obtain the target teaching identity and the target teaching scene of the user 130, where the purpose of the operation is to provide a basis for subsequent selection of the reference agent, so that the functions that can be implemented by the reference agent conform to the user requirements, improve the efficiency and stability of processing data by the server, and reduce the data volume of call data. The selected reference agent is then invoked for human-machine interaction with the user 130, the process of interaction inputting user request information for the user 130, the reference agent responding to the user through data interaction with the server. An intelligent teaching system 100 corresponds to a plurality of servers 120 at the same time, one server 120 corresponds to one or a plurality of terminal devices 110, each terminal device 110 performs man-machine interaction with one user 130, wherein the number of reference agents corresponds to the number of reference teaching scenes, the reference teaching scenes are preset according to the difference of teacher identities and student identities in actual application scenes, the corresponding reference agents are set according to the preset reference teaching scenes, and the function types of preset application functions which can be realized by each reference agent are associated with the scene requirements of the corresponding reference teaching scenes.
Based on this, the embodiment of the application provides a data processing method based on a multi-mode intelligent agent, and the embodiment of the application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flow chart of a data processing method based on a multi-mode agent according to an embodiment of the present application, where the method is applied to a terminal device 110 in an intelligent teaching system 100, and the intelligent teaching system further includes a server 120, and the method includes:
step S201, a target teaching identity of a user and a target teaching scene of the user are obtained.
The target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity.
In one possible embodiment, the reference teaching scene comprises a teacher teaching scene, a teacher lesson preparation scene and a student learning scene, the acquisition of a target teaching identity of a user and a target teaching scene where the user is located comprises the steps of detecting an operation triggering instruction of the user aiming at the target application program, operating the target application program to display an initial login interface of the target application program and acquire agent information stored in the target application program, responding to information input operation of the user aiming at the initial login interface to acquire identity verification information, acquiring user identity information corresponding to the user from a preset local database if the identity verification information passes, requesting and caching relevant data of each reference agent according to the interface information, determining the target teaching identity according to the user identity information corresponding to the user, determining the target teaching scene as the student learning scene when the target teaching identity is determined to be the student identity, calling the current system time of the user according to the user identity information when the target teaching identity is determined to be the teacher identity, determining the target teaching scene to be the teacher lesson time if the current system time is the teacher lesson time, and determining the target teaching scene to be the teacher lesson preparation scene if the current system time is not determined to be the teacher lesson time.
The intelligent agent information comprises interface information for calling a calling interface of a reference intelligent agent, the operation trigger instruction is triggered and generated by a preset trigger event, and the local database is used for storing user identity information associated with all teachers and students in a school where the user is located.
In the process of man-machine interaction between the terminal equipment and the user, firstly, the terminal equipment detects an operation trigger instruction (generated by triggering a preset trigger event) of the user for the target application program, and the operation target application program displays an initial login interface. In the process that the terminal equipment runs the target application program, the terminal equipment can acquire the interface information of the calling interface of the reference agent in advance so as to call each preset reference agent through the calling interface in advance, the speed of the terminal equipment for calling the target agent in response to the user identity information is improved, the response time is effectively reduced, and the teaching efficiency is improved. In addition, in order to guarantee the service quality and the practicality of the intelligent education service, the terminal equipment can retrieve the target teaching identity of the user and the corresponding target teaching scene through the identity verification of the user. It should be noted that this step also ensures the security of the intelligent teaching system, preventing other school staff from invoking relevant data of the server in the intelligent teaching system of the school.
Further, the method of determining the target teaching scene is different for different target teaching identities. The principle is that if the target teaching identity of the user is a student, the scene that the student needs to call the reference agent when the student is in school or out of school is usually a student learning scene. If the target teaching identity of the user is a teacher, different scenes needing to call the reference intelligent agent exist, namely a teacher lesson preparation scene and a teacher teaching scene. At this time, further confirmation operation of the target teaching scene can be performed according to the time when the user starts the target application program and the teacher course time bound with the identity information of the user. The design principle of the embodiment is that based on detailed analysis of teaching scenes and user identities, identity verification and time judgment mechanisms are utilized to accurately classify users into different teaching scenes. Through agent information and large model base interaction, realize providing the customization service for the user under the different scenes, aim at breaking traditional teaching system's limitation, satisfy diversified teaching demand, promote whole teaching experience and quality. Compared with the traditional teaching system, the terminal equipment can realize accurate pushing aiming at different scenes by using the terminal equipment as the precondition of using the corresponding intelligent agent through the identity verification operation, and the teaching resource integration and utilization efficiency is improved.
It can be seen that, in this example, terminal equipment is through setting up operation trigger instruction and authentication in advance, discerns user's identity and teaching scene fast and accurately, realizes intelligent matching, improves teaching resource allocation efficiency by a wide margin, promotes the degree of accuracy of scene discernment, ensures that teaching activity is orderly, high-efficient is developed.
In one possible embodiment, the triggering event comprises at least one reference event of detecting that a user maps to a touch operation of a selectable control in a display interface of the terminal device for a target application, or detecting that a currently acquired user input voice contains an operation voice instruction for the target application, or detecting that target feature information of a currently acquired user action gesture matches reference feature information of a preset opening gesture for the target application, or detecting that the terminal device does not operate other applications and that the current system time is in a high-frequency operation period of the target application.
The reference feature information comprises spatial feature information and time feature information of a preset opening gesture, and the high-frequency operation time period is a time period in which the operation frequency of the target application program counted in the preset days is greater than the preset frequency.
Referring to fig. 3, fig. 3 is a schematic view of a trigger event according to an embodiment of the present application. As shown in fig. 3, the terminal device in fig. 3 is a mobile phone, and the three diagrams from left to right are visual scene diagrams of interaction scenes corresponding to three different trigger events supported by the mobile phone. The method comprises the steps that a user indicated by a first graph performs touch operation on a target application program on a display interface of terminal equipment, after the terminal equipment detects the touch operation, the target application program is started to enter subsequent authentication operation, the user indicated by a second graph stays on the display interface displaying the target application program, and performs corresponding control selection operation on a voice input control reserved on the display interface through a finger or a touch pen, and meanwhile, the user inputs an operation voice instruction. Correspondingly, when the terminal equipment receives a running voice instruction aiming at the target application program, the content of the instruction is analyzed and the subsequent operation is ready to be triggered, and the third diagram shows that the user stays on the display interface displaying the target application program and draws a preset opening gesture on the display interface through gesture input operation. Correspondingly, the terminal equipment (namely the mobile phone) collects the action gesture of the user through the camera or the sensor, extracts the target characteristic information of the gesture, and starts the target application program if the target characteristic information is consistent with the characteristic information of the preset opening gesture.
The trigger event comprises a plurality of reference events with multiple dimensions, which are arranged on the basis of a common man-machine interaction method between a user and terminal equipment. The specific monitoring mode of the terminal equipment aiming at the trigger event set in the embodiment is that 1, the touch operation monitoring is that a system monitors a display interface of the terminal equipment in real time and detects touch operation of a user on a selectable control mapped by a target application program. Once the related touch behaviors are captured, information such as touch positions, time and the like is recorded, and whether the touch is effectively triggered or not is judged. 2. And monitoring voice instructions, namely continuously collecting the voice input by the user by means of a voice recognition technology. The voice content is analyzed to identify whether it contains an operating voice instruction for the target application. If a valid instruction is identified, the instruction content is parsed and ready to trigger subsequent operations. 3. Gesture feature matching monitoring, namely collecting a user action gesture through a camera or a sensor, and extracting target feature information of the gesture, wherein the target feature information comprises spatial features (such as gesture shape, track, angle and the like) and time features (such as gesture completion time, duration of each stage and the like). And comparing the features with reference feature information of a preset opening gesture aiming at a target application program, and judging whether the features are matched. 4. The equipment state and time monitoring comprises the steps of periodically checking an application program list running by the terminal equipment to confirm whether other application programs are not running, simultaneously obtaining the current system time, comparing the current system time with a high-frequency running period of the target application program counted in a preset number of days, and judging whether the current time is in the period. Based on the above multi-dimension monitoring mode, as long as any one of the above reference events is monitored to meet the trigger condition, the terminal device will immediately generate a corresponding trigger instruction to start the target application program.
Further, the design principle of the embodiment is that the multi-mode interaction fusion is used for comprehensively utilizing multiple interaction modes such as touch, voice, gestures and the like, so that the advantages of different interaction modes are fully exerted, the limitation of a single interaction mode is made up, and the diversification and the naturalization of human-computer interaction are realized. By providing a plurality of triggering modes, the operation habits of different users under different scenes can be met. For example, in a driving scene, the user can quickly start the navigation application through voice instructions due to inconvenient operation of both hands, and gesture operation is more convenient when busy or inconvenient to make, so that individuation and comfort of user experience are greatly improved. And the terminal equipment can also perform an additional data analysis process to analyze and learn the user behaviors, automatically start the application according to the high-frequency operation time period (namely, carrying out statistical analysis on the application use frequency of the user in a preset day, learning the use habit of the user and determining the high-frequency operation time period) and the equipment operation state (namely, perceiving the situation information such as the equipment operation state, the current time and the like in real time and making an intelligent decision in combination with the analysis result of the user behaviors), so as to realize intelligent and automatic operation. The system can learn the use habit of the user, is automatically started at the time when the user most probably needs to use the application, reduces the manual operation steps of the user, and improves the use efficiency. In addition, due to the fact that the multi-dimensional triggering event is set, the terminal equipment can quickly capture the intention of a user to start the application through accurate identification of multi-dimensional information such as touch, voice and gestures and accurate judgment of equipment state and time, so that the application program can quickly respond, the waiting time of the user is shortened, and the response speed of the system and the service experience of the user are improved.
In this example, the terminal device is provided to the user with multiple trigger modes of the trigger target application program through multi-mode interaction fusion, so as to meet different user habits, thereby improving user operation convenience and reducing time loss. In addition, the terminal equipment can be automatically triggered according to the equipment running state and the high-frequency running period, so that the intelligence of the teaching system and the continuity of teaching service are improved. And by combining the subsequent identity and scene recognition through the triggering event, the user can be quickly matched with a proper teaching scene, and the teaching resource allocation efficiency and the scene recognition accuracy are further improved.
Step S202, determining and calling a reference agent corresponding to the target teaching scene as a target agent.
Each reference intelligent body corresponds to one reference teaching scene, and the reference intelligent bodies are used for realizing interaction with a user so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and are in communication connection with a large model base preset in a server.
Step S203, user request information input by a user is collected through the target intelligent agent, and a target response result corresponding to the target application function is determined based on data interaction operation between the target intelligent agent and the large model base.
The target application function is determined by the user request information, and the target application function is any one of a plurality of preset application functions.
In one possible embodiment, the method comprises the steps of collecting user request information input by a user through a target agent, determining a target response result corresponding to a target application function based on data interaction operation between the target agent and a large model base, wherein the method comprises the steps of collecting content data input by the user through the target agent, determining a target input format corresponding to the content data, determining the user request information according to the content data and the target input format, sending a result request message carrying the user request information to the corresponding large model base through the target agent, and receiving the target response result sent by an output control module.
The method comprises the steps that a result request message is used for indicating a large model base to call a built-in module to execute data analysis operation aiming at user request information so as to return a target response result and a target application function, the built-in module comprises a special large model, a local knowledge base, a third party plug-in and an output control module, the local knowledge base is used for storing teaching materials corresponding to various disciplines in a school where a user is located, the special large model is used for combining the local knowledge base to analyze content data according to a target input format, the third party plug-in is used for calling teaching tools associated with a plurality of preset application functions corresponding to a target teaching scene, and the input control module is used for integrating sub-response results output by the special large model, the local knowledge base and the third party plug-in into the target response result according to the target application functions indicated by the user request information.
Referring to fig. 4, fig. 4 is a schematic diagram of a target agent and a large model base according to an embodiment of the application. As shown in fig. 4, a user inputs user input information corresponding to a target input format by performing a user input operation on a terminal device, wherein the target input format is any one of a plurality of reference input formats supported by a large model base, and types of the reference input formats include, but are not limited to, voice, text, image and video. The voice part adopts a digital microphone to collect audio, the text is input by an input method to collect text input of a user, and the images and the video are input by a camera. Because the multi-mode large model accepts user input in multiple formats, the target agent can directly send the user input information meeting the reference input format to the large model for processing without additional format conversion and processing. Further, the terminal device is responsible for collecting content data input by a user by calling a target intelligent agent (the target intelligent agent is divided into a teacher teaching intelligent agent, a teacher teaching intelligent agent and a student learning intelligent agent according to a target teaching scene) as a man-machine interaction interface, determining a corresponding target input format, and sending a result request message carrying user request information to the large model base. The large model base comprises a special large model which analyzes content data according to a target input format and performs analysis processing by combining data provided by a local knowledge base. And the local knowledge base is used for storing teaching materials corresponding to various subjects of the school and providing data support for the special large model and the output control module. And the third party plug-in invokes a teaching tool associated with the preset application function of the target teaching scene to assist in generating a response result. And the output control module integrates the sub-response results output by each module according to the target application function indicated by the user request information to form a target response result and returns the target response result to the target intelligent agent. Through the system architecture of the target intelligent body and the large model base, the terminal equipment can interact with the large model base in the server by calling the target intelligent body so as to realize intelligent teaching service required by the user.
In this embodiment, the interaction between the target agent and the server, which are invoked by the terminal device, may implement multi-source data integration and processing, and function-oriented result generation. The multi-source data integration and processing means that the server acquires content data input by a user through the target agent, and adapts according to a reference input format supported by the large model base, so that the data can be effectively processed. And then, through integrating the capacities of the special large model, the local knowledge base and the third-party plugins, analyzing data by utilizing the special large model, providing teaching data support by combining the local knowledge base, and calling the third-party plugins to acquire related teaching tools, so as to realize multi-source data fusion processing and meet the diversified demands of users. The function-oriented result generation means that a target application function is determined according to user request information, sub-response results output by all parts are integrated by an output control module around the target application function, a final target response result is generated, and accurate matching from user requirements to function realization is achieved.
Specifically, based on the system architecture of the target agent and the large model base shown in fig. 4, the teaching scene of the teacher, the lesson preparation scene of the teacher and the learning scene of the student are aimed at. The specific process of using the intelligent teaching service by the user through the terminal equipment is that 1, a teacher gives lessons, namely, the teacher inputs content data such as knowledge point explanation demands, interaction link designs and the like through a target agent during teaching. The system determines target application functions, such as generating teaching courseware, designing classroom questions, and the like, according to the input. The special large model is combined with teaching materials in a local knowledge base, a third party plug-in invokes, for example, an interactive game tool and the like, and outputs an integration result of the control module, thereby providing complete teaching auxiliary content for teachers. 2. And the teacher prepares lessons in a scene that the teacher inputs lesson subjects, teaching targets and other contents, and the system determines that the target application function is to collect and sort lesson preparation data. And analyzing the content by the large model, acquiring related data from a local knowledge base, calling a teaching plan template, a material searching tool and the like by a third-party plug-in, and finally integrating the output control module to generate a detailed lesson preparation scheme. 3. And in a student learning scene, students input content data such as learning questions, homework problems and the like, and the system determines that the target application function is answering. The special large model is combined with knowledge base solution, the third-party plug-in invokes a solution idea analysis tool and the like, and the output control module integrates and generates detailed solution content and learning advice.
It can be seen that, in this example, the terminal device accurately collects user data by calling the target agent, and then matches the large model input format, so that the large model base in the server can quickly understand the user requirements. And through the cooperation of the built-in modules of the large model base, the data are analyzed, the tool is called and the result is integrated, so that the requirements of various teaching functions are met, and the practicability and the data processing efficiency of the terminal equipment are improved.
And S204, after the target intelligent agent receives a target response result returned by the large model base, realizing a target application function according to the target response result.
In one possible embodiment, the target teaching scene is a student learning scene, the target application function is a post-class exercise function, the target application function is realized according to the target response result after the target intelligent agent receives the target response result returned by the large model base, the target application function comprises the steps of determining the current grade of a user and the course progress corresponding to a target subject according to user identity information, performing content screening operation on the target response result according to the current grade and the course progress to determine a plurality of difficulty grade reference screening results, respectively outputting the first grade, second grade and third grade corresponding reference screening results according to a preset exercise strategy to sequentially display corresponding post-class exercises, detecting the answer input operation of the user for the post-class exercises to obtain the user answer of the user for each post-class exercise, determining the reference answer of each post-exercise according to the target response result and the plurality of reference screening results, and outputting the exercise result according to the user answer of the post-exercise exercises and the reference answer of the post-exercise exercises.
The method comprises the steps of selecting a class of the post-class practice problem according to a class of the post-class practice problem, wherein a reference screening result is used for displaying the post-class practice problem matched with the corresponding class of the difficulty, the difficulty degree of the post-class practice problem represented by each class of the difficulty is different from each other, the difficulty class comprises a first class, a second class and a third class, the difficulty degree of the post-class practice problem represented by the class of the difficulty is changed from high to low into the first class, the second class and the third class, and a practice correction result is used for displaying the score information of the current post-class practice of a user and the answer analysis content of the post-class practice problem with wrong answer.
In this embodiment, the target response result may be understood as a comprehensive result generated by a series of processes of the large model base aiming at the user request information, which is a key data base for realizing the post-class exercise function. The large model base cooperates through a plurality of modules to generate a target response result associated with the post-class exercise function. Specifically, the target response results cover various information related to the post-class exercise function, and the information can meet the requirements of each link of the subsequent exercise. For example, it may contain training problem resources with different difficulty levels, which may be screened from a local knowledge base according to the student grade, course progress and other conditions, and analyzed and arranged by a special large model, and may also contain solutions to training problems, knowledge point prompts and other contents, so as to be used for generating reference answers and answer analyses. The method provides basis for each step of realizing the post-class exercise function. And (3) content screening is carried out on the target response result according to the grades and course progress of the students, and reference screening results of different difficulty grades are determined, so that corresponding post-class practice problems are output. Meanwhile, when the reference answers of the practice problems are determined and the practice correction results are generated, the target response results play an important role in guiding.
In this embodiment, after receiving the target response result returned by the large model base, the target agent determines the identity and the scene of the user, that is, accurately locates the current grade of the student and the course progress of the target subject according to the identity information of the user. Further, the target intelligent agent performs content screening operation on the target response result by combining the content in the target response result with the determined grade and course progress, screens out the content which is matched with different difficulty levels, and generates a first-grade reference screening result, a second-grade reference screening result and a third-grade reference screening result which correspond to the high-difficulty, medium-difficulty and low-difficulty post-class exercises respectively. And then, according to a preset exercise strategy, sequentially displaying the exercise exercises with different grades to students. Finally, the target intelligent agent realizes the student answering process through interaction with the user, and after the student is answered, the target intelligent agent replaces the identity of a teacher to correct post-class exercises, and the correction result is exercised, wherein the detailed answer analysis content of the exercise questions with score information and wrong answers is covered.
Further, compared with the traditional education service, the target intelligent agent in the embodiment can provide the learning resources and the training contents matched with the target intelligent agent according to the individual difference of students, so that each student can fully develop in the nearest development area of the student, and the pertinence and the effectiveness of learning are improved. And, replace teacher's work, realize the homework of automatic correction, produce the score and wrong question and analyze fast, the student can know oneself study the situation in time, adjust the study tactics; the teacher puts energy into teaching guidance, and improves teaching efficiency. In addition, through setting up the exercise problem of different difficult grades, guide the student to promote knowledge and skill step by step, avoid because of the problem too hard or too easy, improve user experience.
It can be seen that, in this example, terminal equipment determines grade and course progress according to user identity information through target agent, accurately filters the exercise content after the class, generates many degree of difficulty and practises the problem and satisfy different students' study demands, improves the flexibility that the agent realized target application function. And the target intelligent agent can automatically complete screening, correcting and analyzing, reduce the load of teachers and improve the intelligent level and the teaching quality of teaching.
In one possible embodiment, the target teaching scene is a student learning scene, the target intelligent agent further comprises a emotion recognition module, after determining a practice correction result according to a user answer of a post-class practice problem and a reference answer of the post-class practice problem, the method further comprises collecting audio and video information of a user in answer input operation through the emotion recognition module, determining a first emotion recognition result according to intonation features and speech speed features in voice input information, determining a second emotion recognition result according to facial expression feature vectors and limb action time sequence features in image input information, carrying out fusion processing on the first emotion recognition result and the second emotion recognition result according to a preset fusion algorithm to generate a current emotion state score, calling a preset excitation corpus in a local knowledge base to generate and display personalized encouragement information if a negative dimension score corresponding to the negative dimension exceeds a first score threshold, reducing difficulty level of the post-class practice problem to be displayed next time, or determining a second emotion recognition result according to a preset early warning number of the second score, sending the second emotion recognition result to a monitoring device of a monitoring terminal according to a preset terminal equipment, and determining a current emotion state score according to the preset warning score, and a monitoring device corresponding to the current emotion state score.
The audio and video information comprises voice input information and image input information, the current emotion state score is used for representing emotion tendency degrees of positive dimension, neutral dimension and negative dimension reflected by a user in the current state, and the early warning report carries emotion fluctuation curves and learning state information.
The target teaching scene applicable to the embodiment is a student learning scene, the target application function is a post-class exercise function, and the target intelligent agent determines an exercise modification result according to the user answer and the reference answer of the post-class exercise problem. Then, the target intelligent body performs emotion information acquisition, namely, acquires audio and video information in the learning process of the student in the process of the answer input operation of the student. And extracting facial expression feature vectors and limb action time sequence features from the image input information to determine a second emotion recognition result. And then, carrying out fusion processing on the two results according to a preset fusion algorithm to generate a current emotion state score for representing the emotion tendencies of the active, neutral and passive dimensions of the students. Finally, the target intelligent agent analyzes the scores of the current emotional states so as to ensure the enthusiasm of the student users to learn. In the last step, if the target intelligent agent detects that the negative dimension score corresponding to the negative dimension exceeds a first score threshold (namely, the student is in a negative emotion at the moment), a preset incentive corpus in a local knowledge base is called to generate and display personalized encouragement information to the student, meanwhile, the difficulty level of the next displayed post-class practice problem is reduced, and the learning enthusiasm of the student and the self-confidence of the subsequent student learning are sequentially improved.
Furthermore, the terminal equipment is also provided with a guardian early warning mechanism, namely if the negative dimension score exceeds a second score threshold value (the second score threshold value is larger than the first score threshold value) through continuous preset early warning times, the equipment information of the student guardian terminal equipment is called through a third-party plug-in. Therefore, the emotion fluctuation curve of the student user during learning and the learning state information determined by the training correction result are transmitted to the guardian together, so that the guardian can know the student situation in time and perform human intervention. Compared with the traditional teaching, the data processing process of the target intelligent agent in the embodiment focuses on the learning result evaluation of the student in learning, further increases the real-time monitoring of the emotion state in the learning process, and improves the attention degree of physical and mental health of the student. And through timely informing the guardian when the emotion of the student is continuously negative, detailed emotion fluctuation curve and learning state information are provided, the guardian can conveniently know the student condition and cooperate with the school together, more effective educational measures are adopted, the education resultant force of the family and the school is formed, and the flexibility of the intelligent teaching system for executing data processing is improved.
In one possible embodiment, intonation features and speech speed features of the speech signal are extracted, a pre-trained speech emotion classification model is used to determine a first emotion recognition result, facial expression feature vectors and limb motion time sequence features are extracted, and a convolutional neural network and time sequence classification model are used to determine a second emotion recognition result.
The specific implementation mode of determining the first emotion recognition result is that the terminal equipment firstly preprocesses the collected voice signals and removes interference such as background noise so as to improve signal quality. And extracting intonation features such as pitch fluctuation, tone lifting mode and the like from the preprocessed voice signals by using a digital signal processing technology, and simultaneously determining the speech speed features by calculating the syllable number or word number of the voices in unit time. The extracted intonation and the speech speed characteristics are used as input and sent into a pre-trained voice emotion classification model, and the model is trained on a large number of voice data with emotion marks based on a deep learning algorithm, so that a first emotion recognition result is output, and emotion tendencies of students in voices are judged.
The specific implementation mode of determining the second emotion recognition result is as follows, aiming at the collected image information, facial expression feature vectors are extracted by means of a special facial feature extraction algorithm by utilizing a computer vision technology, and expression details such as upward eyebrow, downward mouth angle and the like are accurately captured. Meanwhile, by adopting a motion capture technology, the sequence, speed change and the like of the limb motions are analyzed, and the time sequence characteristics of the limb motions are determined. Facial expression feature vectors and limb action time sequence features are input into a model framework consisting of a convolutional neural network and a time sequence classification model. The convolutional neural network is good at processing the spatial features of the image, can effectively extract the key features of the facial expression, and the time sequence classification model can process limb action data with time sequence characteristics and excavate the time dependence relationship. And the two work cooperatively to finally output a second emotion recognition result, and judge the emotion state of the student from the visual angle.
It can be seen that, in this example, student audio and video information is collected through emotion recognition module, analyzes multidimensional feature and judges emotion state, adjusts the learning strategy according to passive emotion degree developments, encourages and reduces the degree of difficulty when mild, and can also carry out home-school linkage through calling guardian's contact mode when serious, and continuous passive then sends the early warning report including emotion and learning state to guardian, optimizes the study experience comprehensively to and improves the practicality of intelligent agent concrete realization application function.
In one possible embodiment, the target teaching scene is a teacher teaching scene, the target application function is a class auxiliary function, the target intelligent agent further comprises an emotion recognition module, the intelligent teaching system further comprises a classroom terminal in a class corresponding to the teacher teaching scene, after the target intelligent agent receives a target response result returned by the large model base, the target intelligent agent achieves the target application function according to the target response result, the method comprises the steps of determining a class recording video acquired in real time for the teacher teaching scene according to the target response result, executing video analysis operation according to the class recording video to determine facial expression data corresponding to students and limb action data corresponding to the teacher, determining class attention scores according to the facial expression data, determining a teacher teaching pressure value according to the limb action data, and calling a third-party plug-in to send a class-in test subject or group discussion task to the classroom terminal through the target intelligent agent if the class attention score is lower than a duration of a third score threshold, or calling a local knowledge base to update a class case according to a class difficulty level of a current teaching case of a teacher, and updating the class according to a class of a teaching case, and updating the class difficulty of a teaching case before the class is designed for the current class of the teacher.
The class attention score is used for representing an average value of attention concentration in a target class recorded by a class recorded video, the teacher teaching pressure value is used for representing tension degree of a user in a current teacher teaching scene, and the class test subjects and the group discussion tasks are associated with target subjects corresponding to the current teacher teaching scene.
The target response result is returned content after a series of data analysis and processing according to the request information sent by the target intelligent agent by the large model base. In this embodiment, it is an important basis for implementing the classroom auxiliary function, and includes key information related to the teacher teaching scene, such as an identifier or a data path pointing to a real-time collected classroom recorded video. The information enables the target intelligent agent to determine and acquire the classroom recorded video according to the information, and further a series of operations such as subsequent video analysis, index calculation, execution of targeted measures and the like are carried out, so that various requirements of the classroom auxiliary function are met.
For example, referring to fig. 5, fig. 5 is a schematic view of a teacher teaching scene according to an embodiment of the present application. As shown in fig. 5, cameras in a classroom where a teacher gives lessons can shoot recorded video of each classroom corresponding to each lesson in real time, and upload the recorded video to a large model base in a server in the intelligent teaching system. After the teaching identity is that a user of a teacher invokes a teacher teaching intelligent agent and enables a target application function of a classroom auxiliary function, the teacher teaching intelligent agent invokes a classroom recorded video in a large model base and performs corresponding video analysis operations respectively for students and the teacher. Specifically, facial expression collection is performed on students through video analysis operation to obtain facial expression data, limb motion data collection is performed on teachers to obtain limb motion data, and further subsequent steps in the embodiment are performed, namely class attention allocation is determined through the facial expression data, teacher teaching pressure values are determined according to the limb motion data, and class assistance operation is performed on teacher teaching according to class assessment of the two dimensions.
The design principle of the embodiment is that the student attention (namely, the evaluation operation of class attention scoring) and the teacher teaching pressure (namely, the evaluation operation of teacher teaching pressure value) are quantified through analyzing facial expression and limb action data in a classroom recorded video, decisions are made based on the data, teaching adjustment is more scientific and targeted, and through the cooperative work of a plurality of modules such as a target intelligent body, a third party plug-in and a local knowledge base, different resources and functions are integrated, comprehensive support and optimization of a complex teaching scene are realized (namely, for different teaching situations such as the concentration of students or the overlarge pressure of the teachers, personalized solutions are provided, such as the pushing of proper tasks or the adjustment of teaching scheme difficulty, the diversified demands in actual teaching are met), and through focusing on the states of teachers and students in the classroom, the support is provided for the teachers through technical means, the teacher is helped to better cope with teaching challenges, and the learning experience of the students is improved.
It can be seen that, in this example, by means of the emotion recognition module, the facial expression of the student and the limb action data of the teacher in the classroom recorded video are analyzed, the class attention score and the teacher teaching pressure value are accurately mastered, the teaching strategy is dynamically optimized accordingly, the student attention is attracted, the teacher pressure is lightened, and the classroom teaching quality is improved.
Therefore, the server can realize corresponding intelligent assistance aiming at different links in the teaching flow by internally arranging different types of reference intelligent bodies and arranging a large model base in the server, and the intelligence and the practicability of the teaching system are improved. And the terminal equipment is connected into the large model through the agent to improve the efficiency and flexibility of teaching by teachers and learning by students, and improve the flexibility of the agent for processing the data to be processed input by the user and the feedback experience of the user.
The following are embodiments of the apparatus of the present application, which are within the same concept as embodiments of the method of the present application, for performing the methods described in the embodiments of the present application. For convenience of explanation, the embodiments of the present application are only shown in the parts related to the embodiments of the present application, and specific technical details are not disclosed, please refer to the description of the embodiments of the present application method, which is not repeated here.
The data processing device based on the multi-mode agent provided by the embodiment of the application is applied to the terminal device 110 in the intelligent teaching system 100 shown in fig. 1, and the intelligent teaching system 100 further comprises a server 120. Specifically, the data processing device based on the multi-mode agent is used for executing the steps executed by the terminal device in the data processing method based on the multi-mode agent. The data processing device based on the multi-mode intelligent agent provided by the embodiment of the application can comprise modules corresponding to the corresponding steps.
The embodiment of the application can divide the functional modules of the data processing device based on the multi-mode intelligent agent according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules can be realized in a hardware mode or a software functional module mode. The division of the modules in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation.
FIG. 6 is a block diagram showing functional units of a multi-modal intelligent agent-based data processing apparatus according to an embodiment of the present application, where respective functional modules are divided by corresponding respective functions; the multi-modal intelligent agent-based data processing device 60 is applied to a terminal device 110 in an intelligent teaching system 100, the intelligent teaching system 100 further comprises a server 120, the multi-modal intelligent agent-based data processing device 60 comprises an information acquisition unit 601, a data interaction unit 603, an intelligent agent calling unit 602, a target application calling unit 604, a target application function determining unit 604 and a target application function determining unit 604, wherein the information acquisition unit 601 is used for acquiring a target teaching identity of a user and a target teaching scene in which the user is located, the target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity, the intelligent agent corresponding to the target teaching scene is a target intelligent agent, each reference intelligent agent corresponds to one reference teaching scene, the reference intelligent agent is used for realizing interaction with the user, so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, the reference intelligent agent is in communication connection with a large model base preset in the server, the data interaction unit 603 is used for acquiring user request information input by the user through the target intelligent agent, the target intelligent agent is used for determining that the target application function corresponds to the target application function, and the target application function is required to be received by the large model base, and the target application function is used for responding to the result of the target application function is determined to be a target application function to be required to be responded to the target function, and realizing the target application function according to the target response result.
In one possible embodiment, the reference teaching scene comprises a teacher teaching scene, a teacher lesson preparation scene and a student learning scene, the information acquisition unit 601 is specifically configured to detect an operation triggering instruction of a user for a target application program in terms of acquiring a target teaching identity of the user and the target teaching scene where the user is located, operate the target application program to display an initial login interface in the target application program and acquire agent information stored in the target application program, the agent information comprises interface information for calling a calling interface of the reference agent, the operation triggering instruction is generated by a preset triggering event, respond to an information input operation of the user for the initial login interface to acquire identity verification information, acquire user identity information corresponding to the user from a preset local database if the identity verification information passes, request and cache relevant data of each reference agent according to the interface information, determine a target teaching identity according to the user identity information stored in the target application program, determine that the target teaching identity is the user identity information, and when determining that the target teaching identity is the student, determine that the target teaching scene is the student teaching scene is the current, and when the teacher teaching system is the teacher teaching scene is determined that the teacher teaching system is not the current, and the teacher teaching system is determined that the teacher teaching scene is not determined that the teacher teaching system is the current.
In one possible embodiment, the triggering event comprises at least one reference event, namely detecting that a user maps a touch operation of a selectable control in a display interface of the terminal device to a target application program, or detecting that the currently acquired user input voice contains an operation voice instruction for the target application program, or detecting that target feature information of the currently acquired user action gesture is matched with reference feature information of a preset opening gesture for the target application program, wherein the reference feature information comprises spatial feature information and time feature information of the preset opening gesture, or detecting that the terminal device does not operate other application programs, and the current system time is in a high-frequency operation period of the target application program, wherein the high-frequency operation period is a period of time in which the operation frequency of the target application program counted in a preset day is greater than a preset frequency.
In one possible embodiment, in the aspects of acquiring user request information input by a user through a target intelligent agent and determining a target response result corresponding to a target application function based on data interaction operation between the target intelligent agent and a large model base, the data interaction unit 603 is specifically configured to acquire content data input by the user through the target intelligent agent and determine a target input format corresponding to the content data, the target input format is any one of a plurality of reference input formats supported by the large model base, determine user request information according to the content data and the target input format, send a result request message carrying the user request information to the corresponding large model base through the target intelligent agent, instruct the large model base to execute data analysis operation on the user request information based on the data interaction operation, so as to return a target response result and the target application function, the built-in module comprises a special large model, a local knowledge base, a third party plug-in and an output control module, the special large model is used for combining the local knowledge base to analyze and content data according to the target input format, the third party plug-in is used for sending the preset information to the corresponding to the large model base, and the built-in module is used for integrating the teaching result response to the target application function, and the third party plug-in module is used for receiving the instruction response result and integrating the target response result.
In one possible embodiment, the target teaching scene is a student learning scene, and the target application function is a post-class exercise function; after the target agent receives the target response result returned by the large model base, the data interaction unit 603 is specifically configured to determine, according to the user identity information, the current grade of the user and the course progress corresponding to the target subject; according to the current grade and course progress, a content screening operation is executed on the target response results to determine reference screening results of a plurality of difficulty grades, the reference screening results are used for displaying post-training exercises matched with the corresponding difficulty grades, the difficulty degrees of the post-training exercises represented by each difficulty grade are different from each other, the difficulty grades comprise a first grade, a second grade and a third grade, the difficulty degrees of the post-training exercises represented by the difficulty grades are changed from high to low to the first grade, the second grade and the third grade, corresponding reference screening results of the first grade, the second grade and the third grade are respectively output according to a preset training strategy to sequentially display corresponding post-training exercises, and answer input operations of users for the post-training exercises are detected to obtain user answers of the users for each post-training exercise, the reference answers of each post-training exercise are determined according to the target response results and the plurality of reference screening results, and the answer information of the post-training exercises is output according to the answers of the user answers and the post-training exercises, and the answer information of the current answers are displayed after-training exercises is improved.
In one possible embodiment, the target teaching scene is a student learning scene, and the target agent further comprises a emotion recognition module; after determining the exercise modification result according to the user answer of the post-class exercise questions and the reference answer of the post-class exercise questions, the function implementation unit 604 is specifically further configured to collect audio and video information of the user in an answer input operation through an emotion recognition module, wherein the audio and video information comprises voice input information and image input information, determine a first emotion recognition result according to intonation features and speech speed features in the voice input information, determine a second emotion recognition result according to facial expression feature vectors and limb action time sequence features in the image input information, perform fusion processing on the first emotion recognition result and the second emotion recognition result according to a preset fusion algorithm to generate a current emotion state score, wherein the current emotion state score is used for representing the tendency degree of active dimension, neutral dimension and passive dimension reflected by the user in the current state, and call an incentive corpus preset in a local knowledge base to generate and display personalized information and reduce the difficulty level of the next-class exercise questions, or determine the current emotion state score according to a preset pre-warning number of pre-warning devices, determine the current emotion state score according to the pre-warning device, and the pre-warning device determines the current emotion score corresponding to the current emotion score of the user in the current state score according to the first score, and the pre-warning device determines the pre-warning device corresponding to the second emotion score of the second emotion score, the early warning report carries an emotion fluctuation curve and learning state information.
In one possible embodiment, the target teaching scene is a teacher teaching scene, the target application function is a classroom auxiliary function, the target intelligent agent further comprises a emotion recognition module, and the intelligent teaching system further comprises a classroom terminal in a classroom corresponding to the teacher teaching scene; after the target agent receives the target response result returned by the large model base, the function implementation unit 604 is specifically configured to determine, according to the target response result, a class recording video collected in real time for a teacher teaching scene, perform video analysis operation according to the class recording video to determine facial expression data corresponding to students and limb motion data corresponding to a teacher, determine a class attention score according to the facial expression data, and determine a teacher teaching pressure value according to the limb motion data, where the class attention score is used to characterize an average value of attention concentration in a target class recorded by the class recording video, and the teacher teaching pressure value is used to characterize a stress level of a user in a current teacher teaching scene, and call a third party plug-in to a classroom terminal through the target agent if the class attention score is lower than a third score threshold for a duration time longer than a preset duration, and associated with a current class subject or a group discussion task through the class test subject and the group discussion task, or call a teacher subject to a fourth subject class pressure value is greater than or equal to a fourth subject class threshold, and the class subject class is designed in advance, and the teaching pressure value is set in accordance with the current class scene, and the teaching subject class is designed in advance, and the teaching subject is used to the teaching subject is a case of the teacher subject is designed in accordance with the current class case, and updating the current classroom teaching plan so that the teaching difficulty level of the updated classroom teaching plan is lower than that of the classroom teaching plan before updating.
In the case of using integrated units, as shown in fig. 7, fig. 7 is a block diagram illustrating functional units of another data processing apparatus based on a multi-modal agent according to an embodiment of the present application. In fig. 7, the multi-modal agent based data processing apparatus 60 includes a processing module 720 and a communication module 710. The processing module 720 is configured to control and manage the actions of the multi-modal agent-based data processing apparatus 60, such as the steps of the information acquisition unit 601, the agent invocation unit 602, the data interaction unit 603, and the function implementation unit 604, and/or other processes for performing the techniques described herein. The communication module 710 is used to support interactions between the multi-modal agent-based data processing apparatus and other devices. As shown in fig. 7, the multi-modal agent based data processing apparatus may include a memory module 730, the memory module 730 storing program codes and data of the multi-modal agent based data processing apparatus.
The processing module 720 may be a Processor or controller, such as a central processing unit (Central Processing Unit, CPU), a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an ASIC, FPGA or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth. The communication module 710 may be a transceiver, an RF circuit, or a communication interface, etc. The storage module 730 may be a memory.
All relevant contents of each scenario related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein. The multi-modal agent-based data processing apparatus 60 may perform the multi-modal agent-based data processing method shown in fig. 2.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired or wireless means from one website site, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc. that contain one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Fig. 8 is a block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, terminal device 110 may include one or more processors 810, a memory 820 coupled to processor 810, wherein memory 820 may store one or more computer programs 821, which when executed by one or more processors 810 may be configured to implement the methods described in the embodiments above. The server here is the terminal device 110 in the above embodiment.
Processor 810 may include one or more processing cores. The processor 810 connects various parts within the overall terminal device 110 using various interfaces and lines, performs various functions of the terminal device 110 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 820, and invoking data stored in the memory 820. Alternatively, the processor 810 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable Logic Array (PLA). The processor 810 may integrate one or a combination of several of a central processing unit (CentralProcessing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 810 and may be implemented solely by a single communication chip.
The Memory 820 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). Memory 820 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 820 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device 110 in use, etc.
It will be appreciated that terminal device 110 may include more or fewer structural elements than those shown in the block diagrams described above and is not limiting herein.
The present application also provides a computer storage medium having stored thereon a computer program/instruction which, when executed by a processor, performs part or all of the steps of any of the methods described in the method embodiments above.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the partitioning of elements is merely a logical functional partitioning, and there may be additional ways in which it may be actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the methods according to the embodiments of the present invention. The storage medium includes a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a volatile memory or a nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of random access memory (random access memory, RAM) are available, such as static random access memory (STATIC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM), among various media that can store program code.
Although the present invention is disclosed above, the present invention is not limited thereto. Variations and modifications, including combinations of the different functions and implementation steps, as well as embodiments of the software and hardware, may be readily apparent to those skilled in the art without departing from the spirit and scope of the invention.