Movatterモバイル変換


[0]ホーム

URL:


CN119690250A - Data processing method and device based on multimodal intelligent agent - Google Patents

Data processing method and device based on multimodal intelligent agent
Download PDF

Info

Publication number
CN119690250A
CN119690250ACN202510206205.3ACN202510206205ACN119690250ACN 119690250 ACN119690250 ACN 119690250ACN 202510206205 ACN202510206205 ACN 202510206205ACN 119690250 ACN119690250 ACN 119690250A
Authority
CN
China
Prior art keywords
target
teaching
user
agent
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510206205.3A
Other languages
Chinese (zh)
Inventor
农长霖
李辉亮
张文晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Shenlei Semiconductor Co ltd
Original Assignee
Shenzhen Qianhai Shenlei Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Shenlei Semiconductor Co ltdfiledCriticalShenzhen Qianhai Shenlei Semiconductor Co ltd
Priority to CN202510206205.3ApriorityCriticalpatent/CN119690250A/en
Publication of CN119690250ApublicationCriticalpatent/CN119690250A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供了基于多模态智能体的数据处理方法及装置,应用于智能教学系统中的终端设备,方法包括:获取用户的目标教学身份和用户所处的目标教学场景;确定并调用与目标教学场景对应的参考智能体为目标智能体;通过目标智能体采集用户输入的用户请求信息,以及基于目标智能体与大模型底座之间的数据交互操作,以确定目标应用功能对应的目标响应结果;待目标智能体接收到大模型底座返回的目标响应结果后,根据目标响应结果实现目标应用功能。如此,终端设备可根据用户身份匹配对应教学场景的智能体,进而提供对应的应用功能服务,提高场景适配性,以及提高教学系统的智能性和实用性。

The present application provides a data processing method and device based on a multimodal agent, which is applied to a terminal device in an intelligent teaching system. The method includes: obtaining the user's target teaching identity and the target teaching scene in which the user is located; determining and calling the reference agent corresponding to the target teaching scene as the target agent; collecting the user's request information input by the target agent, and based on the data interaction operation between the target agent and the large model base, determining the target response result corresponding to the target application function; after the target agent receives the target response result returned by the large model base, the target application function is realized according to the target response result. In this way, the terminal device can match the agent corresponding to the teaching scene according to the user identity, and then provide the corresponding application function service, improve the scene adaptability, and improve the intelligence and practicality of the teaching system.

Description

Data processing method and device based on multi-mode agent
Technical Field
The application belongs to the technical field of general data processing in the Internet industry, and particularly relates to a data processing method and device based on a multi-mode intelligent agent.
Background
At present, an intelligent teaching system plays an important role in the education field, brings new convenience and possibility for teaching activities, and helps education development to a certain extent.
However, because the teaching scene is complicated and various, the demands of teachers and students are different, and an obvious technical short board exists in the current intelligent teaching system. On the one hand, teaching service homogenization is serious, is difficult to provide the function of adaptation to teacher and student's different identities for teaching equipment intelligent degree is lower, can't accurately satisfy the user demand. On the other hand, the teaching resource integration and utilization efficiency are low, so that the teaching quality that the teaching equipment can improve is poor, the efficiency is low, the urgent demands of digital transformation of education cannot be met, and the innovation development of education and the improvement of teaching effect are limited.
Disclosure of Invention
The embodiment of the application provides a data processing method and device based on multi-mode agents, wherein terminal equipment is used for providing interaction for users by acquiring teaching identities and corresponding teaching scenes of the users and matching corresponding reference agents, so that corresponding application functions of the teaching scenes are provided for the users, the scene suitability of the terminal equipment for providing intelligent teaching services is improved, and the flexibility and the practicability of data processing are improved.
In a first aspect, an embodiment of the present application provides a data processing method based on a multi-mode agent, which is applied to a terminal device in an intelligent teaching system, where the intelligent teaching system further includes a server, and the method includes:
acquiring a target teaching identity of a user and a target teaching scene of the user, wherein the target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity;
determining and calling a reference intelligent agent corresponding to a target teaching scene as a target intelligent agent, wherein each reference intelligent agent corresponds to one reference teaching scene, and the reference intelligent agent is used for realizing interaction with a user so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and is in communication connection with a large model base preset in a server;
Acquiring user request information input by a user through a target intelligent agent, and determining a target response result corresponding to a target application function based on data interaction operation between the target intelligent agent and a large model base, wherein the target application function is determined by the user request information, and the target application function is any one of a plurality of preset application functions;
And after the target intelligent agent receives a target response result returned by the large model base, realizing a target application function according to the target response result.
In a second aspect, an embodiment of the present application provides a data processing device based on a multi-mode agent, which is applied to a terminal device in an intelligent teaching system, where the intelligent teaching system further includes a server, where the device includes:
The information acquisition unit is used for acquiring a target teaching identity of a user and a target teaching scene where the user is located, wherein the target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity;
The intelligent body calling unit is used for determining and calling a reference intelligent body corresponding to a target teaching scene as a target intelligent body, each reference intelligent body corresponds to one reference teaching scene, the reference intelligent body is used for realizing interaction with a user so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and the reference intelligent body is in communication connection with a large model base preset in a server;
The data interaction unit is used for acquiring user request information input by a user through the target intelligent agent, and determining a target response result corresponding to a target application function based on data interaction operation between the target intelligent agent and the large model base, wherein the target application function is determined by the user request information, and the target application function is any one of a plurality of preset application functions;
And the function realizing unit is used for realizing the target application function according to the target response result after the target intelligent agent receives the target response result returned by the large model base.
In a third aspect, an embodiment of the present application provides a terminal device comprising a processor, a memory and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing steps as in the first aspect of the embodiment of the present application.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor performs the steps of the first aspect of embodiments of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer programs/instructions which when executed by a processor implement some or all of the steps as described in the first aspect of the embodiments of the present application.
It can be seen that, in the embodiment of the application, the terminal device can be matched with the intelligent body corresponding to the teaching scene according to the user identity, so as to provide corresponding application function service, improve the scene suitability, realize corresponding intelligent assistance aiming at different links in the teaching flow, and improve the intelligence and the practicability of the teaching system. The flexibility of executing data processing of the teaching system is improved by matching agents of different teaching scenes for users.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of an intelligent teaching system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method based on a multi-modal agent according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a trigger event scenario provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a target agent and a large model base according to an embodiment of the present application;
Fig. 5 is a schematic view of a teacher teaching scene provided in an embodiment of the present application;
FIG. 6 is a block diagram showing functional units of a multi-modal agent-based data processing apparatus according to an embodiment of the present application;
FIG. 7 is a block diagram showing functional units of another multi-modal agent-based data processing apparatus according to an embodiment of the present application;
fig. 8 is a block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Referring to fig. 1, fig. 1 is a block diagram of an intelligent teaching system according to an embodiment of the present application. As shown in fig. 1, the intelligent teaching system 100 includes a terminal device 110 and a server 120. The terminal device 110 is a common type of electronic device on the market, which can perform man-machine interaction for the user 130 and carries a display screen, including but not limited to a computer, a smart television, a mobile phone, and the like. The terminal device 110 may interact with the user 130 to start the intelligent teaching service, and after detecting that the intelligent teaching service is started, the terminal device 110 may first obtain the target teaching identity and the target teaching scene of the user 130, where the purpose of the operation is to provide a basis for subsequent selection of the reference agent, so that the functions that can be implemented by the reference agent conform to the user requirements, improve the efficiency and stability of processing data by the server, and reduce the data volume of call data. The selected reference agent is then invoked for human-machine interaction with the user 130, the process of interaction inputting user request information for the user 130, the reference agent responding to the user through data interaction with the server. An intelligent teaching system 100 corresponds to a plurality of servers 120 at the same time, one server 120 corresponds to one or a plurality of terminal devices 110, each terminal device 110 performs man-machine interaction with one user 130, wherein the number of reference agents corresponds to the number of reference teaching scenes, the reference teaching scenes are preset according to the difference of teacher identities and student identities in actual application scenes, the corresponding reference agents are set according to the preset reference teaching scenes, and the function types of preset application functions which can be realized by each reference agent are associated with the scene requirements of the corresponding reference teaching scenes.
Based on this, the embodiment of the application provides a data processing method based on a multi-mode intelligent agent, and the embodiment of the application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flow chart of a data processing method based on a multi-mode agent according to an embodiment of the present application, where the method is applied to a terminal device 110 in an intelligent teaching system 100, and the intelligent teaching system further includes a server 120, and the method includes:
step S201, a target teaching identity of a user and a target teaching scene of the user are obtained.
The target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity.
In one possible embodiment, the reference teaching scene comprises a teacher teaching scene, a teacher lesson preparation scene and a student learning scene, the acquisition of a target teaching identity of a user and a target teaching scene where the user is located comprises the steps of detecting an operation triggering instruction of the user aiming at the target application program, operating the target application program to display an initial login interface of the target application program and acquire agent information stored in the target application program, responding to information input operation of the user aiming at the initial login interface to acquire identity verification information, acquiring user identity information corresponding to the user from a preset local database if the identity verification information passes, requesting and caching relevant data of each reference agent according to the interface information, determining the target teaching identity according to the user identity information corresponding to the user, determining the target teaching scene as the student learning scene when the target teaching identity is determined to be the student identity, calling the current system time of the user according to the user identity information when the target teaching identity is determined to be the teacher identity, determining the target teaching scene to be the teacher lesson time if the current system time is the teacher lesson time, and determining the target teaching scene to be the teacher lesson preparation scene if the current system time is not determined to be the teacher lesson time.
The intelligent agent information comprises interface information for calling a calling interface of a reference intelligent agent, the operation trigger instruction is triggered and generated by a preset trigger event, and the local database is used for storing user identity information associated with all teachers and students in a school where the user is located.
In the process of man-machine interaction between the terminal equipment and the user, firstly, the terminal equipment detects an operation trigger instruction (generated by triggering a preset trigger event) of the user for the target application program, and the operation target application program displays an initial login interface. In the process that the terminal equipment runs the target application program, the terminal equipment can acquire the interface information of the calling interface of the reference agent in advance so as to call each preset reference agent through the calling interface in advance, the speed of the terminal equipment for calling the target agent in response to the user identity information is improved, the response time is effectively reduced, and the teaching efficiency is improved. In addition, in order to guarantee the service quality and the practicality of the intelligent education service, the terminal equipment can retrieve the target teaching identity of the user and the corresponding target teaching scene through the identity verification of the user. It should be noted that this step also ensures the security of the intelligent teaching system, preventing other school staff from invoking relevant data of the server in the intelligent teaching system of the school.
Further, the method of determining the target teaching scene is different for different target teaching identities. The principle is that if the target teaching identity of the user is a student, the scene that the student needs to call the reference agent when the student is in school or out of school is usually a student learning scene. If the target teaching identity of the user is a teacher, different scenes needing to call the reference intelligent agent exist, namely a teacher lesson preparation scene and a teacher teaching scene. At this time, further confirmation operation of the target teaching scene can be performed according to the time when the user starts the target application program and the teacher course time bound with the identity information of the user. The design principle of the embodiment is that based on detailed analysis of teaching scenes and user identities, identity verification and time judgment mechanisms are utilized to accurately classify users into different teaching scenes. Through agent information and large model base interaction, realize providing the customization service for the user under the different scenes, aim at breaking traditional teaching system's limitation, satisfy diversified teaching demand, promote whole teaching experience and quality. Compared with the traditional teaching system, the terminal equipment can realize accurate pushing aiming at different scenes by using the terminal equipment as the precondition of using the corresponding intelligent agent through the identity verification operation, and the teaching resource integration and utilization efficiency is improved.
It can be seen that, in this example, terminal equipment is through setting up operation trigger instruction and authentication in advance, discerns user's identity and teaching scene fast and accurately, realizes intelligent matching, improves teaching resource allocation efficiency by a wide margin, promotes the degree of accuracy of scene discernment, ensures that teaching activity is orderly, high-efficient is developed.
In one possible embodiment, the triggering event comprises at least one reference event of detecting that a user maps to a touch operation of a selectable control in a display interface of the terminal device for a target application, or detecting that a currently acquired user input voice contains an operation voice instruction for the target application, or detecting that target feature information of a currently acquired user action gesture matches reference feature information of a preset opening gesture for the target application, or detecting that the terminal device does not operate other applications and that the current system time is in a high-frequency operation period of the target application.
The reference feature information comprises spatial feature information and time feature information of a preset opening gesture, and the high-frequency operation time period is a time period in which the operation frequency of the target application program counted in the preset days is greater than the preset frequency.
Referring to fig. 3, fig. 3 is a schematic view of a trigger event according to an embodiment of the present application. As shown in fig. 3, the terminal device in fig. 3 is a mobile phone, and the three diagrams from left to right are visual scene diagrams of interaction scenes corresponding to three different trigger events supported by the mobile phone. The method comprises the steps that a user indicated by a first graph performs touch operation on a target application program on a display interface of terminal equipment, after the terminal equipment detects the touch operation, the target application program is started to enter subsequent authentication operation, the user indicated by a second graph stays on the display interface displaying the target application program, and performs corresponding control selection operation on a voice input control reserved on the display interface through a finger or a touch pen, and meanwhile, the user inputs an operation voice instruction. Correspondingly, when the terminal equipment receives a running voice instruction aiming at the target application program, the content of the instruction is analyzed and the subsequent operation is ready to be triggered, and the third diagram shows that the user stays on the display interface displaying the target application program and draws a preset opening gesture on the display interface through gesture input operation. Correspondingly, the terminal equipment (namely the mobile phone) collects the action gesture of the user through the camera or the sensor, extracts the target characteristic information of the gesture, and starts the target application program if the target characteristic information is consistent with the characteristic information of the preset opening gesture.
The trigger event comprises a plurality of reference events with multiple dimensions, which are arranged on the basis of a common man-machine interaction method between a user and terminal equipment. The specific monitoring mode of the terminal equipment aiming at the trigger event set in the embodiment is that 1, the touch operation monitoring is that a system monitors a display interface of the terminal equipment in real time and detects touch operation of a user on a selectable control mapped by a target application program. Once the related touch behaviors are captured, information such as touch positions, time and the like is recorded, and whether the touch is effectively triggered or not is judged. 2. And monitoring voice instructions, namely continuously collecting the voice input by the user by means of a voice recognition technology. The voice content is analyzed to identify whether it contains an operating voice instruction for the target application. If a valid instruction is identified, the instruction content is parsed and ready to trigger subsequent operations. 3. Gesture feature matching monitoring, namely collecting a user action gesture through a camera or a sensor, and extracting target feature information of the gesture, wherein the target feature information comprises spatial features (such as gesture shape, track, angle and the like) and time features (such as gesture completion time, duration of each stage and the like). And comparing the features with reference feature information of a preset opening gesture aiming at a target application program, and judging whether the features are matched. 4. The equipment state and time monitoring comprises the steps of periodically checking an application program list running by the terminal equipment to confirm whether other application programs are not running, simultaneously obtaining the current system time, comparing the current system time with a high-frequency running period of the target application program counted in a preset number of days, and judging whether the current time is in the period. Based on the above multi-dimension monitoring mode, as long as any one of the above reference events is monitored to meet the trigger condition, the terminal device will immediately generate a corresponding trigger instruction to start the target application program.
Further, the design principle of the embodiment is that the multi-mode interaction fusion is used for comprehensively utilizing multiple interaction modes such as touch, voice, gestures and the like, so that the advantages of different interaction modes are fully exerted, the limitation of a single interaction mode is made up, and the diversification and the naturalization of human-computer interaction are realized. By providing a plurality of triggering modes, the operation habits of different users under different scenes can be met. For example, in a driving scene, the user can quickly start the navigation application through voice instructions due to inconvenient operation of both hands, and gesture operation is more convenient when busy or inconvenient to make, so that individuation and comfort of user experience are greatly improved. And the terminal equipment can also perform an additional data analysis process to analyze and learn the user behaviors, automatically start the application according to the high-frequency operation time period (namely, carrying out statistical analysis on the application use frequency of the user in a preset day, learning the use habit of the user and determining the high-frequency operation time period) and the equipment operation state (namely, perceiving the situation information such as the equipment operation state, the current time and the like in real time and making an intelligent decision in combination with the analysis result of the user behaviors), so as to realize intelligent and automatic operation. The system can learn the use habit of the user, is automatically started at the time when the user most probably needs to use the application, reduces the manual operation steps of the user, and improves the use efficiency. In addition, due to the fact that the multi-dimensional triggering event is set, the terminal equipment can quickly capture the intention of a user to start the application through accurate identification of multi-dimensional information such as touch, voice and gestures and accurate judgment of equipment state and time, so that the application program can quickly respond, the waiting time of the user is shortened, and the response speed of the system and the service experience of the user are improved.
In this example, the terminal device is provided to the user with multiple trigger modes of the trigger target application program through multi-mode interaction fusion, so as to meet different user habits, thereby improving user operation convenience and reducing time loss. In addition, the terminal equipment can be automatically triggered according to the equipment running state and the high-frequency running period, so that the intelligence of the teaching system and the continuity of teaching service are improved. And by combining the subsequent identity and scene recognition through the triggering event, the user can be quickly matched with a proper teaching scene, and the teaching resource allocation efficiency and the scene recognition accuracy are further improved.
Step S202, determining and calling a reference agent corresponding to the target teaching scene as a target agent.
Each reference intelligent body corresponds to one reference teaching scene, and the reference intelligent bodies are used for realizing interaction with a user so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and are in communication connection with a large model base preset in a server.
Step S203, user request information input by a user is collected through the target intelligent agent, and a target response result corresponding to the target application function is determined based on data interaction operation between the target intelligent agent and the large model base.
The target application function is determined by the user request information, and the target application function is any one of a plurality of preset application functions.
In one possible embodiment, the method comprises the steps of collecting user request information input by a user through a target agent, determining a target response result corresponding to a target application function based on data interaction operation between the target agent and a large model base, wherein the method comprises the steps of collecting content data input by the user through the target agent, determining a target input format corresponding to the content data, determining the user request information according to the content data and the target input format, sending a result request message carrying the user request information to the corresponding large model base through the target agent, and receiving the target response result sent by an output control module.
The method comprises the steps that a result request message is used for indicating a large model base to call a built-in module to execute data analysis operation aiming at user request information so as to return a target response result and a target application function, the built-in module comprises a special large model, a local knowledge base, a third party plug-in and an output control module, the local knowledge base is used for storing teaching materials corresponding to various disciplines in a school where a user is located, the special large model is used for combining the local knowledge base to analyze content data according to a target input format, the third party plug-in is used for calling teaching tools associated with a plurality of preset application functions corresponding to a target teaching scene, and the input control module is used for integrating sub-response results output by the special large model, the local knowledge base and the third party plug-in into the target response result according to the target application functions indicated by the user request information.
Referring to fig. 4, fig. 4 is a schematic diagram of a target agent and a large model base according to an embodiment of the application. As shown in fig. 4, a user inputs user input information corresponding to a target input format by performing a user input operation on a terminal device, wherein the target input format is any one of a plurality of reference input formats supported by a large model base, and types of the reference input formats include, but are not limited to, voice, text, image and video. The voice part adopts a digital microphone to collect audio, the text is input by an input method to collect text input of a user, and the images and the video are input by a camera. Because the multi-mode large model accepts user input in multiple formats, the target agent can directly send the user input information meeting the reference input format to the large model for processing without additional format conversion and processing. Further, the terminal device is responsible for collecting content data input by a user by calling a target intelligent agent (the target intelligent agent is divided into a teacher teaching intelligent agent, a teacher teaching intelligent agent and a student learning intelligent agent according to a target teaching scene) as a man-machine interaction interface, determining a corresponding target input format, and sending a result request message carrying user request information to the large model base. The large model base comprises a special large model which analyzes content data according to a target input format and performs analysis processing by combining data provided by a local knowledge base. And the local knowledge base is used for storing teaching materials corresponding to various subjects of the school and providing data support for the special large model and the output control module. And the third party plug-in invokes a teaching tool associated with the preset application function of the target teaching scene to assist in generating a response result. And the output control module integrates the sub-response results output by each module according to the target application function indicated by the user request information to form a target response result and returns the target response result to the target intelligent agent. Through the system architecture of the target intelligent body and the large model base, the terminal equipment can interact with the large model base in the server by calling the target intelligent body so as to realize intelligent teaching service required by the user.
In this embodiment, the interaction between the target agent and the server, which are invoked by the terminal device, may implement multi-source data integration and processing, and function-oriented result generation. The multi-source data integration and processing means that the server acquires content data input by a user through the target agent, and adapts according to a reference input format supported by the large model base, so that the data can be effectively processed. And then, through integrating the capacities of the special large model, the local knowledge base and the third-party plugins, analyzing data by utilizing the special large model, providing teaching data support by combining the local knowledge base, and calling the third-party plugins to acquire related teaching tools, so as to realize multi-source data fusion processing and meet the diversified demands of users. The function-oriented result generation means that a target application function is determined according to user request information, sub-response results output by all parts are integrated by an output control module around the target application function, a final target response result is generated, and accurate matching from user requirements to function realization is achieved.
Specifically, based on the system architecture of the target agent and the large model base shown in fig. 4, the teaching scene of the teacher, the lesson preparation scene of the teacher and the learning scene of the student are aimed at. The specific process of using the intelligent teaching service by the user through the terminal equipment is that 1, a teacher gives lessons, namely, the teacher inputs content data such as knowledge point explanation demands, interaction link designs and the like through a target agent during teaching. The system determines target application functions, such as generating teaching courseware, designing classroom questions, and the like, according to the input. The special large model is combined with teaching materials in a local knowledge base, a third party plug-in invokes, for example, an interactive game tool and the like, and outputs an integration result of the control module, thereby providing complete teaching auxiliary content for teachers. 2. And the teacher prepares lessons in a scene that the teacher inputs lesson subjects, teaching targets and other contents, and the system determines that the target application function is to collect and sort lesson preparation data. And analyzing the content by the large model, acquiring related data from a local knowledge base, calling a teaching plan template, a material searching tool and the like by a third-party plug-in, and finally integrating the output control module to generate a detailed lesson preparation scheme. 3. And in a student learning scene, students input content data such as learning questions, homework problems and the like, and the system determines that the target application function is answering. The special large model is combined with knowledge base solution, the third-party plug-in invokes a solution idea analysis tool and the like, and the output control module integrates and generates detailed solution content and learning advice.
It can be seen that, in this example, the terminal device accurately collects user data by calling the target agent, and then matches the large model input format, so that the large model base in the server can quickly understand the user requirements. And through the cooperation of the built-in modules of the large model base, the data are analyzed, the tool is called and the result is integrated, so that the requirements of various teaching functions are met, and the practicability and the data processing efficiency of the terminal equipment are improved.
And S204, after the target intelligent agent receives a target response result returned by the large model base, realizing a target application function according to the target response result.
In one possible embodiment, the target teaching scene is a student learning scene, the target application function is a post-class exercise function, the target application function is realized according to the target response result after the target intelligent agent receives the target response result returned by the large model base, the target application function comprises the steps of determining the current grade of a user and the course progress corresponding to a target subject according to user identity information, performing content screening operation on the target response result according to the current grade and the course progress to determine a plurality of difficulty grade reference screening results, respectively outputting the first grade, second grade and third grade corresponding reference screening results according to a preset exercise strategy to sequentially display corresponding post-class exercises, detecting the answer input operation of the user for the post-class exercises to obtain the user answer of the user for each post-class exercise, determining the reference answer of each post-exercise according to the target response result and the plurality of reference screening results, and outputting the exercise result according to the user answer of the post-exercise exercises and the reference answer of the post-exercise exercises.
The method comprises the steps of selecting a class of the post-class practice problem according to a class of the post-class practice problem, wherein a reference screening result is used for displaying the post-class practice problem matched with the corresponding class of the difficulty, the difficulty degree of the post-class practice problem represented by each class of the difficulty is different from each other, the difficulty class comprises a first class, a second class and a third class, the difficulty degree of the post-class practice problem represented by the class of the difficulty is changed from high to low into the first class, the second class and the third class, and a practice correction result is used for displaying the score information of the current post-class practice of a user and the answer analysis content of the post-class practice problem with wrong answer.
In this embodiment, the target response result may be understood as a comprehensive result generated by a series of processes of the large model base aiming at the user request information, which is a key data base for realizing the post-class exercise function. The large model base cooperates through a plurality of modules to generate a target response result associated with the post-class exercise function. Specifically, the target response results cover various information related to the post-class exercise function, and the information can meet the requirements of each link of the subsequent exercise. For example, it may contain training problem resources with different difficulty levels, which may be screened from a local knowledge base according to the student grade, course progress and other conditions, and analyzed and arranged by a special large model, and may also contain solutions to training problems, knowledge point prompts and other contents, so as to be used for generating reference answers and answer analyses. The method provides basis for each step of realizing the post-class exercise function. And (3) content screening is carried out on the target response result according to the grades and course progress of the students, and reference screening results of different difficulty grades are determined, so that corresponding post-class practice problems are output. Meanwhile, when the reference answers of the practice problems are determined and the practice correction results are generated, the target response results play an important role in guiding.
In this embodiment, after receiving the target response result returned by the large model base, the target agent determines the identity and the scene of the user, that is, accurately locates the current grade of the student and the course progress of the target subject according to the identity information of the user. Further, the target intelligent agent performs content screening operation on the target response result by combining the content in the target response result with the determined grade and course progress, screens out the content which is matched with different difficulty levels, and generates a first-grade reference screening result, a second-grade reference screening result and a third-grade reference screening result which correspond to the high-difficulty, medium-difficulty and low-difficulty post-class exercises respectively. And then, according to a preset exercise strategy, sequentially displaying the exercise exercises with different grades to students. Finally, the target intelligent agent realizes the student answering process through interaction with the user, and after the student is answered, the target intelligent agent replaces the identity of a teacher to correct post-class exercises, and the correction result is exercised, wherein the detailed answer analysis content of the exercise questions with score information and wrong answers is covered.
Further, compared with the traditional education service, the target intelligent agent in the embodiment can provide the learning resources and the training contents matched with the target intelligent agent according to the individual difference of students, so that each student can fully develop in the nearest development area of the student, and the pertinence and the effectiveness of learning are improved. And, replace teacher's work, realize the homework of automatic correction, produce the score and wrong question and analyze fast, the student can know oneself study the situation in time, adjust the study tactics; the teacher puts energy into teaching guidance, and improves teaching efficiency. In addition, through setting up the exercise problem of different difficult grades, guide the student to promote knowledge and skill step by step, avoid because of the problem too hard or too easy, improve user experience.
It can be seen that, in this example, terminal equipment determines grade and course progress according to user identity information through target agent, accurately filters the exercise content after the class, generates many degree of difficulty and practises the problem and satisfy different students' study demands, improves the flexibility that the agent realized target application function. And the target intelligent agent can automatically complete screening, correcting and analyzing, reduce the load of teachers and improve the intelligent level and the teaching quality of teaching.
In one possible embodiment, the target teaching scene is a student learning scene, the target intelligent agent further comprises a emotion recognition module, after determining a practice correction result according to a user answer of a post-class practice problem and a reference answer of the post-class practice problem, the method further comprises collecting audio and video information of a user in answer input operation through the emotion recognition module, determining a first emotion recognition result according to intonation features and speech speed features in voice input information, determining a second emotion recognition result according to facial expression feature vectors and limb action time sequence features in image input information, carrying out fusion processing on the first emotion recognition result and the second emotion recognition result according to a preset fusion algorithm to generate a current emotion state score, calling a preset excitation corpus in a local knowledge base to generate and display personalized encouragement information if a negative dimension score corresponding to the negative dimension exceeds a first score threshold, reducing difficulty level of the post-class practice problem to be displayed next time, or determining a second emotion recognition result according to a preset early warning number of the second score, sending the second emotion recognition result to a monitoring device of a monitoring terminal according to a preset terminal equipment, and determining a current emotion state score according to the preset warning score, and a monitoring device corresponding to the current emotion state score.
The audio and video information comprises voice input information and image input information, the current emotion state score is used for representing emotion tendency degrees of positive dimension, neutral dimension and negative dimension reflected by a user in the current state, and the early warning report carries emotion fluctuation curves and learning state information.
The target teaching scene applicable to the embodiment is a student learning scene, the target application function is a post-class exercise function, and the target intelligent agent determines an exercise modification result according to the user answer and the reference answer of the post-class exercise problem. Then, the target intelligent body performs emotion information acquisition, namely, acquires audio and video information in the learning process of the student in the process of the answer input operation of the student. And extracting facial expression feature vectors and limb action time sequence features from the image input information to determine a second emotion recognition result. And then, carrying out fusion processing on the two results according to a preset fusion algorithm to generate a current emotion state score for representing the emotion tendencies of the active, neutral and passive dimensions of the students. Finally, the target intelligent agent analyzes the scores of the current emotional states so as to ensure the enthusiasm of the student users to learn. In the last step, if the target intelligent agent detects that the negative dimension score corresponding to the negative dimension exceeds a first score threshold (namely, the student is in a negative emotion at the moment), a preset incentive corpus in a local knowledge base is called to generate and display personalized encouragement information to the student, meanwhile, the difficulty level of the next displayed post-class practice problem is reduced, and the learning enthusiasm of the student and the self-confidence of the subsequent student learning are sequentially improved.
Furthermore, the terminal equipment is also provided with a guardian early warning mechanism, namely if the negative dimension score exceeds a second score threshold value (the second score threshold value is larger than the first score threshold value) through continuous preset early warning times, the equipment information of the student guardian terminal equipment is called through a third-party plug-in. Therefore, the emotion fluctuation curve of the student user during learning and the learning state information determined by the training correction result are transmitted to the guardian together, so that the guardian can know the student situation in time and perform human intervention. Compared with the traditional teaching, the data processing process of the target intelligent agent in the embodiment focuses on the learning result evaluation of the student in learning, further increases the real-time monitoring of the emotion state in the learning process, and improves the attention degree of physical and mental health of the student. And through timely informing the guardian when the emotion of the student is continuously negative, detailed emotion fluctuation curve and learning state information are provided, the guardian can conveniently know the student condition and cooperate with the school together, more effective educational measures are adopted, the education resultant force of the family and the school is formed, and the flexibility of the intelligent teaching system for executing data processing is improved.
In one possible embodiment, intonation features and speech speed features of the speech signal are extracted, a pre-trained speech emotion classification model is used to determine a first emotion recognition result, facial expression feature vectors and limb motion time sequence features are extracted, and a convolutional neural network and time sequence classification model are used to determine a second emotion recognition result.
The specific implementation mode of determining the first emotion recognition result is that the terminal equipment firstly preprocesses the collected voice signals and removes interference such as background noise so as to improve signal quality. And extracting intonation features such as pitch fluctuation, tone lifting mode and the like from the preprocessed voice signals by using a digital signal processing technology, and simultaneously determining the speech speed features by calculating the syllable number or word number of the voices in unit time. The extracted intonation and the speech speed characteristics are used as input and sent into a pre-trained voice emotion classification model, and the model is trained on a large number of voice data with emotion marks based on a deep learning algorithm, so that a first emotion recognition result is output, and emotion tendencies of students in voices are judged.
The specific implementation mode of determining the second emotion recognition result is as follows, aiming at the collected image information, facial expression feature vectors are extracted by means of a special facial feature extraction algorithm by utilizing a computer vision technology, and expression details such as upward eyebrow, downward mouth angle and the like are accurately captured. Meanwhile, by adopting a motion capture technology, the sequence, speed change and the like of the limb motions are analyzed, and the time sequence characteristics of the limb motions are determined. Facial expression feature vectors and limb action time sequence features are input into a model framework consisting of a convolutional neural network and a time sequence classification model. The convolutional neural network is good at processing the spatial features of the image, can effectively extract the key features of the facial expression, and the time sequence classification model can process limb action data with time sequence characteristics and excavate the time dependence relationship. And the two work cooperatively to finally output a second emotion recognition result, and judge the emotion state of the student from the visual angle.
It can be seen that, in this example, student audio and video information is collected through emotion recognition module, analyzes multidimensional feature and judges emotion state, adjusts the learning strategy according to passive emotion degree developments, encourages and reduces the degree of difficulty when mild, and can also carry out home-school linkage through calling guardian's contact mode when serious, and continuous passive then sends the early warning report including emotion and learning state to guardian, optimizes the study experience comprehensively to and improves the practicality of intelligent agent concrete realization application function.
In one possible embodiment, the target teaching scene is a teacher teaching scene, the target application function is a class auxiliary function, the target intelligent agent further comprises an emotion recognition module, the intelligent teaching system further comprises a classroom terminal in a class corresponding to the teacher teaching scene, after the target intelligent agent receives a target response result returned by the large model base, the target intelligent agent achieves the target application function according to the target response result, the method comprises the steps of determining a class recording video acquired in real time for the teacher teaching scene according to the target response result, executing video analysis operation according to the class recording video to determine facial expression data corresponding to students and limb action data corresponding to the teacher, determining class attention scores according to the facial expression data, determining a teacher teaching pressure value according to the limb action data, and calling a third-party plug-in to send a class-in test subject or group discussion task to the classroom terminal through the target intelligent agent if the class attention score is lower than a duration of a third score threshold, or calling a local knowledge base to update a class case according to a class difficulty level of a current teaching case of a teacher, and updating the class according to a class of a teaching case, and updating the class difficulty of a teaching case before the class is designed for the current class of the teacher.
The class attention score is used for representing an average value of attention concentration in a target class recorded by a class recorded video, the teacher teaching pressure value is used for representing tension degree of a user in a current teacher teaching scene, and the class test subjects and the group discussion tasks are associated with target subjects corresponding to the current teacher teaching scene.
The target response result is returned content after a series of data analysis and processing according to the request information sent by the target intelligent agent by the large model base. In this embodiment, it is an important basis for implementing the classroom auxiliary function, and includes key information related to the teacher teaching scene, such as an identifier or a data path pointing to a real-time collected classroom recorded video. The information enables the target intelligent agent to determine and acquire the classroom recorded video according to the information, and further a series of operations such as subsequent video analysis, index calculation, execution of targeted measures and the like are carried out, so that various requirements of the classroom auxiliary function are met.
For example, referring to fig. 5, fig. 5 is a schematic view of a teacher teaching scene according to an embodiment of the present application. As shown in fig. 5, cameras in a classroom where a teacher gives lessons can shoot recorded video of each classroom corresponding to each lesson in real time, and upload the recorded video to a large model base in a server in the intelligent teaching system. After the teaching identity is that a user of a teacher invokes a teacher teaching intelligent agent and enables a target application function of a classroom auxiliary function, the teacher teaching intelligent agent invokes a classroom recorded video in a large model base and performs corresponding video analysis operations respectively for students and the teacher. Specifically, facial expression collection is performed on students through video analysis operation to obtain facial expression data, limb motion data collection is performed on teachers to obtain limb motion data, and further subsequent steps in the embodiment are performed, namely class attention allocation is determined through the facial expression data, teacher teaching pressure values are determined according to the limb motion data, and class assistance operation is performed on teacher teaching according to class assessment of the two dimensions.
The design principle of the embodiment is that the student attention (namely, the evaluation operation of class attention scoring) and the teacher teaching pressure (namely, the evaluation operation of teacher teaching pressure value) are quantified through analyzing facial expression and limb action data in a classroom recorded video, decisions are made based on the data, teaching adjustment is more scientific and targeted, and through the cooperative work of a plurality of modules such as a target intelligent body, a third party plug-in and a local knowledge base, different resources and functions are integrated, comprehensive support and optimization of a complex teaching scene are realized (namely, for different teaching situations such as the concentration of students or the overlarge pressure of the teachers, personalized solutions are provided, such as the pushing of proper tasks or the adjustment of teaching scheme difficulty, the diversified demands in actual teaching are met), and through focusing on the states of teachers and students in the classroom, the support is provided for the teachers through technical means, the teacher is helped to better cope with teaching challenges, and the learning experience of the students is improved.
It can be seen that, in this example, by means of the emotion recognition module, the facial expression of the student and the limb action data of the teacher in the classroom recorded video are analyzed, the class attention score and the teacher teaching pressure value are accurately mastered, the teaching strategy is dynamically optimized accordingly, the student attention is attracted, the teacher pressure is lightened, and the classroom teaching quality is improved.
Therefore, the server can realize corresponding intelligent assistance aiming at different links in the teaching flow by internally arranging different types of reference intelligent bodies and arranging a large model base in the server, and the intelligence and the practicability of the teaching system are improved. And the terminal equipment is connected into the large model through the agent to improve the efficiency and flexibility of teaching by teachers and learning by students, and improve the flexibility of the agent for processing the data to be processed input by the user and the feedback experience of the user.
The following are embodiments of the apparatus of the present application, which are within the same concept as embodiments of the method of the present application, for performing the methods described in the embodiments of the present application. For convenience of explanation, the embodiments of the present application are only shown in the parts related to the embodiments of the present application, and specific technical details are not disclosed, please refer to the description of the embodiments of the present application method, which is not repeated here.
The data processing device based on the multi-mode agent provided by the embodiment of the application is applied to the terminal device 110 in the intelligent teaching system 100 shown in fig. 1, and the intelligent teaching system 100 further comprises a server 120. Specifically, the data processing device based on the multi-mode agent is used for executing the steps executed by the terminal device in the data processing method based on the multi-mode agent. The data processing device based on the multi-mode intelligent agent provided by the embodiment of the application can comprise modules corresponding to the corresponding steps.
The embodiment of the application can divide the functional modules of the data processing device based on the multi-mode intelligent agent according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules can be realized in a hardware mode or a software functional module mode. The division of the modules in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation.
FIG. 6 is a block diagram showing functional units of a multi-modal intelligent agent-based data processing apparatus according to an embodiment of the present application, where respective functional modules are divided by corresponding respective functions; the multi-modal intelligent agent-based data processing device 60 is applied to a terminal device 110 in an intelligent teaching system 100, the intelligent teaching system 100 further comprises a server 120, the multi-modal intelligent agent-based data processing device 60 comprises an information acquisition unit 601, a data interaction unit 603, an intelligent agent calling unit 602, a target application calling unit 604, a target application function determining unit 604 and a target application function determining unit 604, wherein the information acquisition unit 601 is used for acquiring a target teaching identity of a user and a target teaching scene in which the user is located, the target teaching identity is used for indicating that the teaching identity corresponding to the user is a teacher identity or a student identity, the target teaching scene is any one of a plurality of reference teaching scenes matched with the teacher identity or the student identity, the intelligent agent corresponding to the target teaching scene is a target intelligent agent, each reference intelligent agent corresponds to one reference teaching scene, the reference intelligent agent is used for realizing interaction with the user, so as to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, the reference intelligent agent is in communication connection with a large model base preset in the server, the data interaction unit 603 is used for acquiring user request information input by the user through the target intelligent agent, the target intelligent agent is used for determining that the target application function corresponds to the target application function, and the target application function is required to be received by the large model base, and the target application function is used for responding to the result of the target application function is determined to be a target application function to be required to be responded to the target function, and realizing the target application function according to the target response result.
In one possible embodiment, the reference teaching scene comprises a teacher teaching scene, a teacher lesson preparation scene and a student learning scene, the information acquisition unit 601 is specifically configured to detect an operation triggering instruction of a user for a target application program in terms of acquiring a target teaching identity of the user and the target teaching scene where the user is located, operate the target application program to display an initial login interface in the target application program and acquire agent information stored in the target application program, the agent information comprises interface information for calling a calling interface of the reference agent, the operation triggering instruction is generated by a preset triggering event, respond to an information input operation of the user for the initial login interface to acquire identity verification information, acquire user identity information corresponding to the user from a preset local database if the identity verification information passes, request and cache relevant data of each reference agent according to the interface information, determine a target teaching identity according to the user identity information stored in the target application program, determine that the target teaching identity is the user identity information, and when determining that the target teaching identity is the student, determine that the target teaching scene is the student teaching scene is the current, and when the teacher teaching system is the teacher teaching scene is determined that the teacher teaching system is not the current, and the teacher teaching system is determined that the teacher teaching scene is not determined that the teacher teaching system is the current.
In one possible embodiment, the triggering event comprises at least one reference event, namely detecting that a user maps a touch operation of a selectable control in a display interface of the terminal device to a target application program, or detecting that the currently acquired user input voice contains an operation voice instruction for the target application program, or detecting that target feature information of the currently acquired user action gesture is matched with reference feature information of a preset opening gesture for the target application program, wherein the reference feature information comprises spatial feature information and time feature information of the preset opening gesture, or detecting that the terminal device does not operate other application programs, and the current system time is in a high-frequency operation period of the target application program, wherein the high-frequency operation period is a period of time in which the operation frequency of the target application program counted in a preset day is greater than a preset frequency.
In one possible embodiment, in the aspects of acquiring user request information input by a user through a target intelligent agent and determining a target response result corresponding to a target application function based on data interaction operation between the target intelligent agent and a large model base, the data interaction unit 603 is specifically configured to acquire content data input by the user through the target intelligent agent and determine a target input format corresponding to the content data, the target input format is any one of a plurality of reference input formats supported by the large model base, determine user request information according to the content data and the target input format, send a result request message carrying the user request information to the corresponding large model base through the target intelligent agent, instruct the large model base to execute data analysis operation on the user request information based on the data interaction operation, so as to return a target response result and the target application function, the built-in module comprises a special large model, a local knowledge base, a third party plug-in and an output control module, the special large model is used for combining the local knowledge base to analyze and content data according to the target input format, the third party plug-in is used for sending the preset information to the corresponding to the large model base, and the built-in module is used for integrating the teaching result response to the target application function, and the third party plug-in module is used for receiving the instruction response result and integrating the target response result.
In one possible embodiment, the target teaching scene is a student learning scene, and the target application function is a post-class exercise function; after the target agent receives the target response result returned by the large model base, the data interaction unit 603 is specifically configured to determine, according to the user identity information, the current grade of the user and the course progress corresponding to the target subject; according to the current grade and course progress, a content screening operation is executed on the target response results to determine reference screening results of a plurality of difficulty grades, the reference screening results are used for displaying post-training exercises matched with the corresponding difficulty grades, the difficulty degrees of the post-training exercises represented by each difficulty grade are different from each other, the difficulty grades comprise a first grade, a second grade and a third grade, the difficulty degrees of the post-training exercises represented by the difficulty grades are changed from high to low to the first grade, the second grade and the third grade, corresponding reference screening results of the first grade, the second grade and the third grade are respectively output according to a preset training strategy to sequentially display corresponding post-training exercises, and answer input operations of users for the post-training exercises are detected to obtain user answers of the users for each post-training exercise, the reference answers of each post-training exercise are determined according to the target response results and the plurality of reference screening results, and the answer information of the post-training exercises is output according to the answers of the user answers and the post-training exercises, and the answer information of the current answers are displayed after-training exercises is improved.
In one possible embodiment, the target teaching scene is a student learning scene, and the target agent further comprises a emotion recognition module; after determining the exercise modification result according to the user answer of the post-class exercise questions and the reference answer of the post-class exercise questions, the function implementation unit 604 is specifically further configured to collect audio and video information of the user in an answer input operation through an emotion recognition module, wherein the audio and video information comprises voice input information and image input information, determine a first emotion recognition result according to intonation features and speech speed features in the voice input information, determine a second emotion recognition result according to facial expression feature vectors and limb action time sequence features in the image input information, perform fusion processing on the first emotion recognition result and the second emotion recognition result according to a preset fusion algorithm to generate a current emotion state score, wherein the current emotion state score is used for representing the tendency degree of active dimension, neutral dimension and passive dimension reflected by the user in the current state, and call an incentive corpus preset in a local knowledge base to generate and display personalized information and reduce the difficulty level of the next-class exercise questions, or determine the current emotion state score according to a preset pre-warning number of pre-warning devices, determine the current emotion state score according to the pre-warning device, and the pre-warning device determines the current emotion score corresponding to the current emotion score of the user in the current state score according to the first score, and the pre-warning device determines the pre-warning device corresponding to the second emotion score of the second emotion score, the early warning report carries an emotion fluctuation curve and learning state information.
In one possible embodiment, the target teaching scene is a teacher teaching scene, the target application function is a classroom auxiliary function, the target intelligent agent further comprises a emotion recognition module, and the intelligent teaching system further comprises a classroom terminal in a classroom corresponding to the teacher teaching scene; after the target agent receives the target response result returned by the large model base, the function implementation unit 604 is specifically configured to determine, according to the target response result, a class recording video collected in real time for a teacher teaching scene, perform video analysis operation according to the class recording video to determine facial expression data corresponding to students and limb motion data corresponding to a teacher, determine a class attention score according to the facial expression data, and determine a teacher teaching pressure value according to the limb motion data, where the class attention score is used to characterize an average value of attention concentration in a target class recorded by the class recording video, and the teacher teaching pressure value is used to characterize a stress level of a user in a current teacher teaching scene, and call a third party plug-in to a classroom terminal through the target agent if the class attention score is lower than a third score threshold for a duration time longer than a preset duration, and associated with a current class subject or a group discussion task through the class test subject and the group discussion task, or call a teacher subject to a fourth subject class pressure value is greater than or equal to a fourth subject class threshold, and the class subject class is designed in advance, and the teaching pressure value is set in accordance with the current class scene, and the teaching subject class is designed in advance, and the teaching subject is used to the teaching subject is a case of the teacher subject is designed in accordance with the current class case, and updating the current classroom teaching plan so that the teaching difficulty level of the updated classroom teaching plan is lower than that of the classroom teaching plan before updating.
In the case of using integrated units, as shown in fig. 7, fig. 7 is a block diagram illustrating functional units of another data processing apparatus based on a multi-modal agent according to an embodiment of the present application. In fig. 7, the multi-modal agent based data processing apparatus 60 includes a processing module 720 and a communication module 710. The processing module 720 is configured to control and manage the actions of the multi-modal agent-based data processing apparatus 60, such as the steps of the information acquisition unit 601, the agent invocation unit 602, the data interaction unit 603, and the function implementation unit 604, and/or other processes for performing the techniques described herein. The communication module 710 is used to support interactions between the multi-modal agent-based data processing apparatus and other devices. As shown in fig. 7, the multi-modal agent based data processing apparatus may include a memory module 730, the memory module 730 storing program codes and data of the multi-modal agent based data processing apparatus.
The processing module 720 may be a Processor or controller, such as a central processing unit (Central Processing Unit, CPU), a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an ASIC, FPGA or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth. The communication module 710 may be a transceiver, an RF circuit, or a communication interface, etc. The storage module 730 may be a memory.
All relevant contents of each scenario related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein. The multi-modal agent-based data processing apparatus 60 may perform the multi-modal agent-based data processing method shown in fig. 2.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired or wireless means from one website site, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc. that contain one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Fig. 8 is a block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, terminal device 110 may include one or more processors 810, a memory 820 coupled to processor 810, wherein memory 820 may store one or more computer programs 821, which when executed by one or more processors 810 may be configured to implement the methods described in the embodiments above. The server here is the terminal device 110 in the above embodiment.
Processor 810 may include one or more processing cores. The processor 810 connects various parts within the overall terminal device 110 using various interfaces and lines, performs various functions of the terminal device 110 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 820, and invoking data stored in the memory 820. Alternatively, the processor 810 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable Logic Array (PLA). The processor 810 may integrate one or a combination of several of a central processing unit (CentralProcessing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 810 and may be implemented solely by a single communication chip.
The Memory 820 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). Memory 820 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 820 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device 110 in use, etc.
It will be appreciated that terminal device 110 may include more or fewer structural elements than those shown in the block diagrams described above and is not limiting herein.
The present application also provides a computer storage medium having stored thereon a computer program/instruction which, when executed by a processor, performs part or all of the steps of any of the methods described in the method embodiments above.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the partitioning of elements is merely a logical functional partitioning, and there may be additional ways in which it may be actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the methods according to the embodiments of the present invention. The storage medium includes a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a volatile memory or a nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of random access memory (random access memory, RAM) are available, such as static random access memory (STATIC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM), among various media that can store program code.
Although the present invention is disclosed above, the present invention is not limited thereto. Variations and modifications, including combinations of the different functions and implementation steps, as well as embodiments of the software and hardware, may be readily apparent to those skilled in the art without departing from the spirit and scope of the invention.

Claims (10)

Translated fromChinese
1.一种基于多模态智能体的数据处理方法,其特征在于,应用于智能教学系统中的终端设备,所述智能教学系统中还包括服务器;所述方法包括:1. A data processing method based on a multimodal agent, characterized in that it is applied to a terminal device in an intelligent teaching system, wherein the intelligent teaching system also includes a server; the method comprises:获取用户的目标教学身份和所述用户所处的目标教学场景,所述目标教学身份用于指示所述用户对应的教学身份为教师身份或者学生身份,所述目标教学场景为与所述教师身份或者所述学生身份匹配的多个参考教学场景中的任意一个;Acquire a target teaching identity of a user and a target teaching scenario of the user, wherein the target teaching identity is used to indicate that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scenario is any one of a plurality of reference teaching scenarios matching the teacher identity or the student identity;确定并调用与所述目标教学场景对应的参考智能体为目标智能体,每一所述参考智能体对应一个所述参考教学场景,所述参考智能体用于实现与所述用户交互,以确定处于对应的所述参考教学场景中所需使用的多个预设应用功能,所述参考智能体与所述服务器中预先设置的大模型底座通信连接;Determine and call a reference agent corresponding to the target teaching scene as the target agent, each of the reference agents corresponds to one reference teaching scene, the reference agent is used to interact with the user to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and the reference agent is communicatively connected with a large model base pre-set in the server;通过所述目标智能体采集所述用户输入的用户请求信息,以及基于所述目标智能体与所述大模型底座之间的数据交互操作,以确定目标应用功能对应的目标响应结果,所述目标应用功能由所述用户请求信息确定,所述目标应用功能为所述多个预设应用功能中的任意一个;The target agent collects the user request information input by the user, and determines the target response result corresponding to the target application function based on the data interaction operation between the target agent and the large model base, wherein the target application function is determined by the user request information and the target application function is any one of the multiple preset application functions;待所述目标智能体接收到所述大模型底座返回的所述目标响应结果后,根据所述目标响应结果实现所述目标应用功能。After the target agent receives the target response result returned by the large model base, the target application function is implemented according to the target response result.2.根据权利要求1所述的方法,其特征在于,所述参考教学场景包括教师授课场景、教师备课场景和学生学习场景;所述获取用户的目标教学身份和所述用户所处的目标教学场景,包括:2. The method according to claim 1 is characterized in that the reference teaching scene includes a teacher teaching scene, a teacher preparing a lesson scene and a student learning scene; the step of obtaining the user's target teaching identity and the user's target teaching scene comprises:检测到所述用户针对目标应用程序的运行触发指令,运行所述目标应用程序以显示所述目标应用程序中的初始登录界面,并获取所述目标应用程序中存储的智能体信息,所述智能体信息包括用于调用所述参考智能体的调用接口的接口信息,所述运行触发指令由预设的触发事件触发生成;Detecting a run trigger instruction from the user for a target application, running the target application to display an initial login interface in the target application, and acquiring agent information stored in the target application, the agent information including interface information for calling a calling interface of the reference agent, wherein the run trigger instruction is generated by a preset trigger event;响应于所述用户针对所述初始登录界面的信息输入操作,以获取身份验证信息;In response to the user's information input operation on the initial login interface, obtaining identity authentication information;若所述身份验证信息验证通过,则从预设的本地数据库中获取所述用户对应的用户身份信息,以及根据所述接口信息,请求并缓存每个所述参考智能体的相关数据,所述本地数据库用于存储所述用户所处学校中的所有的教师和学生所关联的用户身份信息;If the identity verification information is verified, the user identity information corresponding to the user is obtained from a preset local database, and according to the interface information, the relevant data of each reference agent is requested and cached, and the local database is used to store the user identity information associated with all teachers and students in the school where the user is located;根据所述用户对应的所述用户身份信息确定所述目标教学身份;以及,Determining the target teaching identity according to the user identity information corresponding to the user; and,当确定所述目标教学身份为所述学生身份时,确定所述目标教学场景为所述学生学习场景;When it is determined that the target teaching identity is the student identity, the target teaching scene is determined to be the student learning scene;当确定所述目标教学身份为所述教师身份时,根据所述用户身份信息调用所述用户的教师课程时间,并获取当前系统时间;When it is determined that the target teaching identity is the teacher identity, calling the teacher course time of the user according to the user identity information, and obtaining the current system time;若所述当前系统时间处于所述教师课程时间,则确定所述目标教学场景为所述教师授课场景;If the current system time is in the teacher's class time, determining the target teaching scene to be the teacher's teaching scene;若所述当前系统时间未处于所述教师课程时间,则确定所述目标教学场景为所述教师备课场景。If the current system time is not in the teacher's class time, the target teaching scene is determined to be the teacher's lesson preparation scene.3.根据权利要求2所述的方法,其特征在于,所述触发事件包括以下至少一种参考事件:3. The method according to claim 2, wherein the trigger event comprises at least one of the following reference events:检测到所述用户针对所述目标应用程序映射于所述终端设备的显示界面中的可选控件的触碰操作;或者,detecting a touch operation of the user on an optional control mapped by the target application program in a display interface of the terminal device; or,检测到当前次采集的用户输入语音包含针对所述目标应用程序的运行语音指令;或者,It is detected that the currently collected user input voice contains a running voice instruction for the target application; or,检测到当前次采集的用户动作手势的目标特征信息与针对所述目标应用程序的预设开启手势的参考特征信息匹配,所述参考特征信息包括所述预设开启手势的空间特征信息和时间特征信息;或者,It is detected that the target feature information of the currently collected user action gesture matches the reference feature information of the preset start gesture for the target application, and the reference feature information includes the spatial feature information and the temporal feature information of the preset start gesture; or,检测到所述终端设备未运行其余应用程序,且所述当前系统时间处于所述目标应用程序的高频运行时段,所述高频运行时段为预设天数内统计的所述目标应用程序的运行频率大于预设频率的时段。It is detected that the terminal device is not running other applications, and the current system time is in a high-frequency operation period of the target application, and the high-frequency operation period is a period in which the operation frequency of the target application counted within a preset number of days is greater than a preset frequency.4.根据权利要求3所述的方法,其特征在于,所述通过所述目标智能体采集所述用户输入的用户请求信息,以及基于所述目标智能体与所述大模型底座之间的数据交互操作,以确定目标应用功能对应的目标响应结果,包括:4. The method according to claim 3, characterized in that the collecting of the user request information input by the user through the target agent, and determining the target response result corresponding to the target application function based on the data interaction operation between the target agent and the large model base, comprises:通过所述目标智能体采集所述用户输入的内容数据,以及确定所述内容数据对应的目标输入格式,所述目标输入格式为所述大模型底座支持的多个参考输入格式中的任意一种;Collecting the content data input by the user through the target agent, and determining a target input format corresponding to the content data, wherein the target input format is any one of a plurality of reference input formats supported by the large model base;根据所述内容数据和所述目标输入格式确定所述用户请求信息;Determining the user request information according to the content data and the target input format;通过所述目标智能体向对应的所述大模型底座发送携带有所述用户请求信息的结果请求消息,其中,所述结果请求消息用于指示所述大模型底座调用内置模块针对所述用户请求信息执行数据分析操作,以返回所述目标响应结果和所述目标应用功能,所述内置模块包括专用大模型、本地知识库、第三方插件以及输出控制模块,所述本地知识库用于存储所述用户所处的所述学校中的各类学科对应的教学资料,所述专用大模型用于结合所述本地知识库以根据所述目标输入格式解析与所述内容数据,所述第三方插件用于调用所述目标教学场景对应的所述多个预设应用功能所关联的教学工具,所述输入控制模块用于根据所述用户请求信息指示的所述目标应用功能,整合所述专用大模型、所述本地知识库和所述第三方插件输出的子响应结果为所述目标响应结果;A result request message carrying the user request information is sent to the corresponding large model base through the target agent, wherein the result request message is used to instruct the large model base to call a built-in module to perform a data analysis operation on the user request information to return the target response result and the target application function, and the built-in module includes a dedicated large model, a local knowledge base, a third-party plug-in and an output control module, the local knowledge base is used to store the teaching materials corresponding to various subjects in the school where the user is located, the dedicated large model is used to combine the local knowledge base to parse the content data according to the target input format, the third-party plug-in is used to call the teaching tools associated with the multiple preset application functions corresponding to the target teaching scenario, and the input control module is used to integrate the sub-response results output by the dedicated large model, the local knowledge base and the third-party plug-in into the target response result according to the target application function indicated by the user request information;接收所述输出控制模块发送的所述目标响应结果。Receive the target response result sent by the output control module.5.根据权利要求4所述的方法,其特征在于,所述目标教学场景为所述学生学习场景,以及所述目标应用功能为课后练习功能;所述待所述目标智能体接收到所述大模型底座返回的所述目标响应结果后,根据所述目标响应结果实现所述目标应用功能,包括:5. The method according to claim 4, characterized in that the target teaching scenario is the student learning scenario, and the target application function is an after-class exercise function; after the target agent receives the target response result returned by the large model base, the target application function is realized according to the target response result, comprising:根据所述用户身份信息确定所述用户的当前年级和目标学科对应的课程进度;Determine the course progress corresponding to the current grade and target subject of the user according to the user identity information;根据所述当前年级和所述课程进度,对所述目标响应结果执行内容筛选操作,以确定多个难易等级的参考筛选结果,所述参考筛选结果用于显示与对应的所述难易等级匹配的课后练习题,每个所述难易等级所表征的所述课后练习题的难易程度互不相同,所述难易等级包括第一等级、第二等级和第三等级,所述难易等级所表征的课后练习题的难易程度从高到低为所述第一等级、所述第二等级和所述第三等级;According to the current grade and the course progress, a content screening operation is performed on the target response result to determine reference screening results of multiple difficulty levels, the reference screening results are used to display after-class exercises matching the corresponding difficulty levels, the difficulty levels of the after-class exercises represented by each difficulty level are different from each other, the difficulty levels include a first level, a second level and a third level, and the difficulty levels of the after-class exercises represented by the difficulty levels are, from high to low, the first level, the second level and the third level;根据预设练习策略,分别输出所述第一等级、所述第二等级和所述第三等级对应的所述参考筛选结果,以依次显示对应的所述课后练习题;以及,According to a preset practice strategy, the reference screening results corresponding to the first level, the second level and the third level are output respectively, so as to display the corresponding after-class exercises in sequence; and,检测所述用户针对所述课后练习题的答案输入操作,以获取所述用户针对每个所述课后练习题的用户答案;Detecting the user's input operation of answering the after-class exercises to obtain the user's answer to each of the after-class exercises;根据所述目标响应结果和所述多个参考筛选结果,确定每个所述课后练习题的参考答案;Determining a reference answer for each of the after-class exercises according to the target response result and the multiple reference screening results;根据所述课后练习题的用户答案和所述课后练习题的参考答案,输出练习批改结果,所述练习批改结果用于显示所述用户的当前次的课后练习的得分信息和答案错误的所述课后练习题的答案解析内容。Output the exercise marking result according to the user's answer to the exercise and the reference answer to the exercise, wherein the exercise marking result is used to display the score information of the user's current exercise and the answer analysis content of the exercise with wrong answer.6.根据权利要求5所述的方法,其特征在于,所述目标教学场景为所述学生学习场景,以及所述目标智能体还包括情感识别模块;在所述根据所述课后练习题的用户答案和所述课后练习题的参考答案,确定练习批改结果之后,所述方法还包括:6. The method according to claim 5, characterized in that the target teaching scenario is the student learning scenario, and the target agent further comprises an emotion recognition module; after determining the exercise correction result according to the user answers to the after-class exercises and the reference answers to the after-class exercises, the method further comprises:通过所述情感识别模块在所述答案输入操作中采集所述用户的音视频信息,所述音视频信息包括语音输入信息和图像输入信息;Collecting the user's audio and video information during the answer input operation by the emotion recognition module, wherein the audio and video information includes voice input information and image input information;根据所述语音输入信息中的语调特征和语速特征确定第一情感识别结果,以及根据所述图像输入信息中的面部表情特征向量和肢体动作时序特征确定第二情感识别结果;Determine a first emotion recognition result according to the intonation features and the speech speed features in the voice input information, and determine a second emotion recognition result according to the facial expression feature vector and the body movement timing features in the image input information;将所述第一情感识别结果和所述第二情感识别结果,根据预设融合算法进行融合处理以生成当前情绪状态评分,所述当前情绪状态评分用于表征所述用户在当前状态下所反映的积极维度、中立维度和消极维度的情绪倾向程度;以及,The first emotion recognition result and the second emotion recognition result are fused according to a preset fusion algorithm to generate a current emotion state score, wherein the current emotion state score is used to characterize the degree of emotion tendency of the user in the positive dimension, the neutral dimension and the negative dimension in the current state; and若所述消极维度对应的消极维度评分超过第一分数阈值,则调用所述本地知识库中预设的激励语料库生成并显示个性化鼓励信息,以及降低下一次显示的所述课后练习题的所述难易等级;或者,If the negative dimension score corresponding to the negative dimension exceeds a first score threshold, calling a preset motivational corpus in the local knowledge base to generate and display personalized encouragement information, and reducing the difficulty level of the after-class exercises displayed next time; or,若连续预设预警次数检测到所述消极维度评分超过第二分数阈值,则通过所述第三方插件调取所述用户的监护人的终端设备的设备信息,所述第二分数阈值大于所述第一分数阈值;If the negative dimension score is detected to exceed a second score threshold for a preset number of consecutive warning times, the device information of the terminal device of the guardian of the user is retrieved through the third-party plug-in, and the second score threshold is greater than the first score threshold;根据所述预警次数对应的当前情绪状态评分确定情绪波动曲线,以及根据所述预警次数对应的练习批改结果确定学习状态信息;Determine an emotion fluctuation curve according to the current emotion state score corresponding to the number of warnings, and determine learning state information according to the exercise correction results corresponding to the number of warnings;根据所述设备信息向所述监护人的终端设备发送预警报告,所述预警报告携带有所述情绪波动曲线和所述学习状态信息。A warning report is sent to the terminal device of the guardian according to the device information, wherein the warning report carries the emotion fluctuation curve and the learning status information.7.根据权利要求4所述的方法,其特征在于,所述目标教学场景为所述教师授课场景,所述目标应用功能为课堂辅助功能,所述目标智能体还包括情感识别模块,以及所述智能教学系统还包括所述教师授课场景对应课堂中的教室终端;所述待所述目标智能体接收到所述大模型底座返回的所述目标响应结果后,根据所述目标响应结果实现所述目标应用功能,包括:7. The method according to claim 4 is characterized in that the target teaching scene is the teacher teaching scene, the target application function is a classroom auxiliary function, the target intelligent agent further includes an emotion recognition module, and the intelligent teaching system further includes a classroom terminal in the classroom corresponding to the teacher teaching scene; after the target intelligent agent receives the target response result returned by the large model base, the target application function is realized according to the target response result, comprising:根据所述目标响应结果,确定针对所述教师授课场景实时采集的课堂录制视频;According to the target response result, determine the classroom recording video collected in real time for the teacher's teaching scene;根据所述课堂录制视频执行视频分析操作,以确定学生对应的面部表情数据和教师对应的肢体动作数据;Performing a video analysis operation based on the classroom recording video to determine the facial expression data corresponding to the student and the body movement data corresponding to the teacher;根据所述面部表情数据确定班级注意力评分,以及根据所述肢体动作数据确定教师授课压力值,所述班级注意力评分用于表征所述课堂录制视频所记录的目标班级中的注意力集中度的平均数值,所述教师授课压力值用于表征所述用户在当前次的教师授课场景中的紧张程度;以及,Determine a class attention score according to the facial expression data, and determine a teacher teaching stress value according to the body movement data, wherein the class attention score is used to represent an average value of the concentration of the target class recorded by the classroom recording video, and the teacher teaching stress value is used to represent the tension of the user in the current teacher teaching scene; and若所述班级注意力评分低于第三分数阈值的持续时长大于预设时长,则通过所述目标智能体调用所述第三方插件以向所述教室终端推送随堂测试题目或者分组讨论任务,所述随堂测试题目和所述分组讨论任务与当前次的所述教师授课场景对应的所述目标学科相关联;或者,If the duration of the class attention score being lower than the third score threshold is longer than a preset duration, the target agent calls the third-party plug-in to push in-class test questions or group discussion tasks to the classroom terminal, and the in-class test questions and the group discussion tasks are associated with the target subject corresponding to the current teacher teaching scene; or若所述教师授课压力值大于或者等于第四分数阈值,则通过所述目标智能体调用所述本地知识库,以匹配所述目标学科的教学设计案例,并获取所述用户预先上传的当前次的所述教师授课场景的课堂教案;If the teacher's teaching pressure value is greater than or equal to a fourth score threshold, the local knowledge base is called through the target agent to match the teaching design case of the target subject, and the classroom lesson plan of the current teacher's teaching scenario uploaded in advance by the user is obtained;根据所述教学设计案例,对当前次的所述课堂教案进行更新,以使更新后的所述课堂教案的教学难度等级低于更新前的所述课堂教案的教学难度等级。According to the teaching design case, the current classroom lesson plan is updated so that the teaching difficulty level of the updated classroom lesson plan is lower than the teaching difficulty level of the classroom lesson plan before the update.8.一种基于多模态智能体的数据处理装置,其特征在于,应用于智能教学系统中的终端设备,所述智能教学系统中还包括服务器;所述装置包括:8. A data processing device based on a multimodal agent, characterized in that it is applied to a terminal device in an intelligent teaching system, wherein the intelligent teaching system also includes a server; the device comprises:信息获取单元,用于获取用户的目标教学身份和所述用户所处的目标教学场景,所述目标教学身份用于指示所述用户对应的教学身份为教师身份或者学生身份,所述目标教学场景为与所述教师身份或者所述学生身份匹配的多个参考教学场景中的任意一个;An information acquisition unit, used for acquiring a target teaching identity of a user and a target teaching scenario of the user, wherein the target teaching identity is used to indicate that the teaching identity corresponding to the user is a teacher identity or a student identity, and the target teaching scenario is any one of a plurality of reference teaching scenarios matching the teacher identity or the student identity;智能体调用单元,用于确定并调用与所述目标教学场景对应的参考智能体为目标智能体,每一所述参考智能体对应一个所述参考教学场景,所述参考智能体用于实现与所述用户交互,以确定处于对应的所述参考教学场景中所需使用的多个预设应用功能,所述参考智能体与所述服务器中预先设置的大模型底座通信连接;An agent calling unit, used to determine and call a reference agent corresponding to the target teaching scene as a target agent, each of the reference agents corresponds to one reference teaching scene, the reference agent is used to interact with the user to determine a plurality of preset application functions required to be used in the corresponding reference teaching scene, and the reference agent is communicatively connected with a large model base pre-set in the server;数据交互单元,用于通过所述目标智能体采集所述用户输入的用户请求信息,以及基于所述目标智能体与所述大模型底座之间的数据交互操作,以确定目标应用功能对应的目标响应结果,所述目标应用功能由所述用户请求信息确定,所述目标应用功能为所述多个预设应用功能中的任意一个;a data interaction unit, configured to collect user request information input by the user through the target agent, and determine a target response result corresponding to a target application function based on a data interaction operation between the target agent and the large model base, wherein the target application function is determined by the user request information and is any one of the plurality of preset application functions;功能实现单元,用于待所述目标智能体接收到所述大模型底座返回的所述目标响应结果后,根据所述目标响应结果实现所述目标应用功能。The function realization unit is used to realize the target application function according to the target response result after the target agent receives the target response result returned by the large model base.9.一种终端设备,其特征在于,包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-7中任一项所述的方法中的步骤的指令。9. A terminal device, characterized in that it comprises a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the processor, and the program includes instructions for executing the steps in the method as described in any one of claims 1-7.10.一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-7中任一项所述的方法。10. A computer-readable storage medium, characterized in that it stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the method according to any one of claims 1 to 7.
CN202510206205.3A2025-02-252025-02-25 Data processing method and device based on multimodal intelligent agentPendingCN119690250A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510206205.3ACN119690250A (en)2025-02-252025-02-25 Data processing method and device based on multimodal intelligent agent

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510206205.3ACN119690250A (en)2025-02-252025-02-25 Data processing method and device based on multimodal intelligent agent

Publications (1)

Publication NumberPublication Date
CN119690250Atrue CN119690250A (en)2025-03-25

Family

ID=95043064

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510206205.3APendingCN119690250A (en)2025-02-252025-02-25 Data processing method and device based on multimodal intelligent agent

Country Status (1)

CountryLink
CN (1)CN119690250A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120218755A (en)*2025-05-282025-06-27北京翌特视讯科技有限公司 AI-driven classroom real-time interaction and teaching quality intelligent optimization system and method
CN120299318A (en)*2025-06-112025-07-11中国人民解放军中部战区总医院 A scenario-based multi-role casualty treatment virtual training system

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108805009A (en)*2018-04-202018-11-13华中师范大学Classroom learning state monitoring method based on multimodal information fusion and system
CN108877336A (en)*2018-03-262018-11-23深圳市波心幻海科技有限公司Teaching method, cloud service platform and tutoring system based on augmented reality
CN113095969A (en)*2021-03-112021-07-09华中师范大学Immersion type turnover classroom teaching system based on multiple virtualization entities and working method thereof
CN119418725A (en)*2024-11-072025-02-11华中师范大学 A multimodal classroom emotion recognition method and system based on modality adaptive learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108877336A (en)*2018-03-262018-11-23深圳市波心幻海科技有限公司Teaching method, cloud service platform and tutoring system based on augmented reality
CN108805009A (en)*2018-04-202018-11-13华中师范大学Classroom learning state monitoring method based on multimodal information fusion and system
CN113095969A (en)*2021-03-112021-07-09华中师范大学Immersion type turnover classroom teaching system based on multiple virtualization entities and working method thereof
CN119418725A (en)*2024-11-072025-02-11华中师范大学 A multimodal classroom emotion recognition method and system based on modality adaptive learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄涛 等: "智能教育场域中的学习者建模研究趋向", 远程教育杂志, no. 01, 20 January 2020 (2020-01-20)*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120218755A (en)*2025-05-282025-06-27北京翌特视讯科技有限公司 AI-driven classroom real-time interaction and teaching quality intelligent optimization system and method
CN120299318A (en)*2025-06-112025-07-11中国人民解放军中部战区总医院 A scenario-based multi-role casualty treatment virtual training system

Similar Documents

PublicationPublication DateTitle
CN110991381B (en) A real-time classroom student status analysis and instruction reminder system and method based on behavior and voice intelligent recognition
CN107030691B (en) Data processing method and device for nursing robot
CN111833861B (en)Event evaluation report generation based on artificial intelligence
CN119690250A (en) Data processing method and device based on multimodal intelligent agent
CN111027486A (en)Auxiliary analysis and evaluation system and method for big data of teaching effect of primary and secondary school classroom
CN108563780A (en)Course content recommends method and apparatus
CN110009537B (en)Information processing method, device, equipment and storage medium
US20220150287A1 (en)System and method for an interactive digitally rendered avatar of a subject person
CN117788239A (en)Multi-mode feedback method, device, equipment and storage medium for talent training
CN113377200B (en)Interactive training method and device based on VR technology and storage medium
CN117541445B (en) A virtual environment interactive eloquence training method, system, device and medium
CN117541444B (en) An interactive virtual reality eloquence expression training method, device, equipment and medium
CN111209817A (en)Assessment method, device and equipment based on artificial intelligence and readable storage medium
CN115131867A (en)Student learning efficiency detection method, system, device and medium
KR20090094576A (en)An apparatus and method for evaluating spoken ability by speech recognition through computer-lead interaction and thereof
CN118173119A (en) A method for training eloquence based on dynamic adjustment mechanism
CN110767005A (en)Data processing method and system based on intelligent equipment special for children
CN116825288A (en)Autism rehabilitation course recording method and device, electronic equipment and storage medium
CN118941165A (en) A smart education classroom evaluation platform, electronic device and readable storage medium
US20240202634A1 (en)Dialogue training device, dialogue training system, dialogue training method, and computer-readable medium
CN118864189A (en) Data processing method and related device based on classroom video
CN118349641A (en)Investigation inquiry scene simulation exercise AI intelligent system
KR102644989B1 (en)Method for providing psychological counseling service using voice data of the deceased based on artificial intelligence algorithm
CN116416839A (en)Training auxiliary teaching method based on Internet of things training system
KR20230124237A (en)Method for progressing non face to face exercise lecture using speech recogntion

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp