Movatterモバイル変換


[0]ホーム

URL:


CN120029517B - Intelligent body service system based on domestic operating system - Google Patents

Intelligent body service system based on domestic operating system

Info

Publication number
CN120029517B
CN120029517BCN202510503462.3ACN202510503462ACN120029517BCN 120029517 BCN120029517 BCN 120029517BCN 202510503462 ACN202510503462 ACN 202510503462ACN 120029517 BCN120029517 BCN 120029517B
Authority
CN
China
Prior art keywords
behavior
user
reinforcement learning
decision
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510503462.3A
Other languages
Chinese (zh)
Other versions
CN120029517A (en
Inventor
李照川
林一伟
王冠军
张尧臣
林杰
王金超
张庆鑫
王珂琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co LtdfiledCriticalInspur Software Technology Co Ltd
Priority to CN202510503462.3ApriorityCriticalpatent/CN120029517B/en
Publication of CN120029517ApublicationCriticalpatent/CN120029517A/en
Application grantedgrantedCritical
Publication of CN120029517BpublicationCriticalpatent/CN120029517B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于国产化操作系统的智能体服务系统,属于操作系统与人工智能融合技术领域,本发明要解决的技术问题为常规智能体系统在国产化生态中存在的行为感知敏感度差、偏好变更响应滞后、决策可解释性低,采用的技术方案为:该系统是融合操作行为感知与强化学习技术,通过操作感知引擎、人类反馈强化学习中枢、MCP协议适配器以及思维链可视化设计器形成感知‑建模‑优化‑解释的完整技术闭环,实现智能体从被动执行向主动协同的范式跃迁;其中,通过操作感知引擎实时捕获并解析用户行为数据,获取操作序列数据,将操作序列数据作为特征输入到强化学习中枢,强化学习中枢输出操作指令。

The present invention discloses an intelligent agent service system based on a domestic operating system, which belongs to the technical field of the integration of operating systems and artificial intelligence. The technical problem to be solved by the present invention is the poor behavioral perception sensitivity, delayed response to preference changes, and low decision-making interpretability of conventional intelligent agent systems in a domestic ecosystem. The technical solution adopted is: the system integrates operation behavior perception and reinforcement learning technology, and forms a complete technical closed loop of perception-modeling-optimization-interpretation through an operation perception engine, a human feedback reinforcement learning center, an MCP protocol adapter, and a thinking chain visual designer, thereby realizing the paradigm transition of intelligent agents from passive execution to active collaboration; wherein, the operation perception engine captures and analyzes user behavior data in real time to obtain operation sequence data, and the operation sequence data is input as features into the reinforcement learning center, and the reinforcement learning center outputs operation instructions.

Description

Intelligent body service system based on domestic operating system
Technical Field
The invention relates to the technical field of fusion of an operating system and artificial intelligence, in particular to an intelligent service system based on a domestic operating system.
Background
Along with the acceleration of permeation of the intelligent agent technology to the desktop end, the defects of the prior art system in domestic adaptation, safety, reliability, ecological coordination and the like are exposed, so that the autonomous process of the information technology in China is severely restricted, and the method is particularly expressed as three core bottlenecks:
1. Protocol ecological splitting, namely, at present, an agent tool commonly adopts a private communication protocol (such as a semantic network model of JSON-RPC and ANP of MCP), so that data island and development cost are increased greatly. Taking the enterprise ERP system integration as an example, the traditional scheme needs to develop independent interfaces for each tool, so that the integration period is prolonged by 3-8 times, and the maintenance cost is increased by more than 4 times. The splitting is not only embodied on the technical architecture, but also extends to the security mechanism level, namely the ACL white list of the MCP and the zero knowledge of the ANP prove difficult to cooperate, and privacy leakage risks are easily caused.
2. The deep reconstruction of behavior perception is that the traditional system can only capture surface events such as clicking, scrolling and the like, and the accuracy of understanding the GUI operation semantics is insufficient. For example, in WPS office scenarios, collaborative editing requirements for frequent user switching of "revision mode" are often ignored, and association analysis of multi-window overlay operations (Excel linked to PPT) is weak.
3. Hysteresis of preference change response conventional human feedback Reinforcement Learning (RLHF) is too long, traditional thinking chains rely on manual annotation data, RLHF training period is usually over 72 hours, and user preference change cannot be responded in real time. When a user adjusts the document format standard, the static model needs full-parameter retraining, and catastrophic forgetting is easy to cause.
Therefore, how to overcome the defects of poor behavioral perception sensitivity, delayed preference change response, low decision interpretation and the like of the conventional intelligent agent system in domestic ecology is a technical problem to be solved urgently.
Disclosure of Invention
The technical task of the invention is to provide an intelligent agent service system based on a domestic operation system, so as to solve the problems of poor behavioral perception sensitivity, delayed preference change response and low decision interpretation of the conventional intelligent agent system in domestic ecology.
The technical task of the invention is realized in the following way, namely an intelligent agent service system based on a domestic operating system, which is a fusion operation behavior perception and reinforcement learning technology, and forms a complete technology closed loop of perception-modeling-optimization-interpretation through an operation perception engine, a human feedback reinforcement learning center, an MCP protocol adapter and a thinking chain visual designer to realize the paradigm transition of the intelligent agent from passive execution to active cooperation;
The MCP protocol adapter is used for interacting the operation instruction recommended by the reinforcement learning center with different data sources and services through a standardized interface, obtaining a result corresponding to the operation instruction, transmitting feedback (such as user scoring and eye tracking data) of an external system back to the human feedback reinforcement learning center, continuously optimizing a strategy network of the human feedback reinforcement learning center, and converting a complex decision process and data relationship of the human feedback reinforcement learning center into an intuitive visual view by a thinking chain visual designer, so that a user is helped to understand decision logic and a behavior mode of the intelligent behavior analysis device, the interpretability and the user trust degree of the system are improved, and the adjustment and optimization of the human feedback reinforcement learning center are guided to form a closed-loop optimization process.
Preferably, the operation awareness engine includes:
The system comprises an operation sequence data acquisition module, a file system access track and a user operation sequence data acquisition module, wherein the operation sequence data acquisition module is used for capturing user operation behaviors in real time based on a kernel-level hook mechanism of a domestic operation system to form operation sequence data, the user operation behaviors comprise GUI operation events and the file system access track;
And the user multidimensional image construction module is used for constructing a user multidimensional behavior image through an event tracing technology and realizing the analysis of the context relation of operation semantics.
Preferably, the operation awareness engine also has the following functions:
① The multi-device synchronous sensing is supported, namely the mobile terminal, the desktop terminal and the cloud terminal are uniformly captured;
② The abnormal behavior detection function is added, namely, real-time alarming of atypical operation modes is carried out;
③ Providing encryption storage and transmission of behavior data, and ensuring data security;
④ And supporting plug-in extension, and allowing a third party developer to access the custom behavior awareness rule.
Preferably, the reinforcement learning center includes:
The system comprises a model training module, a behavior-file multi-modal joint probability model, a model learning module and a model analysis module, wherein the model training module is used for performing behavior-file multi-modal joint probability model training, the behavior-file multi-modal joint probability model is a technology for combining behavior data and file data (such as texts, images and the like) and modeling multi-modal association through a probability frame;
The optimizing engine building module is used for building a strategy optimizing engine driven by double-channel feedback and realizing the optimization of the behavior-file multi-mode joint probability model;
And the privacy protection and safety module is used for realizing privacy protection in the reinforcement learning center by utilizing a privacy protection and safety mechanism.
More preferably, the model training module works as follows:
(1) Calculating characteristic values of three dimensions of corresponding operation frequency, duration and path complexity for each operation behavior respectively; the operation frequency refers to counting the execution times of a user on each operation behavior in a specific time period and reflecting the use frequency of the user on different operation behaviors, the duration refers to recording the time spent by the user on each operation behavior and reflecting the attention degree and the investment time of the user on different operations, and the path complexity refers to analyzing the path complexity of the user when the user executes the operations, such as the directory depth, the jump times and the like of an access file and measuring the complexity of the operation path of the user;
(2) Taking different operation behaviors of a user as vocabulary, taking a series of operation sequences of the user as documents, and quantifying preference weights W of the user on different types of operation behaviors by using a TF-IDF algorithm, wherein the specific formula is as follows;
;
Wherein, theRepresenting the number of occurrences of the vocabulary t in the document Di; Representing documentsN represents the total number of documents; Representing documentsWhether the word t is contained or not, if the word t is contained as1, the word t is not contained as 0;
The TF-IDF is a statistical method for evaluating the importance degree of a word for one document in a document set or a corpus, wherein the importance of the word is increased in proportion to the occurrence Frequency of the word in the document, but is reduced in inverse proportion to the occurrence Frequency of the word in the corpus, so that the influence of the common word on the key word can be effectively avoided, and the correlation between the key word and the article is improved;
(3) Converting all operation sequence data into structured feature vectors by using preference weights and three-dimensional feature values of different types of operation behaviors of a user, and using the structured feature vectors as input for training a behavior-file multi-mode joint probability model;
(4) Calculating joint probability distribution of operation behaviors and file access behaviors through a dynamic Bayesian network, modeling operation sequences and file access behaviors of users at different time points as conditional probability distribution, and capturing causal relations between the operation behaviors and the file access;
(5) Inputting the structured feature vector into a multi-layer neural network, outputting probability distribution of operation suggestions by the multi-layer neural network, training a behavior-file joint probability model through historical behavior data by adopting a supervised learning method, constructing time-space association of an operation sequence and file access, and dynamically updating the probability distribution to reflect time sequence and context dependency of user behavior;
(6) And verifying behavior-file multi-mode joint probability model performance through cross verification and index evaluation (such as accuracy, recall rate and F1 score), and ensuring generalization capability of the model.
More preferably, the working process of the optimization engine building module is specifically as follows:
(1) Designing a feedback channel, wherein the feedback channel comprises an explicit feedback channel and an implicit feedback channel; the implicit channel records the gaze point, glance path and pupil change of a user in the operation process through eye movement tracking, records the stay time of the user on a specific operation or interface element, calculates the cognitive load index such as gaze frequency, average stay time and the like through eye movement and stay time length data, converts the score and the cognitive load index into a numeric feedback signal, and inputs the numeric feedback signal as a reward function of reinforcement learning;
(2) Designing a reward function by combining an explicit feedback signal and an implicit feedback signal, wherein the explicit feedback signal is directly used as a reward value, and the implicit feedback signal indirectly influences rewards through cognitive load indexes;
(3) The PPO algorithm is used for calculating gradient update of a behavior-file multi-mode joint probability model strategy network, and stability of training is ensured through truncation strategy update, wherein the PPO algorithm adopts multi-objective optimization, optimizes a user operation path through minimizing path entropy, improves operation efficiency, improves unidentifiability of sensitive operation through maximizing sensitive operation confusion degree, and protects user privacy.
More preferably, the privacy protection and security module works as follows:
(1) In the back propagation process, the gradient masking technology is utilized to mask the gradient related to the sensitive data, so that the private data is ensured not to be leaked;
(2) By adding noise or transforming feature vectors, the behavior features of the sensitive operation are confused, and the identifiability of the sensitive operation is reduced;
(3) And the privacy protection effect of the system is periodically evaluated, and the effectiveness of a privacy protection mechanism is ensured.
Preferably, the MCP protocol adapter includes a protocol gateway deployment module and a dynamic service discovery module;
The protocol network management deployment module is used for deploying the MCP protocol gateway by adopting a client-server architecture, wherein the client is used for receiving an operation strategy generated by the reinforcement learning center, the server is an external data source and a tool, the client and the server perform function negotiation to determine that the client and the server mutually provide functions and services, and the protocol network management deployment module integrates a JSON-RPC 2.0 standard protocol and supports two communication modes, and the protocol network management deployment module comprises the following specific steps:
① A local pipeline (stdio) mode implementing a <10ms low-latency response, adapted to process local operational behavior data;
② A network streaming (SSE) mode supporting high concurrency calls, suitable for processing behavioral data in a distributed system;
The dynamic service discovery module is used for identifying an available MCP server through an automatic scanning mechanism, the available MCP server comprises a local IDE plug-in unit, an enterprise ERP system interface and cloud AI service (such as a Claude reasoning engine), a client sends a request to the server according to a user request or the requirement of an AI model, the server processes the user request and possibly interacts with local or remote resources, after operation execution is completed, the server returns a processing result to the client, the client transmits information back to a host application program, parameterized resource positioning is realized by adopting a URI dynamic template, and the JSON-RPC 2.0 standard protocol is maintained, so that the flexibility and the dynamics of service discovery are ensured.
More preferably, the MCP protocol adapter has the following functions:
① The cross-platform compatibility is supported, namely Windows, linux, macOS and a mobile terminal;
② The protocol version management function is supported, and the MCP protocol seamless switching of different versions is supported;
③ Providing a service health monitoring function, and monitoring the availability of the MCP server in real time;
④ The service fusing mechanism is supported, and system breakdown caused by single-point faults is avoided.
Preferably, the mind chain visualization designer includes:
the decision traceability model construction module is used for constructing a decision traceability model based on a multi-head attention mechanism, integrating time sequence behavior data with system state characteristics, tracking and recording the forming process of each decision generated by the reinforcement learning center, and constructing a complete decision chain, wherein the forming process of each decision generated by the reinforcement learning center comprises key influencing factors of the decision and context information during the decision;
The decision path reconstruction module is used for reconstructing a decision path by adopting an LSTM network weighted by a time attenuation factor, modeling and analyzing time sequence data in a decision process, highlighting decision trend and mode which change along with time, and helping to understand the evolution process of the decision;
the causal relation analysis module is used for generating an interpretable view with causal relation by combining a knowledge graph technology, correlating each event and operation in the decision process with a result generated by the event and operation to form a causal relation graph, and revealing logic and motivation behind the decision;
The visual output module is used for acquiring a behavior thermodynamic diagram, a file relationship network and a strategy evolution time axis; the system comprises a file association network, a strategy evolution time axis, a strategy analysis time axis and a strategy analysis time axis, wherein the behavior thermodynamic diagram is used for presenting operation mode distribution, showing the operation frequency and modes of a user in different time and different scenes, helping to identify a high-frequency operation area and a user behavior habit;
the feedback and optimization module is used for feeding back the visualized result to the reinforcement learning center, providing reference for further optimization of the reinforcement learning strategy, finding out potential problems and improvement spaces in the reinforcement learning strategy by analyzing the visualized output, guiding adjustment and optimization of the reinforcement learning algorithm, and forming a closed-loop optimization process.
The intelligent agent service system based on the domestic operating system has the following advantages:
Through multimode behavior perception, behavior-file multimode reinforcement learning optimization and lightweight thinking chain generation technologies, MCP protocol access, dynamic behavior modeling and man-machine co-evolution mechanisms are innovatively fused, the problems of poor behavior perception sensitivity, delayed preference change response, low decision interpretation and the like of a conventional intelligent system in domestic ecology are solved, and autonomous controllable and efficient reasoning of intelligent service in a domestic environment is realized;
Secondly, the invention realizes the dynamic access of the cross-platform intelligent agent tool through a context protocol (MCP), and combines the operation behavior perception, the file semantic understanding and the human feedback reinforcement learning algorithm to construct a user personalized thinking chain system;
the invention breaks through the core bottleneck of the intelligent agent technology in the application of the landing in the domestic operation system through three technical paths of domestic adaptation, safety enhancement and ecological cooperation;
the invention aims to construct an intelligent agent service system and device based on a domestic operating system, and realizes the normal form transition from passive execution to active cooperation by constructing an operation-feedback-optimization technology closed loop, specifically, constructing a unified tool access framework by using a MCP (micro control protocol) protocol, realizing seamless integration of heterogeneous tools based on a client-server architecture, calling a request by using a JSON-RPC 2.0 protocol encapsulation tool, automatically matching a local tool or a service end API by a dynamic service discovery mechanism, combining fine grain authority control, ensuring that only authenticated equipment and users can access system resources, establishing a three-dimensional behavior modeling system of user operation-file-feedback, designing a multi-mode feedback interface to convert a user score into an enhanced learning excitation signal by using a bidirectional LSTM (LSTM) analysis window focus track and gesture operation sequence, deploying a hierarchical PPO algorithm-driven thinking chain optimization engine, wherein a meta-strategy network fuses operation time sequence characteristics with a file TF-IDF vector optimization long-term target, and a task strategy network realizes quick strategy update by using a real-time data pipeline;
the invention ensures data main rights through domestic kernel-level monitoring, realizes transmission encryption by adopting a national cryptographic algorithm, builds a dynamic association model of behavior-files to break through the static limitation of traditional log analysis, simultaneously innovates and fuses a human feedback mechanism and an interpretable AI technology to ensure that an agent decision process has evolutionary capability and transparency, and experiments show that the invention can improve the operation efficiency of a common office scene by 37 percent, reduce the misoperation rate by 62 percent and simultaneously provide privacy protection capability conforming to GB/T35273 standard.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block diagram of an agent service system based on a domestic operating system.
Detailed Description
An intelligent agent service system based on a domestic operating system according to the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Examples:
As shown in fig. 1, the present embodiment provides an agent service system based on a domestic operating system, which is a fused operation behavior sensing and reinforcement learning technology, and forms a complete technology closed loop of sensing-modeling-optimizing-explaining through an operation sensing engine, a human feedback reinforcement learning center, an MCP protocol adapter and a thinking chain visual designer, so as to realize the transition from passive execution to active cooperation of an agent;
The MCP protocol adapter is used for interacting the operation instruction recommended by the reinforcement learning center with different data sources and services through a standardized interface, obtaining a result corresponding to the operation instruction, transmitting feedback (such as user scoring and eye tracking data) of an external system back to the human feedback reinforcement learning center, continuously optimizing a strategy network of the human feedback reinforcement learning center, and converting a complex decision process and data relationship of the human feedback reinforcement learning center into an intuitive visual view by a thinking chain visual designer, so that a user is helped to understand decision logic and a behavior mode of the intelligent behavior analysis device, the interpretability and the user trust degree of the system are improved, and the adjustment and optimization of the human feedback reinforcement learning center are guided to form a closed-loop optimization process.
The operation awareness engine in this embodiment includes:
The system comprises an operation sequence data acquisition module, a file system access track and a user operation sequence data acquisition module, wherein the operation sequence data acquisition module is used for capturing user operation behaviors in real time based on a kernel-level hook mechanism of a domestic operation system to form operation sequence data, the user operation behaviors comprise GUI operation events and the file system access track;
And the user multidimensional image construction module is used for constructing a user multidimensional behavior image through an event tracing technology and realizing the analysis of the context relation of operation semantics.
The operation awareness engine in this embodiment also has the following functions:
① The multi-device synchronous sensing is supported, namely the mobile terminal, the desktop terminal and the cloud terminal are uniformly captured;
② The abnormal behavior detection function is added, namely, real-time alarming of atypical operation modes is carried out;
③ Providing encryption storage and transmission of behavior data, and ensuring data security;
④ And supporting plug-in extension, and allowing a third party developer to access the custom behavior awareness rule.
The reinforcement learning center in the present embodiment includes:
The system comprises a model training module, a behavior-file multi-modal joint probability model, a model learning module and a model analysis module, wherein the model training module is used for performing behavior-file multi-modal joint probability model training, the behavior-file multi-modal joint probability model is a technology for combining behavior data and file data (such as texts, images and the like) and modeling multi-modal association through a probability frame;
The optimizing engine building module is used for building a strategy optimizing engine driven by double-channel feedback and realizing the optimization of the behavior-file multi-mode joint probability model;
And the privacy protection and safety module is used for realizing privacy protection in the reinforcement learning center by utilizing a privacy protection and safety mechanism.
The working process of the model training module in this embodiment is specifically as follows:
(1) Calculating characteristic values of three dimensions of corresponding operation frequency, duration and path complexity for each operation behavior respectively; the operation frequency refers to counting the execution times of a user on each operation behavior in a specific time period and reflecting the use frequency of the user on different operation behaviors, the duration refers to recording the time spent by the user on each operation behavior and reflecting the attention degree and the investment time of the user on different operations, and the path complexity refers to analyzing the path complexity of the user when the user executes the operations, such as the directory depth, the jump times and the like of an access file and measuring the complexity of the operation path of the user;
(2) Taking different operation behaviors of a user as vocabulary, taking a series of operation sequences of the user as documents, and quantifying preference weights W of the user on different types of operation behaviors by using a TF-IDF algorithm, wherein the specific formula is as follows;
;
Wherein, theRepresenting the number of occurrences of the vocabulary t in the document Di; Representing documentsN represents the total number of documents; Representing documentsWhether the word t is contained or not, if the word t is contained as1, the word t is not contained as 0;
(3) Converting all operation sequence data into structured feature vectors by using preference weights and three-dimensional feature values of different types of operation behaviors of a user, and using the structured feature vectors as input for training a behavior-file multi-mode joint probability model;
(4) Calculating joint probability distribution of operation behaviors and file access behaviors through a dynamic Bayesian network, modeling operation sequences and file access behaviors of users at different time points as conditional probability distribution, and capturing causal relations between the operation behaviors and the file access;
(5) Inputting the structured feature vector into a multi-layer neural network, outputting probability distribution of operation suggestions by the multi-layer neural network, training a behavior-file joint probability model through historical behavior data by adopting a supervised learning method, constructing time-space association of an operation sequence and file access, and dynamically updating the probability distribution to reflect time sequence and context dependency of user behavior;
(6) And verifying behavior-file multi-mode joint probability model performance through cross verification and index evaluation (such as accuracy, recall rate and F1 score), and ensuring generalization capability of the model.
The working process of the optimization engine building module in this embodiment is specifically as follows:
(1) Designing a feedback channel, wherein the feedback channel comprises an explicit feedback channel and an implicit feedback channel; the implicit channel records the gaze point, glance path and pupil change of a user in the operation process through eye movement tracking, records the stay time of the user on a specific operation or interface element, calculates the cognitive load index such as gaze frequency, average stay time and the like through eye movement and stay time length data, converts the score and the cognitive load index into a numeric feedback signal, and inputs the numeric feedback signal as a reward function of reinforcement learning;
(2) Designing a reward function by combining an explicit feedback signal and an implicit feedback signal, wherein the explicit feedback signal is directly used as a reward value, and the implicit feedback signal indirectly influences rewards through cognitive load indexes;
(3) The PPO algorithm is used for calculating gradient update of a behavior-file multi-mode joint probability model strategy network, and stability of training is ensured through truncation strategy update, wherein the PPO algorithm adopts multi-objective optimization, optimizes a user operation path through minimizing path entropy, improves operation efficiency, improves unidentifiability of sensitive operation through maximizing sensitive operation confusion degree, and protects user privacy.
The working process of the privacy protection and security module in this embodiment is specifically as follows:
(1) In the back propagation process, the gradient masking technology is utilized to mask the gradient related to the sensitive data, so that the private data is ensured not to be leaked;
(2) By adding noise or transforming feature vectors, the behavior features of the sensitive operation are confused, and the identifiability of the sensitive operation is reduced;
(3) And the privacy protection effect of the system is periodically evaluated, and the effectiveness of a privacy protection mechanism is ensured.
The MCP protocol adapter in the embodiment comprises a protocol gateway deployment module and a dynamic service discovery module;
The protocol network management deployment module is used for deploying the MCP protocol gateway by adopting a client-server architecture, wherein the client is used for receiving an operation strategy generated by the reinforcement learning center, the server is an external data source and a tool, the client and the server perform function negotiation to determine that the client and the server mutually provide functions and services, and the protocol network management deployment module integrates a JSON-RPC 2.0 standard protocol and supports two communication modes, and the protocol network management deployment module comprises the following specific steps:
① A local pipeline (stdio) mode implementing a <10ms low-latency response, adapted to process local operational behavior data;
② A network streaming (SSE) mode supporting high concurrency calls, suitable for processing behavioral data in a distributed system;
The dynamic service discovery module is used for identifying an available MCP server through an automatic scanning mechanism, the available MCP server comprises a local IDE plug-in unit, an enterprise ERP system interface and cloud AI service (such as a Claude reasoning engine), a client sends a request to the server according to a user request or the requirement of an AI model, the server processes the user request and possibly interacts with local or remote resources, after operation execution is completed, the server returns a processing result to the client, the client transmits information back to a host application program, parameterized resource positioning is realized by adopting a URI dynamic template, and the JSON-RPC 2.0 standard protocol is maintained, so that the flexibility and the dynamics of service discovery are ensured.
The MCP protocol adapter in this embodiment has the following functions:
① The cross-platform compatibility is supported, namely Windows, linux, macOS and a mobile terminal;
② The protocol version management function is supported, and the MCP protocol seamless switching of different versions is supported;
③ Providing a service health monitoring function, and monitoring the availability of the MCP server in real time;
④ The service fusing mechanism is supported, and system breakdown caused by single-point faults is avoided.
The thinking chain visual designer in this embodiment includes:
the decision traceability model construction module is used for constructing a decision traceability model based on a multi-head attention mechanism, integrating time sequence behavior data with system state characteristics, tracking and recording the forming process of each decision generated by the reinforcement learning center, and constructing a complete decision chain, wherein the forming process of each decision generated by the reinforcement learning center comprises key influencing factors of the decision and context information during the decision;
The decision path reconstruction module is used for reconstructing a decision path by adopting an LSTM network weighted by a time attenuation factor, modeling and analyzing time sequence data in a decision process, highlighting decision trend and mode which change along with time, and helping to understand the evolution process of the decision;
the causal relation analysis module is used for generating an interpretable view with causal relation by combining a knowledge graph technology, correlating each event and operation in the decision process with a result generated by the event and operation to form a causal relation graph, and revealing logic and motivation behind the decision;
The visual output module is used for acquiring a behavior thermodynamic diagram, a file relationship network and a strategy evolution time axis; the system comprises a file association network, a strategy evolution time axis, a strategy analysis time axis and a strategy analysis time axis, wherein the behavior thermodynamic diagram is used for presenting operation mode distribution, showing the operation frequency and modes of a user in different time and different scenes, helping to identify a high-frequency operation area and a user behavior habit;
the feedback and optimization module is used for feeding back the visualized result to the reinforcement learning center, providing reference for further optimization of the reinforcement learning strategy, finding out potential problems and improvement spaces in the reinforcement learning strategy by analyzing the visualized output, guiding adjustment and optimization of the reinforcement learning algorithm, and forming a closed-loop optimization process.
The working process of this embodiment is specifically as follows:
The method comprises the following steps of S1, capturing operation behaviors of a user in real time based on an operation perception engine, wherein the operation behaviors of the user are input basis of subsequent modules (such as a reinforcement learning center and an MCP protocol adapter), namely capturing the operation behaviors of the user in real time based on a domestic operation system kernel-level hook mechanism, wherein the operation behaviors comprise GUI operation events (such as window focus switching, control clicking and shortcut key triggering) and file system access tracks (such as creating/reading/writing/deleting operations) to form operation sequence data;
S2, inputting user operation behavior data captured by an operation perception engine as characteristics into a reinforcement learning center, training a behavior-file multi-mode joint probability model, wherein the behavior-file multi-mode joint probability model is used for receiving task instructions of a user and outputting probability distribution of suggested operation instructions, and the method specifically comprises the following steps of:
S201, calculating characteristic values of three dimensions of operation frequency, duration and path complexity of each operation behavior respectively, wherein the operation frequency refers to counting the execution times of a user on each operation behavior in a specific time period and reflecting the use frequency of the user on different operation behaviors, the duration refers to recording the time spent by the user on each operation behavior and reflecting the attention degree and the input time of the user on different operations;
S202, regarding different operation behaviors of a user as words, regarding a series of operation sequences of the user as documents, and quantifying preference weights W of the user on different types of operation behaviors by using a TF-IDF algorithm, wherein a specific formula is as follows;
;
Wherein, theRepresenting the number of occurrences of the vocabulary t in the document Di; Representing documentsN represents the total number of documents; Representing documentsWhether the word t is contained or not, if the word t is contained as1, the word t is not contained as 0;
s203, converting all operation sequence data into a structured feature vector by using the preference weight obtained by calculation and the extracted three-dimensional features, and taking the structured feature vector as an input of model training;
s204, calculating joint probability distribution of operation behaviors and file access behaviors through a dynamic Bayesian network, modeling operation sequences of users at different time points and the file access behaviors as conditional probability distribution, and capturing causal relations between the operation behaviors and the file access;
s205, constructing a multi-layer neural network, inputting a structured feature vector, outputting a probability distribution suggested by operation, adopting a supervised learning method, training a behavior-file joint probability model through historical behavior data, constructing time-space association between an operation sequence and file access, and dynamically updating the probability distribution to reflect the time sequence and the context dependency of user behavior;
s206, verifying the performance of the model through cross verification and index evaluation (such as accuracy, recall and F1 score), and ensuring the generalization capability of the model;
S3, receiving a task instruction of a user, obtaining an optimal operation instruction based on a trained behavior-file multi-mode joint probability model, calling an external data source and service through a standardized interface provided by an MCP protocol adapter, executing corresponding instructions, and obtaining an operation result corresponding to the operation instruction, wherein the operation instruction is transmitted to an external system (such as ERP and AI service) through the MCP protocol adapter based on an operation strategy generated by a reinforcement learning center, and automatically executing corresponding operations, an MCP protocol adaptation framework is constructed and comprises two parts of protocol gateway deployment and dynamic service discovery, wherein the protocol network management deployment adopts a client-server architecture to deploy the MCP protocol gateway, the client is used for receiving the operation strategy generated by the reinforcement learning center, the server performs function negotiation between the external data source and can mutually provide functions and services, and integrates JSON-RPC 2.0 standard protocols and supports two communication modes:
① The local pipeline (stdio) mode implements a <10ms low-latency response, suitable for processing local operational behavior data;
② Network streaming (SSE) mode supports high concurrency calls, suitable for processing behavioral data in a distributed system;
the dynamic service discovery identifies available MCP servers through an automatic scanning mechanism, including but not limited to a local IDE plug-in, an enterprise ERP system interface, cloud AI service (such as Claude reasoning engine) and the like, wherein a client sends requests to the server according to user requests or the requirements of an AI model, and the server processes the requests and possibly interacts with local or remote resources;
S4, a strategy optimization engine driven based on double-channel feedback in a reinforcement learning center collects feedback (such as user scoring and eye movement tracking data) of a user on an operation result, further optimizes the strategy, and improves the operation efficiency and privacy protection capability of a model, wherein the method comprises the following steps:
S401, designing feedback channels, including an explicit feedback channel and an implicit feedback channel. The explicit channel receives star grade scores of intelligent suggestions from users through a user interface, the scores range is 1-5 grade, the implicit channel records the gaze point, glance path and pupil change of the users in the operation process through eye movement tracking, records the stay time of the users on specific operation or interface elements, calculates cognitive load indexes such as gaze times, average stay time and the like through eye movement and stay time length data, converts the scores and the cognitive load indexes into numeric feedback signals, and inputs the numeric feedback signals as reward functions of reinforcement learning;
s402, designing a reward function by combining an explicit feedback signal and an implicit feedback signal, wherein the explicit feedback signal is directly used as a reward value, and the implicit feedback signal indirectly influences rewards through cognitive load indexes;
S403, calculating gradient update of a behavior-file multi-mode joint probability model strategy network by adopting a PPO algorithm, and ensuring training stability by means of truncated strategy update, wherein the PPO algorithm adopts multi-objective optimization, optimizes a user operation path by minimizing path entropy, improves operation efficiency, improves unidentifiability of sensitive operation by maximizing sensitive operation confusion degree, and protects user privacy;
S5, converting a decision process and a data relationship of the behavior-file multi-mode joint probability model into an intuitive visual view by using a thinking chain visual designer, helping a user understand decision logic and behavior modes of the system, improving the interpretability and user trust degree of the system, guiding adjustment and optimization of a reinforcement learning algorithm, and forming a closed-loop optimization process, wherein the method comprises the following steps:
S501, constructing a decision traceability model based on a multi-head attention mechanism, fusing time sequence behavior data with system state characteristics, tracking and recording the forming process of each decision generated by the reinforcement learning center, including key influence factors of the decision, context information during the decision and the like, so as to construct a complete decision chain;
S502, reconstructing a decision path, namely reconstructing the decision path by adopting an LSTM network weighted by a time attenuation factor, modeling and analyzing time sequence data in a decision process, highlighting decision trend and mode which change along with time, and helping to understand the evolution process of the decision;
S503, causal relation analysis, namely generating an interpretable view with causal relation by combining a knowledge graph technology, correlating each event and operation in the decision process with a result generated by the event and operation to form a causal relation graph, and revealing logic and motivation behind the decision;
S504, visual output, specifically comprising the following steps:
S50401, showing operation mode distribution, showing operation frequencies and modes of users in different time and different scenes, and helping to identify high-frequency operation areas and user behavior habits;
S50402, a file association network, wherein an implicit knowledge structure is revealed, association relations among files are displayed, and the method comprises direct reference, indirect association, content similarity and the like, so that potential knowledge structures and information flows can be found;
S50403, a strategy evolution time axis is used for showing a learning process, presenting the evolution and optimization process of a reinforcement learning strategy along with time, including key nodes for strategy adjustment, the change trend of performance indexes and the like, and helping to evaluate the learning effect and the convergence of the strategy;
and S505, feeding back and optimizing, namely feeding back a visual result to the reinforcement learning center to provide a reference for further optimization of the strategy, and finding out potential problems and improvement spaces in the strategy by analyzing visual output to guide adjustment and optimization of the reinforcement learning algorithm so as to form a closed-loop optimization process.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims (7)

Translated fromChinese
1.一种基于国产化操作系统的智能体服务系统,其特征在于,该系统是融合操作行为感知与强化学习技术,通过操作感知引擎、人类反馈强化学习中枢、MCP协议适配器以及思维链可视化设计器形成感知-建模-优化-解释的完整技术闭环,实现智能体从被动执行向主动协同的范式跃迁;1. An intelligent agent service system based on a domestically produced operating system. This system integrates operational behavior perception and reinforcement learning technologies. Through an operational perception engine, a human feedback reinforcement learning hub, an MCP protocol adapter, and a mind chain visual designer, it forms a complete closed-loop technology of perception-modeling-optimization-interpretation, enabling a paradigm shift from passive execution to active collaboration for intelligent agents.其中,通过操作感知引擎实时捕获并解析用户行为数据,获取操作序列数据,将操作序列数据作为特征输入到强化学习中枢,强化学习中枢输出操作指令,MCP协议适配器将强化学习中枢推荐的操作指令通过标准化的接口与不同的数据源和服务进行交互,获取操作指令对应的结果,并将外部系统的反馈传递回人类反馈强化学习中枢,不断优化人类反馈强化学习中枢的策略网络;思维链可视化设计器将人类反馈强化学习中枢复杂的决策过程和数据关系转换为直观的可视化视图,并指导人类反馈强化学习中枢的调整和优化,形成闭环优化过程;Among them, the operation perception engine captures and analyzes user behavior data in real time, obtains operation sequence data, and inputs the operation sequence data as features into the reinforcement learning center. The reinforcement learning center outputs operation instructions. The MCP protocol adapter interacts with different data sources and services through standardized interfaces for the operation instructions recommended by the reinforcement learning center to obtain the results corresponding to the operation instructions, and transmits the feedback of the external system back to the human feedback reinforcement learning center, continuously optimizing the strategy network of the human feedback reinforcement learning center; the thinking chain visual designer converts the complex decision-making process and data relationship of the human feedback reinforcement learning center into an intuitive visual view, and guides the adjustment and optimization of the human feedback reinforcement learning center to form a closed-loop optimization process;强化学习中枢包括:The reinforcement learning hub includes:模型训练模块,用于进行行为-文件多模态联合概率模型训练;Model training module, used for behavior-document multimodal joint probability model training;优化引擎建立模块,用于建立双通道反馈驱动的策略优化引擎,实现行为-文件多模态联合概率模型的优化;The optimization engine building module is used to build a dual-channel feedback-driven policy optimization engine to optimize the behavior-document multimodal joint probability model;隐私保护与安全模块,用于利用隐私保护与安全机制实现强化学习中枢中的隐私保护;Privacy and security module, which is used to implement privacy protection in reinforcement learning hubs using privacy and security mechanisms;模型训练模块工作过程具体如下:The working process of the model training module is as follows:(1)对每个操作行为分别计算相应的操作频率、持续时间和路径复杂度三个维度的特征值;其中,操作频率是指统计用户在特定时间段内对每个操作行为的执行次数,反映用户对不同操作行为的使用频率;持续时间是指记录用户在每个操作行为上花费的时间,体现用户对不同操作的关注程度和投入时间的多少;路径复杂度是指分析用户在执行操作时的路径复杂程度,衡量用户操作路径的复杂性;(1) Calculate the corresponding characteristic values of the three dimensions of operation frequency, duration and path complexity for each operation behavior; among them, operation frequency refers to the number of times the user performs each operation behavior in a specific time period, reflecting the frequency of users' use of different operation behaviors; duration refers to the time the user spends on each operation behavior, reflecting the degree of attention and time invested by the user in different operations; path complexity refers to analyzing the complexity of the path when the user performs the operation, measuring the complexity of the user's operation path;(2)将用户的不同操作行为视为词汇,将用户的一系列操作序列视为文档,利用TF-IDF算法量化用户对不同类型操作行为的偏好权重W,具体公式如下;(2) Considering different user operations as words and a series of user operations as documents, the TF-IDF algorithm is used to quantify the user's preference weight W for different types of operations. The specific formula is as follows; ;其中,表示词汇t在文档Di中出现次数;表示文档中所有词汇的词数;N表示文档总数;表示文档是否包含词汇t,若包含为1,不包含为0;in, Indicates the number of times the word t appears in the document Di; Represents a document The number of words in all the vocabulary; N represents the total number of documents; Represents a document Whether the word t is included, if included, it is 1, if not included, it is 0;(3)利用用户对不同类型操作行为的偏好权重和三维特征值,将所有操作序列数据转化为结构化的特征向量,作为行为-文件多模态联合概率模型训练的输入;(3) Using the user’s preference weights and three-dimensional feature values for different types of operation behaviors, all operation sequence data are converted into structured feature vectors as input for training the behavior-file multimodal joint probability model;(4)通过动态贝叶斯网络计算操作行为与文件访问行为的联合概率分布,将用户在不同时间点的操作序列与文件访问行为建模为条件概率分布,捕捉操作行为与文件访问之间的因果关系;(4) The joint probability distribution of operation behavior and file access behavior is calculated through a dynamic Bayesian network, and the user's operation sequence and file access behavior at different time points are modeled as a conditional probability distribution to capture the causal relationship between operation behavior and file access;(5)将结构化的特征向量输入多层神经网络,多层神经网络输出为操作建议的概率分布,采用监督学习方法,通过历史行为数据训练行为-文件联合概率模型,构建操作序列与文件访问的时空关联,动态更新概率分布以反映用户行为的时序性和上下文依赖性;(5) The structured feature vector is input into a multi-layer neural network, and the output of the multi-layer neural network is the probability distribution of the operation suggestion. A supervised learning method is used to train the behavior-file joint probability model through historical behavior data, construct the spatiotemporal association between operation sequence and file access, and dynamically update the probability distribution to reflect the temporal nature and contextual dependence of user behavior;(6)通过交叉验证和指标评估验证行为-文件多模态联合概率模型性能,确保模型的泛化能力;(6) Verify the performance of the behavior-document multimodal joint probability model through cross-validation and indicator evaluation to ensure the generalization ability of the model;优化引擎建立模块的工作过程具体如下:The working process of the optimization engine establishment module is as follows:(1)设计反馈通道,反馈通道包括显式反馈通道和隐式反馈通道;其中,显式通道通过用户界面接收用户对智能建议的星级评分,评分范围为1-5级;隐式通道通过眼动追踪记录用户在操作过程中的注视点、扫视路径和瞳孔变化,并记录用户在特定操作或界面元素上的停留时间,通过眼动和停留时长数据计算认知负荷指标;再将评分与认知负荷指标转化为数值化的反馈信号,作为强化学习的奖励函数输入;(1) Design a feedback channel, which includes an explicit feedback channel and an implicit feedback channel. The explicit channel receives the user's star rating of the intelligent suggestion through the user interface, with a rating range of 1-5. The implicit channel records the user's gaze point, scanning path, and pupil changes during the operation through eye tracking, and records the user's dwell time on a specific operation or interface element. The cognitive load index is calculated based on the eye movement and dwell time data. The rating and cognitive load index are then converted into numerical feedback signals as the input of the reward function of reinforcement learning.(2)结合显式和隐式反馈信号设计奖励函数,显式反馈直接作为奖励值,隐式反馈通过认知负荷指标间接影响奖励;(2) Designing a reward function by combining explicit and implicit feedback signals, where explicit feedback directly serves as the reward value, and implicit feedback indirectly affects the reward through cognitive load indicators;(3)采用PPO算法计算行为-文件多模态联合概率模型策略网络的梯度更新,通过截断策略更新确保训练的稳定性;其中,PPO算法采用多目标优化,通过最小化路径熵优化用户操作路径,并通过最大化敏感操作混淆度,提高敏感操作的不可识别性,保护用户隐私。(3) The PPO algorithm is used to calculate the gradient update of the behavior-file multimodal joint probability model policy network, and the stability of training is ensured by truncating the policy update. Among them, the PPO algorithm adopts multi-objective optimization to optimize the user operation path by minimizing the path entropy, and to improve the unrecognizableness of sensitive operations by maximizing the confusion of sensitive operations, thereby protecting user privacy.2.根据权利要求1所述的基于国产化操作系统的智能体服务系统,其特征在于,操作感知引擎包括:2. The intelligent agent service system based on a domestic operating system according to claim 1, wherein the operation perception engine comprises:操作序列数据获取模块,用于基于国产操作系统内核级钩子机制实时捕获用户操作行为,形成操作序列数据;其中,用户操作行为包括GUI操作事件和文件系统访问轨迹;GUI操作事件包括窗口焦点切换、控件点击及快捷键触发;文件系统访问轨迹包括创建、读写及删除操作;The operation sequence data acquisition module is used to capture user operation behaviors in real time based on the kernel-level hook mechanism of the domestic operating system to form operation sequence data; user operation behaviors include GUI operation events and file system access traces; GUI operation events include window focus switching, control clicks, and shortcut key triggering; file system access traces include create, read, write, and delete operations;用户多维画像构建模块,用于通过事件溯源技术构建用户多维行为画像,实现操作语义的上下文关系解析。The user multi-dimensional portrait construction module is used to build a multi-dimensional user behavior portrait through event tracing technology and realize the contextual relationship analysis of operational semantics.3.根据权利要求1或2所述的基于国产化操作系统的智能体服务系统,其特征在于,操作感知引擎还具有如下功能:3. The intelligent agent service system based on a domestic operating system according to claim 1 or 2, wherein the operation perception engine further has the following functions:①支持多设备同步感知:移动端、桌面端、云端操作行为的统一捕获;①Supports multi-device synchronous perception: unified capture of mobile, desktop, and cloud operation behaviors;②增加异常行为检测功能:非典型操作模式的实时告警;②Add abnormal behavior detection function: real-time alarm for atypical operation modes;③提供行为数据的加密存储与传输,确保数据安全;③ Provide encrypted storage and transmission of behavioral data to ensure data security;④支持插件化扩展,允许第三方开发者接入自定义行为感知规则。④Support plug-in extensions, allowing third-party developers to access customized behavior perception rules.4.根据权利要求1所述的基于国产化操作系统的智能体服务系统,其特征在于,隐私保护与安全模块的工作过程具体如下:4. The intelligent agent service system based on a domestic operating system according to claim 1 is characterized in that the working process of the privacy protection and security module is as follows:(1)在反向传播过程中,利用梯度掩码技术对涉及敏感数据的梯度进行掩码处理,确保隐私数据不被泄露;(1) During the back-propagation process, gradient masking technology is used to mask gradients involving sensitive data to ensure that private data is not leaked;(2) 通过添加噪声或变换特征向量,混淆敏感操作的行为特征,降低敏感操作的可识别性;(2) By adding noise or transforming feature vectors, the behavioral characteristics of sensitive operations are obfuscated, thereby reducing the identifiability of sensitive operations;(3)定期评估系统的隐私保护效果,确保隐私保护机制的有效性。(3) Regularly evaluate the privacy protection effect of the system to ensure the effectiveness of the privacy protection mechanism.5.根据权利要求1所述的基于国产化操作系统的智能体服务系统,其特征在于,MCP协议适配器包括协议网关部署模块和动态服务发现模块;5. The intelligent agent service system based on a domestic operating system according to claim 1, wherein the MCP protocol adapter includes a protocol gateway deployment module and a dynamic service discovery module;其中,协议网管部署模块用于采用客户端-服务器架构部署MCP协议网关;客户端用于接收强化学习中枢生成的操作策略,服务器端为外部数据源和工具,客户端和服务器之间进行功能协商,确定客户端与服务器相互提供功能和服务;协议网管部署模块集成JSON-RPC 2.0标准协议,并支持两种通信模式,具体如下:The protocol network management deployment module is used to deploy the MCP protocol gateway using a client-server architecture. The client is used to receive the operation strategy generated by the reinforcement learning center, and the server is an external data source and tool. The client and server negotiate functions to determine the functions and services provided by the client and server to each other. The protocol network management deployment module integrates the JSON-RPC 2.0 standard protocol and supports two communication modes, as follows:①本地管道模式:实现<10ms低延迟响应,适合处理本地操作行为数据;① Local pipeline mode: achieves low latency response of <10ms, suitable for processing local operation behavior data;②网络流模式:支撑高并发调用,适合处理分布式系统中的行为数据;② Network flow mode: supports high-concurrency calls and is suitable for processing behavioral data in distributed systems;动态服务发现模块用于通过自动扫描机制识别可用MCP服务器,可用MCP服务器包括本地IDE插件、企业ERP系统接口及云端AI服务;客户端根据用户请求或AI模型的需要,向服务器发送请求,服务器处理用户请求,并可能与本地或远程资源进行交互;在操作执行完成后,服务器将处理结果返回给客户端,客户端再将信息传递回主机应用程序;再采用URI动态模板实现参数化资源定位,持JSON-RPC 2.0标准协议,确保服务发现的灵活性和动态性。The dynamic service discovery module is used to identify available MCP servers through an automatic scanning mechanism. Available MCP servers include local IDE plug-ins, enterprise ERP system interfaces, and cloud-based AI services. The client sends a request to the server based on the user request or the needs of the AI model. The server processes the user request and may interact with local or remote resources. After the operation is completed, the server returns the processing result to the client, and the client then passes the information back to the host application. URI dynamic templates are then used to implement parameterized resource positioning, supporting the JSON-RPC 2.0 standard protocol to ensure the flexibility and dynamism of service discovery.6.根据权利要求1所述的基于国产化操作系统的智能体服务系统,其特征在于,MCP协议适配器具有如下功能:6. The agent service system based on a domestic operating system according to claim 1, wherein the MCP protocol adapter has the following functions:①支持跨平台兼容性:Windows、Linux、macOS、移动端;①Support cross-platform compatibility: Windows, Linux, macOS, and mobile terminals;②支持协议版本管理功能,支持不同版本的MCP协议无缝切换;②Support protocol version management function and support seamless switching of different versions of MCP protocols;③提供服务健康监测功能,实时监控MCP服务器的可用性;③ Provide service health monitoring function to monitor the availability of MCP servers in real time;④支持服务熔断机制,避免因单点故障导致系统崩溃。④Support service circuit breaker mechanism to avoid system crash due to single point failure.7.根据权利要求1所述的基于国产化操作系统的智能体服务系统,其特征在于,思维链可视化设计器包括:7. The intelligent agent service system based on a domestic operating system according to claim 1, wherein the thinking chain visual designer includes:决策溯源模型构建模块,用于基于多头注意力机制构建决策溯源模型,融合时序行为数据与系统状态特征,追踪并记录强化学习中枢生成的每个决策的形成过程,构建完整的决策链条;其中,强化学习中枢生成的每个决策的形成过程包括决策的关键影响因素及决策时的上下文信息;The decision traceability model construction module is used to build a decision traceability model based on the multi-head attention mechanism, integrating time-series behavior data with system state characteristics, tracking and recording the formation process of each decision generated by the reinforcement learning center, and building a complete decision chain. The formation process of each decision generated by the reinforcement learning center includes the key influencing factors of the decision and the contextual information at the time of decision-making;决策路径重构模块,用于采用时间衰减因子加权的LSTM网络重构决策路径,对决策过程中的时序数据进行建模和分析,突出显示随时间变化的决策趋势和模式,帮助理解决策的演变过程;The decision path reconstruction module is used to reconstruct the decision path using an LSTM network weighted by a time decay factor. It models and analyzes the time series data in the decision-making process, highlights the decision trends and patterns that change over time, and helps understand the evolution of decisions.因果关联分析模块,用于结合知识图谱技术生成具备因果关联的可解释视图,将决策过程中的各个事件和操作与其产生的结果进行关联,形成因果关系图,揭示决策背后的逻辑和动机;The causal relationship analysis module is used to combine knowledge graph technology to generate an explainable view of causal relationships. It associates various events and operations in the decision-making process with their resulting results to form a causal relationship diagram, revealing the logic and motivation behind the decision;可视化输出模块,用于获取行为热力图、文件关系网络及策略演变时间轴;其中,行为热力图用于呈现操作模式分布,展示用户在不同时间、不同场景下的操作频率和模式,帮助识别高频操作区域和用户行为习惯;文件关联网络用于揭示隐性知识结构,展示文件之间的直接引用、间接关联、内容相似性的关联关系,帮助发现潜在的知识结构和信息流动;策略演变时间轴用于展示学习进程,呈现强化学习策略随时间的演变和优化过程,帮助评估学习效果和策略的收敛性;强化学习策略随时间的演变和优化过程包括策略调整的关键节点、性能指标的变化趋势;The visualization output module is used to obtain behavior heat maps, file relationship networks, and strategy evolution timelines. The behavior heat map is used to display the distribution of operation modes, showing the frequency and patterns of users' operations at different times and in different scenarios, helping to identify high-frequency operation areas and user behavior habits. The file association network is used to reveal implicit knowledge structures, showing the direct references, indirect associations, and content similarity between files, helping to discover potential knowledge structures and information flows. The strategy evolution timeline is used to display the learning process, showing the evolution and optimization process of the reinforcement learning strategy over time, helping to evaluate the learning effect and the convergence of the strategy. The evolution and optimization process of the reinforcement learning strategy over time includes the key nodes of strategy adjustment and the changing trends of performance indicators.反馈与优化模块,用于将可视化结果反馈到强化学习中枢,为强化学习策略的进一步优化提供参考;并通过分析可视化输出,发现强化学习策略中的潜在问题和改进空间,指导强化学习算法的调整和优化,形成闭环优化过程。The feedback and optimization module is used to feed back the visualization results to the reinforcement learning center to provide a reference for further optimization of the reinforcement learning strategy. By analyzing the visualization output, it can discover potential problems and improvement space in the reinforcement learning strategy, guide the adjustment and optimization of the reinforcement learning algorithm, and form a closed-loop optimization process.
CN202510503462.3A2025-04-222025-04-22Intelligent body service system based on domestic operating systemActiveCN120029517B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510503462.3ACN120029517B (en)2025-04-222025-04-22Intelligent body service system based on domestic operating system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510503462.3ACN120029517B (en)2025-04-222025-04-22Intelligent body service system based on domestic operating system

Publications (2)

Publication NumberPublication Date
CN120029517A CN120029517A (en)2025-05-23
CN120029517Btrue CN120029517B (en)2025-08-19

Family

ID=95737900

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510503462.3AActiveCN120029517B (en)2025-04-222025-04-22Intelligent body service system based on domestic operating system

Country Status (1)

CountryLink
CN (1)CN120029517B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120338288A (en)*2025-06-182025-07-18北京城建设计发展集团股份有限公司 Construction method of intelligent operation and maintenance service of urban rail transit based on MCP
CN120336048A (en)*2025-06-202025-07-18北京携云启源科技有限公司 Bioinformatics MCP service calling method, system, device and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117632087A (en)*2023-11-202024-03-01浪潮软件科技有限公司User experience optimization system and method based on machine learning
CN119311943A (en)*2024-09-242025-01-14重庆师范大学 Digital human intelligent recommendation and decision-making system based on user behavior and context awareness

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11809958B2 (en)*2020-06-102023-11-07Capital One Services, LlcSystems and methods for automatic decision-making with user-configured criteria using multi-channel data inputs
CN119024723A (en)*2023-05-242024-11-26智昌科技集团股份有限公司 Human-machine collaborative intelligent control method, system and storage medium based on AIGC
US20240427789A1 (en)*2023-06-262024-12-26Ingram Micro Inc.Single pane of glass mobile application including erp agnostic realtime data mesh with data change capture
CN119007942A (en)*2024-07-252024-11-22浪潮云信息技术股份公司Large-model-based emotion intelligent intervention and personalized recommendation method and system for medical industry
CN119458331A (en)*2024-11-142025-02-18广州里工实业有限公司 Robot autonomous programming system and method based on reinforcement learning
CN119377997A (en)*2024-12-252025-01-28中国标准化研究院 A standard electronic archive management method and system based on artificial intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117632087A (en)*2023-11-202024-03-01浪潮软件科技有限公司User experience optimization system and method based on machine learning
CN119311943A (en)*2024-09-242025-01-14重庆师范大学 Digital human intelligent recommendation and decision-making system based on user behavior and context awareness

Also Published As

Publication numberPublication date
CN120029517A (en)2025-05-23

Similar Documents

PublicationPublication DateTitle
CN120029517B (en)Intelligent body service system based on domestic operating system
US20250077915A1 (en)A chatbot for defining a machine learning (ml) solution
CN120693607A (en) Generative AI Enterprise Search
US10832457B2 (en)Interface for data analysis
AU2017348460A1 (en)Systems and methods for monitoring and analyzing computer and network activity
US20210136096A1 (en)Methods and systems for establishing semantic equivalence in access sequences using sentence embeddings
US20240370709A1 (en)Enterprise generative artificial intelligence anti-hallucination and attribution architecture
CA3211911A1 (en)Systems and methods for creating, training, and evaluating models, scenarios, lexicons, and policies
Ieva et al.A retrieval-augmented generation approach for data-driven energy infrastructure digital twins
KotaruAdapting foundation models for operator data analytics
CN118586849A (en) Data processing methods
Botega et al.Quality-aware human-driven information fusion model
Shen et al.Neural network-based log anomaly detection algorithm for 6G wireless integrated cyber-physical system
Sager et al.Ai agents for computer use: A review of instructionbased computer control, gui automation, and operator assistants
Kiefer et al.Vonda: A framework for ontology-based dialogue management
Chen et al.From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
Di Noia et al.Formal model for user‐centred adaptive mobile devices
Xue et al.A Conceptual Architecture for Adaptive Human‐Computer Interface of a PT Operation Platform Based on Context‐Awareness
CN119519831B (en) A communication scheduling large-screen monitoring method and system based on large model intelligent agent
EP4471641A1 (en)Securing large language model output by propagating permissions
Mangla et al.Localized Intelligence with Built In Confidentiality: A Policy Aligned Framework for Privacy Aware TinyML Systems
Chun et al.Collective Intelligence for Smart Cities
Yu et al.Building customizable context-aware systems
AgrawalData Lineage as a Pillar of FATE: The AI Provenance Solution
Gal et al.Uncertainty in streams

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp