Detailed Description
The technical solutions of the embodiments of the present specification are explained and illustrated below with reference to the drawings of the embodiments of the present specification, but the following embodiments are only preferred embodiments of the present specification, and not all the embodiments. Based on the examples in the implementation manner, those skilled in the art may obtain other examples without making any creative effort, which fall within the protection scope of the present specification.
The terms first, second, third and the like in the description and in the claims and in the above drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In the following description, directional or positional relationships such as the terms "inner", "outer", "upper", "lower", "left", "right", etc., are presented merely to facilitate describing the embodiments and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the description.
The data related to the application are information and data authorized by the user or fully authorized by all parties, and the collection of the related data complies with related laws and regulations and standards of related countries and regions.
Before describing the technical scheme, the application scene of the technical scheme and related technology are described.
A causal link is a model or tool describing causal relationships between events, which explains the occurrence and development of a certain phenomenon through a series of reasons and results, and in the field of software development, particularly in front-end applications, the causal link focuses on direct association between user operations and application state changes, and a causal link-based state management method can help a developer understand and implement how to update an application state through a series of user operations such as clicking buttons, submitting forms, and the like, for example, redux is a popular state management library, manages state changes through concepts of actions and reducers, and has clear causal relationships between actions and reducers, focuses on causal relationships between user interactions and events inside the system, so as to better manage and optimize the state and behavior of the front-end application.
However, the conventional framework only provides state management, cannot actively discover causal logic loopholes, and may cause causal link breakage or failure, which refers to the fact that in a software system, the association between events such as user behaviors and results such as state changes is lost due to various reasons, and the system cannot be ensured to execute according to expected logic, so that inconsistency and unpredictability of the system may be caused, and user experience and system reliability are seriously affected.
Specific phenomena of causal link breakage or failure are that the user operation does not produce expected effects, such as that after the user clicks a button, the expected state change does not occur or an error state change occurs, an asynchronous request fails or delays, network delay or server response time out, so that state update depending on the responses cannot be executed correctly, data inconsistency, problems occur in data synchronization among different modules due to the breakage of the causal link, inconsistent data in the system occurs, and problems are difficult to track, and the problem root is that effective logging or monitoring tools are lacked, so that positioning of the problem becomes difficult.
Cause of cause and effect chain breakage or failure are improper management of asynchronous operations such as API requests, if proper mechanisms are not available to ensure successful completion of the requests and necessary state updating is performed based on the request results, cause of cause and effect chain breakage, logic defects such as logic errors such as conditional branch omission, improper exception handling and the like, cause of specific user behaviors to not trigger correct state change, memory leakage such as available memory depletion of long-running application programs with memory leakage, further cause of historical cause and effect chain data loss, influence on consistency and traceability of states, and incorrect binding of event listeners such as application in dynamic environments, cause of cause and effect chain breakage if event listeners do not bind or unbind correctly, and miss some important user interaction events.
For these phenomena, manual handling of asynchronous logic is generally required, as is required by Vuex/reduce frameworks, which increases the risk of human error, because causal link integrity is entirely dependent on the developer's code quality, lacks automatic verification mechanisms, and moreover, current troubleshooting of such problems relies primarily on journaling and breakpoint debugging, which is time consuming and inefficient, especially in complex asynchronous scenarios where it is more difficult to locate the problem.
For this reason, first, a front-end state anomaly detection method based on a causal chain needs to be studied, please refer to fig. 1, which is a system diagram of the technology for causal chain integrity detection, and shows a scheme in two stages, firstly, in a static analysis stage, the main purpose of the stage is to identify and understand the interaction relationship between each component through analysis of source codes, and to process the call relationship between key elements, and under the condition of not running a program, help a developer identify design defects or logic errors which may cause problems in advance, and track the actual behavior of the program in real time during the program running, so as to verify whether the actually occurring causal chain accords with the data collected in a static analysis stage, if deviation occurs, it may mean that an anomaly condition exists, and further investigation is needed to detect whether the causal chain fails or breaks during the program running.
In view of the many terms involved in the present application, these terms will be described first.
A causal chain model refers to a logical structure that describes or predicts the behavior of a system by analyzing causal relationships between various components within the system, and is used to identify how different parts of the system interact and how these interactions affect the overall performance and stability of the system.
AI (artificial intelligence): in the application, AI analyzes a large amount of data generated during front-end application running through an algorithm and a model, identifies abnormal modes, and automatically proposes or implements a restoration strategy based on a knowledge base trained in advance or learned in real time, thereby ensuring that the application state meets the requirements of an expected causal chain model.
Confidence and risk level, wherein the confidence refers to the judgment reliability of the AI model on the repair strategy, the risk level represents the level of negative influence possibly brought by the repair strategy, and the risk assessment in AI repair refers to the logic of information security risk assessment.
Specifically, referring to fig. 2 and 3, the present disclosure first provides a causal chain-based front end status anomaly detection method, which includes the following steps:
step S1, static analysis, namely identifying calling relations among an event monitor, an asynchronous task and state change logic by analyzing source codes, and constructing an expected causal chain model;
Step S2, dynamically verifying whether the actually-occurring causal link accords with an expected model or not during the running of the application.
Wherein, in step S1, the step of constructing the expected causal chain model comprises:
Step A1, extracting key interaction nodes in codes through static code analysis, providing basic data for causal modeling, wherein the basic data comprises two parts of AST analysis and key node extraction, and the specific contents are as follows:
AST analysis, namely generating Abstract Syntax Tree (AST) by using a Babel or TYPESCRIPT COMPILER API compiler on front-end source codes, and realizing depth code analysis by combining plug-in modules of ESLint or SonarJS;
the key node extraction comprises the following steps:
An event listener, which identifies all event binding codes, such as addEventListener, onClick, e.g., document. GetElementById ('btn')addEventListener ('click', handler), extracts information as event type click, target element btn, callback function handler;
Asynchronous tasks, including displaying asynchronous and implicit asynchronous, identifying asynchronous function calls such as fetch, setTimeout, axios. Get, examples are fetch ('/api/data'). Then (data= > setState (data)), extracted information is asynchronous type fetch, uniform Resource Locator (URL)/api/data, callback function setState;
state change-either identify the dispatch of state update logic like reduce, vuex, or direct state assignment, example: store. Dispatch ({ type: 'SET_DATA', payload: DATA }), extract information as action type SET_DATA, trigger condition like DATA is valid.
Step A2, mapping the extracted nodes into a causal relation graph, and dividing the hierarchy according to a TRIZ method, wherein the method specifically comprises the following steps:
Defining node types:
Event nodes, representing user interactions or system events, such as click, submit;
Asynchronous nodes, representing asynchronous operations, such as fetch, setTimeout;
A status node, representing a status change, such as SET_DATA, SET_ LOADING;
Interface node-mapping reflecting state to interface, e.g., render, updateUI.
Defining edge types:
trigger edge, which represents event triggering asynchronous operation such as click→fetch;
dependent edges-representing that the result of an asynchronous operation depends on a state change, such as fetch→SET_DATA;
Feedback edge, which is to feedback to user interface after state change, such as SET_DATA→render;
hierarchy edges-parent nodes that label the causal link hierarchy and dependencies, such as SET_DATA, are fetch.
Constructing a causal graph:
directed Acyclic Graphs (DAGs) that ensure paths are unique and loop-free;
Hierarchical annotation, namely, hierarchical annotation is a logic hierarchy of the DAG, is used for defining the depth and the priority of a causal chain, comprises marking a direct cause and a root cause, and is combined with defect analysis deduction of TRIZ;
AND/OR relation, wherein the part is logic condition constraint among DAG nodes AND is used for describing preconditions of causal trigger, in the AND relation, a plurality of conditions must be satisfied simultaneously, such as DATA valid AND API success- & gt SET_DATA, OR relation, any condition is satisfied, such as user login OR API caching- & gt DATA display.
And (3) visualization:
The node type and edge relationships are annotated using cytoscape.js or Mermaid to generate a causal graph.
And step A3, verifying a rule set and optimizing a model, and optimizing the integrity of a causal chain model through a static rule and a dynamic model, wherein the method specifically comprises the following steps of:
static rule verification, namely checking error processing and code specification by using ESLint plug-in units, performing static analysis on codes by using TYPESCRIPT compiler verification type definition, checking DAG legality, hierarchical relationship, loops and the like by using graph analysis tools such as networkx, and realizing verification of a model structure;
dynamic model enhancement:
Causal Machine Learning (CML) using an Interactive Regression Model (IRM) to analyze code structures to find implicit causal relationships in a data-driven manner, such as by IRM inferring the association of the setTimeout threshold with the status update delay;
TRIZ causal chain analysis, namely dividing the causal chain into direct reasons such as API failure and root causes such as server configuration errors;
adding hidden risk nodes, predicting potential risks and optimizing model paths, such as network delay, timeout and state non-update;
Type system integration-ensuring that the state and data type meet expectations in dynamic execution, avoiding implicit errors, verifying the state conditions with typeguard of TYPESCRIPT.
And A4, generating a structured model.
Output format, JSON/YAML, comprising node, edge, rule, hierarchical relationship, structured representation causal chain graph.
Wherein in step S2, the step of verifying the causal link comprises:
And step B1, hijacking the key function through code injection or proxy technology.
Wherein hijacking the target includes:
event handlers such as user interaction events, frame hooks;
asynchronous functions such as network requests, timers, promise chains;
state change functions such as setState of dispatch, react of Redux, commit of state management library.
The hijacking method comprises the following steps:
Proxy mechanism, by intercepting function call such as new Proxy (fn, handler), inserting monitoring logic before and after call, recording parameter, return value and time stamp;
Function wrapping, namely, inserting monitoring codes before and after the original function is executed by rewriting the function, such as originalFn =. Fn= wrapper (originalFn);
the Hook technology is that in node. Js or original environment, the bottom layer functions such as socket and open are hijacked by LD_PRELOAD or dtrace, and the bottom layer operation path is recorded.
And step B2, recording an actual execution path, wherein the recorded content comprises context information and key data, setting a data structure, constructing a complete execution path chain, converting the path into an actual DAG, wherein nodes represent operations, and edges represent causal relationships.
Wherein, the key data of record includes:
Event triggering, including event types such as click, keydown, submit, trigger time, time stamp accurate to millisecond, target element, DOM element ID or path of trigger event, user behavior context such as click coordinates, input content, scroll position;
Asynchronous operation, including asynchronous type, key parameters and execution results.
The state change comprises an action type, a state value, a changed state snapshot such as loading, true, a trigger source and an associated preamble function or event.
The recorded context information comprises network state, equipment information, environment snapshot and the like.
The data structure is a chain hash structure, each record generates a unique hash value and is linked to the previous record to form a non-tamperable chain structure, so that the recorded data can be ensured not to be tampered, and a reliable basis is provided for comparison.
And B3, setting a verification mechanism, and comparing the actual path with an expected causal chain model in real time through the verification mechanism to detect whether the causal chain is broken or invalid, wherein the verification mechanism comprises graph structure comparison and anomaly detection.
The graph structure comprises the following components in comparison with the prior art:
Node matching, checking whether the actual node type is consistent with the expected one, such as whether fetch is marked as an asynchronous operation;
verifying whether the triggering relationship accords with the expectation, such as whether the fetch is triggered by a click event;
the condition constraint is that whether the node execution condition is met or not is checked, if the SET_DATA is triggered after the fetch is successful;
Order verification-ensuring that the node execution order conforms to the topological ordering of the DAG, e.g., fetch must precede set_data.
The abnormality detection specifically includes:
Explicit exceptions include break points where the actual path lacks an intended node, e.g., fetch does not trigger SET_DATA, unintended nodes where undefined operations occur, e.g., SET_ERROR state changes that are not declared in the model, and order ERRORs where the nodes execute an order that conflicts with the model, e.g., state updates precede asynchronous operations.
Implicit anomalies include causal reasoning, analyzing causal relationships of paths by Causal Machine Learning (CML), detecting implicit fractures, such as the success of fetch but not triggering SET_DATA, possibly due to DATA format errors, pattern recognition, learning historical anomaly patterns by a Graph Neural Network (GNN), predicting potential fracture points, such as the fetch returning 200 but not updated status marking as a "DATA unresolved" anomaly.
On the other hand, when a causal link abnormality is detected, the present application proposes to use AI to assist in repairing, please refer to fig. 4, which specifically includes the following steps:
Step C1, after detecting the cause and effect chain abnormality, if detecting the explicit or implicit abnormality of the step B3, triggering a repair flow, constructing context characteristics comprising multi-dimensional data, including code paths, state snapshots, environment data, user behaviors and the like, then compressing the high-dimensional characteristics through PCA or hash coding, splicing the compressed characteristics into unified input vectors, and providing comprehensive input for AI reasoning;
And step C2, combining a rule engine such as Drools with a lightweight model such as an LSTM model, locating the root cause of the fracture, mapping the root cause to TRIZ defect classification, generating a repair template, and providing basis for generating a repair strategy.
And step C3, setting a repair strategy library, setting a risk level for each repair strategy, selecting the repair strategy by using an AI model, and evaluating the feasibility of the repair strategy, specifically, using a model based on TensorFlow. Js, inputting the compressed feature vector of the step C1, outputting strategy probability, namely confidence degree distribution, setting a reliability comprehensive scoring formula, balancing the risk and the confidence degree by the comprehensive scoring formula, and carrying out priority ranking on the strategies to ensure the feasibility and the safety of the repair.
Specifically, the reliability comprehensive scoring formula is that comprehensive scoring=confidence factor× (1-risk level×α), wherein α is a constant between (0, 1), the influence weight of the risk level on scoring is controlled, and the value of α can be optimized and adjusted in real time according to historical data or cases.
Examples of repair policy library and corresponding risk level and confidence are:
The strategy 1 is that the automatic retry is performed, the confidence coefficient is 0.8, if the LSTM prediction continuous timeout probability is high, the risk level is 1, the risk is low, and only the retry operation is performed;
Policy 2, inserting a state update code, wherein the confidence coefficient is 0.75, such as the condition absence of matching of a rule engine, the risk level is 2, and the code change is related to the risk;
and 3, triggering a fusing mechanism, wherein the confidence coefficient is 0.65, and the risk level is 3, namely the high risk.
And step C4, previewing the repair strategy in a sandbox environment to ensure feasibility and safety, wherein the method specifically comprises the following steps of:
WebAssembly (Wasm) simulation, namely simulating a repaired code execution path in a browser by Wasm, simulating a network environment and user behaviors, checking whether a critical path is closed or not, and recording the repaired response time;
AST conversion verification, namely analyzing an original code by using an AST tool chain of Babel, inserting a repair code, and ensuring that a patch has no grammar errors through ESLint or a grammar verifier of Babel;
and a failure rollback mechanism, namely if the sandbox verification fails, returning the strategy priority, such as strategy 1 to strategy 2.
And step C5, setting a strategy execution condition, automatically or semi-automatically executing restoration according to the strategy risk level, supporting cross-service collaboration, and informing other services through an API in the micro-service architecture.
Example execution conditions include:
The automatic execution condition is that the confidence is >0.8 and the risk level is less than or equal to 2, for example, policy 1 is that the API request is retried, such as fetch () is retried 3 times, and policy 2 is that the code is dynamically inserted, such as setState ('paid');
and (3) manually confirming execution, namely generating a patch suggestion when the confidence coefficient is less than 0.8 or the risk level is more than 2, highlighting the Code position through a VS Code plug-in, and displaying a repair suggestion for a developer to check and confirm.
And step 6, evaluating the repairing effect, including path comparison, confirming whether the causal chain breaking point is repaired or not, recording the repaired performance index, collecting user feedback, optimizing the model and root cause library to form a continuous improved closed loop, and supporting rollback and audit.
Wherein the model optimization comprises:
on-line learning, namely collecting the repair cases, and only updating the last layer of the model, such as a Dense layer, so as to avoid retraining;
And (3) migration learning, namely extracting a repair case, such as 'state machine design defect', 'Circuit Breaker mode', from GitHub Issues, and enhancing generalization capability through data enhancement.
Wherein root cause library optimization includes:
New root causes categorize, for example, "missing fuse configuration" as "architecture design defect", associated design patterns;
rule base update-adding newly discovered root cause to rule engine, such as "API returns format change" → "switch standby interface".
On the other hand, the application is also provided with a fault-tolerant mechanism of causal link failure, when the causal link such as a business flow, a system state or a data dependency relationship is abnormal or broken, the system stability and the user experience are ensured, and meanwhile, the data loss and the business interruption are minimized, and the application specifically comprises the following steps:
The causal link snapshot storage is used for preventing data loss caused by memory leakage or program breakdown, providing a reliable basis for rollback, using IndexedDB (front end) or a distributed database (rear end) to store the causal link snapshot in a lasting mode, recording states according to time stamps and version numbers, supporting rollback to any historical effective state, forcedly generating the snapshot after key nodes such as state change and asynchronous operation are completed, and pre-judging possible failure and triggering the snapshot in advance by monitoring indexes such as memory utilization rate, API response time and the like;
and (3) failure detection and rollback, namely detecting logic fracture, parameter out-of-range or overtime and the like, selecting the latest effective snapshot to recover the system state, and recording rollback reasons and state differences for subsequent analysis.
And degrading the protocol, namely setting a layering strategy, and maintaining the usability of the core function during failure so as to avoid user experience breakdown.
Wherein the layering strategy comprises:
The first layer, non-critical asynchronous timeout, UI displays "in-load" or basic information, and retains core operation;
the second layer is that the core API fails in full link, the UI displays cache data, such as history records, and the non-core module is closed;
and the third layer, namely the key state is missing, the UI displays static content, and only basic inquiry is allowed.
On the other hand, the application also provides a front-end state anomaly detection system based on a causal chain, referring to fig. 5, the system comprises the following modules:
The static analysis and modeling module extracts key nodes such as event monitoring, asynchronous tasks, state change and the like through analyzing codes, constructs a causal relationship graph among the nodes, forms an expected causal chain model and serves as a reference for subsequent dynamic verification;
The dynamic monitoring and verification module is used for recording the actual paths of event triggering, asynchronous calling and state changing in real time through code hijacking when in operation, and comparing the expected causal chain model to detect abnormality;
The restoration module is used for combining a rule engine and an AI model, locating the root cause of the causal link abnormality, generating a restoration scheme based on a strategy library, and finally outputting an executable restoration suggestion;
The fault-tolerant and rollback module is used for recovering the system to a latest stable state through a snapshot rollback mechanism or degrading the function according to a preset strategy when the irreversible abnormality is detected, supporting the selection of rollback points according to time, version or user behavior dimension and recording a difference log;
The data storage module is used for persistently storing causal chain model versions, structured logs of abnormal events, repair strategy execution records and system state snapshots, supporting quick query and analysis and integrating the system into the existing monitoring system;
The alarm and notification module is used for triggering notification through a grading strategy according to the severity and the influence range of the abnormality, supporting a self-defined alarm rule and generating a periodic report for team duplication;
And the verification and execution module simulates the execution effect of the repair strategy in the isolation environment, ensures that no new problem is introduced in repair, and records the verification result to optimize the subsequent strategy selection.
In another aspect, referring to FIG. 6, a block diagram of an electronic device is provided for embodiments of the present disclosure, which may include at least one processor, at least one network interface, a user interface, a memory, and at least one communication bus. Wherein a communication bus may be used to enable the connection communication of the various components described above. The optional user interface may also include a standard wired interface, a wireless interface, where the network interface may include, but is not limited to, a bluetooth module, NFC module, wi-Fi module, etc. Wherein the processor may include one or more processing cores. The processor uses various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the routing device and process data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and invoking data stored in memory. Wherein the processor may be implemented in at least one hardware form of DSP, FPGA, PLA. The processor may integrate one or a combination of several of a CPU, GPU, modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing contents required to be displayed by the display screen, the processor can efficiently process high-computation tasks such as static analysis (AST analysis and DAG construction), dynamic monitoring (hijacking function and real-time recording), AI auxiliary repair (LSTM reasoning and CML model) and the like, and the modem is used for processing wireless communication.
It will be appreciated that the modem may not be integrated into the processor and may be implemented by a single chip. The memory may include, among other things, RAM and ROM. Optionally, the memory comprises a non-transitory computer readable medium. The memory may be used to store instructions, programs, code sets, or instruction sets. The memory may include a stored program area that may store instructions for implementing the operating system, instructions for at least one function, instructions for implementing the various method embodiments described above, and the like, and a stored data area that may store data and the like referred to in the various method embodiments described above. The memory may optionally also be at least one storage device located remotely from the aforementioned processor. The memory, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and application programs. The processor may be configured to invoke the application stored in the memory and perform the methods of the various embodiments described above, where the memory is capable of meeting the storage requirements of code resolution, causal chain model, execution path record (chain hash structure), and repair policy library, supporting the persistence (IndexedDB) and rollback mechanisms of causal chain snapshots.
The present description also provides a computer-readable storage medium having instructions stored therein, which when executed on a computer or processor, cause the computer or processor to perform the steps of the above embodiments. The above-described constituent modules of the electronic apparatus may be stored in the computer-readable storage medium if implemented in the form of software functional units and sold or used as independent products.
The present description also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above embodiments.
The technical features in the present examples and embodiments may be arbitrarily combined without conflict.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present description, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of a plurality of available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (DIGITAL VERSATILE DISC, DVD)), or a semiconductor medium (e.g., a Solid state disk (Solid STATE DISK, SSD)), or the like.
When the method is realized by hardware and firmware, the method flow is programmed into a hardware circuit to obtain a corresponding hardware circuit structure, so as to realize corresponding functions. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (FieldProgrammable GATEARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, today, instead of manually fabricating integrated circuit chips, such programming is most often implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before it is compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), which is not just one but a plurality of HDL. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.