Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those skilled in the pertinent art, the described embodiments may be modified in numerous different ways without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
In order to facilitate understanding of the technical solutions of the embodiments of the present application, the following describes related technologies of the embodiments of the present application. The following related technologies may be optionally combined with the technical solutions of the embodiments of the present application, which all belong to the protection scope of the embodiments of the present application.
The following terms will be used hereinafter:
Just-In-Time (JIT) technology, which is a program optimization means, has the core idea of dynamically compiling intermediate code (such as byte code) into local machine code when the program runs.
Performance analysis is a method for evaluating and optimizing software performance, and aims to identify bottlenecks and inefficient parts in a program so as to optimize and improve the overall performance of a system.
The operating system is the core software in the computer system for managing and controlling the computer hardware and software resources, provides an interface between the user and the computer hardware, and reasonably organizes and schedules the workflow of the computer, so that all the hardware and software resources of the computer system are utilized efficiently and coordinately.
Continuous performance analysis, a process of continuously collecting performance data from applications running in a real-time production environment, aims to optimize the performance of the applications and improve resource utilization efficiency. Unlike traditional performance analysis methods, continuous performance analysis can capture various indexes in real time, providing more accurate and timely application performance and behavior views.
In general, in order to ensure stable operation of software such as an operating system or an application program installed on a computing device, performance analysis or continuous performance analysis is required for the software. When performing performance analysis on software of a computing device, it is generally required to collect key performance indicators in the running process of the software, such as the usage rate of a CPU, the memory occupation condition, and the frequency of function call. By analyzing the key performance indexes, the performance bottleneck of the software running process can be found in time, and corresponding optimization measures can be adopted. In some application scenarios, such as a distributed computing system or a time sequence computing task that needs real-time processing, the real-time performance requirement on the performance analysis is high. There is a need in these scenarios to be able to quickly and efficiently perform performance analysis in order to quickly respond and solve performance problems. Therefore, a technical solution is needed to quickly and efficiently identify performance anomalies during the running of software.
There are also some techniques in the related art for performing performance analysis on software, for example, some performance analysis platforms have the functions of collecting, storing, and querying performance data, and providing performance monitoring and analysis. The platform can support a plurality of programming languages, such as Python language or Java language, and the like, collect performance data of the program in running through the integrated lightweight probe, upload the performance data to a server for display, provide visual patterns such as visual flame patterns and performance overviews for developers, and facilitate quick identification of performance bottlenecks of software. Although these performance analysis platforms provide a one-stop performance data visualization platform, they do not have the ability to process the acquired performance data in real time to quickly identify performance anomalies, i.e., they cannot meet the real-time requirements of the performance analysis for scenarios such as distributed computing systems or time-series computing tasks.
In view of the foregoing, an embodiment of the present application provides an anomaly analysis method for performing performance analysis on software running on a computing device. In the embodiment of the application, the performance abnormality detection rule is compiled into the executable code which can be directly executed by the processor and is loaded in the memory of the computing equipment, so that the execution efficiency of the program is obviously improved, the efficiency of abnormality analysis of the software to be analyzed is further improved, and the requirement of real-time abnormality analysis in some scenes is met.
In order to facilitate understanding of the embodiments of the present application, first, an application scenario of the anomaly analysis method provided by the embodiments of the present application is described briefly. Fig. 1 shows an application scenario schematic diagram of an anomaly analysis method provided by an embodiment of the present application. As shown in fig. 1, the application scenario includes an exception handling device 110 and at least one computing device 120, with data communication between the exception handling device 110 and the at least one computing device 120. Wherein at least one computing device 120 is a computing node for performing service processing in a distributed computing system, and the exception handling device 110 may be an exception handling node for performing exception transactions in the distributed computing system. The specific forms of the exception handling device 110 and the computing device 120 may be servers or the like.
In one embodiment, the performance anomaly detection rule is generated by the anomaly processing device 110, where the performance anomaly detection rule is used to perform anomaly analysis on software running on the computing device 120, the anomaly processing device 110 sends the generated performance anomaly detection rule to each computing device 120 in the distributed computing system, and after receiving the performance anomaly detection rule sent by the anomaly processing device 110, each computing device 120 compiles the performance anomaly detection rule to obtain a performance anomaly detection program and loads the performance anomaly detection program into a memory of the computing device 120, where the compiled performance anomaly detection program is an executable program. For any computing device 120, when the computing device runs the software to be analyzed, performance data of the software to be analyzed during running is obtained, and a performance abnormality detection program stored in a memory is called, and the performance data is detected by the performance abnormality detection program so as to analyze whether the software to be analyzed has a performance abnormality.
It should be noted that, the application scenario or the application example provided in the embodiment of the present application is for convenience of understanding, and the embodiment of the present application does not specifically limit the application of the technical solution. In addition, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related country and region, and are provided with corresponding operation entries for the user to select authorization or rejection.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the foregoing technical problems in detail with specific embodiments. The specific embodiments illustrated may be combined with one another and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 2 is a flowchart illustrating an anomaly analysis method according to an embodiment of the present application, where the method illustrated in fig. 2 may be applied to the computing device 120 in the application scenario illustrated in fig. 1, that is, the execution subject of the method illustrated in fig. 2 may be the computing device 120. As shown in fig. 2, the method may include step S201, step S202, and step S203.
Step S201, performance data of software to be analyzed when running on the computing device is obtained.
Step S202, calling a performance abnormality detection program of the computing device, wherein the performance abnormality detection program is an executable program which compiles a performance abnormality detection rule and is loaded in a memory of the computing device, and is used for analyzing whether the software to be analyzed has the performance abnormality according to the performance data and the performance abnormality detection rule.
And step 203, acquiring an analysis result of the software to be analyzed by the performance abnormality detection program.
The software to be analyzed may be an application program running on the computing device, an operating system running on the computing device, or both an application program running on the computing device and an operating system running on the computing device.
By way of example, the performance data may include CPU utilization, memory usage, input/output (I/O) times, function call frequencies, etc. that may be used to perform performance analysis on the software to be analyzed. In implementing this embodiment, the corresponding performance data may be obtained according to actual requirements, and these are only illustrative of several possible performance data, and are not limiting on the present embodiment.
In addition, in the step S201, the trigger condition for acquiring the performance data of the software to be analyzed may be the start of the software to be analyzed. For example, after detecting that the software to be analyzed is started, an operation of acquiring performance data of the runtime thereof is performed. In step S201, the performance data of the software to be analyzed during the running may be obtained in real time, the performance data of the software to be analyzed during the running may be obtained periodically, or the performance data of the software to be analyzed during the running may be obtained in response to the data obtaining instruction. In implementation, the trigger condition for acquiring the performance data of the software to be analyzed and the acquisition mode of the performance data may be set according to actual requirements, which are just illustrative of several possible implementation modes and are not limiting on the scheme of the present application.
In one embodiment, the performance data of the software to be analyzed may be obtained from a performance log of the software to be analyzed.
In the embodiment of the application, after the performance data of the software to be analyzed is obtained when the software to be analyzed runs on the computing equipment, the execution of calling a performance abnormality detection program from the memory of the computing equipment is triggered, so that the performance of the software to be analyzed is detected based on the obtained performance data through the performance abnormality detection program, and whether the performance abnormality exists in the software to be analyzed is analyzed.
The performance abnormality detection program is an executable program obtained by compiling a performance abnormality detection rule. Illustratively, the performance anomaly detection program is a JIT compiled performance anomaly detection rule.
It should be noted that, in the embodiment of the present application, the Memory for storing the performance abnormality detection program may be a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM), which may be other, and will not be described in detail here.
In one embodiment, the performance anomaly detection rule may be generated by the computing device based on the historical performance data of the software to be analyzed, or may be generated by another device, for example, an anomaly processing device, based on the historical performance data of the software to be analyzed, and then sent to the computing device.
In the embodiment of the application, after the performance abnormality detection rule is invoked, whether the software to be analyzed has the performance abnormality or not is analyzed through the performance abnormality detection rule and the acquired performance data of the software to be analyzed.
The analysis result of the software to be analyzed may be that the software to be analyzed has a performance abnormality or the software to be analyzed does not have a performance abnormality, or in an implementation manner, if the software to be analyzed has a performance abnormality, the analysis result may further include a performance abnormality problem of the software to be analyzed. For example, the analysis result may be a performance problem that the software to be analyzed has abnormal resource allocation, or the analysis result may also be a performance problem that the software to be analyzed has memory fragmentation. Of course, the content of the analysis results is described herein by way of example only, and is not limiting of the performance problems of the software to be analyzed.
According to the method provided by the embodiment of the application, the performance abnormality detection program is loaded in the memory of the computing device, and is an executable program obtained by compiling the performance detection rule, when the performance data of the software to be analyzed when running on the computing device is obtained, the performance abnormality detection program of the computing device is called, and whether the software to be analyzed has the performance abnormality or not can be analyzed according to the performance data and the performance abnormality detection rule by running the performance abnormality detection program. In the embodiment of the application, the performance abnormality detection program is the local machine code obtained after compiling, so that the performance abnormality detection program can be directly executed by a processor, the execution efficiency of the program is obviously improved, the efficiency of abnormality analysis of software to be analyzed is further improved, and the requirement of real-time abnormality analysis in some scenes is met.
In general, different performance problems are determined by detecting different performance indexes, and for one performance problem, one performance index may be detected, or a plurality of performance indexes may be detected. Thus, for different performance problems, there are different performance index detection sub-rules, which may be generated based on one performance index or may be generated based on multiple performance indexes.
Therefore, in one embodiment, the performance anomaly detection rule may include a performance index detection sub-rule corresponding to a plurality of performance problems, and for this case, the performance anomaly detection program is specifically configured to calculate a performance index value corresponding to the performance data, detect the performance index value based on the performance index detection sub-rule, and determine whether the software to be analyzed has the performance problem corresponding to the performance index.
The performance problem refers to the problem that the efficiency, speed and stability of the software to be analyzed can be affected when the software to be analyzed is in the running process. For example, the performance problem may be a resource maldistribution problem or a memory fragmentation problem, etc.
In one embodiment, the performance anomaly detection rule may include a plurality of detection sub-rules, and one performance problem corresponds to one detection sub-rule. For example, the performance anomaly detection rule may include a performance problem 1 and its corresponding performance index detection sub-rule 1, a performance problem 2 and its corresponding performance index detection sub-rule 2, and a performance problem 3 and its corresponding performance index detection sub-rule 3.
And considering that for a performance problem, it may be detected by only one performance index, or may be detected by a plurality of performance indexes. Therefore, in one embodiment, for any performance problem, the corresponding performance index detection sub-rule may be generated by one performance index, or may be composed of a plurality of performance indexes together. For example, continuing with the above example, performance level detection sub-rule 1 may be generated from performance level 1, or may be composed of both performance level 1 and performance level 2.
In the embodiment of the application, after the performance data of the software to be analyzed is obtained when the software runs on the computing equipment, the corresponding performance index value is calculated based on the specific data content of the performance data. Wherein, for different performance data, different calculation modes may be corresponding. For example, an average value of each data in the performance data may be calculated as the performance index value, a data distribution of each data in the performance data may be calculated as the performance index value, or a deviation value of each data from a reference value in the performance data may be calculated as the performance index value. The exemplary list of several possible ways of calculating the performance index value is not intended to limit the embodiments of the present application.
In one embodiment, the performance data of the software to be analyzed obtained in the step S201 may be one performance data or may be multiple performance data. If multiple performance data of the software to be analyzed are obtained, calculating a performance index value corresponding to each performance data. That is, one performance data corresponds to one performance index value. Aiming at the situation, when the performance abnormality of the software to be analyzed is detected, each performance index value is detected respectively to determine whether the software to be analyzed has a corresponding performance problem.
For example, in one embodiment, the performance data of the software to be analyzed obtained in the step S201 includes performance data 1, performance data 2, and performance data 3, where the performance data 1, the performance data 2, and the performance data 3 are respectively different types of performance data. When calculating the performance index value corresponding to the performance data, the performance index value 1 corresponding to the performance data 1, the performance index value 2 corresponding to the performance data 2, and the performance index value 3 corresponding to the performance data 3 are calculated respectively, and the calculation modes corresponding to the performance index value 1, the performance index value 2, and the performance index value 3 can be determined according to the category of the corresponding performance data.
In the embodiment of the application, after the performance index value corresponding to the performance data of the software to be analyzed is calculated, the performance index value is detected based on the performance index detection sub-rule in the performance abnormality detection rule, so as to determine whether the performance problem corresponding to the performance index exists in the software to be analyzed.
Wherein, when the performance index value is detected based on the performance index detection sub-rule, various implementations are possible. For example, in one embodiment, after obtaining the performance index value corresponding to the performance data, a target performance index detection sub-rule having the performance index corresponding to the performance index value may be determined in the performance anomaly detection rule, then the performance index value is detected based on the target performance index detection sub-rule, and if the performance index value meets the target performance index detection sub-rule, it is determined that the performance problem corresponding to the performance index detection sub-rule exists in the software to be analyzed.
Or in another embodiment, after obtaining the performance index value corresponding to the performance data, detecting the performance index value based on each performance index detection sub-rule, and if the performance index value is consistent with any one of the performance index detection sub-rules, determining that the performance problem corresponding to the performance index detection sub-rule exists in the software to be analyzed.
In the embodiment of the application, the performance index detection sub-rules corresponding to the performance problems are set in the performance abnormality detection rules, so that the performance abnormality detection program can perform abnormality analysis of the performance problems, the comprehensiveness of performing abnormality analysis on software to be analyzed is improved, the requirements on various different performance analyses can be met, and the flexibility is higher.
In one embodiment, the performance problem may be a resource allocation abnormality problem, and the performance abnormality detection program is specifically configured to calculate, according to the performance data, a deviation between a CPU utilization rate of each CPU core and a reference value, and determine, in response to the CPU core having the deviation greater than or equal to a first threshold, that the software to be analyzed has the resource allocation abnormality problem, where the reference value is an average value of CPU utilization rates of a plurality of CPU cores.
When the abnormal detection is performed on the resource allocation situation of the software to be analyzed, the performance data of the software to be analyzed obtained in the step S201 during operation is the CPU utilization rate of each CPU core, and accordingly, for the performance problem, the corresponding performance index detection sub-rule may be that the deviation value between the CPU utilization rate and the reference value is greater than or equal to the first threshold. The CPU Core (Core) refers to a separate processing unit within the CPU. In a multi-core processor, each core may independently perform computing tasks, similar to a separate CPU.
In addition, the CPU utilization of each CPU core refers to the percentage of each CPU core that is used to perform a calculation task in a certain period of time, which is an important indicator for measuring the busy degree of the CPU core and reflects the workload of the CPU core. Therefore, when the resource allocation condition of the software to be analyzed needs to be subjected to abnormal analysis, the CPU utilization rate of each CPU core when the software to be analyzed runs can be obtained.
In one embodiment, after the CPU utilization of each CPU core of the software to be analyzed at the time of running is obtained, an average value of the CPU utilization of all the CPU cores is calculated as the above-mentioned reference value. And then, calculating the deviation between the CPU utilization rate of each CPU core and the reference value, judging whether the deviation between the CPU utilization rate of the CPU core and the reference value is larger than or equal to a first threshold value, and if the deviation is larger than or equal to the first threshold value, indicating that the resource allocation is unbalanced, namely the performance problem of abnormal resource allocation of the software to be analyzed exists.
The specific value of the first threshold may be set according to an actual application scenario, and the embodiment of the present application does not limit the specific value of the first threshold.
In the embodiment of the application, in the running process of the software to be analyzed, the CPU utilization rate of each CPU core is obtained when the software to be analyzed runs, and by judging whether the CPU cores with the CPU utilization rate larger than or equal to the first threshold value exist in each CPU core, the excessive use of the CPU cores can be accurately judged, so that the resource allocation of the software to be analyzed can be timely adjusted to balance the load, and the overall efficiency is improved.
In one embodiment, the performance problem may include a memory fragmentation problem, and for the performance problem, the performance abnormality detection program is specifically configured to calculate, according to performance data, a ratio between a number of target memory blocks and a total number of memory blocks, and determine, in response to the ratio being greater than or equal to a third threshold, that there is a memory fragmentation problem in software to be analyzed, where the target memory blocks are memory blocks having a memory occupation space that is less than or equal to a second threshold.
Memory fragmentation refers to a phenomenon that a memory space is discontinuous in a memory of a computing device due to discontinuous memory allocation and release, so that a plurality of small and discontinuous memory blocks are formed, and the memory blocks are too small to be effectively utilized but cannot be combined, so that the memory utilization rate is reduced. The memory block refers to a section of continuous storage space allocated in the memory, and the memory occupied space of the memory block refers to the size of the memory actually occupied by the memory block.
Therefore, when analyzing whether the performance problem of memory fragmentation exists during the running of the software to be analyzed, the performance data obtained in the step S201 may be the memory occupied space of each memory block in the memory, and correspondingly, for the performance problem, the corresponding performance index detection sub-rule may determine whether the ratio of the number of memory blocks whose memory occupied space is smaller than the second threshold to the total number of all memory blocks exceeds the third threshold. In general, the larger the ratio of small memory blocks in the memory is, the more serious the memory fragmentation is, so that whether the performance problem of the memory fragmentation exists in the running process of the software to be analyzed can be judged by detecting whether the ratio of the number of the small memory blocks (i.e. the target memory blocks) to the total number of all the memory blocks exceeds a third threshold.
In one embodiment, after the memory space occupied by each memory block of the software to be analyzed in the running process is obtained, the memory space occupied by each memory block is compared with a second threshold value, and target memory blocks with the memory space occupied by less than or equal to the second threshold value are screened out from each memory block, namely small memory blocks in the memory blocks are screened out. And calculating the ratio of the number of the target memory blocks to the number of all the memory blocks, and judging whether the ratio is larger than or equal to a third threshold value or not, or if the ratio is larger than or equal to the third threshold value, the ratio of the small memory blocks in the memory is too high, namely, the performance problem of memory fragmentation exists in the running process of the software to be analyzed.
The specific values of the second threshold and the third threshold may be set according to an actual application scenario, and the embodiment of the present application does not limit the specific values of the second threshold and the third threshold.
In the embodiment of the application, the memory occupation space of each memory block in the running process of the software to be analyzed is obtained in the running process of the software to be analyzed, and whether the performance problem of memory fragmentation exists in the software to be analyzed can be accurately judged by detecting whether the number proportion of the target memory blocks smaller than the second threshold exceeds the third threshold or not, so that corresponding optimization measures, such as adjusting a memory allocation strategy, can be timely adopted to reduce the generation of the memory fragmentation, and the system performance is improved.
In one embodiment, if it is determined that the software to be analyzed has a performance abnormality, in order to solve the performance abnormality problem in time, an alarm operation to the target person may be triggered. The method provided by the embodiment of the application further comprises the following steps of generating the performance abnormality alarming information aiming at the software to be analyzed in response to the fact that the performance abnormality exists in the software to be analyzed as the result to be analyzed, and sending the performance abnormality alarming information to the target equipment, wherein the performance abnormality alarming information carries the performance data of the software to be analyzed and the call stack information when the software to be analyzed runs, and the target equipment is used for determining the reason for the performance abnormality of the software to be analyzed according to the performance data and the call stack information.
For example, in one embodiment, the alert information may be sent to the target device in any manner, such as a short message, a mail, a voice message, or the like, or the alert information may be sent to a specified address and notify the user to access the specified address through the target device to obtain the alert information.
The call stack information is a sequence record of function calls of the software to be analyzed in the running process, and records all function or process calls of the software to be analyzed, including the sequence and the hierarchy of the calls, from execution to a specific moment. When the software to be analyzed has abnormal performance, the call stack information of the software to be analyzed during the running process can be used for knowing which functions or processes are called most frequently or consume the most resources, so that the cause of the abnormal performance of the software to be analyzed is assisted in analysis.
In the embodiment of the application, the alarm information is used for prompting that the software to be analyzed has abnormal performance on one hand, and sending the related information for analyzing the cause of the abnormal performance to the target equipment on the other hand, so as to analyze the cause of the abnormal performance.
In one embodiment, if an analysis result of analyzing the software to be analyzed through the performance abnormality detection program indicates that the software to be analyzed has the performance abnormality, call stack information of the software to be analyzed in a running period is obtained, performance abnormality warning information is generated based on the obtained call stack information and corresponding performance data of the software to be analyzed, and the generated performance abnormality warning information is sent to the target device.
After receiving the performance abnormality warning information sent by the computing device, the target device responds to the graph generation instruction, and generates a corresponding analysis graph based on the performance data and the call stack information in the performance abnormality warning information, wherein the analysis graph is used for assisting in analyzing the reasons for causing the performance abnormality of the software to be analyzed.
For example, in one embodiment, a flame map may be generated based on the performance data and call stack information, with the analysis being aided based on the flame map, which causes a performance anomaly in the software to be analyzed. Specifically, the flame graph includes a plurality of call stacks, any call stack includes a plurality of lattices, the plurality of lattices are in one-to-one correspondence with a plurality of functions (or a plurality of processes), the arrangement sequence of the plurality of lattices characterizes the call relationship of the plurality of functions (or processes), and the width of any lattice is related to performance data corresponding to the corresponding function (or process) and is used for characterizing the probability that the corresponding function (or process) causes an abnormal state. The larger the width of the lattice, the greater the probability of characterizing the corresponding function (or process) to cause an abnormal state, or the greater the probability of characterizing the corresponding function (or process) to cause a performance abnormality of the software to be analyzed.
Therefore, in the embodiment of the application, related personnel can analyze the abnormal reason information causing the abnormal performance of the software to be analyzed by combining the flame graph generated based on the performance data and the call stack information, and process the performance problem of the software to be analyzed based on the abnormal reason information obtained by analysis.
In the embodiment of the application, when the analysis result of the software to be analyzed indicates that the software to be analyzed has abnormal performance, the performance data of the software to be analyzed and the call stack information of the software to be analyzed in the running process are carried in the performance abnormality alarming information and sent to the target equipment, so that the target equipment analyzes the reason causing the abnormal performance of the software to be analyzed based on the performance data and the call stack information. In addition, the call stack information and the performance data are directly carried in the performance abnormality warning information and sent to the target equipment, so that the time for waiting for manually collecting the data for carrying out abnormality analysis is reduced, the reason for generating the performance abnormality can be rapidly positioned, and the response efficiency of the system is further improved.
In one implementation manner, the performance abnormality detection rule is generated by the abnormality processing device, so in the embodiment of the present application, before executing the step S202 and invoking the performance abnormality detection program of the computing device, the method provided by the embodiment of the present application further includes the steps of receiving the performance abnormality detection rule sent by the abnormality processing device, compiling the performance abnormality detection rule, and loading the compiled performance abnormality detection program into the memory of the computing device, where the performance abnormality detection rule is generated by the abnormality processing device based on historical performance data, or updating the original performance abnormality detection rule based on user feedback information.
The exception handling device is a computing device independent of the computing device, and is used for generating a performance exception detection rule or optimizing the existing performance exception detection rule based on user feedback information and sending the optimized performance exception detection rule to the computing device.
In the embodiment of the present application, the performance anomaly detection rule sent by the anomaly processing device to the computing device may be an initially generated performance anomaly detection rule, or may be a performance anomaly detection rule obtained by adjusting the original performance anomaly detection rule based on user feedback, or may be a performance anomaly detection rule in which a new detection rule is newly added to the original performance anomaly detection rule.
Illustratively, after receiving the performance abnormality detection rule sent by the abnormality processing device, the computing device compiles the performance abnormality detection rule, for example, compiles the performance abnormality detection rule in JIT, to obtain an executable program corresponding to the performance abnormality detection rule.
In one embodiment, the exception handling device may generate the performance exception detection rules by collecting historical performance data of the software to be analyzed from a historical performance monitoring database, where the historical performance data may be key performance indicators such as historical CPU utilization, historical memory occupancy, historical function call frequency, and historical I/O times. The collected historical performance data is then data cleaned to improve data quality. The data cleansing may include processing flows such as missing value processing, deduplication, and data format normalization.
After the collected historical performance data is subjected to data cleaning, the cleaned historical performance data is analyzed by adopting an abnormal analysis algorithm such as cluster analysis and the like, performance indexes corresponding to all performance problems are identified, and the normal fluctuation range of the performance indexes is determined. Based on the result of the cluster analysis, for each performance index, an index threshold value when a performance problem exists can be determined, namely, a performance index detection threshold value corresponding to each performance problem is determined, and a corresponding performance index detection sub-rule is generated according to a set template. For example, one possible form of the performance index detection sub-rule corresponding to each performance problem may be that when it is detected that the index value of the performance index a exceeds the target threshold, the performance problem corresponding to the performance index a is considered to exist, or when it is detected that the ratio of a certain call stack to the total hot point exceeds a preset threshold in the process B, the performance problem related to the call stack is considered to exist.
After the performance index detection sub-rules corresponding to the performance problems are obtained, the performance index detection sub-rules corresponding to all the performance problems are combined into a performance abnormality detection rule and sent to each computing device, so that each computing device performs performance analysis on the software to be analyzed based on the performance abnormality detection rule.
In general, in the process of performing performance analysis on software to be analyzed, a situation that a certain performance index detection sub-rule detects a performance problem inaccurately may occur, or some new performance problems may also occur, and no performance index detection sub-rule corresponding to the new performance problems exists in the existing performance anomaly detection rule, and for this situation, the existing performance detection rule needs to be optimized according to actual requirements.
Therefore, in one embodiment, if the related personnel identify that the accuracy problem exists in the performance index detection sub-rules corresponding to a certain performance problem, the exception handling device may be triggered to optimize the rules. The exception handling equipment responds to the optimization instruction triggered by the related personnel, can collect the historical performance data again, and re-analyze the collected historical performance data so as to adjust the detection threshold of the performance index and update the corresponding performance index detection sub-rule. Or in another embodiment, the optimization instruction may further carry an adjustment suggestion for adjusting the performance anomaly detection sub-rule, and the anomaly processing device responds to the optimization instruction and adjusts the anomaly index detection sub-rule according to the adjustment suggestion carried in the optimization instruction.
Illustratively, after the optimization of the performance abnormality detection sub-rule is completed, the abnormality processing device uses the optimized performance abnormality detection sub-rule to replace the corresponding sub-rule in the performance abnormality detection rule, and sends the replaced whole performance abnormality detection rule to each computing device.
In one embodiment, if a new performance problem occurs, the exception handling device may be triggered to generate a performance index detection sub-rule corresponding to the new performance problem, and the performance index detection sub-rule corresponding to the new performance problem is added to the original performance exception detection rule and sent to each computing device together.
In the embodiment of the application, the abnormality processing equipment generates the performance abnormality detection rule and sends the performance abnormality detection rule to each computing equipment, so that the centralized management of the performance abnormality detection rule is realized, the same abnormality performance detection rule is ensured to be used by each computing equipment, thereby reducing false detection caused by inconsistent performance abnormality detection rules, improving the reliability of the whole system and the accuracy of performance detection.
In the embodiment of the application, the abnormality processing equipment can continuously optimize the performance detection rule based on the feedback of the user, and issue the optimized performance abnormality detection rule to each computing equipment. Thus, for each computing device, it may receive multiple performance anomaly detection rules and need to be compiled each time a new performance anomaly detection rule is received. Therefore, the method provided by the embodiment of the application further comprises the steps of recompiling the updated performance abnormality detection rule, and updating the performance abnormality detection program in the computing equipment into the recompiled performance abnormality detection program.
Wherein in one embodiment, the updated performance anomaly detection rules are compiled using JIT compilation techniques. After the new performance abnormality detection rule is recompiled, the new performance abnormality detection rule is loaded into the memory of the computing device, and the performance abnormality detection program originally loaded in the memory is replaced. That is, in the embodiment of the present application, only the latest compiled performance abnormality detection program is saved in the memory.
In the embodiment of the application, each time the new performance abnormality detection rule is recompiled, the performance abnormality detection program stored in the memory of the computing device is updated to the recompiled performance abnormality detection program, namely, the original performance abnormality detection program in the memory of the computing device is replaced by the latest compiled version, so that the computing device can use the latest performance abnormality detection program to perform abnormality analysis, the accuracy and efficiency of analysis are improved, and by replacing the old program, the occupation of the memory space of the computing device can be reduced, and the resource utilization is optimized.
In order to facilitate understanding of the method provided by the embodiment of the present application, a schematic flow chart of the method provided by the embodiment of the present application will be described below. Fig. 3 is a schematic flow chart of an anomaly analysis method provided in an embodiment of the present application, where, as shown in fig. 3, an anomaly processing device collects historical performance index data from a historical database, determines a performance anomaly detection sub-rule corresponding to each performance problem through an anomaly analysis algorithm such as cluster analysis, and generates codes of each performance problem and the corresponding performance anomaly detection sub-rule, so as to obtain a performance anomaly detection rule, and sends the performance anomaly detection rule to each computing device. After receiving the performance abnormality detection rule sent by the abnormality processing device, the computing device compiles the performance abnormality detection rule by using a JIT compiling technology to obtain a corresponding performance abnormality detection program, and loads the corresponding performance abnormality detection program into a memory of the computing device.
When the software to be analyzed runs on the computing equipment, the performance data of the software to be analyzed in the running process is obtained, a performance abnormality detection program is called from the memory of the computing equipment to analyze, for example, a performance index value corresponding to the performance data can be calculated, the performance index value is detected based on the performance index detection sub-rule to determine whether the performance problem corresponding to the performance index exists in the software to be analyzed, and an analysis result of the performance abnormality detection program to the software to be analyzed is obtained. If the to-be-analyzed result indicates that the to-be-analyzed software has abnormal performance, generating the abnormal performance alarm information of the to-be-analyzed software, and sending the abnormal performance alarm information to the target equipment.
Corresponding to the application scenario and method of the method provided by the embodiment of the present application, the embodiment of the present application further provides an anomaly analysis system, and fig. 4 shows a schematic structural diagram of the anomaly analysis system provided by the embodiment of the present application, and as shown in fig. 4, the anomaly analysis system includes an anomaly processing device 110 and at least one computing device 120.
The device 110 is configured to send a performance abnormality detection rule to each computing device 120, the computing devices 120 are configured to compile the received performance abnormality detection rule, load a performance abnormality detection program obtained after compiling into a memory, obtain performance data of software to be analyzed when the software is running, call the performance abnormality detection program, analyze whether the software to be analyzed has a performance abnormality according to the performance data and the performance abnormality detection rule, and the computing devices 120 are also configured to obtain an analysis result of the software to be analyzed by the performance abnormality detection program.
Wherein, in one embodiment, the above-mentioned exception handling device 110 is further configured to generate a performance exception detection rule based on the historical performance data.
Optionally, in an embodiment, the exception handling device 110 is further configured to update the original performance exception detection rule based on the received user feedback information, send the new performance exception detection rule to the computing device 120, and recompile the updated performance exception detection rule by the computing device 120, and update the performance exception detection program in the computing device 120 to the recompiled performance exception detection program.
Optionally, in one embodiment, the performance anomaly detection rule includes a performance index detection sub-rule corresponding to a plurality of performance problems, and the performance anomaly detection program is specifically configured to calculate a performance index value corresponding to the performance data, detect the performance index value based on the performance index detection sub-rule, and determine whether the performance problem corresponding to the performance index exists in the software to be analyzed.
Optionally, in one embodiment, the performance problem includes a resource allocation abnormality problem, and the performance abnormality detection program is specifically configured to calculate, according to the performance data, a deviation between a CPU utilization rate of each CPU core and a reference value, where the reference value is an average value of CPU utilization rates of the respective CPU cores, and determine, in response to the CPU cores having the deviation greater than or equal to a first threshold, that the software to be analyzed has the resource allocation abnormality problem.
Optionally, in an embodiment, the performance problem includes a memory fragmentation problem, and the performance anomaly detection program is specifically configured to calculate, according to the performance data, a ratio between a number of target memory blocks and a total number of memory blocks, where the target memory blocks are memory blocks with a memory occupation space smaller than or equal to a second threshold, and determine that the software to be analyzed has the memory fragmentation problem in response to the ratio being greater than or equal to a third threshold.
Optionally, in an embodiment, the computing device 120 is further configured to generate, in response to the analysis result that the software to be analyzed has the performance abnormality, performance abnormality alert information for the software to be analyzed, where the performance abnormality alert information carries performance data of the software to be analyzed and call stack information when the software to be analyzed runs, and send the performance abnormality alert information to a target device, where the target device is configured to determine, according to the performance data and the call stack information, a cause of the performance abnormality of the software to be analyzed.
The functions of the computing device and the exception handling device in the embodiments of the present application may be described correspondingly in the above methods, and have corresponding beneficial effects, which are not described herein.
Corresponding to the application scenario and method of the method provided by the embodiment of the application, the embodiment of the application also provides an abnormality analysis device, which comprises a first acquisition module, a calling module and a second acquisition module, wherein the first acquisition module is used for acquiring performance data of software to be analyzed when the software to be analyzed runs on the computing equipment, the calling module is used for calling a performance abnormality detection program of the computing equipment, the performance abnormality detection program is an executable program which compiles a performance abnormality detection rule and is loaded in a memory of the computing equipment, and the second acquisition module is used for analyzing whether the software to be analyzed has the performance abnormality according to the performance data and the performance abnormality detection rule.
Optionally, in one embodiment, the performance anomaly detection rule includes a performance index detection sub-rule corresponding to a plurality of performance problems, and the performance anomaly detection program is specifically configured to calculate a performance index value corresponding to the performance data, detect the performance index value based on the performance index detection sub-rule, and determine whether the performance problem corresponding to the performance index exists in the software to be analyzed.
Optionally, in one embodiment, the performance problem includes a resource allocation abnormality problem, and the performance abnormality detection program is specifically configured to calculate, according to the performance data, a deviation between a CPU utilization rate of each CPU core and a reference value, where the reference value is an average value of CPU utilizations of the respective CPU cores, and determine, in response to the CPU cores having the deviations greater than or equal to a first threshold, that the software to be analyzed has the resource allocation abnormality problem.
Optionally, in an embodiment, the performance problem includes a memory fragmentation problem, and the performance anomaly detection program is specifically configured to calculate, according to the performance data, a ratio between a number of target memory blocks and a total number of memory blocks, where the target memory blocks are memory blocks with a memory occupation space smaller than or equal to a second threshold, and determine that the software to be analyzed has the memory fragmentation problem in response to the ratio being greater than or equal to a third threshold.
Optionally, in an implementation manner, the device provided by the embodiment of the application further comprises a receiving module, a first compiling module and a loading module, wherein the receiving module is used for receiving the performance abnormality detection rule sent by the abnormality processing device, the performance abnormality detection rule is generated by the abnormality processing device based on historical performance data or is obtained by updating the original performance abnormality detection rule based on user feedback information, the first compiling module is used for compiling the performance abnormality detection rule, and the loading module is used for loading the compiled performance abnormality detection program into a memory of the computing device.
Optionally, in an implementation manner, the device provided by the embodiment of the application further comprises a second compiling module, an updating module and a compiling module, wherein the second compiling module is used for recompiling the updated performance abnormality detection rule, and the updating module is used for updating the performance abnormality detection program in the computing equipment into the recompiled performance abnormality detection program.
Optionally, in one implementation manner, the device provided by the embodiment of the application further comprises a generating module, a sending module and a target device, wherein the generating module is used for responding to the analysis result that the performance of the software to be analyzed is abnormal, generating performance abnormality alarming information for the software to be analyzed, the performance abnormality alarming information carries performance data of the software to be analyzed and call stack information when the software to be analyzed runs, and the sending module is used for sending the performance abnormality alarming information to the target device, wherein the target device is used for determining the reason for the performance abnormality of the software to be analyzed according to the performance data and the call stack information.
The functions of each module in the device provided by the embodiment of the application can be referred to the corresponding description in the method, and have corresponding beneficial effects, and are not repeated here.
Fig. 5 is a block diagram of an electronic device for implementing an embodiment of the application. As shown in fig. 5, the electronic device comprises a memory 501 and a processor 502, the memory 501 storing a computer program executable on the processor 502. The processor 502, when executing the computer program, implements the methods in the above-described embodiments. The number of memory 501 and processors 502 may be one or more. In a specific implementation, the electronic device may further include a communication interface 503, configured to communicate with an external device for performing data interactive transmission.
In a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are implemented independently, the memory 501, the processor 502, and the communication interface 503 may be connected to each other and perform communication with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.
The embodiment of the application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the method provided in the embodiment of the application.
The embodiment of the application provides a computer program product, comprising a computer program, which when being executed by a processor, realizes the method provided in the embodiment of the application.
The embodiment of the application also provides a chip, which comprises a processor and is used for calling the instructions stored in the memory from the memory and running the instructions stored in the memory, so that the communication equipment provided with the chip executes the method provided by the embodiment of the application.
The embodiment of the application also provides a chip which comprises an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the application embodiment.
It should be appreciated that the Processor described above may be a CPU, but may also be other general purpose processors, digital Signal Processors (DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), field programmable gate arrays (Field Programmable GATE ARRAY, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (ADVANCED RISC MACHINES, ARM) architecture.
Further alternatively, the memory may include a read-only memory and a random access memory. The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), programmable ROM (PROM), erasable Programmable ROM (EPROM), electrically Erasable EPROM (EEPROM), or flash Memory, among others. Volatile memory can include random access memory (Random Access Memory, RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, static random access memory (STATIC RAM, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNC LINK DRAM, SLDRAM), and Direct memory bus random access memory (DR RAM).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Any process or method described in flow charts or otherwise herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.
Logic and/or steps described in the flowcharts or otherwise described herein, e.g., may be considered a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is merely an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various changes or substitutions within the technical scope of the present application, and these should be covered in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.