Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for monitoring an operation condition, which can predict a predicted operation condition of a monitored system, and combine an index association relationship to perform intelligent early warning in advance in a complex scene.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of monitoring an operation condition.
The method for monitoring the operation condition of the embodiment of the invention comprises the following steps:
acquiring index data of a monitored system;
acquiring an index association relation corresponding to the index data;
predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation;
and sending out early warning based on the predicted operation condition and an early warning threshold value.
Optionally, the acquiring of the index data of the monitored system includes:
collecting system operation data of a monitored system and server operation data of each server;
index data corresponding to the reference indices are extracted from the system operation data and the server operation data.
Optionally, obtaining an index association relationship corresponding to the index data includes:
acquiring a historical early warning result of a monitored system;
and analyzing the historical early warning result by using a correlation degree analysis method to obtain an index correlation relation among all the reference indexes.
Optionally, predicting the predicted operation condition of the monitored system according to the index data and the index association relationship includes:
and calculating the index data and the index incidence relation based on a logistic regression model to obtain the predicted operation condition of the monitored system.
Optionally, the early warning threshold includes dynamic thresholds of the reference indicators corresponding to time periods; and
issuing an early warning based on the predicted operating condition and an early warning threshold, comprising:
obtaining early warning results of the reference indexes based on the predicted operation condition and the dynamic threshold;
sending out an early warning corresponding to the reference index according to the early warning result; or sending out early warning corresponding to the reference index according to the index data and the dynamic threshold;
and recording the early warning result.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an apparatus for monitoring an operation condition.
The device for monitoring the operation condition of the embodiment of the invention comprises:
the acquisition module is used for acquiring index data of the monitored system;
the acquisition module is used for acquiring the index association relation corresponding to the index data;
the prediction module is used for predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation;
and the early warning module is used for sending out early warning based on the predicted operation condition and the early warning threshold value.
Optionally, the acquisition module is further configured to:
collecting system operation data of a monitored system and server operation data of each server;
index data corresponding to the reference indices are extracted from the system operation data and the server operation data.
Optionally, the obtaining module is further configured to:
acquiring a historical early warning result of a monitored system;
and analyzing the historical early warning result by using a correlation degree analysis method to obtain an index correlation relation among all the reference indexes.
Optionally, the prediction module is further configured to:
and calculating the index data and the index incidence relation based on a logistic regression model to obtain the predicted operation condition of the monitored system.
Optionally, the early warning threshold includes dynamic thresholds of the reference indicators corresponding to time periods; and
the early warning module is further configured to:
obtaining early warning results of the reference indexes based on the predicted operation condition and the dynamic threshold;
sending out an early warning corresponding to the reference index according to the early warning result; or sending out early warning corresponding to the reference index according to the index data and the dynamic threshold;
and recording the early warning result.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for monitoring an operation condition.
An electronic device for monitoring an operating condition according to an embodiment of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for monitoring the running condition of the embodiment of the invention.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which when executed by a processor implements a method of monitoring an operating condition of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: because the method adopts the method of collecting the index data of the monitored system; acquiring an index association relation corresponding to index data; predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation; the technical problems that correlation relation is lacked among early warnings, the early warning rule is single, intelligent early warning cannot be achieved in a complex scene, and early warning cannot be achieved in advance are solved, the predicted operation condition of a monitored system can be predicted, and the technical effect of intelligent early warning in advance in the complex scene by combining index correlation is achieved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
The existing detection mode is to obtain the operation condition of each server in real time, and know the operation condition of the monitored system according to the operation condition of each server, so as to analyze whether each reference index of the monitored system needs to be pre-warned, but the mode has the following disadvantages:
1. the early warnings lack correlation relationships, and in actual operation, the early warnings look independent, some correlation relationships exist among the early warnings, and even cause-effect relationships exist among some of the early warnings, for example, the more the interface calling times are, the more the memory is consumed, but the correlation relationships among the early warnings are not established, and the capability of holding up one thing against three things is also lacked;
2. the method is not intelligent enough, and can not predict in advance, for example, the alarm is given due to insufficient memory, the alarm threshold value is a general value estimated by a system person in charge according to personal experience, then the general value is set in a monitoring system, when the real-time memory of the system reaches the early warning, the early warning can be given, often the future Zhuge is late;
3. the early warning rule is single, intelligent early warning cannot be achieved in complex scenes, for example, early warning of interface calling times is achieved, calling times of the same interface in different scenes are greatly different, the maximum calling amount and the minimum calling amount of the product information interface in a promotion period and a non-promotion period are greatly different in the two periods, the maximum calling number in the promotion period may be 1000 times per second, the minimum calling number may be 200 times per second, the maximum calling number in the non-promotion period may be 100 times per second, and the minimum calling number may be 20 times per second.
In the method for monitoring the operation condition, provided by the embodiment of the invention, the current operation condition data of the monitored system and the server are analyzed in real time by using the logistic regression algorithm, and the correlation degree analysis is performed on all the historical analysis results again for analyzing the relation among all the reference indexes, so that the accuracy of alarming and prejudging is greatly enhanced.
FIG. 1 is a schematic diagram of the main steps of a method of monitoring operating conditions according to an embodiment of the present invention.
As shown in fig. 1, the method for monitoring the operation condition of the embodiment of the present invention mainly includes the following steps:
step S101: index data of the monitored system is collected.
In order to predict the operation condition of the monitored system for a period of time in the future, index data is collected when the monitored system operates, and the index data is used for reflecting the current operation condition of the monitored system.
In the embodiment of the present invention, step S101 may be implemented in the following manner: collecting system operation data of a monitored system and server operation data of each server; index data corresponding to the reference indexes are extracted from the system operation data and the server operation data.
The monitored system refers to various systems, application software and the like deployed and operated on a server, and generally, the monitored system is operated on a plurality of servers. A server refers to a dedicated server running various types of systems and applications. The system operation data is data generated by the monitored system during operation, and represents the overall operation condition of the monitored system, such as the total memory amount of the JVM (Java virtual machine), the current usage amount of the JVM, the current thread number, or the maximum call volume of the interface within the last N minutes, and the like. The server operation data is data generated by the server in the monitored system during operation, and represents the operation condition of the server, such as the total memory size, the current memory size, the total network connection number or the current network transmission rate, and the like. The reference index is a reference standard for evaluating the operation condition, such as the current memory consumption value, the number of calls or the number of threads of the JVM.
Step S102: and acquiring an index association relation corresponding to the index data.
In actual operation, each of the seemingly independent reference indexes has some association relations therebetween, or even some of the reference indexes have causal relations therebetween, for example, the more the number of interface calls, the more the memory consumption, and after the index data is collected, which reference indexes are included therein may be analyzed, and the association relations between the reference indexes (i.e., the index association relations) may be obtained.
The association relationship between the reference indexes can be obtained and stored in advance by analyzing the historical data (i.e. the historical early warning results) so as to be used in the subsequent analysis. In the embodiment of the present invention, step S102 may be implemented in the following manner: acquiring a historical early warning result of a monitored system; and analyzing the historical early warning result by using a correlation degree analysis method to obtain an index correlation relation among all reference indexes.
The relevance analysis method is a simple and practical analysis technology, and is used for finding relevance or correlation existing in a large number of data sets, so that the rules and patterns of simultaneous appearance of certain attributes in one thing are described. For example, the probability of occurrence of the B event after the a event occurs is calculated, and if the probability is very high, it can be said that the event B is related to the event a, and when the a event occurs again, the event B is likely to occur.
Step S103: and predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation.
After the index data and the index association relation are obtained through the steps, the next operation condition of the monitored system can be deduced, namely how the data corresponding to each reference index of the monitored system changes is predicted.
In the embodiment of the present invention, step S103 may be implemented in the following manner: and calculating the index data and the index incidence relation by using a logistic regression model to obtain the predicted operation condition of the monitored system.
The logistic regression model is a simple and common binary classification model, and the class of an object is obtained by inputting an attribute feature sequence of the unknown class object. The method for monitoring the operation condition of the embodiment of the invention introduces a machine learning mode, and analyzes and predicts the server operation data of the current server and the system operation data of the monitored system operated on the server by using a logistic regression model so as to pre-judge the possible problems in advance and give the probability of the problems, and if the probability of the problems is higher, the monitoring system gives an alarm. The machine learning-based method does not depend on the experience of system developers and the configured threshold value, so that the method can be applied to complex scenes. In addition, the use of the logistic regression model can refer to the existing technical solution, which is not described herein.
Step S104: and sending out early warning based on the predicted operation condition and the early warning threshold value.
The early warning threshold value can be determined according to actual needs or historical data and the like, and when the predicted operation condition reaches the early warning threshold value, early warning can be sent out to remind related personnel, for example: network exception alerts, server disk starvation alerts, memory starvation alerts, system thread count exception alerts, server performance alerts, interface call volume above upper threshold alerts, interface call volume below lower threshold alerts, and the like.
In the embodiment of the present invention, step S104 may be implemented in the following manner: obtaining early warning results of all reference indexes based on the predicted operation condition and the dynamic threshold value; sending out an early warning corresponding to the reference index according to an early warning result; or, sending out an early warning corresponding to the reference index according to the index data and the dynamic threshold; and recording an early warning result.
The method for monitoring the operation condition of the embodiment of the invention adopts a mode of combining the logistic regression model and the association degree analysis to analyze, namely, the real-time alarm of the monitored system is realized, the risk which possibly occurs in the future of the monitored system is also pre-judged, if the possibility that the monitored system or the server where the monitored system is located has problems is predicted, the person in charge of the monitored system or the server can be actively contacted in the early warning modes of mails, short messages, even telephones and the like, and the possible risk is informed.
Because the operation condition of the monitored system may change due to some external factors, for example, the interface call volume at some specific time is extremely high or the memory usage is more, or the performance is degraded due to the aging of the hardware of the monitored system, the early warning threshold may include a dynamic threshold corresponding to each time period for each reference index, that is, a dynamic threshold matching the usage requirement may be set for different times and different reference indexes. In addition, the early warning result can be recorded so as to facilitate subsequent analysis.
According to the method for monitoring the operation condition, the index data of the monitored system is acquired; acquiring an index association relation corresponding to index data; predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation; the technical problems that correlation relation is lacked among early warnings, the early warning rule is single, intelligent early warning cannot be achieved in a complex scene, and early warning cannot be achieved in advance are solved, the predicted operation condition of a monitored system can be predicted, and the technical effect of intelligent early warning in advance in the complex scene by combining index correlation is achieved.
Fig. 2 is a schematic view of a main flow of a method of monitoring an operating condition according to a referential embodiment of the present invention.
As shown in fig. 2, the method for monitoring the operation condition according to the embodiment of the present invention is implemented by referring to the following processes:
1. starting an intelligent monitoring system;
2.1. collecting various operation data (namely system operation data) of a specified application system (namely a monitored system) in real time;
2.2. collecting various operation data (namely server operation data) of a specified server in real time;
3. some other data which can affect the analysis result, wherein the most important time parameter is used for distinguishing different threshold ranges required by various types of early warning in different time periods, and other types of early warning result information in the latest time range can be provided;
4. extracting index data from the data collected in 2.1, 2.2 and 3, using the index data as a parameter of a logistic regression model (LR model), and calculating and analyzing by using the LR model;
wherein, the formula of the logical regression of the LR model is as follows:

wherein, P is a decision value, x is a characteristic value, and e is a natural logarithm. Taking the JVM memory early warning as an example, when x represents parameters such as a total memory value of a server, a current consumed memory value of the JVM, a current thread number, a method call number, response time, and the like, the more the call times, the more the thread number, the response time is reduced, the frequency of cleaning heap space (GC) is increased, and finally, the risk that the JVM memory may be exhausted is analyzed. A JVM memory: the application system divides a memory area from the physical memory of the server, and the area is only used by the current application system; number of method calls: the application system is provided with a plurality of interfaces, namely methods, the interfaces are used for completing some specific logic calculation tasks, such as a commodity detail method, and are mainly used for acquiring the trust information of commodities, the method is provided for all users, and when more and more users open a commodity detail page, the calling amount of the commodity detail method is larger and larger; response time: a method completes a particular logical computing task by the amount of time it takes to complete the entire process from the start to the computation. The unit can be a time unit such as millisecond or second;
5. results analyzed based on the LR model: whether each reference index needs to be pre-warned or not;
6. writing the analyzed result into a hive database;
7. the monitoring system extracts various types of analysis results (namely historical early warning results) of the last N days from the hive;
8. analyzing the incidence relation among different types of early warnings by using an incidence analysis method, for example, the early warning of calling times indicates that the early warning of response time possibly occurs to a certain extent, and even the early warning of the possible exhaustion of the memory of the JVM is accompanied;
9. taking the correlation degree analysis result (namely index correlation) obtained in the last step as a reference of an LR model as one of the influence factors of the current early warning;
10. if the early warning is needed, the alarm information is sent out through various notification modes, such as mail alarm, short message alarm, enterprise-level office chat tool push alarm information and the like
Fig. 3 is a schematic diagram of the main modules of an apparatus for monitoring operating conditions according to an embodiment of the present invention.
As shown in fig. 3, theapparatus 300 for monitoring an operating condition according to an embodiment of the present invention includes: anacquisition module 301, anacquisition module 302, aprediction module 303, and anearly warning module 304.
Wherein,
theacquisition module 301 is configured to acquire index data of a monitored system;
an obtainingmodule 302, configured to obtain an index association relation corresponding to the index data;
theprediction module 303 is configured to predict a predicted operation condition of the monitored system according to the index data and the index association relationship;
and theearly warning module 304 is used for sending out early warning based on the predicted operation condition and an early warning threshold value.
In this embodiment of the present invention, theacquisition module 301 is further configured to:
collecting system operation data of a monitored system and server operation data of each server;
index data corresponding to the reference indices are extracted from the system operation data and the server operation data.
In this embodiment of the present invention, the obtainingmodule 302 is further configured to:
acquiring a historical early warning result of a monitored system;
and analyzing the historical early warning result by using a correlation degree analysis method to obtain an index correlation relation among all the reference indexes.
In an embodiment of the present invention, theprediction module 303 is further configured to:
and calculating the index data and the index incidence relation based on a logistic regression model to obtain the predicted operation condition of the monitored system.
In addition, the early warning threshold value comprises a dynamic threshold value corresponding to each time period of each reference index.
In an embodiment of the present invention, theearly warning module 304 is further configured to:
obtaining early warning results of the reference indexes based on the predicted operation condition and the dynamic threshold;
sending out an early warning corresponding to the reference index according to the early warning result; or sending out early warning corresponding to the reference index according to the index data and the dynamic threshold;
and recording the early warning result.
According to the device for monitoring the running condition, the index data of the monitored system is collected; acquiring an index association relation corresponding to index data; predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation; the technical problems that correlation relation is lacked among early warnings, the early warning rule is single, intelligent early warning cannot be achieved in a complex scene, and early warning cannot be achieved in advance are solved, the predicted operation condition of a monitored system can be predicted, and the technical effect of intelligent early warning in advance in the complex scene by combining index correlation is achieved.
Fig. 4 illustrates anexemplary system architecture 400 of a method of monitoring operating conditions or an apparatus for monitoring operating conditions to which embodiments of the present invention may be applied.
As shown in fig. 4, thesystem architecture 400 may includeterminal devices 401, 402, 403, anetwork 404, and aserver 405. Thenetwork 404 serves as a medium for providing communication links between theterminal devices 401, 402, 403 and theserver 405.Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may useterminal devices 401, 402, 403 to interact with aserver 405 over anetwork 404 to receive or send messages or the like. Theterminal devices 401, 402, 403 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
Theterminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
Theserver 405 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using theterminal devices 401, 402, and 403. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the method for monitoring the operation condition provided by the embodiment of the present invention is generally executed by theserver 405, and accordingly, the apparatus for monitoring the operation condition is generally disposed in theserver 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, shown is a block diagram of acomputer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, thecomputer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from astorage section 508 into a Random Access Memory (RAM) 503. In theRAM 503, various programs and data necessary for the operation of thesystem 500 are also stored. TheCPU 501,ROM 502, andRAM 503 are connected to each other via abus 504. An input/output (I/O)interface 505 is also connected tobus 504.
The following components are connected to the I/O interface 505: aninput portion 506 including a keyboard, a mouse, and the like; anoutput portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage portion 508 including a hard disk and the like; and acommunication section 509 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 509 performs communication processing via a network such as the internet. Thedriver 510 is also connected to the I/O interface 505 as necessary. Aremovable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 510 as necessary, so that a computer program read out therefrom is mounted into thestorage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 509, and/or installed from theremovable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises an acquisition module, a prediction module and an early warning module. The names of the modules do not limit the module itself in some cases, and for example, the collection module may be further described as a "module for collecting index data of a monitored system".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: step S101: acquiring index data of a monitored system; step S102: acquiring an index association relation corresponding to index data; step S103: predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation; step S104: and sending out early warning based on the predicted operation condition and the early warning threshold value.
According to the technical scheme of the embodiment of the invention, the acquisition of the index data of the monitored system is adopted; acquiring an index association relation corresponding to index data; predicting the predicted operation condition of the monitored system according to the index data and the index incidence relation; the technical problems that correlation relation is lacked among early warnings, the early warning rule is single, intelligent early warning cannot be achieved in a complex scene, and early warning cannot be achieved in advance are solved, the predicted operation condition of a monitored system can be predicted, and the technical effect of intelligent early warning in advance in the complex scene by combining index correlation is achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.