Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The embodiment of the application provides a monitoring method, a monitoring device, computer equipment and a computer readable storage medium. The monitoring method can be applied to terminal equipment, and the terminal equipment can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and wearable equipment; the method can also be applied to a server, and the server can be an independent server, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and an artificial intelligence platform.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic flow chart of a monitoring method according to an embodiment of the present application. For example, the monitoring method in the embodiment of the present invention may be implemented by a plug-in based on Prometheus, which is not limited to this, and may also be based on other monitoring systems, which is not limited herein.
As shown in fig. 1, the monitoring method includes steps S101 to S105.
Step S101, obtaining a configuration file, wherein the configuration file is generated according to configuration parameters input by a user on a user interface, and comprises an index obtaining rule and an alarm rule;
illustratively, the preset configuration parameters are determined according to configuration operations of operation and maintenance personnel on the user interface. The configuration file is generated according to the configuration parameters set on the user interface, so that the time for compiling the configuration file and the learning cost are saved, and the monitoring efficiency is improved.
Specifically, the preset configuration parameters are used for indicating an acquisition path, an acquisition cycle, an alarm threshold, an alarm information sending mode, and the like of the index data.
For example, the configuration file may be generated according to the preset configuration parameters and a preset configuration file framework.
For example, a user may fill in a threshold value of relevant data in a user interface, set the acquisition period to 2 seconds, and select an alarm level, and then may generate the following corresponding instructions in a configuration file:
groups:
-name:default
rules:
-alert:Test_Alert
expr:PACES_EMALL_DB021679<0
for:2s
labels:
severity:warning
wherein the instructions in the configuration file are determined according to configuration parameters input by a user in the user interface.
Similarly, other parts of the configuration file, such as the index obtaining rule, may also be generated according to the configuration parameters, and are not described herein again.
Illustratively, the generated configuration file includes an index obtaining rule and an alarm rule. Specifically, the index acquisition rule includes an acquisition path, an acquisition period, and the like for acquiring the index data, where the acquisition path is used to indicate an acquisition entry of the index data, and the acquisition period is used to determine a period for acquiring the index data; the alarm rule comprises an alarm threshold value of the index data, an alarm information sending mode and the like.
As shown in fig. 2, fig. 2 is a usage scenario diagram provided in an embodiment of the present application, and the configuration parameters may be determined by configuration operations of an operation and maintenance person at a terminal device.
The configuration file is generated according to the preset configuration parameters, the operation complexity is reduced, the configuration file can be generated only by simple configuration of operation and maintenance personnel on a user interface, the time for compiling the configuration file and the learning cost are saved, and the monitoring efficiency is improved.
And S102, acquiring index data related to the monitored object based on the index acquisition rule.
In some embodiments, step S102 includes steps S1021-S1022:
and S1021, acquiring the service data of the monitored object according to the acquisition path and the acquisition cycle in the index acquisition rule.
Illustratively, the acquisition path in the index acquisition rule specifies an acquisition entry for capturing the service data, and the service data is periodically and actively captured through the acquisition entry. Specifically, the period for collecting the service data is determined based on the collection period in the index acquisition rule. Illustratively, the service data may be data traffic information, data storage information, CPU occupation information of the server, and the like.
For example, the monitoring platform may concurrently execute a multi-thread task, and acquire a plurality of service data of the monitoring object, for example, simultaneously acquire data traffic information and data storage information.
The service data of the monitored object is captured through a preset acquisition inlet, the monitored object does not need to sense the existence of a monitoring platform, the coupling degree of the monitoring platform and the monitored object is reduced, and the stability of the monitoring system is enhanced.
Step S1022, performing preset indexing processing on the service data to obtain index data related to the monitoring object.
Illustratively, the index data includes business data and an index name (metrics name), an index tag (label name).
In some embodiments, the performing a preset indexing process on the service data to obtain index data related to the monitored object includes: determining an index name and an index label of the business data according to the configuration file and the source of the business data; and adding the index name and the index label to the service data to obtain index data. It can be understood that the preset processing on the service data may also be to determine the index tag first and then determine the index name, which is not limited herein.
Illustratively, the index name and the index tag are configured in advance, and an operation and maintenance worker names the index name and the index tag for a data source for collecting business data, and determines the pre-configured index name and the index tag according to the source of the business data when the collected business data is subjected to the preset indexing processing. The index name is used for indicating the name of the index data, so that the abnormal position can be positioned simply and clearly when the alarm message is sent; the index tag is used for grouping alarm information so as to avoid simultaneously generating a large number of alarm events when the services in the cloud environment are intensively coupled, thereby avoiding instantly and suddenly generating a large number of alarm notifications and enabling operation and maintenance personnel to be incapable of quickly positioning problems.
And S103, judging whether an alarm event occurs or not according to the index data based on the alarm rule.
In some embodiments, the alarm rule is used to determine the alarm threshold, which comprises a data threshold.
In some embodiments, the determining whether an alarm event occurs according to the index data based on the alarm rule includes: judging whether the numerical value of the service data in the index data is larger than the data threshold value corresponding to the index data in the alarm rule; and if the numerical value of the service data in the index data is larger than the data threshold, judging that an alarm event occurs.
For example, taking the index data as CPU occupation information as an example, if the data threshold is set to 80%, when the value of the service data in the index data indicating the CPU occupation information is greater than 80%, it is determined that an alarm event occurs.
In some embodiments, the alert threshold further comprises a time threshold.
In some embodiments, determining whether a value of service data in the index data is greater than a data threshold corresponding to the index data in the alarm rule; if the numerical value of the service data in the index data is larger than the data threshold, judging whether the duration time that the numerical value of the service data in the index data is larger than the data threshold is larger than the time threshold corresponding to the index data in the alarm rule; and if the duration time that the numerical value of the service data in the index data is greater than the data threshold value is greater than the time threshold value corresponding to the index data in the alarm rule, judging that an alarm event occurs.
For example, taking the index data as CPU occupation information as an example, if the data threshold is set to 80% and the event threshold is set to 15 seconds, when the value of the service data in the index data indicating the CPU occupation information is greater than 80%, and the duration of the service data in the index data indicating the CPU occupation information, which is greater than 80%, is greater than 15 seconds, it is determined that an alarm event occurs.
Step S104, if the alarm event is judged to occur, the alarm information related to the alarm event is pushed to an alarm component so as to obtain an alarm notice obtained by combining the alarm information by the alarm component.
Referring to fig. 2, the alarm component can obtain alarm information related to an alarm event, and combine the alarm information to obtain an alarm notification. The alarm component may be configured on a server to which the monitoring method of the embodiment of the present application is applied, or may also be configured on other servers or terminal devices.
In some embodiments, the alert component merges the alert information after receiving the alert information. Specifically, according to the types of the alarm information indicated by the index tags, the alarm information of the same type is divided into the same group, and the alarm information of different types is divided into different groups; and generating corresponding alarm notifications according to the alarm information of each group so as to prevent a large number of alarm events from being generated simultaneously when the services in the cloud environment are intensively coupled, thereby preventing a large number of alarm notifications from being generated suddenly in a moment and ensuring that operation and maintenance personnel cannot quickly locate problems.
Illustratively, the alert component may be an alert manager.
And step S105, sending the alarm notification according to the alarm sending mode in the alarm rule.
Illustratively, the alarm notification is sent to the alarm notification receiver in the alarm rule by the alarm sending mode in the alarm rule, so that the alarm event can be processed in time.
Illustratively, the alarm information sending manner includes at least one of a WeChat, a mail, a Slack (message integration tool), and a Webhook (message interface), but is not limited thereto.
Illustratively, the alarm notification includes an index name of index data of the alarm event, so that the operation and maintenance personnel can locate the alarm event according to the index name.
Specifically, the alarm sending method determines a receiver of the alarm notification, and taking sending the alarm notification by mail as an example, the receiver may be a mailbox address of an operation and maintenance worker, so that when an alarm event occurs, the alarm component sends the alarm notification to the mailbox address of the operation and maintenance worker.
In some embodiments, after sending the alert notification, if no feedback for the receiving side to process the alert event is received, the alert notification may be sent again according to a preset sending interval, or the alert notification may be muted. For example, the importance degree of the alarm notification may be determined according to the group and the number to which the alarm notification belongs, and the alarm notification may be determined to be sent again or muted according to the importance degree of the alarm notification.
In some embodiments, the monitoring method further comprises:
generating a data base line by performing preset operation on the index data;
determining an index data predicted value according to the data base line;
and determining the data threshold value and the time threshold value according to the index data predicted value.
Illustratively, the data baseline can be obtained by performing a preset operation on the index data. Specifically, the data base line is a smooth curve of the index data changing with time obtained by performing preset operation on the index data, and a base line predicted value can be determined through the data base line to predict the index data. And the preset operation on the index data comprises calculating at least one of standard deviation, average value, maximum value and minimum value of the index data.
For example, the index data at a future time, i.e., the predicted baseline value, can be determined from the points on the data baseline. Taking the CPU occupation information as an example, the baseline predicted value of the CPU occupation information at least one moment in a future period of time can be determined according to the data baseline generated by performing the customized operation on the CPU occupation information.
The index data can be predicted through the baseline predicted value, so that the alarm threshold value of the index data can be dynamically adjusted according to the baseline predicted value. Taking a data threshold as an example, at least one value greater than the baseline predicted value may be determined at a time as the data threshold; the time threshold of the alarm threshold may also be determined based on a similar principle, which is not described herein again.
In some embodiments, the monitoring method further comprises:
responding to the modification operation of the configuration parameters to obtain modified configuration parameters;
performing increment judgment on the modified configuration parameters;
and if the result of the increment judgment indicates that the configuration file needs to be updated, updating the configuration file according to the modified configuration parameters.
Illustratively, in response to an operation of modifying configuration parameters at the user interface, the monitoring platform may perform an incremental determination on the configuration parameters, determine changed configuration parameters, and determine whether the configuration file needs to be updated based on the changed configuration parameters.
And if the increment judgment result is that the configuration file needs to be updated, updating the configuration file according to the changed configuration parameters. For example, if the acquisition path in the configuration parameters changes, the part of the configuration file for determining the acquisition path is updated, so that the operation of modifying the configuration parameters on the user interface can be reflected in the configuration file in real time, the time and the learning cost for compiling the configuration file by operation and maintenance personnel are reduced, and the user experience is improved.
In some embodiments, the monitoring method further comprises:
and sending the index data to a man-machine interaction subsystem for display.
Illustratively, the human-computer interaction subsystem can acquire the index data acquired by the monitoring platform and display the index data according to a preset mode so as to enhance the readability of the index data and enable the monitoring process to be more intuitive.
In some embodiments, the human-computer interaction subsystem is a Grafana visualization panel. The Grafana visualization panel can acquire and display the index data, such as data flow information, data storage information, CPU occupation information of a server and the like, so that the readability of the data is enhanced, and the operation and maintenance are more visual.
According to the monitoring method provided by the embodiment of the application, the configuration file is generated according to the preset configuration parameters, the index data is obtained according to the generated configuration file, and whether the alarm event occurs or not is judged. The operation and maintenance personnel can generate the configuration file only by simply configuring the configuration file on the user interface, so that the time for compiling the configuration file and the learning cost are saved, and the monitoring efficiency is improved.
Referring to fig. 3 in conjunction with the foregoing embodiment, fig. 3 is a schematic diagram of a monitoring device according to an embodiment of the present application, where the monitoring device may be configured in a server or a terminal for executing the foregoing monitoring method.
As shown in fig. 3, the monitoring apparatus includes: a configuration file obtaining module 110, an indexdata obtaining module 120, an alarmevent judging module 130, an alarm information pushing module 140, and an alarm notification sending module 150.
A configuration file obtaining module 110, configured to obtain a configuration file, where the configuration file is generated according to configuration parameters input by a user on a user interface, and the configuration file includes an index obtaining rule and an alarm rule;
an indexdata obtaining module 120, configured to obtain, based on the index obtaining rule, index data related to the monitored object;
an alarmevent determining module 130, configured to determine whether an alarm event occurs according to the index data based on the alarm rule;
the alarm information pushing module 140 is configured to, if it is determined that an alarm event occurs, push alarm information related to the alarm event to an alarm component to obtain an alarm notification that the alarm component merges the alarm information;
and an alarm notification sending module 150, configured to send the alarm notification according to an alarm sending mode in the alarm rule.
Illustratively, the indexdata obtaining module 120 includes a service data obtaining sub-module and an index data generating sub-module.
And the service data acquisition submodule is used for acquiring the service data of the monitored object according to the acquisition path and the acquisition cycle in the index acquisition rule.
And the index data generation submodule is used for carrying out preset indexing processing on the service data to obtain index data related to the monitored object.
Illustratively, the index data generation submodule includes an index name determination submodule, an index tag determination submodule, and an index name tag addition submodule.
The index name determining submodule is used for determining an index name and an index label of the business data according to the configuration file and the source of the business data;
and the index name tag adding submodule is used for adding the index name and the index tag to the service data to obtain index data.
Illustratively, the alarmevent determination module 130 includes a first data threshold determination sub-module and a first alarm event determination sub-module.
And the first data threshold judgment submodule is used for judging whether the numerical value of the service data in the index data is greater than the data threshold corresponding to the index data in the alarm rule.
And the first alarm event judgment submodule is used for judging that an alarm event occurs if the numerical value of the service data in the index data is greater than the data threshold value.
Illustratively, the alarmevent determination module 130 includes a second data threshold determination sub-module, a time threshold determination sub-module, and a second alarm event determination sub-module.
A second data threshold judgment submodule, configured to judge whether a numerical value of service data in the index data is greater than a data threshold corresponding to the index data in the alarm rule;
a time threshold judgment submodule, configured to, if the value of the service data in the index data is greater than the data threshold, judge whether a duration that the value of the service data in the index data is greater than the data threshold is greater than a time threshold corresponding to the index data in the alarm rule;
and the second alarm event judgment submodule is used for judging that an alarm event occurs if the duration time that the numerical value of the service data in the index data is greater than the data threshold is greater than the time threshold corresponding to the index data in the alarm rule.
Illustratively, the monitoring device further comprises a data baseline generation module, an index data prediction module and an alarm threshold determination module.
A data baseline generation module, configured to generate a data baseline by performing a preset operation on the index data, where the preset operation includes at least one of: standard deviation operation, average value operation, maximum value operation and minimum value operation;
and the index data prediction module is used for determining the index data prediction value according to the data base line.
And the alarm threshold value determining module is used for determining the data threshold value and the time threshold value according to the index data predicted value.
Illustratively, the monitoring device further comprises a configuration parameter modification module, an increment judgment module and a configuration file updating module.
And the configuration parameter modification module is used for responding to the modification operation of the configuration parameters to obtain the modified configuration parameters.
And the increment judgment module is used for carrying out increment judgment on the modified configuration parameters.
And the configuration file updating module is used for updating the configuration file according to the modified configuration parameters if the result of the increment judgment indicates that the configuration file needs to be updated.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus, the modules and the units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The methods, apparatus, and devices of the present application may be deployed in numerous general-purpose or special-purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The above-described methods and apparatuses may be implemented, for example, in the form of a computer program that can be run on a computer device as shown in fig. 4.
Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.
As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.
The storage medium may store an operating system and a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any of the monitoring methods.
The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.
The internal memory provides an environment for the execution of a computer program on a storage medium, which when executed by a processor causes the processor to perform any of the monitoring methods.
The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:
acquiring a configuration file, wherein the configuration file is generated according to configuration parameters input by a user on a user interface and comprises an index acquisition rule and an alarm rule;
acquiring index data related to the monitored object based on the index acquisition rule;
judging whether an alarm event occurs according to the index data based on the alarm rule;
if an alarm event is judged to occur, pushing alarm information related to the alarm event to an alarm component so as to obtain an alarm notice obtained by combining the alarm information by the alarm component;
and sending the alarm notification according to an alarm sending mode in the alarm rule.
In some embodiments, the processor, when implementing acquiring the index data related to the monitored object based on the index acquisition rule, is configured to implement:
acquiring the service data of the monitored object according to the acquisition path and the acquisition cycle in the index acquisition rule;
and performing preset indexing processing on the service data to obtain index data related to the monitored object.
In some embodiments, when the processor implements the preset indexing processing on the service data to obtain the index data related to the monitored object, the processor is configured to implement:
determining an index name and an index label of the business data according to the configuration file and the source of the business data;
and adding the index name and the index label to the service data to obtain index data.
In some embodiments, the processor, when implementing the determination of whether an alarm event occurs according to the index data based on the alarm rule, is configured to implement:
judging whether the numerical value of the service data in the index data is larger than the data threshold value corresponding to the index data in the alarm rule;
and if the numerical value of the service data in the index data is larger than the data threshold, judging that an alarm event occurs.
In some embodiments, the processor, when implementing the determination of whether an alarm event occurs according to the index data based on the alarm rule, is configured to implement:
judging whether the numerical value of the service data in the index data is larger than the data threshold value corresponding to the index data in the alarm rule;
if the numerical value of the service data in the index data is larger than the data threshold, judging whether the duration time that the numerical value of the service data in the index data is larger than the data threshold is larger than the time threshold corresponding to the index data in the alarm rule;
and if the duration time that the numerical value of the service data in the index data is greater than the data threshold value is greater than the time threshold value corresponding to the index data in the alarm rule, judging that an alarm event occurs.
In some embodiments, the processor, when implementing the monitoring method, is configured to implement:
generating a data baseline by performing a preset operation on the index data, wherein the preset operation comprises at least one of the following operations: standard deviation operation, average value operation, maximum value operation and minimum value operation;
determining an index data predicted value according to the data base line;
and determining the data threshold value and the time threshold value according to the index data predicted value.
In some embodiments, the processor, when implementing the monitoring method, is configured to implement:
responding to the modification operation of the configuration parameters to obtain modified configuration parameters;
performing increment judgment on the modified configuration parameters;
and if the result of the increment judgment indicates that the configuration file needs to be updated, updating the configuration file according to the modified configuration parameters.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working process of the monitoring method described above may refer to the corresponding process in the foregoing monitoring method embodiment, and is not described herein again.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and a method implemented when the program instructions are executed may refer to various embodiments of the monitoring method of the present application.
The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.