Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only schematic and illustrate the basic idea of the present application, and although the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, the type, quantity and proportion of the components in actual implementation may be changed at will, and the layout of the components may be more complex.
Throughout the specification, when a part is referred to as being "connected" to another part, this includes not only a case of being "directly connected" but also a case of being "indirectly connected" with another element interposed therebetween. In addition, when a certain part is referred to as "including" a certain component, unless otherwise stated, other components are not excluded, but it means that other components may be included.
The terms first, second, third, etc. are used herein to describe various elements, components, regions, layers and/or sections, but are not limited thereto. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the scope of the present application.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
The aforementioned prometheus is an open source of monitoring software developed by CNCF. The process of prometheus mainly comprises the following steps: and pulling the data of each node at regular time according to the configuration, wherein the default pulling mode is pull, and the data of each monitoring node can be acquired by using a push mode provided by push gateway. The acquired data is stored in a TSDB (a time series database). At this time, prometheus has already acquired monitoring data, and a built-in PromQL can be used for querying. Its alarm function is provided using an alarm manager (Alertmanager), which is a component of promemeus' alarm management and sending alarms. The native icon function of the prometheus is too simple, so that the prometheus data can be accessed to the grafana and uniformly managed by the grafana.
The alarm management method aims at the problem that an alarm manager of a Prometheus server is insufficient in alarm configuration. The method, the system and the storage medium mainly use a Prometous server as a framework, and provide a visual unified alarm method, a system and a storage medium based on a service tree so as to provide alarm state pages of different alarm notification levels, diversified alarm channel configurations, friendly visual service tree/service dependence and other levels, an automatic alarm processing function and a historical alarm notification storage medium.
Fig. 1 is a schematic flow chart of a visualization unified alarm method based on a service tree in an embodiment of the present application. As shown, the method comprises:
step S101: receiving one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form.
Webhooks is an api concept, is one of the usage paradigms of microservice api, and is also called reverse api, namely: the front end does not actively send the request and is completely pushed by the back end. Taking a common example, for example, your friend sends a friend circle, and the backend pushes the message to the clients of all other friends, which is a typical scenario of Webhooks. Briefly, Webhook is a URL that receives HTTP POST (or GET, PUT, DELETE). An API provider that implements WebHook sends a message to the configured URL when an event occurs. Using WebHooks, you can accept changes in real time, unlike request-response. This is again an inverse of the client-server model, where in the traditional approach, the client requests data from the server, and then the server provides the client with the data (the client is pulling the data). In the Webhook paradigm, a server updates a resource that needs to be provisioned and then automatically sends it as an update to a client (server is pushing data), which is not a requestor but a passive recipient. This reversal of the control relationship may be used to facilitate many communication requests that would otherwise require more complex requests and constant polling on the remote server. By simply receiving the resource rather than sending the request directly, the remote code library can be updated, easily allocating the resource, and even integrating it into an existing system to update the endpoint and related data as needed by the API. And for scenes such as authentication, login and the like of a third-party platform without a front-end interface for transfer, or payment scenes with strong safety requirements and the like, the method is suitable for actively pushing data by using Webhooks.
The alarm manager (alert manager) is an independent alarm module, receives alarms sent by clients such as promemeus (open source monitoring software), and then processes the alarms through grouping, deleting repetition and the like, and sends the alarms to a correct receiver through a route; the alarm mode can be sent to different module responsible persons according to different rules.
The alarm channel includes: enterprise WeChat, nailing, mailing, and SMS. For example, Alertmanager supports alarm modes such as Email and Slack, and can access domestic IM tools such as nails through webhook.
Briefly, one or more alarm notification messages from an alarm manager (alert manager) may be received via a variety of alarm channels (e.g., enterprise WeChat, nail, mail, and SMS, etc.) by pre-registering a variety of alarm channels, each in a Webhook format, to the alarm manager.
Preferably, the entities of the alarm notification message include: any one or more of a source, environment, type, node, and label.
Therefore, the alert manager sends the alarm message to the visual unified alarm system based on the business service tree in a webhook mode, so that Topic can be set independently for different services and different alarm levels, and more accurate notification reach and focusing can be realized.
Step S102: and standardizing the alarm notification message, analyzing the content of the alarm notification message to obtain a service tree in third-party service, and adding the service tree to the alarm notification message after standardized processing.
Preferably, the alarm notification message is standardized, that is, valid information is extracted, and irrelevant data is discarded. Because, typically, alarm messages are more hierarchical and redundant, if all of the alarm notification messages are received, not only is the processing speed of the message consumer reduced, but network resources are also consumed.
Specifically, the content of the alarm notification message is analyzed to obtain a service tree in a third-party service, and the specific operations are as follows: and according to the environment, the type and the node information in the entity of the alarm notification message, the third-party service acquires the data of the service tree corresponding to the alarm and adds the data into the alarm message after standardized processing.
Preferably, the service tree is a tree-shaped structure abstraction for information of each service Department, each service Project, service Cluster and service Role, and the abstraction is helpful for the alarm message to be quickly and accurately positioned to a development responsible person of the micro-service.
Step S103: and pushing the alarm notification message after standardized processing to one or more topics of a distributed publish-subscribe message system so as to be consumed by different message consumers.
For example, the standardized alert notification messages are pushed to one or more topics (Topic) of a Kafka (a high-throughput distributed publish-subscribe messaging system) for subsequent consumption processing by different message consumers (consumers).
So far, compared with the prior art, the steps S101 to S103 of the application are unified processing and issuing by means of the data of the service tree on the basis of receiving the alarm notification message from the alarm management. And then, flexible alarm configuration is realized through each hierarchy or each node of the business service tree, and based on the flexible alarm configuration, the dynamic index value acquisition and the dynamic index label acquisition are supported. Reference may be made to the flow diagram shown in fig. 2.
Step S104: starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
For example, the Consumer that simultaneously starts a plurality of different functional modules consumes the alarm notification message in the Topic of Kafka, and performs customized processing or storage according to the requirements of the alarm system of the present application. And/or, initiating an additional Consumer subscription to the alert message in Topic of Kafka, receiving and storing a copy of all historical alert messages in an ElasticSearch (a distributed multi-user-capable full text search engine) to provide intelligent analysis capability for alert notification messages.
In an embodiment of the present application, the method further includes:
A. and according to the dynamic automatic generation of each hierarchy of the business service tree, generating an alarm state page corresponding to each hierarchy of the business service tree, and displaying the alarm state page visually.
For example, the method of the present step may correspond to a state management module of the present system, and is used to friendly and visually display the alarm state pages of each level of the service tree, and the alarm state pages are dynamically and automatically generated according to the level of the service tree. Preferably, when the state of receiving the alarm notification message is triggering (ringing), the state on the alarm state page turns red, and the value increases as the number of the received alarm notification messages increases; when the status of receiving the alarm notification message is completed or recovered (resolved), the value decreases as the number of received alarm notification messages increases; when the value is zero, the state on the alarm state page turns green.
B. And providing alarm levels, alarm channel configuration and updating functions on an alarm state page of the visual business service tree so that each development, operation and maintenance personnel of the micro-service can configure and subscribe the required alarm message by themselves.
For example, the method of the present step may correspond to a configuration update module of the present system, which is configured to provide an alarm level and an alarm channel configuration and update function on a visual interface, and a development operation and maintenance staff of the micro service may configure and subscribe the alarm notification message that the development operation and maintenance staff want to receive by himself.
In an embodiment of the present application, the method further includes: and calling a corresponding fault processing service interface according to the type of each standardized alarm notification message so as to provide an alarm fault self-healing processing function.
For example, the method of this embodiment may correspond to an alarm processing module of the system, and is configured to call a corresponding fault handling service interface according to a standardized alarm notification message type to provide an alarm fault self-healing processing function.
In an embodiment of the present application, the method further includes:
A. pushing the alarm notification message to a registered and configured alarm channel according to a certain format style; the alarm channel includes: any one of enterprise WeChat, nail, mail and SMS;
B. the alarm channels of the same type are distinguished according to different groups or groups created by the service tree; wherein the cluster may correspond to any node on the service tree to provide notification control of alarm notification messages of different granularities.
For example, the method of this embodiment may correspond to a channel pushing module of the present system, which is configured to push the alarm notification message to various alarm channels of the registration configuration according to a certain format style, where the types of the alarm channels include, but are not limited to: enterprise wechat, stapling, email, SMS, etc., the same type of alarm channel also distinguishes different groups or groups created according to a particular business service tree, which may correspond to any node on the particular service tree to provide alarm message notification control of different granularity.
Briefly, the method and the system provide alarm upgrading and change of a notifier, and are beneficial to monitoring popularization of alarms in enterprises; the unified processing of the alarm topic, the alarm content, the alarm channel and the alarm subscription is realized, and the visual unified alarm system based on the business service tree can be fully utilized.
It should be noted that, as the number of services increases, the dependency relationship between the micro services becomes more and more complex, and the alarm of one micro service cluster may be initiated by a single micro service alarm and affect the upstream and downstream micro services, so the service dependency relationship, the alarm timing sequence, and the like have great significance for effectively managing the micro services.
To this end, the present application also proposes: based on a cloud native K8s service registration center (a public container K8s cluster) and a service topology tool Kiali, the method obtains the calling relation of each micro service in real time in an alarm processing module, receives the alarm adjusting message of each micro service according to the time sequence, and quickly positions an accident source and influences the service by combining a topological graph.
For example, based on a cloud native K8s service registry and a service topology tool Kiali, the calling relationship of each micro service is acquired in real time, the alarm message of each micro service is received according to the time sequence, and the accident source and the service influence are quickly positioned by combining a topology map.
For example, in different K8s clusters, a distributed scheduling platform XXLJOB pushes data to pushgateway, and then the pushgateway is provided to a proxy server, and then the proxy server is provided to an alarm manager alert manager and an open-source index amount monitoring and visualization tool Grafana, and finally summarized into the system, so as to obtain the invocation relationship of each micro-service in real time, receive the alarm adjustment message of each micro-service according to the time sequence, and quickly locate the accident source and influence the service by combining with a topological graph.
Wherein, the Kiali is a system with a front end and a back end separated, and when the mirror image is constructed, the front end and the back end are placed in the same mirror image. Kiali relies on two external services, Prometheus, which is a monitoring and alarm system. Kiali will look up data from Prometheus, produce a topological map or some other statistical map. The other service is a Cluster API, and Kiali acquires data such as service, deployment and the like, and also acquires yaml configuration of virtual service and destinationrule for configuration detection. Also, Kiali can configure two alternative services, Jaeger and Grafana. Jaeger is a distributed tracking system developed by Uber and Grafana is a data visualization system. These functions of Kiali are all Istio based.
The micro-service alarm dependence management method and the micro-service alarm dependence management system are based on the cloud native service registration center, micro-service alarm dependence management and visualization are achieved, and decision basis is provided for automatic service management.
Further, the method further comprises: according to the alarm severity level and range, adopting a corresponding alarm processing strategy to achieve rapid recovery of the micro-service cluster fault; the alarm processing strategy comprises the following steps: any one or more of current limiting, fusing, degrading, popping and restarting.
For example, the alarm processing module corresponding to the system can be internally provided with service management means such as current limiting/fusing/degrading/elastic expansion/restarting and the like, and an appropriate alarm processing strategy is adopted according to the alarm severity level and range so as to achieve rapid recovery of micro-service cluster faults.
In summary, the present application provides a service tree-based visual unified alarm method, device, equipment and medium, and aims to implement a service tree-based visual unified alarm scheme to provide different alarm notification levels, diversified alarm channel configurations, friendly alarm state pages at each level of the visual service tree, an automated alarm processing function and a historical alarm notification storage medium.
Fig. 3 is a schematic block diagram of a visualization unified alarm system based on a service tree according to an embodiment of the present application. As shown, the system 200 includes.
A receivingmodule 301, configured to receive one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form.
Aprocessing module 302, configured to perform standardization processing on the alarm notification message to obtain an effective message, analyze the content of the alarm notification message to obtain a service tree in the associated third-party service, and add the service tree to the alarm notification message after the standardization processing; and pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system so as to be consumed by different message consumers.
Aconsumer module 303, configured to start message consumers of multiple different function modules to consume the alarm notification messages in each topic of the distributed publish-subscribe message system, so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
It should be noted that, for the information interaction, execution process, and other contents between the modules/units of the system, since the same concept is based on the embodiment of the method described in this application, the technical effect brought by the embodiment of the method is the same as that of the embodiment of the method in this application, and specific contents may refer to the description in the foregoing embodiment of the method in this application, and are not described herein again.
It should be further noted that the division of the modules of the above system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units may all be implemented in the form of software calls by processing elements.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown, thecomputer device 400 includes: amemory 401, and aprocessor 402; thememory 401 is used for storing computer instructions; theprocessor 402 executes computer instructions to implement the method described in fig. 1.
In some embodiments, the number of thememories 401 in thecomputer device 400 may be one or more, the number of theprocessors 402 may be one or more, and fig. 4 is taken as an example.
In an embodiment of the present application, theprocessor 402 in thecomputer device 400 loads one or more instructions corresponding to processes of an application program into thememory 401 according to the steps described in fig. 1, and theprocessor 402 executes the application program stored in thememory 401, thereby implementing the method described in fig. 1.
Thememory 401 may include a Random Access Memory (RAM), or may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Thememory 401 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
TheProcessor 402 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In some specific applications, the various components of thecomputer device 400 are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of explanation the various busses are shown in fig. 4 as a bus system.
In an embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method described in fig. 1.
The present application may be embodied as systems, methods, and/or computer program products, in any combination of technical details. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable programs described herein may be downloaded from a computer-readable storage medium to a variety of computing/processing devices, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
In summary, the present application provides a visual unified alarm method, apparatus, device and medium based on a service tree, by receiving one or more alarm notification messages from an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form; standardizing the alarm notification message to obtain an effective message, analyzing the content of the alarm notification message to obtain a business service tree in the associated third-party service, and adding the business service tree to the alarm notification message after standardized processing; pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system for consumption by different message consumers; starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
The application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the invention. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present application.