Movatterモバイル変換


[0]ホーム

URL:


CN113377626B - Visual unified alarm method, device, equipment and medium based on service tree - Google Patents

Visual unified alarm method, device, equipment and medium based on service tree
Download PDF

Info

Publication number
CN113377626B
CN113377626BCN202110916619.7ACN202110916619ACN113377626BCN 113377626 BCN113377626 BCN 113377626BCN 202110916619 ACN202110916619 ACN 202110916619ACN 113377626 BCN113377626 BCN 113377626B
Authority
CN
China
Prior art keywords
alarm
alarm notification
message
service
messages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110916619.7A
Other languages
Chinese (zh)
Other versions
CN113377626A (en
Inventor
何育伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Linkedcare Information Technology Co ltd
Original Assignee
Shanghai Linkedcare Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Linkedcare Information Technology Co ltdfiledCriticalShanghai Linkedcare Information Technology Co ltd
Priority to CN202110916619.7ApriorityCriticalpatent/CN113377626B/en
Publication of CN113377626ApublicationCriticalpatent/CN113377626A/en
Application grantedgrantedCritical
Publication of CN113377626BpublicationCriticalpatent/CN113377626B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供的一种基于服务树的可视化统一报警方法、装置、设备和介质,通过接收报警管理器的一或多个报警通知消息;对所述报警通知消息进行标准化处理以获取有效消息,同时解析报警通知消息的内容以获取所关联的第三方服务中的业务服务树,并将业务服务树添加至标准化处理后的报警通知消息;将标准化处理后的报警通知消息推送至发布订阅消息系统的一或多个主题,以供不同的消息消费者进行消费。本申请提供了友好的前端配置页面,降低用户使用Prometheus的AlertManager报警通知管理模块的学习成本,避免人工配置出错;提供告警升级和通知人的变更,有助于监控告警在企业内的普及;实现了报警话题、报警内容、报警渠道、报警订阅的统一处理。

Figure 202110916619

A visual unified alarm method, device, device and medium based on a service tree provided by the present application, by receiving one or more alarm notification messages from an alarm manager; standardizing the alarm notification messages to obtain valid messages, and at the same time Parse the content of the alarm notification message to obtain the business service tree in the associated third-party service, and add the business service tree to the standardized alarm notification message; push the standardized alarm notification message to the pub-sub message system One or more topics for consumption by different message consumers. This application provides a friendly front-end configuration page, which reduces the learning cost of users using the AlertManager alarm notification management module of Prometheus, and avoids manual configuration errors; provides alarm escalation and notification person changes, which helps to monitor the popularity of alarms in enterprises; realize Unified processing of alarm topic, alarm content, alarm channel, and alarm subscription.

Figure 202110916619

Description

Visual unified alarm method, device, equipment and medium based on service tree
Technical Field
The invention relates to the technical field of micro-services, in particular to a visual unified alarm method, a visual unified alarm device, visual unified alarm equipment and a visual unified alarm medium based on a service tree.
Background
The microservice architecture is a software architecture mode commonly adopted by all large internet companies at present. In the micro-service architecture, a system is split into a plurality of small and mutually independent services, and the services run in own processes and can be independently developed and deployed. When the service changes rapidly, the micro-service has the characteristics of single responsibility and autonomy, so that the boundary of the system is clearer, and the maintainability of the system is improved; meanwhile, the complexity of system deployment is simplified, and the micro-service can be upgraded and released independently; when the service is increased, independent expansion can be conveniently carried out. The microservice architecture, while providing many benefits, also introduces new problems.
In the previous single application, the problem of troubleshooting is usually to locate error information and abnormal stacks by checking logs; however, in the micro-service architecture, the services are numerous, and problem location becomes very difficult when a problem occurs. In addition, micro-services often create new services by combining existing services, and a failure of one service is likely to produce an avalanche effect, resulting in unavailability of the entire system.
Therefore, how to monitor the operating condition of the microservice is extremely important, and when an abnormality occurs, the microservice can quickly and accurately give a corresponding alarm to inform corresponding service development and operation and maintenance personnel.
Currently, the underlying infrastructure supporting micro services is cloud native applications such as kubernets (a production-level container arrangement system) container arrangement and Service Mesh (a Service grid, an infrastructure technical architecture for inter-Service communication) Service administration, and a micro Service monitoring alarm system built based on Prometheus (promemeus) is commonly used in the industry to realize comprehensive monitoring and unified alarm of micro services.
While the technical solution of using Prometheus to match with Grafana (an open-source metric monitoring and visualization tool) for monitoring microservices has been relatively mature and sophisticated, the alarm of monitoring data relies on the Prometheus's alarm notification system alert manager (Prometheus's alarm notification management module), where the following problems exist.
1) The method needs to modify Prometheus configuration files, associates AlertManager with Prometheus, cannot realize alarm upgrading, does not support dynamic index value acquisition, cannot dynamically acquire index labels, and summarizes all the problems, which indicates that the alarm configuration of Prometheus and AlertManager is not convenient and flexible enough, the configuration files need to be changed frequently, and the learning cost is high, so that the method is not beneficial to popularization of monitoring alarms in enterprises.
2) In addition, in the face of a huge number of micro services, the prior art adopts a uniform alarm strategy based on AlertManager for all Prometheus systems in an enterprise, and cannot adapt to complex and variable business alarm requirements of the enterprise, for example, monitoring alarm rules required by development groups in the enterprise are configured differently, monitored applications are different, alarm notification levels and personnel are also different, while the prior art only provides alarm configuration capability of AlertManager, does not provide an interface for expanding an alarm system of the enterprise, and has poor expandability.
3) In addition, for hundreds of micro services, how to reasonably aggregate, display and automatically process alarm messages according to calling relations and time sequence relations after receiving the alarm messages is a pain point and a difficulty point in actual services.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present application to provide a visual unified alarm method, apparatus, device and medium based on service tree to solve the problems in the prior art.
To achieve the above and other related objects, the present application provides a visual unified alarm method based on a service tree, including: receiving one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form; standardizing the alarm notification message to obtain an effective message, analyzing the content of the alarm notification message to obtain a business service tree in the associated third-party service, and adding the business service tree to the alarm notification message after standardized processing; pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system for consumption by different message consumers; starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
In an embodiment of the present application, the entities of the alarm notification message include: any one or more of a source, environment, type, node, and label.
In an embodiment of the present application, the method further includes: according to dynamic automatic generation of each hierarchy of the business service tree, an alarm state page corresponding to each hierarchy of the business service tree is generated and displayed visually; and providing alarm levels, alarm channel configuration and updating functions on an alarm state page of the visual business service tree so that each development, operation and maintenance personnel of the micro-service can configure and subscribe the required alarm message by themselves.
In an embodiment of the present application, the method further includes: when the state of the received alarm notification message is triggered, the state of a node panel corresponding to the service tree on the alarm state page turns red, and the numerical value is increased along with the increase of the number of different received alarm notification messages; when the state of receiving the alarm notification message is that the processing is finished, the numerical value is reduced along with the increase of the number of different alarm notification messages received by a user; and when the numerical value is zero, the state of the node panel corresponding to the service tree on the alarm state page turns green.
In an embodiment of the present application, the method further includes: and calling a corresponding fault processing service interface according to the type of each standardized alarm notification message so as to provide an alarm fault self-healing processing function.
In an embodiment of the present application, the method further includes: pushing the alarm notification message to a registered and configured alarm channel according to a certain format style; the alarm channel includes: any one of enterprise WeChat, nail, mail and SMS; the alarm channels of the same type are distinguished according to different groups or groups created by the service tree; wherein the cluster may correspond to any node on the service tree to provide notification control of alarm notification messages of different granularities.
In an embodiment of the present application, the method further includes: acquiring the calling relation of each micro service in real time based on a cloud native service registration center and a service topology tool; and receiving alarm adjustment information of each micro-service according to a time sequence, and combining the topological graph to quickly locate an accident source and influence the service.
In an embodiment of the present application, the method further includes: according to the alarm severity level and range, adopting a corresponding alarm processing strategy to achieve rapid recovery of the micro-service cluster fault; wherein the alarm handling policy comprises: any one or more of current limiting, fusing, degrading, popping and restarting.
To achieve the above and other related objects, the present application provides a visual unified alarm system based on a service tree, the system comprising: a receiving module for receiving one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form; the processing module is used for carrying out standardization processing on the alarm notification message to obtain an effective message, analyzing the content of the alarm notification message to obtain a business service tree in the associated third-party service, and adding the business service tree to the alarm notification message after the standardization processing; pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system for consumption by different message consumers; the consumer module is used for starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to the alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
To achieve the above and other related objects, the present application provides a computer apparatus, comprising: a memory, and a processor; the memory is to store computer instructions; the processor executes computer instructions to implement the method as described above.
To achieve the above and other related objects, the present application provides a computer readable storage medium storing computer instructions which, when executed, perform the method as described above.
In summary, the method, apparatus, device and medium for visual unified alarm based on service tree according to the present application receives one or more alarm notification messages from an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form; standardizing the alarm notification message to obtain an effective message, analyzing the content of the alarm notification message to obtain a business service tree in the associated third-party service, and adding the business service tree to the alarm notification message after standardized processing; pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system for consumption by different message consumers; starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
Has the following beneficial effects:
the method and the device provide a friendly front-end configuration page, reduce the learning cost of a user for informing a management module by using an AlertManager alarm of Prometheus, and avoid manual configuration errors; the alarm upgrading and the change of the notifier are provided, which is beneficial to monitoring the popularization of the alarm in enterprises; the unified processing of the alarm topic, the alarm content, the alarm channel and the alarm subscription is realized, and the visual unified alarm system based on the business service tree can be fully utilized. Based on a cloud native service registration center, micro-service alarm dependence management and visualization are realized, and a decision basis is provided for automatic service management; the AlertManager sends the alarm message to a visual unified alarm system based on the business service tree in a webhook mode, so that Topic can be set independently for different services and different alarm levels, and more accurate notification reach and focusing can be realized.
Drawings
Fig. 1 is a flowchart illustrating a visualization unified alarm method based on a service tree according to an embodiment of the present application.
Fig. 2 is a schematic traffic flow diagram illustrating a visualization unified alarm method based on a service tree according to an embodiment of the present application.
Fig. 3 is a block diagram of a visual unified alarm system based on a service tree according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only schematic and illustrate the basic idea of the present application, and although the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, the type, quantity and proportion of the components in actual implementation may be changed at will, and the layout of the components may be more complex.
Throughout the specification, when a part is referred to as being "connected" to another part, this includes not only a case of being "directly connected" but also a case of being "indirectly connected" with another element interposed therebetween. In addition, when a certain part is referred to as "including" a certain component, unless otherwise stated, other components are not excluded, but it means that other components may be included.
The terms first, second, third, etc. are used herein to describe various elements, components, regions, layers and/or sections, but are not limited thereto. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the scope of the present application.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
The aforementioned prometheus is an open source of monitoring software developed by CNCF. The process of prometheus mainly comprises the following steps: and pulling the data of each node at regular time according to the configuration, wherein the default pulling mode is pull, and the data of each monitoring node can be acquired by using a push mode provided by push gateway. The acquired data is stored in a TSDB (a time series database). At this time, prometheus has already acquired monitoring data, and a built-in PromQL can be used for querying. Its alarm function is provided using an alarm manager (Alertmanager), which is a component of promemeus' alarm management and sending alarms. The native icon function of the prometheus is too simple, so that the prometheus data can be accessed to the grafana and uniformly managed by the grafana.
The alarm management method aims at the problem that an alarm manager of a Prometheus server is insufficient in alarm configuration. The method, the system and the storage medium mainly use a Prometous server as a framework, and provide a visual unified alarm method, a system and a storage medium based on a service tree so as to provide alarm state pages of different alarm notification levels, diversified alarm channel configurations, friendly visual service tree/service dependence and other levels, an automatic alarm processing function and a historical alarm notification storage medium.
Fig. 1 is a schematic flow chart of a visualization unified alarm method based on a service tree in an embodiment of the present application. As shown, the method comprises:
step S101: receiving one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form.
Webhooks is an api concept, is one of the usage paradigms of microservice api, and is also called reverse api, namely: the front end does not actively send the request and is completely pushed by the back end. Taking a common example, for example, your friend sends a friend circle, and the backend pushes the message to the clients of all other friends, which is a typical scenario of Webhooks. Briefly, Webhook is a URL that receives HTTP POST (or GET, PUT, DELETE). An API provider that implements WebHook sends a message to the configured URL when an event occurs. Using WebHooks, you can accept changes in real time, unlike request-response. This is again an inverse of the client-server model, where in the traditional approach, the client requests data from the server, and then the server provides the client with the data (the client is pulling the data). In the Webhook paradigm, a server updates a resource that needs to be provisioned and then automatically sends it as an update to a client (server is pushing data), which is not a requestor but a passive recipient. This reversal of the control relationship may be used to facilitate many communication requests that would otherwise require more complex requests and constant polling on the remote server. By simply receiving the resource rather than sending the request directly, the remote code library can be updated, easily allocating the resource, and even integrating it into an existing system to update the endpoint and related data as needed by the API. And for scenes such as authentication, login and the like of a third-party platform without a front-end interface for transfer, or payment scenes with strong safety requirements and the like, the method is suitable for actively pushing data by using Webhooks.
The alarm manager (alert manager) is an independent alarm module, receives alarms sent by clients such as promemeus (open source monitoring software), and then processes the alarms through grouping, deleting repetition and the like, and sends the alarms to a correct receiver through a route; the alarm mode can be sent to different module responsible persons according to different rules.
The alarm channel includes: enterprise WeChat, nailing, mailing, and SMS. For example, Alertmanager supports alarm modes such as Email and Slack, and can access domestic IM tools such as nails through webhook.
Briefly, one or more alarm notification messages from an alarm manager (alert manager) may be received via a variety of alarm channels (e.g., enterprise WeChat, nail, mail, and SMS, etc.) by pre-registering a variety of alarm channels, each in a Webhook format, to the alarm manager.
Preferably, the entities of the alarm notification message include: any one or more of a source, environment, type, node, and label.
Therefore, the alert manager sends the alarm message to the visual unified alarm system based on the business service tree in a webhook mode, so that Topic can be set independently for different services and different alarm levels, and more accurate notification reach and focusing can be realized.
Step S102: and standardizing the alarm notification message, analyzing the content of the alarm notification message to obtain a service tree in third-party service, and adding the service tree to the alarm notification message after standardized processing.
Preferably, the alarm notification message is standardized, that is, valid information is extracted, and irrelevant data is discarded. Because, typically, alarm messages are more hierarchical and redundant, if all of the alarm notification messages are received, not only is the processing speed of the message consumer reduced, but network resources are also consumed.
Specifically, the content of the alarm notification message is analyzed to obtain a service tree in a third-party service, and the specific operations are as follows: and according to the environment, the type and the node information in the entity of the alarm notification message, the third-party service acquires the data of the service tree corresponding to the alarm and adds the data into the alarm message after standardized processing.
Preferably, the service tree is a tree-shaped structure abstraction for information of each service Department, each service Project, service Cluster and service Role, and the abstraction is helpful for the alarm message to be quickly and accurately positioned to a development responsible person of the micro-service.
Step S103: and pushing the alarm notification message after standardized processing to one or more topics of a distributed publish-subscribe message system so as to be consumed by different message consumers.
For example, the standardized alert notification messages are pushed to one or more topics (Topic) of a Kafka (a high-throughput distributed publish-subscribe messaging system) for subsequent consumption processing by different message consumers (consumers).
So far, compared with the prior art, the steps S101 to S103 of the application are unified processing and issuing by means of the data of the service tree on the basis of receiving the alarm notification message from the alarm management. And then, flexible alarm configuration is realized through each hierarchy or each node of the business service tree, and based on the flexible alarm configuration, the dynamic index value acquisition and the dynamic index label acquisition are supported. Reference may be made to the flow diagram shown in fig. 2.
Step S104: starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
For example, the Consumer that simultaneously starts a plurality of different functional modules consumes the alarm notification message in the Topic of Kafka, and performs customized processing or storage according to the requirements of the alarm system of the present application. And/or, initiating an additional Consumer subscription to the alert message in Topic of Kafka, receiving and storing a copy of all historical alert messages in an ElasticSearch (a distributed multi-user-capable full text search engine) to provide intelligent analysis capability for alert notification messages.
In an embodiment of the present application, the method further includes:
A. and according to the dynamic automatic generation of each hierarchy of the business service tree, generating an alarm state page corresponding to each hierarchy of the business service tree, and displaying the alarm state page visually.
For example, the method of the present step may correspond to a state management module of the present system, and is used to friendly and visually display the alarm state pages of each level of the service tree, and the alarm state pages are dynamically and automatically generated according to the level of the service tree. Preferably, when the state of receiving the alarm notification message is triggering (ringing), the state on the alarm state page turns red, and the value increases as the number of the received alarm notification messages increases; when the status of receiving the alarm notification message is completed or recovered (resolved), the value decreases as the number of received alarm notification messages increases; when the value is zero, the state on the alarm state page turns green.
B. And providing alarm levels, alarm channel configuration and updating functions on an alarm state page of the visual business service tree so that each development, operation and maintenance personnel of the micro-service can configure and subscribe the required alarm message by themselves.
For example, the method of the present step may correspond to a configuration update module of the present system, which is configured to provide an alarm level and an alarm channel configuration and update function on a visual interface, and a development operation and maintenance staff of the micro service may configure and subscribe the alarm notification message that the development operation and maintenance staff want to receive by himself.
In an embodiment of the present application, the method further includes: and calling a corresponding fault processing service interface according to the type of each standardized alarm notification message so as to provide an alarm fault self-healing processing function.
For example, the method of this embodiment may correspond to an alarm processing module of the system, and is configured to call a corresponding fault handling service interface according to a standardized alarm notification message type to provide an alarm fault self-healing processing function.
In an embodiment of the present application, the method further includes:
A. pushing the alarm notification message to a registered and configured alarm channel according to a certain format style; the alarm channel includes: any one of enterprise WeChat, nail, mail and SMS;
B. the alarm channels of the same type are distinguished according to different groups or groups created by the service tree; wherein the cluster may correspond to any node on the service tree to provide notification control of alarm notification messages of different granularities.
For example, the method of this embodiment may correspond to a channel pushing module of the present system, which is configured to push the alarm notification message to various alarm channels of the registration configuration according to a certain format style, where the types of the alarm channels include, but are not limited to: enterprise wechat, stapling, email, SMS, etc., the same type of alarm channel also distinguishes different groups or groups created according to a particular business service tree, which may correspond to any node on the particular service tree to provide alarm message notification control of different granularity.
Briefly, the method and the system provide alarm upgrading and change of a notifier, and are beneficial to monitoring popularization of alarms in enterprises; the unified processing of the alarm topic, the alarm content, the alarm channel and the alarm subscription is realized, and the visual unified alarm system based on the business service tree can be fully utilized.
It should be noted that, as the number of services increases, the dependency relationship between the micro services becomes more and more complex, and the alarm of one micro service cluster may be initiated by a single micro service alarm and affect the upstream and downstream micro services, so the service dependency relationship, the alarm timing sequence, and the like have great significance for effectively managing the micro services.
To this end, the present application also proposes: based on a cloud native K8s service registration center (a public container K8s cluster) and a service topology tool Kiali, the method obtains the calling relation of each micro service in real time in an alarm processing module, receives the alarm adjusting message of each micro service according to the time sequence, and quickly positions an accident source and influences the service by combining a topological graph.
For example, based on a cloud native K8s service registry and a service topology tool Kiali, the calling relationship of each micro service is acquired in real time, the alarm message of each micro service is received according to the time sequence, and the accident source and the service influence are quickly positioned by combining a topology map.
For example, in different K8s clusters, a distributed scheduling platform XXLJOB pushes data to pushgateway, and then the pushgateway is provided to a proxy server, and then the proxy server is provided to an alarm manager alert manager and an open-source index amount monitoring and visualization tool Grafana, and finally summarized into the system, so as to obtain the invocation relationship of each micro-service in real time, receive the alarm adjustment message of each micro-service according to the time sequence, and quickly locate the accident source and influence the service by combining with a topological graph.
Wherein, the Kiali is a system with a front end and a back end separated, and when the mirror image is constructed, the front end and the back end are placed in the same mirror image. Kiali relies on two external services, Prometheus, which is a monitoring and alarm system. Kiali will look up data from Prometheus, produce a topological map or some other statistical map. The other service is a Cluster API, and Kiali acquires data such as service, deployment and the like, and also acquires yaml configuration of virtual service and destinationrule for configuration detection. Also, Kiali can configure two alternative services, Jaeger and Grafana. Jaeger is a distributed tracking system developed by Uber and Grafana is a data visualization system. These functions of Kiali are all Istio based.
The micro-service alarm dependence management method and the micro-service alarm dependence management system are based on the cloud native service registration center, micro-service alarm dependence management and visualization are achieved, and decision basis is provided for automatic service management.
Further, the method further comprises: according to the alarm severity level and range, adopting a corresponding alarm processing strategy to achieve rapid recovery of the micro-service cluster fault; the alarm processing strategy comprises the following steps: any one or more of current limiting, fusing, degrading, popping and restarting.
For example, the alarm processing module corresponding to the system can be internally provided with service management means such as current limiting/fusing/degrading/elastic expansion/restarting and the like, and an appropriate alarm processing strategy is adopted according to the alarm severity level and range so as to achieve rapid recovery of micro-service cluster faults.
In summary, the present application provides a service tree-based visual unified alarm method, device, equipment and medium, and aims to implement a service tree-based visual unified alarm scheme to provide different alarm notification levels, diversified alarm channel configurations, friendly alarm state pages at each level of the visual service tree, an automated alarm processing function and a historical alarm notification storage medium.
Fig. 3 is a schematic block diagram of a visualization unified alarm system based on a service tree according to an embodiment of the present application. As shown, the system 200 includes.
A receivingmodule 301, configured to receive one or more alarm notification messages of an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form.
Aprocessing module 302, configured to perform standardization processing on the alarm notification message to obtain an effective message, analyze the content of the alarm notification message to obtain a service tree in the associated third-party service, and add the service tree to the alarm notification message after the standardization processing; and pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system so as to be consumed by different message consumers.
Aconsumer module 303, configured to start message consumers of multiple different function modules to consume the alarm notification messages in each topic of the distributed publish-subscribe message system, so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
It should be noted that, for the information interaction, execution process, and other contents between the modules/units of the system, since the same concept is based on the embodiment of the method described in this application, the technical effect brought by the embodiment of the method is the same as that of the embodiment of the method in this application, and specific contents may refer to the description in the foregoing embodiment of the method in this application, and are not described herein again.
It should be further noted that the division of the modules of the above system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units may all be implemented in the form of software calls by processing elements.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown, thecomputer device 400 includes: amemory 401, and aprocessor 402; thememory 401 is used for storing computer instructions; theprocessor 402 executes computer instructions to implement the method described in fig. 1.
In some embodiments, the number of thememories 401 in thecomputer device 400 may be one or more, the number of theprocessors 402 may be one or more, and fig. 4 is taken as an example.
In an embodiment of the present application, theprocessor 402 in thecomputer device 400 loads one or more instructions corresponding to processes of an application program into thememory 401 according to the steps described in fig. 1, and theprocessor 402 executes the application program stored in thememory 401, thereby implementing the method described in fig. 1.
Thememory 401 may include a Random Access Memory (RAM), or may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Thememory 401 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
TheProcessor 402 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In some specific applications, the various components of thecomputer device 400 are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of explanation the various busses are shown in fig. 4 as a bus system.
In an embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method described in fig. 1.
The present application may be embodied as systems, methods, and/or computer program products, in any combination of technical details. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable programs described herein may be downloaded from a computer-readable storage medium to a variety of computing/processing devices, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
In summary, the present application provides a visual unified alarm method, apparatus, device and medium based on a service tree, by receiving one or more alarm notification messages from an alarm manager; wherein, various alarm channels are respectively and uniformly pre-registered to the alarm manager in a Webhook form; standardizing the alarm notification message to obtain an effective message, analyzing the content of the alarm notification message to obtain a business service tree in the associated third-party service, and adding the business service tree to the alarm notification message after standardized processing; pushing the alarm notification message after standardized processing to one or more topics of a publish-subscribe message system for consumption by different message consumers; starting message consumers of a plurality of different functional modules to consume the alarm notification messages in the topics of the distributed publish-subscribe message system so as to perform customized processing or storage according to alarm requirements; and/or initiating a message consumer to subscribe to the alarm notification messages in the topics of the distributed publish-subscribe message system to receive and store the alarm notification messages of all the histories to a search engine for analysis of the alarm notification messages.
The application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the invention. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present application.

Claims (11)

Translated fromChinese
1.一种基于服务树的可视化统一报警方法,其特征在于,所述方法包括:1. a visual unified alarm method based on service tree, is characterized in that, described method comprises:接收报警管理器的一或多个报警通知消息;其中,将各报警通知消息对应的多种报警通道分别统一地以一个Webhook的形式预先注册到所述报警管理器;Receive one or more alarm notification messages of the alarm manager; wherein, multiple alarm channels corresponding to each alarm notification message are pre-registered to the alarm manager in the form of a Webhook in a unified manner;对所述报警通知消息进行标准化处理以获取有效消息并舍弃无关数据,同时解析所述报警通知消息的内容以获取所关联的第三方服务中的业务服务树,并将所述业务服务树添加至标准化处理后的所述报警通知消息;Standardize the alarm notification message to obtain valid messages and discard irrelevant data, parse the content of the alarm notification message to obtain the business service tree in the associated third-party service, and add the business service tree to the the standardized alarm notification message;将标准化处理后的所述报警通知消息推送至发布订阅消息系统的一或多个主题,以供不同的消息消费者进行消费;Pushing the standardized alarm notification message to one or more topics of the publish-subscribe message system for consumption by different message consumers;启动多个不同功能模块的消息消费者对分布式发布订阅消息系统的各主题中的报警通知消息进行消费,以供根据报警需求进行定制化的处理或存储;和/或,启动一消息消费者订阅分布式发布订阅消息系统的各主题中的报警通知消息,以接收并将所有历史的报警通知消息存储至搜索引擎,以供对报警通知消息进行分析。Start a plurality of message consumers of different functional modules to consume alarm notification messages in each topic of the distributed publish-subscribe message system for customized processing or storage according to alarm requirements; and/or start a message consumer Subscribe to the alarm notification messages in each topic of the distributed publish-subscribe message system to receive and store all historical alarm notification messages to the search engine for analysis of the alarm notification messages.2.根据权利要求1所述的方法,其特征在于,所述报警通知消息的实体包括:源头、环境、类型、节点、及标签中任意一种或多种组合。2 . The method according to claim 1 , wherein the entity of the alarm notification message comprises: any one or more combinations of source, environment, type, node, and tag. 3 .3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, wherein the method further comprises:根据所述业务服务树的各层级动态化的自动生成对应所述业务服务树各层级的报警状态页面,并通过可视化展示;Dynamically and automatically generate alarm status pages corresponding to each level of the business service tree according to each level of the business service tree, and display them visually;在可视化的所述业务服务树的报警状态页面上,提供报警等级、报警通道的配置、及更新功能,以供微服务的各开发运维人员可自行配置订阅所需的报警消息。On the alarm status page of the visualized business service tree, the alarm level, the configuration of the alarm channel, and the update function are provided, so that the development, operation and maintenance personnel of the microservice can configure and subscribe the alarm messages required by themselves.4.根据权利要求3所述的方法,其特征在于,所述方法还包括:4. The method according to claim 3, wherein the method further comprises:当接收到报警通知消息的状态为触发时,所述报警状态页面上对应所述业务服务树的节点面板的状态变红,且数值随着收到的不同的报警通知消息数量的增加而增加;When the status of receiving the alarm notification message is triggered, the status of the node panel corresponding to the business service tree on the alarm status page turns red, and the value increases with the increase of the number of different alarm notification messages received;当接收到报警通知消息的状态为处理完成时,数值随者收到的不同的报警通知消息数量的增加而减少;当数值为零值时,所述报警状态页面上对应所述业务服务树的节点面板的状态变绿。When the status of the received alarm notification message is processing completed, the value decreases with the increase in the number of different alarm notification messages received; when the value is zero, the corresponding value of the business service tree on the alarm status page The status of the node pane turns green.5.根据权利要求1所述的方法,其特征在于,所述方法还包括:5. The method according to claim 1, wherein the method further comprises:根据标准化后的各所述报警通知消息的类型,调用相应的故障处理服务接口,以提供报警故障自愈处理功能。According to the type of each of the standardized alarm notification messages, the corresponding fault processing service interface is invoked to provide the function of self-healing processing of alarm faults.6.根据权利要求1所述的方法,其特征在于,所述方法还把包括:6. The method according to claim 1, wherein the method further comprises:将所述报警通知消息按照一定的格式样式推送到注册配置的报警通道;所述报警通道包括:企业微信、钉钉、邮件、及SMS中任意一种;Push the alarm notification message to the alarm channel registered and configured according to a certain format; the alarm channel includes: any one of enterprise WeChat, DingTalk, email, and SMS;同一类型的所述报警通道按业务服务树所创建的不同群或组别进行区分;其中,所述群可对应到所述服务树上的任意节点,以提供不同颗粒度的报警通知消息的通知控制。The alarm channels of the same type are distinguished by different groups or groups created by the business service tree; wherein, the group can correspond to any node on the service tree to provide notifications of alarm notification messages of different granularities control.7.根据权利要求1所述的方法,其特征在于,所述方法还包括:7. The method of claim 1, wherein the method further comprises:基于云原生服务注册中心和服务拓扑工具,实时获取各微服务调用关系;Based on the cloud native service registry and service topology tools, real-time acquisition of the calling relationship of each microservice;按时序接收到各微服务的报警调整消息,并结合拓扑图,以迅速定位事故源及影响服务。Receive the alarm adjustment messages of each microservice in time, and combine with the topology map to quickly locate the source of the accident and affect the service.8.根据权利要求1所述的方法,其特征在于,所述方法还包括:8. The method of claim 1, wherein the method further comprises:根据报警严重等级和范围,采取相应的报警处理策略,以达到微服务集群故障快速恢复;其中,所述报警处理策略包括:限流、熔断、降级、弹扩、及重启中任意一种或多种组合。According to the alarm severity level and scope, corresponding alarm processing strategies are adopted to achieve rapid recovery of microservice cluster failures; wherein, the alarm processing strategies include: any one or more of current limiting, fusing, downgrading, elastic expansion, and restarting kind of combination.9.一种基于服务树的可视化统一报警系统,其特征在于,所述系统包括:9. A visual unified alarm system based on a service tree, wherein the system comprises:接收模块,用于接收报警管理器的一或多个报警通知消息;其中,将各报警通知消息对应的多种报警通道分别统一地以一个Webhook的形式预先注册到所述报警管理器;a receiving module for receiving one or more alarm notification messages of the alarm manager; wherein, multiple alarm channels corresponding to each alarm notification message are pre-registered in the alarm manager in the form of a Webhook in a unified manner;处理模块,用于对所述报警通知消息进行标准化处理以获取有效消息并舍弃无关数据,同时解析所述报警通知消息的内容以获取所关联的第三方服务中的业务服务树,并将所述业务服务树添加至标准化处理后的所述报警通知消息;将标准化处理后的所述报警通知消息推送至发布订阅消息系统的一或多个主题,以供不同的消息消费者进行消费;The processing module is used to standardize the alarm notification message to obtain valid messages and discard irrelevant data, and parse the content of the alarm notification message to obtain the business service tree in the associated third-party service. The business service tree is added to the standardized alarm notification message; the standardized alarm notification message is pushed to one or more topics of the publish-subscribe message system for consumption by different message consumers;消费者模块,用于启动多个不同功能模块的消息消费者对分布式发布订阅消息系统的各主题中的报警通知消息进行消费,以供根据报警需求进行定制化的处理或存储;和/或,启动一消息消费者订阅分布式发布订阅消息系统的各主题中的报警通知消息,以接收并将所有历史的报警通知消息存储至搜索引擎,以供对报警通知消息进行分析。The consumer module is used to start the message consumers of a plurality of different functional modules to consume the alarm notification messages in each topic of the distributed publish-subscribe message system for customized processing or storage according to the alarm requirements; and/or , start a message consumer to subscribe to the alarm notification messages in each topic of the distributed publish-subscribe message system to receive and store all historical alarm notification messages in the search engine for analysis of the alarm notification messages.10.一种计算机设备,其特征在于,所述设备包括:存储器、及处理器;所述存储器用于存储计算机指令;所述处理器运行计算机指令实现如权利要求1至8中任意一项所述的方法。10. A computer device, characterized in that the device comprises: a memory, and a processor; the memory is used to store computer instructions; the processor executes the computer instructions to implement any one of claims 1 to 8. method described.11.一种计算机可读存储介质,其特征在于,存储有计算机指令,所述计算机指令被运行时执行如权利要求1至8中任一项所述的方法。11. A computer-readable storage medium, characterized by storing computer instructions that, when executed, perform the method of any one of claims 1 to 8.
CN202110916619.7A2021-08-112021-08-11Visual unified alarm method, device, equipment and medium based on service treeActiveCN113377626B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110916619.7ACN113377626B (en)2021-08-112021-08-11Visual unified alarm method, device, equipment and medium based on service tree

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110916619.7ACN113377626B (en)2021-08-112021-08-11Visual unified alarm method, device, equipment and medium based on service tree

Publications (2)

Publication NumberPublication Date
CN113377626A CN113377626A (en)2021-09-10
CN113377626Btrue CN113377626B (en)2021-11-23

Family

ID=77576694

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110916619.7AActiveCN113377626B (en)2021-08-112021-08-11Visual unified alarm method, device, equipment and medium based on service tree

Country Status (1)

CountryLink
CN (1)CN113377626B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114048113A (en)*2021-11-232022-02-15中国工商银行股份有限公司Data center monitoring alarm fault self-healing method and device and computer equipment
CN114253807B (en)*2021-12-202023-04-07深圳前海微众银行股份有限公司Alarm information notification method and device
CN114500248B (en)*2022-04-012022-08-05北京锐融天下科技股份有限公司Monitoring and alarming method and system for service in Internet software system
CN117149897B (en)*2023-10-312024-01-26成都交大光芒科技股份有限公司Big data alarm information hierarchical display system and method based on double-buffer technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111611137A (en)*2020-06-302020-09-01平安银行股份有限公司Alarm monitoring method and device, computer equipment and storage medium
CN113141485A (en)*2020-10-222021-07-20西安天和防务技术股份有限公司Alarm system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8111814B2 (en)*2006-03-202012-02-07Microsoft CorporationExtensible alert types
US10853160B2 (en)*2018-05-042020-12-01Vmware, Inc.Methods and systems to manage alerts in a distributed computing system
CN109710487A (en)*2018-11-292019-05-03同盾控股有限公司A kind of monitoring method and device
CN112511339B (en)*2020-11-092023-04-07宝付网络科技(上海)有限公司Container monitoring alarm method, system, equipment and storage medium based on multiple clusters
CN112559281A (en)*2020-12-072021-03-26恩亿科(北京)数据科技有限公司Alarm routing system and method based on configuration

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111611137A (en)*2020-06-302020-09-01平安银行股份有限公司Alarm monitoring method and device, computer equipment and storage medium
CN113141485A (en)*2020-10-222021-07-20西安天和防务技术股份有限公司Alarm system

Also Published As

Publication numberPublication date
CN113377626A (en)2021-09-10

Similar Documents

PublicationPublication DateTitle
CN113377626B (en)Visual unified alarm method, device, equipment and medium based on service tree
US12039310B1 (en)Information technology networked entity monitoring with metric selection
US11934417B2 (en)Dynamically monitoring an information technology networked entity
US10057152B2 (en)Providing an unseen message count across devices
US20200028760A1 (en)Automated service-oriented performance management
US20190095478A1 (en)Information technology networked entity monitoring with automatic reliability scoring
US10521263B2 (en)Generic communication architecture for cloud microservice infrastructure
CN112235130A (en)Method and device for realizing operation and maintenance automation based on SDN network
US20140068629A1 (en)Event-Driven Approach for Collecting Monitoring Data of Messaging Systems
US11818152B2 (en)Modeling topic-based message-oriented middleware within a security system
CN111190888A (en)Method and device for managing graph database cluster
CN109450693B (en)Hybrid cloud monitoring system and monitoring method using same
CN114756301B (en) Log processing method, device and system
CN114363042A (en)Log analysis method, device, equipment and readable storage medium
Okhovat et al.Monitoring the Smart City Sensor Data Using Thingsboard and Node-Red
CN111782672B (en)Multi-field data management method and related device
CN116846729A (en)Method for managing monitoring alarm notification based on multi-tenant mode under cloud container
CN118250204A (en) Method, device and computer equipment for processing event information on existing network
CN113242148B (en)Method, device, medium and electronic equipment for generating monitoring alarm related information
CN115550141A (en) Event processing method, device, electronic device and readable storage medium
CN109324892B (en)Distributed management method, distributed management system and device
CN114880321A (en) A business early warning method and device
CN113900898B (en)Data processing system, equipment and medium
CN115514618A (en)Alarm event processing method and device, electronic equipment and medium
CN114625763A (en)Information analysis method and device for database, electronic equipment and readable medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
PE01Entry into force of the registration of the contract for pledge of patent right

Denomination of invention:Visual unified alarm method, device, equipment, and medium based on service tree

Effective date of registration:20231127

Granted publication date:20211123

Pledgee:China Minsheng Banking Corp Shanghai branch

Pledgor:SHANGHAI LINKEDCARE INFORMATION TECHNOLOGY Co.,Ltd.

Registration number:Y2023310000785

PE01Entry into force of the registration of the contract for pledge of patent right
PC01Cancellation of the registration of the contract for pledge of patent right

Granted publication date:20211123

Pledgee:China Minsheng Banking Corp Shanghai branch

Pledgor:SHANGHAI LINKEDCARE INFORMATION TECHNOLOGY Co.,Ltd.

Registration number:Y2023310000785

PC01Cancellation of the registration of the contract for pledge of patent right

[8]ページ先頭

©2009-2025 Movatter.jp