Movatterモバイル変換


[0]ホーム

URL:


CN110289981A - A high-performance computing Internet network monitoring method and system - Google Patents

A high-performance computing Internet network monitoring method and system
Download PDF

Info

Publication number
CN110289981A
CN110289981ACN201910402272.7ACN201910402272ACN110289981ACN 110289981 ACN110289981 ACN 110289981ACN 201910402272 ACN201910402272 ACN 201910402272ACN 110289981 ACN110289981 ACN 110289981A
Authority
CN
China
Prior art keywords
data
data processing
status register
capture program
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910402272.7A
Other languages
Chinese (zh)
Inventor
彭运勇
卢宇彤
杜云飞
刘羽
颜辉
杨杰
曾凌波
蒋迁谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen UniversityfiledCriticalSun Yat Sen University
Priority to CN201910402272.7ApriorityCriticalpatent/CN110289981A/en
Publication of CN110289981ApublicationCriticalpatent/CN110289981A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及一种高性能计算互联网网络监控方法及系统,通过在监控数据采集节点中部署通信芯片状态寄存器采集程序,采集态寄存器状态信息数据,并将状态寄存器中各被监控项的数据存储至数据库,中间层数据处理端对储存在数据库中的数据进行处理,并检测数据中是否存在预警、报警信息信息,若存在,则将预警、报警推送至终端,同时将处理后的数据推送给显示前端进行可视化展示、显示前端将数据进行可视化显示,本发明解决了高性能计算互联网络中监控数据采集不够实时、故障定位不够高效的问题,并且实现对预、报警信息的主动推送,实现一个统一的监控界面,直观高效。

The invention relates to a high-performance computing Internet network monitoring method and system. By deploying a communication chip state register acquisition program in a monitoring data acquisition node, the state register state information data is collected, and the data of each monitored item in the state register is stored in a The database, the data processing terminal of the middle layer processes the data stored in the database, and detects whether there is warning or alarm information in the data. If there is, it will push the warning and alarm to the terminal, and push the processed data to the display The front end performs visual display, and the display front end performs visual display of data. The present invention solves the problems of insufficient real-time monitoring data collection and inefficient fault location in the high-performance computing interconnected network, and realizes the active push of pre-warning and alarm information to achieve a unified The monitoring interface is intuitive and efficient.

Description

Translated fromChinese
一种高性能计算互联网网络监控方法及系统A high-performance computing Internet network monitoring method and system

技术领域technical field

本发明涉及互联网领域,更具体地,涉及一种高性能计算互联网网络监控方法及系统。The present invention relates to the field of the Internet, and more particularly, to a high-performance computing Internet network monitoring method and system.

背景技术Background technique

随着高性能计算集群规模正在不断的扩大,高性能计算互联网络也变得越来越庞大和复杂。高性能计算互联网络的监控工作就是要高效、实时的了解网络中通信板卡在线状态、互联通信板卡配置信息、链路连接状态、链路带宽性能、链路通信质量等;如果要全面了解整个高性能计算集群的网络运行状态,网络的自动化监控软件不可或缺。而目所采用技术,对高性能计算互联网络中监控数据采集不够实时、故障定位不够高效,使得对互联网网络的监控存在滞后性,无法及时的反应互联网网络中存在的问题。As the scale of high-performance computing clusters is continuously expanding, the high-performance computing interconnection network is also becoming more and more large and complex. The monitoring work of high-performance computing interconnection network is to efficiently and real-time understand the online status of communication boards in the network, the configuration information of interconnected communication boards, link connection status, link bandwidth performance, link communication quality, etc.; if you want to fully understand The network running status of the entire high-performance computing cluster requires automatic network monitoring software. However, the technology adopted by the project is not real-time enough for monitoring data collection in the high-performance computing Internet network, and the fault location is not efficient enough, which makes the monitoring of the Internet network lag, and cannot timely reflect the problems existing in the Internet network.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中对高性能计算互联网络中监控数据采集不够实时、故障定位不够高效的不足,本发明提供了一种高性能计算互联网网络监控方法及系统。In order to solve the problems in the prior art that the monitoring data collection in the high-performance computing internet network is not real-time enough and the fault location is not efficient enough, the present invention provides a high-performance computing internet network monitoring method and system.

为解决上述技术问题,本发明的技术方案如下:For solving the above-mentioned technical problems, the technical scheme of the present invention is as follows:

一种高性能计算互联网网络监控方法,包括以下步骤:A high-performance computing Internet network monitoring method, comprising the following steps:

步骤S101:在监控数据采集节点中部署通信芯片状态寄存器采集程序;Step S101: Deploy a communication chip status register collection program in the monitoring data collection node;

步骤S102:启动采集程序,判断采集程序是否启动成功,若成功执行步骤S103,启动失败则重新执行步骤S101;Step S102: Start the collection program, and determine whether the collection program is successfully started, if step S103 is successfully performed, and if the startup fails, step S101 is performed again;

步骤S103:采集程序周期性的采集被监控的通信芯片上的状态寄存器的信息;Step S103: the collection program periodically collects the information of the status register on the monitored communication chip;

步骤S104:将采集程序采集到的状态寄存器的信息进行标准格式化处理,并存入数据库中;Step S104: carry out standard formatting processing on the information of the status register collected by the collection program, and store it in a database;

步骤S105:中间层数据处理端分析数据库中的数据是否有预警、报警信息,如果有预、报警信息则执行步骤S106,若没有预、报警信息则直接执行步骤S107;Step S105: The middle-layer data processing terminal analyzes whether the data in the database has warning or alarm information, and if there is warning or warning information, executes step S106, and if there is no warning or warning information, executes step S107 directly;

步骤S106:中间层数据处理端将预警、报警信息推送到终端;Step S106: the middle layer data processing terminal pushes the warning and alarm information to the terminal;

步骤S107:显示前端从中间层数据处理端中获取状态寄存器的数据信息,并将数据进行可视化。Step S107: the display front end obtains the data information of the status register from the data processing end of the middle layer, and visualizes the data.

优选的,所述的中间层数据处理端通过定时器函数连接数据库,周期性的处理采集到的数据并从中筛选出预警、报警信息,再将预警、报警信息推送出去。Preferably, the data processing end of the middle layer is connected to the database through a timer function, periodically processes the collected data, filters out early warning and alarm information, and then pushes the early warning and alarm information.

优选的,所述的中间层数据处理端通过restful api函数将采集到的数据json化供显示前端进行调用。Preferably, the middle-layer data processing end uses a restful api function to jsonize the collected data for the display front end to call.

优选的,所述的监控数据采集节点和数据库须在同一个局域网中。Preferably, the monitoring data collection node and the database must be in the same local area network.

优选的,所述的监控数据采集节点采用分布式部署。Preferably, the monitoring data collection node adopts distributed deployment.

优选的,所述的中间层数据处理端和显示前端采用b/s模式。Preferably, the middle layer data processing end and the display front end adopt b/s mode.

一种高性能计算互联网网络监控系统,所述系统基于上述所述的方法,包括底层数据采集端,数据库,中间层数据处理端以及显示前端;A high-performance computing Internet network monitoring system, the system is based on the above-mentioned method, and includes a bottom-layer data collection terminal, a database, a middle-layer data processing terminal and a display front-end;

所述的底层数据采集端包括状态寄存器采集程序,采集程序部署在通信芯片状态寄存器中,启动采集程序后,系统判断采集程序是否启动成功,启动失败则重新启动采集程序,若启动成功,则采集程序周期性的采集被监控的通信芯片上的状态寄存器的信息,并将采集程序采集到的状态寄存器的信息进行标准格式化处理,并存入数据库中,中间层数据处理端分析数据库中的数据是否有预警、报警信息,如果有预、报警信息,中间层数据处理端将预警、报警信息推送到终端;若没有预、报警信息,显示前端从中间层数据处理端中获取状态寄存器的数据信息,并将数据进行可视化。The underlying data acquisition terminal includes a status register acquisition program, and the acquisition program is deployed in the communication chip status register. After the acquisition program is started, the system determines whether the acquisition program is successfully started. If the start fails, the acquisition program is restarted. The program periodically collects the information of the status register on the monitored communication chip, and the information of the status register collected by the collection program is formatted and stored in the database, and the data in the database is analyzed by the middle-level data processing end Whether there is pre-warning or alarm information, if there is pre-alarm or pre-alarm information, the middle-layer data processing terminal will push the pre-warning and alarm information to the terminal; if there is no pre-alarm and alarm information, the display front end obtains the data information of the status register from the middle-layer data processing terminal , and visualize the data.

与现有技术相比,本发明技术方案的有益效果是:Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

本发明解决了高性能计算互联网络中监控数据采集不够实时、故障定位不够高效的问题,并且实现对预、报警信息的主动推送,实现一个统一的监控界面,直观高效,对芯片状态信息进行经过标准化、格式化的处理后便于系统管理员更直接高效的定位故障点,并且本发明采用能够根据采集到的数据主动推送预、报警信息,避免了人为的疏忽,稳定可靠。The invention solves the problems that the monitoring data collection is not real-time enough and the fault location is not efficient enough in the high-performance computing interconnection network, and realizes the active push of the pre-warning and alarm information, realizes a unified monitoring interface, is intuitive and efficient, and processes the chip status information. After standardized and formatted processing, it is convenient for system administrators to locate fault points more directly and efficiently, and the present invention can actively push pre-warning and alarm information according to the collected data, which avoids human negligence and is stable and reliable.

附图说明Description of drawings

图1为本发明的方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2为本发明的系统结构图。FIG. 2 is a system structure diagram of the present invention.

具体实施方式Detailed ways

附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent;

为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;In order to better illustrate this embodiment, some parts of the drawings are omitted, enlarged or reduced, which do not represent the size of the actual product;

对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。It will be understood by those skilled in the art that some well-known structures and their descriptions may be omitted from the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.

实施例1Example 1

如图1所示,一种高性能计算互联网网络监控方法,包括以下步骤:As shown in Figure 1, a high-performance computing Internet network monitoring method includes the following steps:

步骤S101:在监控数据采集节点中部署通信芯片状态寄存器采集程序;Step S101: Deploy a communication chip status register collection program in the monitoring data collection node;

步骤S102:启动采集程序,判断采集程序是否启动成功,若成功执行步骤S103,启动失败则重新执行步骤S101;Step S102: Start the collection program, and determine whether the collection program is successfully started, if step S103 is successfully performed, and if the startup fails, step S101 is performed again;

步骤S103:采集程序周期性的采集被监控的通信芯片上的状态寄存器的信息;Step S103: the collection program periodically collects the information of the status register on the monitored communication chip;

步骤S104:将采集程序采集到的状态寄存器的信息进行标准格式化处理,并存入数据库中;Step S104: carry out standard formatting processing on the information of the status register collected by the collection program, and store it in a database;

步骤S105:中间层数据处理端分析数据库中的数据是否有预警、报警信息,如果有预、报警信息则执行步骤S106,若没有预、报警信息则直接执行步骤S107;Step S105: The middle-layer data processing terminal analyzes whether the data in the database has warning or alarm information, and if there is warning or warning information, executes step S106, and if there is no warning or warning information, executes step S107 directly;

步骤S106:中间层数据处理端将预警、报警信息推送到终端;Step S106: the middle layer data processing terminal pushes the warning and alarm information to the terminal;

步骤S107:显示前端从中间层数据处理端中获取状态寄存器的数据信息,并将数据进行可视化。Step S107: the display front end obtains the data information of the status register from the data processing end of the middle layer, and visualizes the data.

作为一个优选的实施例,所述的中间层数据处理端通过定时器函数连接数据库,周期性的处理采集到的数据并从中筛选出预警、报警信息,再将预警、报警信息推送出去。As a preferred embodiment, the middle-layer data processing end connects to the database through a timer function, periodically processes the collected data, filters out early warning and alarm information, and then pushes the early warning and alarm information.

作为一个优选的实施例,所述的中间层数据处理端通过restful api函数将采集到的数据json化供显示前端进行调用。As a preferred embodiment, the middle-layer data processing end uses a restful api function to jsonize the collected data for the display front end to call.

作为一个优选的实施例,所述的监控数据采集节点和数据库须在同一个局域网中。As a preferred embodiment, the monitoring data collection node and the database must be in the same local area network.

作为一个优选的实施例,所述的监控数据采集节点采用分布式部署。As a preferred embodiment, the monitoring data collection node adopts distributed deployment.

作为一个优选的实施例,所述的中间层数据处理端和显示前端采用b/s模式。As a preferred embodiment, the middle layer data processing end and the display front end adopt b/s mode.

实施例2Example 2

如图2所示,一种高性能计算互联网网络监控系统,所述系统基于上述所述的方法,包括底层数据采集端1,数据库2,中间层数据处理端3以及显示前端4;As shown in FIG. 2 , a high-performance computing Internet network monitoring system, the system is based on the above-mentioned method, comprising a bottom data collection terminal 1, a database 2, a middle layer data processing terminal 3 and a display front end 4;

所述的底层数据采集端1包括状态寄存器采集程序,采集程序部署在通信芯片状态寄存器中,启动采集程序后,系统判断采集程序是否启动成功,启动失败则重新启动采集程序,若启动成功,则采集程序周期性的采集被监控的通信芯片上的状态寄存器的信息,并将采集程序采集到的状态寄存器的信息进行标准格式化处理,并存入数据库2中,中间层数据处理端2分析数据库1中的数据是否有预警、报警信息,如果有预、报警信息,中间层数据处理端3将预警、报警信息推送到终端;若没有预、报警信息,显示前端4从中间层数据处理端3中获取状态寄存器的数据信息,并将数据进行可视化。The underlying data acquisition terminal 1 includes a status register acquisition program, and the acquisition program is deployed in the communication chip status register. After the acquisition program is started, the system determines whether the acquisition program is successfully started. If the start fails, the acquisition program is restarted. The collection program periodically collects the information of the status register on the monitored communication chip, and the information of the status register collected by the collection program is processed in standard format and stored in the database 2, and the middle layer data processing terminal 2 analyzes the database Whether the data in 1 has pre-warning and alarm information, if there is pre-alarm information, the middle-layer data processing terminal 3 will push the warning and alarm information to the terminal; if there is no pre-alarm information, display front-end 4 from the middle-layer data processing terminal 3 Get the data information of the status register and visualize the data.

附图中描述位置关系的用语仅用于示例性说明,不能理解为对本专利的限制;The terms describing the positional relationship in the accompanying drawings are only used for exemplary illustration, and should not be construed as a limitation on this patent;

显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the embodiments of the present invention. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (7)

The bottom data collection terminal (1) includes status register capture program, and capture program is deployed in communication chip stateIn register, after starting capture program, system judges whether capture program starts success, and starting failure then restarts acquisition journeySequence, if starting successfully, capture program periodically acquires the information of the status register on monitored communication chip, and willThe information of the collected status register of capture program carries out standard format processing, and is stored in database (2), the intermediate number of pliesWhether there are early warning, warning message according to the data in processing end (2) analytical database (1), if there is pre-, warning message, middle layerData processing end (3) is by early warning, alarm information pushing to terminal;If showing front end (4) from middle layer without pre-, warning messageThe data information of status register is obtained in data processing end (3), and data are visualized.
CN201910402272.7A2019-05-142019-05-14 A high-performance computing Internet network monitoring method and systemPendingCN110289981A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910402272.7ACN110289981A (en)2019-05-142019-05-14 A high-performance computing Internet network monitoring method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910402272.7ACN110289981A (en)2019-05-142019-05-14 A high-performance computing Internet network monitoring method and system

Publications (1)

Publication NumberPublication Date
CN110289981Atrue CN110289981A (en)2019-09-27

Family

ID=68001917

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910402272.7APendingCN110289981A (en)2019-05-142019-05-14 A high-performance computing Internet network monitoring method and system

Country Status (1)

CountryLink
CN (1)CN110289981A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113553228A (en)*2021-06-212021-10-26华东计算技术研究所(中国电子科技集团公司第三十二研究所) Lightweight computer state monitoring system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5761200A (en)*1993-10-271998-06-02Industrial Technology Research InstituteIntelligent distributed data transfer system
CN102647736A (en)*2012-04-192012-08-22华为技术有限公司 A device status information acquisition system and communication method
CN103500133A (en)*2013-09-172014-01-08华为技术有限公司Fault locating method and device
CN108563550A (en)*2018-04-232018-09-21上海达梦数据库有限公司A kind of monitoring method of distributed system, device, server and storage medium
CN108710347A (en)*2018-04-162018-10-26佛山市顺德区中山大学研究院A kind of monitoring cloud platform
WO2019058615A1 (en)*2017-09-212019-03-28株式会社東芝Industrial plant monitoring device and distributed control system
CN109586999A (en)*2018-11-122019-04-05深圳先进技术研究院A kind of container cloud platform condition monitoring early warning system, method and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5761200A (en)*1993-10-271998-06-02Industrial Technology Research InstituteIntelligent distributed data transfer system
CN102647736A (en)*2012-04-192012-08-22华为技术有限公司 A device status information acquisition system and communication method
CN103500133A (en)*2013-09-172014-01-08华为技术有限公司Fault locating method and device
WO2019058615A1 (en)*2017-09-212019-03-28株式会社東芝Industrial plant monitoring device and distributed control system
CN108710347A (en)*2018-04-162018-10-26佛山市顺德区中山大学研究院A kind of monitoring cloud platform
CN108563550A (en)*2018-04-232018-09-21上海达梦数据库有限公司A kind of monitoring method of distributed system, device, server and storage medium
CN109586999A (en)*2018-11-122019-04-05深圳先进技术研究院A kind of container cloud platform condition monitoring early warning system, method and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵哲等: "基于Zabbix的网络监控系统", 《计算机技术与发展》*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113553228A (en)*2021-06-212021-10-26华东计算技术研究所(中国电子科技集团公司第三十二研究所) Lightweight computer state monitoring system and method

Similar Documents

PublicationPublication DateTitle
CN112671560B (en)High-availability distributed real-time alarm processing method and system
CN103401698B (en)For the monitoring system that server health is reported to the police in server set group operatione
CN110290012A (en)The detection recovery system and method for RabbitMQ clustering fault
CN105610648B (en) A method and server for collecting operation and maintenance monitoring data
CN111181767A (en)Monitoring and fault self-healing system and method for complex system
CN105306272B (en)Information system fault scenes formation gathering method and system
CN113704052B (en)Operation and maintenance system, method, equipment and medium of micro-service architecture
US5822302A (en)LAN early warning system
CN100426751C (en)Method for ensuring accordant configuration information in cluster system
CN110287079A (en) A cluster automatic monitoring system and method
CN106789306B (en)Method and system for detecting, collecting and recovering software fault of communication equipment
CN111881011A (en) Log management method, platform, server and storage medium
JP2001511278A (en) Method and apparatus for pruning a multiprocessor system for maximal total connections during recovery
CN114896166B (en) Scene library construction method, device, electronic device and storage medium
CN114707363B (en) Problem data processing method and system for distribution network engineering management
CN106911519B (en)Data acquisition monitoring method and device
CN110417586A (en)Service monitoring method, service node, server and computer readable storage medium
CN110289981A (en) A high-performance computing Internet network monitoring method and system
CN111752488B (en)Management method and device of storage cluster, management node and storage medium
CN116264541A (en) A multi-dimensional database disaster recovery method and device
CN113765690A (en)Cluster switching method, system, device, terminal, server and storage medium
CN111586608A (en)Intelligent health service system of power supply vehicle and data transmission method thereof
CN114885014A (en)Method, device, equipment and medium for monitoring external field equipment state
CN113688111A (en)Cross-region message copying method, system, electronic equipment and readable storage medium
CN108829563B (en)Alarm method and alarm device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20190927


[8]ページ先頭

©2009-2025 Movatter.jp