Movatterモバイル変換


[0]ホーム

URL:


CN103812699A - Monitoring management system based on cloud computing - Google Patents

Monitoring management system based on cloud computing
Download PDF

Info

Publication number
CN103812699A
CN103812699ACN201410052286.8ACN201410052286ACN103812699ACN 103812699 ACN103812699 ACN 103812699ACN 201410052286 ACN201410052286 ACN 201410052286ACN 103812699 ACN103812699 ACN 103812699A
Authority
CN
China
Prior art keywords
fault
monitoring
management system
engine
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410052286.8A
Other languages
Chinese (zh)
Inventor
许广彬
郭晓
张银滨
李德才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Huayun Data Technology Service Co Ltd
Original Assignee
Wuxi Huayun Data Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Huayun Data Technology Service Co LtdfiledCriticalWuxi Huayun Data Technology Service Co Ltd
Priority to CN201410052286.8ApriorityCriticalpatent/CN103812699A/en
Publication of CN103812699ApublicationCriticalpatent/CN103812699A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The invention provides a monitoring management system based on cloud computing. The monitoring management system comprises a data acquisition unit which comprises a monitoring client for acquiring node data in a large-scale cluster server in real time, and three monitoring databases for storing the node data, a fault characteristic library and a fault handling unit, wherein the fault characteristic library is used for defining and storing fault characteristic items, and the monitoring client is used for verifying the node data acquired in real time and the fault characteristic items in the fault characteristic library in order to judge whether a fault occurs, and transmitting a fault command to the fault handling unit if the fault occurs; the fault handling unit is used for making a response to the fault command transmitted by the monitoring client, generating a fault handling strategy, and transmitting the fault handling strategy to the large-scale cluster server. By adopting the monitoring management system, data acquisition, early warning and fault solution can be performed automatically on the fault of each node in the large-scale cluster server based on cloud computing, thereby improving the stability and availability of the large-scale cluster server.

Description

Monitoring management system based on cloud computing
Technical Field
The invention relates to the technical field of cloud computing, in particular to a monitoring management system based on cloud computing, which is used for fault data acquisition, fault monitoring, fault early warning and fault recovery of faults of data nodes in the process of cloud computing large-scale cluster servers.
Background
In a cloud computing system, it is necessary to monitor the operating state of a data node and perform a failure recovery operation when a failure occurs.
In the prior art, a monitoring client is installed in a cloud server, and the running state of a data node is dynamically acquired and reported by closing or opening the monitoring client and by means of multiple concurrent information acquisition and reporting, message mining and automatic processing technologies. When a cloud server failure is discovered, a new node is dynamically created on a healthy physical server. However, the technology is not suitable for a cloud computing system of a large-scale cluster because the monitoring of each cloud node is relatively single in breadth and depth.
In order to meet the requirements of a cloud computing system of a large-scale cluster, the cloud computing service platform provided by main cloud service providers at home and abroad at present basically adopts an open-source architecture. For example, the disclosure of the patent is CN103024060A entitled "an open cloud computing large-scale cluster monitoring system and method". The method mainly adopts a plug-in design mode, and is used for monitoring a virtual machine cluster formed by a plurality of VMs (virtual machines) and a physical machine cluster formed by a plurality of PMs (physical machines) and sampling node data through an open API (application program interface), and is used for collecting related operating parameters of the VMs or the PMs and cluster platform parameters such as Hadoop. However, this technical solution often only can monitor and alarm, but cannot realize the function of providing automatic fault handling for each node (including VM and PM) in cloud computing.
In view of the above, there is a need to improve the node monitoring and automatic recovery technique in the large-scale cloud computing based cluster server in the prior art to solve the above technical defects.
Disclosure of Invention
The invention aims to disclose a monitoring management system based on cloud computing, which is used for monitoring and automatically recovering faults of nodes in the cloud computing, so that the potential faults or the faults which occur are monitored and automatically recovered, and the stability and the usability of a large-scale cluster server based on the cloud computing are guaranteed.
In order to achieve the above object, the present invention provides a monitoring management system based on cloud computing, which is used for monitoring and managing the operation state of a large-scale cluster server in cloud computing, and includes:
a data acquisition unit, comprising: the system comprises a monitoring client used for collecting node data in a large-scale cluster server in real time and three monitoring databases used for storing the node data;
the monitoring management system further comprises:
a fault feature library and a fault processing unit; wherein,
the fault feature library is used for defining and storing fault feature items, the monitoring client verifies the node data acquired in real time and the fault feature items in the fault feature library to judge whether the node data is a fault, and if the node data is the fault, a fault instruction is sent to the fault processing unit;
and the fault processing unit is used for responding to the fault instruction sent by the monitoring client, generating a fault processing strategy and sending the fault processing strategy to the large-scale cluster server.
As a further improvement of the present invention, the fault processing unit includes a fault monitoring engine, a fault early warning engine, and a fault recovery engine, and the fault monitoring engine receives a fault instruction sent by the monitoring client after verification, sends the fault instruction to the fault early warning engine and the fault recovery engine, generates a fault processing policy by the fault recovery engine, and feeds the fault processing policy back to the fault monitoring engine.
As a further improvement of the invention, the large-scale cluster server comprises a plurality of physical machines, and is virtualized into a plurality of virtual machines with distributed data structures through the physical machines.
As a further improvement of the present invention, the data acquisition unit further includes an administrator interface module, which is used to receive the fault feature item defined by initialization, and output the fault feature item to the fault feature library for storage.
As a further improvement of the present invention, the present invention further includes a Web client remotely connected to the fault processing unit and embedded in the visualization device, so as to create and display the operation status of each data node in the large-scale cluster server in real time, and a user can manually configure user configuration information through the Web client.
As a further improvement of the present invention, the user configuration information includes: fault monitoring strategy, fault early warning strategy, fault recovery strategy and user-defined fault characteristic item.
As a further improvement of the invention, the visualization device comprises a mobile phone and a personal computer.
As a further improvement of the invention, the failure signature library comprises a MySQL database.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the data acquisition, early warning and fault resolution can be automatically carried out on the fault of each node in the large-scale cluster server based on cloud computing, so that the stability and the availability of the large-scale cluster server are improved.
Drawings
Fig. 1 is a schematic structural diagram of a monitoring management system based on cloud computing according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of another embodiment of the monitoring management system based on cloud computing according to the present invention.
Wherein, the reference numbers in the detailed description are as follows:
monitoring management system-100; a data acquisition unit-10; a monitoring client-102; large-scale cluster server-11; virtual machine-11 a; a physical machine-11 b; monitoring databases-111, 112, 113; fault signature library-103; a fault handling unit-40; fault monitoring engine-401; a fault warning engine-402; fail-over engine-403; administrator interface module-104; a Web client-50; visualization device-501.
Detailed Description
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
The present invention aims to implement unified state monitoring, fault detection, early warning, alarming and automatic recovery on physical resources (including physical computing devices, physical storage devices, physical network devices and physical security devices) and virtual resources (including virtual computing devices, virtual storage devices, virtual network devices and virtual security devices) of a large-scale or super-large-scale cluster server in cloud computing through themonitoring management system 100 according to the specific embodiments shown in the present invention, so as to ensure that the physical resources and the virtual resources of each data node in cloud computing are in a healthy and highly available state.
The first embodiment is as follows:
please refer to a first embodiment of the monitoring management system based on cloud computing shown in fig. 1, which discloses amonitoring management system 100 based on cloud computing, configured to monitor and manage an operation state of a large-scale cluster server 11 in cloud computing.
In the present embodiment, themonitoring management system 100 includes:
adata acquisition unit 10 comprising: the monitoring system comprises amonitoring client 102 for collecting node data in the large-scale cluster server 11 in real time, and threemonitoring databases 111, 112, 113 for storing the node data. Specifically, themonitoring client 102 in thedata acquisition unit 10 acquires data such as CPUs, disk IOs, ports, processes, DNS, and the like in different network nodes or nodes of different network types in real time through a Virtual private network (VMN), and sends a storage access request after dividing the data into three parts horizontally; and then, three pieces of data which are horizontally split are respectively stored in themonitoring databases 111, 112 and 113 by calling MySQL data interfaces of themonitoring databases 111, 112 and 113 so as to be ready for the real-time access of themonitoring client 102.
In the present embodiment, themonitoring databases 111, 112, and 113 are each a MySQL database.
The operation of horizontally splitting the node data can be completed at the MySQL database end, so that the bottleneck problem of a large amount of or ultra-large amount of data and a high-load table can be avoided, and the transaction processing is relatively simple. The data stored in thedifferent monitoring databases 111, 112, 113 at these levels can help themonitoring client 102 to find out abnormal changes of data of each node in the large-scale cluster server 11 in time, and warn the occurrence of small abnormal changes in each data node through a large amount of historical data.
In this embodiment, themonitoring management system 100 further includes: afault feature library 103 and afault processing unit 40.
Specifically, thefault feature library 103 is configured to define and store fault feature items, and themonitoring client 102 verifies node data acquired in real time with the fault feature items in thefault feature library 103 to determine whether the node data is a fault, and if the node data is a fault, sends a fault instruction to thefault processing unit 40. Thefault signature library 103 is a MySQL database, may also be an Oracle database, and is preferably a MySQL database.
In this embodiment, thefailure processing unit 40 is configured to respond to the failure instruction sent by themonitoring client 102, generate a failure processing policy, and send the failure processing policy to the large-scale cluster server 11.
Specifically, thefault handling unit 40 includes afault monitoring engine 401, a faultearly warning engine 402, and afault recovery engine 403, where thefault monitoring engine 401 receives a fault instruction sent by themonitoring client 10 after verification, sends the fault instruction to the faultearly warning engine 402 and thefault recovery engine 403, generates a fault handling policy by thefault recovery engine 403, and feeds the fault handling policy back to thefault monitoring engine 401.
The large-scale cluster server 11 includes a plurality of Physical machines 11b (PMs) and is virtualized by the Physical machines into a plurality of virtual machines 11a (VMs) having a distributed data structure.
After the administrator defines a common fault and initializes thefault feature library 103, once themonitoring client 102 in thedata acquisition unit 10 finds abnormal monitoring data, themonitoring client 102 may upload fault information to thefault feature library 103 in thedata acquisition unit 10 through a Virtual Monitoring Network (VMN) for verification.
Firstly, reading mass data mining analysis from themonitoring databases 111, 112, 113 to obtain data such as I/O, response time, communication rate, online rate and the like of the abnormal monitoring item, comparing the data with normal data, and calculating deviation amount according to the monitoring data. If the amount of deviation exceeds a threshold value entered from thefault recovery engine 403, a point of failure is counted;
secondly, the corresponding Globally Unique Identifier (GUID) of the fault point in themonitoring databases 111, 112, 113 is queried, and all possible fault types are found in thefault feature library 103 through the GUID.
Next, thefailure recovery engine 403 performs a one-by-one troubleshooting on the failure types found in the last step, where the troubleshooting is performed in an exploratory mode, such as a cloud server network interruption. Wherein possible fault types include: the method comprises the following steps of network failure of a rack where a cloud server is located, network failure of a computing node where the cloud server is located, failure of the cloud server and the like.
Next, an example of "network failure" and "disk failure" occurring in a certain node in the large-scale cluster server 11 will be described in detail.
When a "network failure" occurs, thefailure recovery engine 403 firstly pins the chassis gateway, if the network cannot be connected, it is determined that a network failure occurs in the chassis, and immediately starts a recovery measure: switching to a standby network and starting the standby network; and if the network is unobstructed, then the computing node where the Ping cloud server is located.
Similarly, if the node network where the cloud server is located cannot be connected, the network is judged to be in fault, a recovery measure is immediately started, and all cloud servers on the cloud computing node copy and start the copies of the cloud server on the computing nodes which normally operate.
And if the network of the computing nodes is smooth, judging that the fault is the network fault of the cloud server, immediately copying and starting the copy of the cloud server on other computing nodes which normally operate. The fault type can be determined by checking, and the fault link can be automatically recovered. After the fault recovery task is completed, themonitoring client 102 verifies the fault processing, and if the fault processing is completed, the fault automatic recovery working process is ended.
When a disk failure occurs, the exploratory troubleshooting results in the disk failure, the GUID of the failed disk is immediately read from themonitoring databases 111, 112, 113, two copies of the data stored in the failed disk in other storage servers are found, a normal storage server node with a low load rate is found in the large-scale cluster server 11 according to the load rate, and the data is copied from the two copies to the normal storage server node.
After copying is completed, a VM table is searched in thefault feature library 103, a VM associated with the GUID of the fault disk is found, a corresponding table between the VM and the disk is modified, the corresponding relation between the fault disk and the VM is deleted, and the GUID of a new storage server is written. If other types of faults are encountered, the above process is repeatedly executed.
In this embodiment, the monitoring andmanagement system 100 further includes aWeb client 50 remotely connected to thefault processing unit 40 and embedded in thevisualization device 501, so as to create and display the operating status of each data node in the large-scale cluster server 11 in real time, and manually configure the user configuration information through theWeb client 50.
Specifically, the user configuration information includes: fault monitoring strategy, fault early warning strategy, fault recovery strategy and user-defined fault characteristic item. In a preferred embodiment, thevisualization device 501 is a mobile phone, and is further preferably a smart phone, and displays information such as the operation status of each data node in the large-scale cluster server 11 displayed in thefault processing unit 40 to the user in real time through a 2G \ 3G \ 4G wireless network.
Different failure processing recovery modes are configured on the Web page provided by thefailure processing engine 403, and thefailure processing engine 403 recovers various failures according to a preset rule.
Through the Web page configuration provided by thefault processing engine 403, a user can deploy and apply cloud services such as a cloud server, load balancing, relational data storage, and the like which the user is using, and after logging in thefault processing unit 40 according to the requirements of the application on the service continuity and data consistency of the user, personalized configuration is performed on the fault monitoring frequency, the monitoring granularity, the monitoring items, and the fault processing mode, and the configuration information can be stored in thefault processing engine 403 through an API.
Example two:
please refer to fig. 2, which shows another embodiment of a monitoring management system based on cloud computing according to the present invention. The main difference between this embodiment and the first embodiment is that, in this embodiment, thedata acquisition unit 10 in themonitoring management system 100 further includes an administrator interface module 104, which is used to receive the fault feature item defined by initialization, and output the fault feature item to thefault feature library 103 for storage.
Meanwhile, in this embodiment, thevisualization device 501 is a personal computer, and may display information such as the operation state of each node in the large-scale cluster server 11 displayed in thefault processing unit 40 to the user in real time through other wireless network connection methods such as WLAN, Internet, or WAP.
The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (8)

CN201410052286.8A2014-02-172014-02-17Monitoring management system based on cloud computingPendingCN103812699A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201410052286.8ACN103812699A (en)2014-02-172014-02-17Monitoring management system based on cloud computing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201410052286.8ACN103812699A (en)2014-02-172014-02-17Monitoring management system based on cloud computing

Publications (1)

Publication NumberPublication Date
CN103812699Atrue CN103812699A (en)2014-05-21

Family

ID=50708940

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201410052286.8APendingCN103812699A (en)2014-02-172014-02-17Monitoring management system based on cloud computing

Country Status (1)

CountryLink
CN (1)CN103812699A (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104092730A (en)*2014-06-202014-10-08裴兆欣Cloud computing system
CN104281483A (en)*2014-09-112015-01-14江苏集群软件股份有限公司Virtual machine control system based on cloud computing platform and control method of virtual machine control system
CN104657622A (en)*2015-03-122015-05-27浪潮集团有限公司Cluster fault analysis method based on event-driven analysis
CN104796299A (en)*2015-03-232015-07-22浪潮集团有限公司Cluster state monitoring method based on wearable equipment
CN104935464A (en)*2015-06-122015-09-23北京奇虎科技有限公司 Fault warning method and device for a website system
CN105024851A (en)*2015-06-252015-11-04四川理工学院 A monitoring and management system based on cloud computing
CN105337999A (en)*2015-12-012016-02-17成都中讯创新信息技术有限公司Method for improving stability of cloud computing environment
CN105450751A (en)*2015-12-012016-03-30成都中讯创新信息技术有限公司System capable of improving stability of cloud computing environment
CN105491108A (en)*2015-11-192016-04-13浪潮集团有限公司System and method for processing remote sensing images
CN105512788A (en)*2015-05-042016-04-20上海北塔软件股份有限公司Intelligent operation and maintenance management method and system
CN105516283A (en)*2015-12-012016-04-20成都中讯创新信息技术有限公司Device for enhancing stability of cloud computing environment
CN106095644A (en)*2016-06-222016-11-09天维尔信息科技股份有限公司A kind of business software monitoring method and system
CN106407030A (en)*2016-09-132017-02-15郑州云海信息技术有限公司Failure processing method and system for storage cluster system
CN106612199A (en)*2015-10-262017-05-03华耀(中国)科技有限公司Network monitoring data collection and analysis system and method
CN106657382A (en)*2017-01-112017-05-10北京学利美科技有限公司Windows and Linux server information acquisition and management control model
CN106789345A (en)*2017-01-202017-05-31厦门集微科技有限公司Passageway switching method and device
WO2017162173A1 (en)*2016-03-222017-09-28中兴通讯股份有限公司Method and device for establishing connection of cloud server cluster
CN107222346A (en)*2017-06-092017-09-29郑州云海信息技术有限公司A kind of clustered node health status method for early warning and system
CN107294786A (en)*2017-07-132017-10-24郑州云海信息技术有限公司A kind of failure information processing method and device
CN107491375A (en)*2017-08-182017-12-19国网山东省电力公司信息通信公司Equipment detection and fault early warning system and method under a kind of cloud computing environment
CN107888437A (en)*2016-09-292018-04-06阿里巴巴集团控股有限公司Cloud monitoring method and equipment
CN108241544A (en)*2016-12-232018-07-03航天星图科技(北京)有限公司A kind of fault handling method based on cluster
CN108289034A (en)*2017-06-212018-07-17新华三大数据技术有限公司A kind of fault discovery method and apparatus
CN108418724A (en)*2018-06-042018-08-17广西电网有限责任公司 Next Generation Critical Information Infrastructure Network Intelligent Management System Based on Cloud Computing
CN108809708A (en)*2018-06-042018-11-13深圳众厉电力科技有限公司A kind of powerline network node failure detecting system
CN109144813A (en)*2018-07-262019-01-04郑州云海信息技术有限公司A kind of cloud computing system server node fault monitoring system and method
CN110287081A (en)*2019-06-212019-09-27腾讯科技(成都)有限公司A kind of service monitoring system and method
CN110781065A (en)*2019-10-282020-02-11北京北信源软件股份有限公司Service application monitoring method and device
CN110825396A (en)*2019-10-312020-02-21Oppo(重庆)智能科技有限公司Exception handling method and related equipment
CN110825632A (en)*2019-11-012020-02-21北京金山云网络技术有限公司 Cloud computing resource metering data testing method, system, device and electronic equipment
CN111224819A (en)*2019-12-302020-06-02上海汇付数据服务有限公司Distributed messaging system
CN112328444A (en)*2020-10-092021-02-05国家电网有限公司Cloud computer management system and management method thereof
CN112749053A (en)*2020-12-142021-05-04北京同有飞骥科技股份有限公司Intelligent fault monitoring and intelligent repair management system based on cloud platform
CN114020503A (en)*2021-10-192022-02-08济南浪潮数据技术有限公司Optimization method, system and device for transparent fault switching of distributed file system

Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101159539A (en)*2007-11-202008-04-09中国人民解放军信息工程大学 Intrusion Tolerant Application Server and Intrusion Tolerant Method Based on J2EE Middleware Specification
CN101917460A (en)*2010-07-222010-12-15河南远为网络信息技术有限公司Virtual machine technique-based remote maintenance system
CN102523137A (en)*2011-12-222012-06-27华为技术服务有限公司Fault monitoring method, device and system
CN102571499A (en)*2012-02-142012-07-11广州亦云信息技术有限公司Monitoring method of cloud database server cluster
CN103024060A (en)*2012-12-202013-04-03中国科学院深圳先进技术研究院Open type cloud computing monitoring system for large scale cluster and method thereof
CN103200050A (en)*2013-04-122013-07-10北京百度网讯科技有限公司Server hardware state monitoring method and server hardware state monitoring system
CN103236949A (en)*2013-04-272013-08-07北京搜狐新媒体信息技术有限公司Monitoring method, device and system for server cluster
CN103338261A (en)*2013-07-042013-10-02北京泰乐德信息技术有限公司Storage and processing method and system of rail transit monitoring data
CN103391185A (en)*2013-08-122013-11-13北京泰乐德信息技术有限公司Cloud security storage and processing method and system for rail transit monitoring data
CN103403689A (en)*2012-07-302013-11-20华为技术有限公司Resource failure management method, device and system
CN103440160A (en)*2013-08-152013-12-11华为技术有限公司Virtual machine recovering method and virtual machine migration method , device and system
CN103580920A (en)*2013-11-072014-02-12江南大学Method for detecting abnormal operation of information system based on cloud computing technology
CN103580924A (en)*2013-11-122014-02-12武汉钢铁(集团)公司Fault location method, device and system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101159539A (en)*2007-11-202008-04-09中国人民解放军信息工程大学 Intrusion Tolerant Application Server and Intrusion Tolerant Method Based on J2EE Middleware Specification
CN101917460A (en)*2010-07-222010-12-15河南远为网络信息技术有限公司Virtual machine technique-based remote maintenance system
CN102523137A (en)*2011-12-222012-06-27华为技术服务有限公司Fault monitoring method, device and system
CN102571499A (en)*2012-02-142012-07-11广州亦云信息技术有限公司Monitoring method of cloud database server cluster
CN103403689A (en)*2012-07-302013-11-20华为技术有限公司Resource failure management method, device and system
CN103024060A (en)*2012-12-202013-04-03中国科学院深圳先进技术研究院Open type cloud computing monitoring system for large scale cluster and method thereof
CN103200050A (en)*2013-04-122013-07-10北京百度网讯科技有限公司Server hardware state monitoring method and server hardware state monitoring system
CN103236949A (en)*2013-04-272013-08-07北京搜狐新媒体信息技术有限公司Monitoring method, device and system for server cluster
CN103338261A (en)*2013-07-042013-10-02北京泰乐德信息技术有限公司Storage and processing method and system of rail transit monitoring data
CN103391185A (en)*2013-08-122013-11-13北京泰乐德信息技术有限公司Cloud security storage and processing method and system for rail transit monitoring data
CN103440160A (en)*2013-08-152013-12-11华为技术有限公司Virtual machine recovering method and virtual machine migration method , device and system
CN103580920A (en)*2013-11-072014-02-12江南大学Method for detecting abnormal operation of information system based on cloud computing technology
CN103580924A (en)*2013-11-122014-02-12武汉钢铁(集团)公司Fault location method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董波: "云计算集群服务器系统监控方法的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》*

Cited By (48)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104092730A (en)*2014-06-202014-10-08裴兆欣Cloud computing system
CN104281483A (en)*2014-09-112015-01-14江苏集群软件股份有限公司Virtual machine control system based on cloud computing platform and control method of virtual machine control system
CN104657622A (en)*2015-03-122015-05-27浪潮集团有限公司Cluster fault analysis method based on event-driven analysis
CN104796299A (en)*2015-03-232015-07-22浪潮集团有限公司Cluster state monitoring method based on wearable equipment
CN105512788A (en)*2015-05-042016-04-20上海北塔软件股份有限公司Intelligent operation and maintenance management method and system
CN104935464A (en)*2015-06-122015-09-23北京奇虎科技有限公司 Fault warning method and device for a website system
CN104935464B (en)*2015-06-122018-07-06北京奇虎科技有限公司The fault early warning method and device of a kind of web station system
CN105024851A (en)*2015-06-252015-11-04四川理工学院 A monitoring and management system based on cloud computing
CN105024851B (en)*2015-06-252018-07-24四川理工学院A kind of monitoring management system based on cloud computing
CN106612199B (en)*2015-10-262019-10-25华耀(中国)科技有限公司 System and method for collecting and analyzing network monitoring data
CN106612199A (en)*2015-10-262017-05-03华耀(中国)科技有限公司Network monitoring data collection and analysis system and method
CN105491108A (en)*2015-11-192016-04-13浪潮集团有限公司System and method for processing remote sensing images
CN105516283A (en)*2015-12-012016-04-20成都中讯创新信息技术有限公司Device for enhancing stability of cloud computing environment
CN105450751A (en)*2015-12-012016-03-30成都中讯创新信息技术有限公司System capable of improving stability of cloud computing environment
CN105337999B (en)*2015-12-012018-11-20南京冠楷信息技术有限公司A method of improving cloud computing environment stability
CN105450751B (en)*2015-12-012018-09-25成都中讯创新信息技术有限公司A kind of system improving cloud computing environment stability
CN105516283B (en)*2015-12-012018-09-25成都中讯创新信息技术有限公司A kind of device improving cloud computing environment stability
CN105337999A (en)*2015-12-012016-02-17成都中讯创新信息技术有限公司Method for improving stability of cloud computing environment
WO2017162173A1 (en)*2016-03-222017-09-28中兴通讯股份有限公司Method and device for establishing connection of cloud server cluster
CN106095644A (en)*2016-06-222016-11-09天维尔信息科技股份有限公司A kind of business software monitoring method and system
CN106407030A (en)*2016-09-132017-02-15郑州云海信息技术有限公司Failure processing method and system for storage cluster system
CN107888437A (en)*2016-09-292018-04-06阿里巴巴集团控股有限公司Cloud monitoring method and equipment
CN107888437B (en)*2016-09-292021-11-02阿里巴巴集团控股有限公司Cloud monitoring method and equipment
CN108241544B (en)*2016-12-232023-06-06中科星图股份有限公司Fault processing method based on clusters
CN108241544A (en)*2016-12-232018-07-03航天星图科技(北京)有限公司A kind of fault handling method based on cluster
CN106657382A (en)*2017-01-112017-05-10北京学利美科技有限公司Windows and Linux server information acquisition and management control model
CN106789345A (en)*2017-01-202017-05-31厦门集微科技有限公司Passageway switching method and device
CN106789345B (en)*2017-01-202019-07-23厦门集微科技有限公司Passageway switching method and device
CN107222346A (en)*2017-06-092017-09-29郑州云海信息技术有限公司A kind of clustered node health status method for early warning and system
WO2018233630A1 (en)*2017-06-212018-12-27新华三大数据技术有限公司 DISCOVERY OF FAILURE
CN108289034A (en)*2017-06-212018-07-17新华三大数据技术有限公司A kind of fault discovery method and apparatus
CN107294786A (en)*2017-07-132017-10-24郑州云海信息技术有限公司A kind of failure information processing method and device
CN107491375A (en)*2017-08-182017-12-19国网山东省电力公司信息通信公司Equipment detection and fault early warning system and method under a kind of cloud computing environment
CN108418724B (en)*2018-06-042019-01-04广西电网有限责任公司Next-generation key message infrastructure network intelligent management system based on cloud computing
CN108418724A (en)*2018-06-042018-08-17广西电网有限责任公司 Next Generation Critical Information Infrastructure Network Intelligent Management System Based on Cloud Computing
CN108809708A (en)*2018-06-042018-11-13深圳众厉电力科技有限公司A kind of powerline network node failure detecting system
CN109144813A (en)*2018-07-262019-01-04郑州云海信息技术有限公司A kind of cloud computing system server node fault monitoring system and method
CN109144813B (en)*2018-07-262022-08-05郑州云海信息技术有限公司System and method for monitoring server node fault of cloud computing system
CN110287081A (en)*2019-06-212019-09-27腾讯科技(成都)有限公司A kind of service monitoring system and method
CN110781065A (en)*2019-10-282020-02-11北京北信源软件股份有限公司Service application monitoring method and device
CN110825396A (en)*2019-10-312020-02-21Oppo(重庆)智能科技有限公司Exception handling method and related equipment
CN110825396B (en)*2019-10-312023-07-25Oppo(重庆)智能科技有限公司Exception handling method and related equipment
CN110825632A (en)*2019-11-012020-02-21北京金山云网络技术有限公司 Cloud computing resource metering data testing method, system, device and electronic equipment
CN110825632B (en)*2019-11-012023-10-03北京金山云网络技术有限公司Cloud computing resource metering data testing method, system and device and electronic equipment
CN111224819A (en)*2019-12-302020-06-02上海汇付数据服务有限公司Distributed messaging system
CN112328444A (en)*2020-10-092021-02-05国家电网有限公司Cloud computer management system and management method thereof
CN112749053A (en)*2020-12-142021-05-04北京同有飞骥科技股份有限公司Intelligent fault monitoring and intelligent repair management system based on cloud platform
CN114020503A (en)*2021-10-192022-02-08济南浪潮数据技术有限公司Optimization method, system and device for transparent fault switching of distributed file system

Similar Documents

PublicationPublication DateTitle
CN103812699A (en)Monitoring management system based on cloud computing
US11663055B2 (en)Dependency analyzer in application dependency discovery, reporting, and management tool
US10459780B2 (en)Automatic application repair by network device agent
CN102231681B (en)High availability cluster computer system and fault treatment method thereof
CN107547273B (en)Method and system for guaranteeing high availability of virtual instance of power system
US11157373B2 (en)Prioritized transfer of failure event log data
CN108039964B (en) Fault handling method, device and system based on network function virtualization
US9450700B1 (en)Efficient network fleet monitoring
US10198284B2 (en)Ensuring operational integrity and performance of deployed converged infrastructure information handling systems
US9841986B2 (en)Policy based application monitoring in virtualized environment
CN102355368B (en)Fault processing method of network equipment and system
US20110225582A1 (en)Snapshot management method, snapshot management apparatus, and computer-readable, non-transitory medium
CN103605722B (en)Database monitoring method and device, equipment
CN102662821A (en)Method, device and system for auxiliary diagnosis of virtual machine failure
WO2016188100A1 (en)Information system fault scenario information collection method and system
CN104252500A (en)Method and device for carrying out fault repairing on database management platform
CN104618161A (en)Application cluster monitoring device and method
CN107360045A (en)The monitoring method and device of a kind of storage cluster system
CN112235300B (en)Cloud virtual network vulnerability detection method, system, device and electronic equipment
CN108845865A (en) A monitoring service deployment method, system and storage medium
CN110063042A (en)A kind of response method and its terminal of database failure
CN102902615A (en)Failure alarm method and system for Lustre parallel file system
BR112017001171B1 (en) METHOD PERFORMED ON A COMPUTING DEVICE, COMPUTING DEVICE AND COMPUTER READABLE MEMORY DEVICE TO RECOVER THE OPERABILITY OF A CLOUD-BASED SERVICE
CN103973516A (en)Method and device for achieving monitoring function in data processing system
CN105068763A (en)Virtual machine fault-tolerant system and method for storage faults

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20140521

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp