Movatterモバイル変換


[0]ホーム

URL:


CN108390793A - A kind of method and device of analysis system stability - Google Patents

A kind of method and device of analysis system stability
Download PDF

Info

Publication number
CN108390793A
CN108390793ACN201810083390.1ACN201810083390ACN108390793ACN 108390793 ACN108390793 ACN 108390793ACN 201810083390 ACN201810083390 ACN 201810083390ACN 108390793 ACN108390793 ACN 108390793A
Authority
CN
China
Prior art keywords
data
abnormal
operation data
monitoring
fluctuation range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810083390.1A
Other languages
Chinese (zh)
Inventor
孙迁
叶国华
刘发亮
马翔
杜中原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Commerce Group Co Ltd
Original Assignee
Suning Commerce Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Commerce Group Co LtdfiledCriticalSuning Commerce Group Co Ltd
Priority to CN201810083390.1ApriorityCriticalpatent/CN108390793A/en
Publication of CN108390793ApublicationCriticalpatent/CN108390793A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a kind of method and devices of analysis system stability, are related to field of computer technology, can improve intelligence degree and the accuracy of monitoring system stability.The present invention includes:Acquisition and the associated operation data of monitor control index;Using the correlation between different monitor control indexes, pending operation data is selected from the operation data acquired, and determine fluctuation range;According to the fluctuation range, the abnormal conditions of the current operation data of system are obtained.Stability for analysis system.

Description

Method and device for analyzing system stability
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for analyzing system stability.
Background
With the development of computer technology and internet technology, the scale of the internet industry in China is expanding continuously, a large number of online services are designed continuously, and in order to ensure the normal operation of the online services, the operation state of the system where the services are located needs to be real-time
Currently, most system monitoring adopts a threshold value set for a certain system operation index, and whether the system operation state is normal is judged by comparing the operation value with the threshold value, but the static monitoring mode for setting the monitoring index can only solve the monitoring of some indexes with thicker granularity, such as monitoring indexes of a load condition of a CPU (central processing unit), a blocking condition of a network port and the like, and can only judge whether the system is overloaded. In practical application, the monitoring effect is not intelligent and flexible enough, the problems of single monitoring scene and rigid judgment mode exist in the existing monitoring strategy, and especially, the correct judgment is difficult to be made on the system running conditions under a plurality of complex scenes.
In order to improve the stability of the system, the most common way is to expand the capacity of the system. When a new system is applied or expanded, the configuration and the number of the required machines are also evaluated by monitoring with reference to the indexes. However, the threshold value for monitoring the indexes is often determined according to the experience of people, is influenced by personal experience, and is very inaccurate.
Disclosure of Invention
The embodiment of the invention provides a method and a device for analyzing the stability of a system, which can improve the intelligent degree and accuracy of the stability of a monitoring system.
In the existing technology, system abnormality is usually monitored through some indexes directly set by people, which are often influenced by personal experience, and the monitoring of indexes with coarser granularity is also difficult to ensure the monitoring accuracy of the system. The lower accuracy of monitoring directly leads to that the debugging system is often needed after the system capacity expansion, and much time is needed for debugging the system before and after. The monitoring accuracy is low, and operation faults and accidents easily occur in online services after the system is debugged, so that corresponding manpower needs to be distributed for troubleshooting, the operation cost of operators is increased, and a large amount of manpower resources are occupied.
Aiming at the defects exposed when the system running state is judged through a threshold value in the traditional system monitoring means: if the monitoring scene is single, the judging mode is rigid, the judging result is inconsistent with the fact, and the like, in the embodiment, the system operation condition is judged by collecting multiple pieces of associated system monitoring item data, performing integrated analysis on the data and establishing a mathematical model and judging whether the collected system monitoring data accords with the mathematical model, and the mode of judging the system operation condition by setting a threshold value for a single monitoring item in the prior art is abandoned, so that the system monitoring is more accurate and comprehensive. For example: therefore, the static business data monitoring index of the order quantity and the monitoring indexes of other dynamic data of system operation can be combined, the monitoring indexes of multiple dimensions are fused and quantized into correlation coefficients, and the operation state of the system is analyzed through the correlation coefficients.
Because the historical performance of the system is comprehensively analyzed by multi-index statistics, technicians do not need to perform complicated operation of manually adjusting the threshold values of the monitoring items of each system aiming at different service scenes, the condition of inaccurate monitoring and alarming under different service scenes is avoided, and the intelligent degree and the accuracy of the existing monitoring means are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 2a is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of an embodiment of the present invention;
fig. 3 and 4 are schematic structural diagrams of apparatuses provided in the embodiments of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The method flow in this embodiment may be specifically executed on a system as shown in fig. 1 by computer software, and specifically relates to computer software system performance monitoring, software algorithm programming, monitoring data integration analysis, and mathematical model establishment.
The system comprises: the system comprises a service system, an analysis server and a background database, wherein the end devices of the system can establish a channel through the Internet and perform data interaction through respective data transmission ports.
The analysis server disclosed in this embodiment may specifically be a workstation, a super computer, or a server cluster for data processing that is composed of multiple servers on a hardware level, or the functions of the analysis server may also be integrated in a background database, a service system, or other hardware systems, that is, the background database, the service system, or other hardware systems implement the functions of the analysis server by allocating a certain amount of hardware resources, and specifically may implement integration of different computing functions on the hardware systems by using the current virtual machine technology or distributed computing technology. The analysis server can collect monitoring data from the monitoring platform in real time, the monitoring platform is used for monitoring the operation state of the service system and recording monitoring data such as logs related to the operation data of the service system or system snapshots of the service system in the operation process, and the monitoring data can be distinguished according to specific set monitoring indexes on each monitoring platform. For example, the monitoring platform that may be involved in the present embodiment includes but is not limited to: zabbix (an enterprise-level open source solution based on a WEB interface that provides distributed system monitoring and network monitoring functions), a cross-system synchronous communication framework (RSF), a cross-system asynchronous communication framework (ESB), and the like.
The background database stores running data of the service system during running, such as: price data, logistics data, order data, and the like. The background database may specifically adopt the currently common database architecture and type.
The service system may specifically be, on a hardware level, composed of a plurality of servers, hardware devices with a computing function such as supercomputing devices, and a system for operating an online service, such as a sales promotion system, an order system, and a notification system operated on an online shopping platform.
An embodiment of the present invention provides a method for analyzing system stability, as shown in fig. 2a, including:
and S1, collecting operation data related to the monitoring index.
Specifically, the monitoring index at least includes: at least one of the indexes of the computing resources of the hardware equipment, such as the idle time percentage of a processor of the hardware equipment for operating a service system, the write/read waiting time percentage of the processor, the occupied time percentage of a user program of the processor, the use percentage of a memory, the use rate of a read/write port of a disk and the like, and the indexes of the communication resources of the hardware equipment, such as the data flow sent by a network card and the data flow received by the network card and the like. And data which is generated by the service system in the operation process and mostly recorded as the retrievable log, for example, the data comprises: at least one of the plurality of sets of data such as the number of exceptions of the system, the amount of service calls of the system, the response time of the system, the amount of traffic exceptions of the system, and the amount of orders of the system. The monitoring indicator may specifically serve as a tag for operational data associated therewith.
For example, as shown in fig. 2b, the analysis server accesses a zabbix monitoring system, an interface call volume data acquisition system, various monitoring platforms, and the like by setting a timing task to obtain and store multiple items of data over a period of time, such as: acquiring monitoring indexes of the zabbix system (such as the average load of the system within 1 min) within a period of time, wherein the monitoring indexes of the zabbix system comprise: the idle time percentage of a processor, the write-in/read-out waiting time percentage of the processor, the user program occupied time percentage of the processor, the memory usage percentage, the utilization rate of a disk read-write port, the data traffic sent by a network card and the data traffic received by the network card. Specifically, a historical sampling set can be formed by collecting data for multiple times within a period of time and recorded as a monitoring comparison table, so that the current operation data of the system can be compared and analyzed with the existing monitoring comparison table, and abnormal monitoring indexes can be found out quickly.
The service call volume of the system can be understood as: the business system calls the times of each service content during operation, such as calling a query service, a price comparison service, a broadcast service and other business services.
The response time of the system can be understood as: the response time of the business system to some business function when running.
The traffic anomaly of the system can be understood as: and when the business system runs, the number of the reported business anomalies is self-detected in real time through a self-contained detection means.
The order quantity of the system can be understood as: the amount of orders processed by the business system in a certain time period, wherein the amount of orders processed can be understood as the amount of orders already processed, the amount of orders being processed, or the sum of the two.
And S2, selecting the operation data to be processed from the collected operation data by utilizing the correlation among different monitoring indexes, and determining the fluctuation range.
And monitoring the correlation among the indexes, establishing a data model corresponding to the correlation among the indexes with obvious correlation, determining the value of a correlation coefficient of the data model, and quantifying the correlation among the monitoring indexes through the value of the correlation coefficient. And setting a reasonable fluctuation range for the model output result by combining the actual application situation subsequently, so as to detect whether the abnormal condition exists in the operation data to be processed or not through the fluctuation range. Specifically, each monitoring index points to a certain type of operation data, and the operation data to be processed can be understood as: and the 2 monitoring indexes with correlation respectively correspond to the operation data.
The operational data to be processed includes: the method comprises the following steps that N groups of running data exist, at least one pair of monitoring indexes with correlation exists in the N groups of running data, namely the monitoring indexes related to the ith group of running data and the monitoring indexes related to the jth group of running data have correlation, N is more than or equal to 2, i is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to N, and i is not equal to j.
For example: therefore, the static business data monitoring index of the order quantity and the monitoring indexes of other dynamic data of system operation can be combined, the monitoring indexes of multiple dimensions are fused and quantized into correlation coefficients, and the operation state of the system is analyzed through the correlation coefficients.
In this embodiment, in order to analyze the unit conveniently and uniformly, data flattening and stretching processing may be performed on the collected operation data, and the variance, covariance, and correlation coefficient among each group of data may be calculated.
And S3, acquiring the abnormal condition of the current operation data of the system according to the fluctuation range.
In the existing technology, system abnormality is usually monitored through some indexes directly set by people, which are often influenced by personal experience, and the monitoring of indexes with coarser granularity is also difficult to ensure the monitoring accuracy of the system. The lower accuracy of monitoring directly leads to that the debugging system is often needed after the system capacity expansion, and much time is needed for debugging the system before and after. The monitoring accuracy is low, and operation faults and accidents easily occur in online services after the system is debugged, so that corresponding manpower needs to be distributed for troubleshooting, the operation cost of operators is increased, and a large amount of manpower resources are occupied.
In the embodiment, by collecting multiple items of associated system monitoring item data, performing integrated analysis on the data, and establishing a mathematical model, and judging whether the collected system monitoring data conforms to the mathematical model, the system operation state is judged, and a mode of judging the system operation state by setting a threshold value for a single monitoring item in the past is abandoned, so that the system monitoring is more accurate and comprehensive. For example: therefore, the static business data monitoring index of the order quantity and the monitoring indexes of other dynamic data of system operation can be combined, the monitoring indexes of multiple dimensions are fused and quantized into correlation coefficients, and the operation state of the system is analyzed through the correlation coefficients.
Aiming at the defects exposed when the system running state is judged through a threshold value in the traditional system monitoring means: if the monitoring scene is single, the judging mode is rigid, the judging result is inconsistent with the fact and the like, the invention provides a method for integrating multiple items of monitoring data of the system in a period, establishing a relevant mathematical model, inputting the monitoring data of the system into the model, and analyzing the output result to obtain the conclusion of the running state of the system. Because the historical performance of the system is comprehensively analyzed by multi-index statistics, technicians do not need to perform complicated operation of manually adjusting the threshold values of the monitoring items of each system aiming at different service scenes, the condition of inaccurate monitoring and alarming under different service scenes is avoided, and the intelligent degree and the accuracy of the existing monitoring means are improved.
In the present embodiment, step S2: the selecting the operation data to be processed from the collected operation data and determining the fluctuation range by using the correlation between different monitoring indexes may include:
and establishing a data model of the to-be-processed operation data. And determining the value of the correlation coefficient through the data model, and setting the fluctuation range of the correlation coefficient.
Wherein, the correlation among various groups of data is analyzed by index integration, such as: the fluctuation rate of the CPU load has a significant relation with the order quantity, has a relation with the network card, and has no direct relation with the abnormal rate of the service system, so the correlation coefficient of the fluctuation rate of the CPU load with the order quantity and the correlation coefficient of the network card are higher than the correlation coefficient of the abnormal rate of the service system. And establishing a data model corresponding to the correlation between indexes with obvious correlation (such as high correlation coefficient), determining a specific value of the correlation coefficient, and setting a reasonable fluctuation range for a model output result by combining with an application practical situation.
After the collected data are processed uniformly in units, the correlation coefficient among the data of each group is calculated, the monitoring item with higher correlation coefficient is screened out, an equation among the correlation monitoring items is established according to the correlation coefficient, the operation data of the system in various operation states are input into a model equation, the value of the correlation coefficient is calculated, and the range of the value of the correlation coefficient under the normal condition and the range of the value of the correlation coefficient under the abnormal condition of the system are counted to be used as the judgment basis of the operation condition of the system.
Specifically, the establishing of the data model of the to-be-processed operation data includes:
at least two different sets of operational data are collected and correlation coefficients between each two different sets of operational data are obtained. And if the correlation coefficient of the two groups of data is greater than the preset value, establishing a data model of the two groups of operation data with the correlation coefficient greater than the preset value. For example, the specific implementation scenario for determining the system operating state according to the data model is as follows:
collecting monitored operation data marked as data1, data2, data3 and data4 … under the condition that the system is normally operated for a period of time, inputting the data, carrying out unified unit operation on the data, then calculating correlation coefficients of every two groups of data, and outputting all the correlation coefficients p12, p13, p14, p23, p24 and p 34.
The correlation coefficient is calculated as shown in equation 1 below:
where E is the mathematical expectation, cov denotes covariance, σXAnd σYIs standard deviation, sigma is summation, X, Y represents two system parameters (system parameters can be set according to specific service system type), N represents number of values, and μ represents conversion coefficient.
1. Inputting all correlation coefficients, screening, and outputting two groups of data corresponding to the correlation coefficients with high values, such as: p14 is 0.77, and the corresponding data are data1 and data 4. Wherein, the correlation coefficient with high value can be understood as: if the absolute value of the correlation coefficient is larger than the preset value, the correlation of the corresponding parameter is strong and the linear relation is satisfied. The preferred value of the specific preset value is 0.6.
2. The two sets of data are respectively substituted into the formula for calculation, and the following calculation formula 2 can be adopted,
wherein,is the average number of the components,is a summation in whichIs equal to the value of the correlation coefficient,an intermediate parameter for the calculation is indicated. And the linear regression equation isx and y respectively represent two system parameters (the system parameters can be set according to specific service system types, the lower case x and y and the upper case X, Y represent the system parameters which can be the same or different and need to be determined according to specific service system types), and N represents the conversion coefficient in the calculation formula 2.
Specifically, the operation data collected from different time periods are calculated in steps of 1-2, and after multiple calculation and evaluation, a distribution range can be obtained according to the obtained correlation coefficient value. After calculation of a large amount of data, a relatively stable distribution range can be obtained, namely the distribution range can be used as the fluctuation range, namely the distribution condition of the same correlation coefficient after multiple calculations is used as the fluctuation range.
In actual operation, data of the previous stage time can be acquired at regular intervals, and the values of the correlation coefficients, such as 100.2 and 50.6, can be calculated. And comparing the correlation coefficient with the corresponding fluctuation range, and if the correlation coefficient exceeds the corresponding fluctuation range and the difference is larger than a set exceeding value, for example: 100.2 exceeds 42.12 percent of the upper limit 70.5 and is more than 10 percent of the set value, the abnormal condition is judged, and corresponding alarm is carried out. And when alarming, sending alarm information to the mailbox address of the related personnel and the like according to the set mobile phone number. For another example: and when the system is abnormal, the current operation data is acquired in a snapshot mode, the corresponding mathematical model is input for calculation, and if the obtained output result exceeds a set fluctuation range, the abnormal condition is judged. For example, when the amount of the historical data order is 10W, the CPU load is only 30%, and when the amount of the order is 10W, the CPU load suddenly increases to 50%, which indicates that there is a problem. However, in the conventional scheme of manually setting the threshold, the alarm threshold is often higher than 50%, and therefore, in this case, only by a simple threshold detection method, it is still considered that there is no problem in the CPU load and no abnormality exists.
In the present embodiment, step S3: the specific implementation manner for obtaining the abnormal condition of the current operation data of the system according to the fluctuation range may include:
and acquiring the current operation data of the system, and outputting the calculation result of the current operation data of the system through the established data model. And when the calculation result does not conform to the fluctuation range, judging that the current operation data of the system is abnormal.
Wherein, the calculation result not meeting the fluctuation range can be understood as: the specific numerical value of the calculation result falls outside the numerical range of the fluctuation range; and the case where the specific numerical value of the calculation result does not completely fall within the numerical range of the fluctuation range, and the specific numerical value of the calculation result completely falls within the numerical range of the fluctuation range, indicates that the calculation result conforms to the fluctuation range. Alternatively, the calculation result not meeting the fluctuation range may be understood as: the specific numerical value of the calculation result does not completely fall within the numerical range of the fluctuation range, and the calculation result is represented as conforming to the fluctuation range only if the specific numerical value of the calculation result completely falls within the numerical range of the fluctuation range.
Further, the method also comprises the following steps:
and extracting abnormal information when the current operation data of the system is judged to be abnormal. And sending out early warning according to the abnormal information.
The abnormal information at least comprises a host ip address of the system, the monitoring index and interface information corresponding to abnormal operating data. For example: and the analysis server acquires data, inputs the data into a corresponding mathematical model for calculation, judges the data to be abnormal if the obtained output result exceeds a set fluctuation range, records the related host ip, the monitoring index and the corresponding interface type, sends an alarm, and sends related information to a related system principal in the form of a mail or a short message.
In practical application, the method is suitable for various service scenes, a data model is realized by programming based on specific service scenes, then a timing task is set, system monitoring item data related to the data model is acquired, the data model is input, a value of a correlation coefficient is obtained, whether the data model belongs to a normal range or not is judged, and if the data model exceeds the normal range, an early warning module is called and comprises an early warning short message sending module and an early warning mail sending module, and a receiver can be dynamically configured to send early warning information. The method and the device have the advantages that the judgment on the running state of the system is changed from static comparison of the size of a certain monitoring data and a threshold value into comprehensive analysis on whether the correlation of a plurality of monitoring data is normal or not, so that the system monitoring is more intelligent, comprehensive and accurate. To illustrate a typical scenario: for example, before and after the new code is released, all indexes do not exceed the threshold, but under the condition that the service volume is not changed, the system resource occupation obviously rises, and the program itself can be judged to be possibly in a problem through index comprehensive analysis, so that early warning is timely performed. Therefore, the defects of single service scene, incapability of dynamic monitoring and inaccurate analysis result in the traditional monitoring method are overcome, and the monitoring flexibility is improved.
In this embodiment, a scheme is further provided for determining whether to perform capacity expansion/capacity reduction of system resources based on the occurrence frequency of the abnormal situation according to the determined abnormal situation, and specifically includes:
and acquiring historical abnormal data of the system, and calculating the capacity expansion or capacity reduction requirement of the system according to the historical abnormal data. And adjusting the quantity of the resources distributed to the system according to the capacity expansion or capacity reduction requirement of the system.
Wherein the historical abnormal data comprises abnormal conditions which have occurred in the system in a specified time period and abnormal information corresponding to the abnormal conditions which have occurred. The capacity expansion or reduction requirement can be understood as an adjustment value for hardware resources of the system, such as: the number of processors, the number of memories, the number of disk spaces and the like which need to be added/reduced, wherein the adjustment value is positive to indicate that the adjustment value needs to be added, and the adjustment value is negative to indicate that the adjustment value needs to be reduced. For example: the analysis server can calculate the scale of the server and a corresponding model of the order quantity according to the internal relation between the order quantity and each index, and provides assistance for server expansion and subsequent application of a new machine. When the server scale is calculated, the required machine configuration and the required machine number are roughly calculated according to the input order quantity, the call quantity data of each system, the server scale, the data read-write mode and the historical data of the server resource rate. Therefore, a referenced capacity expansion/reduction scheme is quickly provided for technicians.
The present invention further provides a device for analyzing system stability, which can specifically operate on the analysis server shown in fig. 1, and the device shown in fig. 3 includes:
the data acquisition module is used for acquiring operation data related to the monitoring index;
the preprocessing module is used for selecting operation data to be processed from the acquired operation data and determining a fluctuation range according to the correlation among different monitoring indexes;
and the analysis module is used for acquiring the abnormal condition of the current operation data of the system according to the fluctuation range.
The preprocessing module is specifically used for acquiring at least two groups of different operation data and acquiring a correlation coefficient between each two groups of different operation data; if the correlation coefficient of the two groups of data is larger than a preset value, establishing a data model of the two groups of operation data of which the correlation coefficient is larger than the preset value; then, determining the value of a correlation coefficient through the data model, and setting the fluctuation range of the correlation coefficient;
the monitoring index at least comprises: at least one of idle time percentage of a processor, write-in/read-out waiting time percentage of the processor, user program occupied time percentage of the processor, memory usage percentage, utilization rate of a disk read-write port, data traffic sent by a network card and data traffic received by the network card, service call amount of the system, response time of the system, abnormal traffic amount of the system and order amount of the system;
the analysis module is specifically used for acquiring the current operation data of the system and outputting the calculation result of the current operation data of the system through the established data model; and when the calculation result does not conform to the fluctuation range, judging that the current operation data of the system is abnormal.
Further, as shown in fig. 4, the apparatus further includes:
the alarm module is used for extracting abnormal information when judging that the current operation data of the system is abnormal, wherein the abnormal information at least comprises a host ip address of the system, the monitoring index and interface information of the corresponding abnormal operation data; sending out early warning according to the abnormal information;
the calibration module is used for acquiring historical abnormal data of the system and calculating capacity expansion or capacity reduction requirements of the system according to the historical abnormal data, wherein the historical abnormal data comprises abnormal conditions of the system in a specified time period and abnormal information corresponding to the abnormal conditions; and adjusting the quantity of resources allocated to the system according to the capacity expansion or capacity reduction requirements of the system.
In the embodiment, by collecting multiple items of associated system monitoring item data, performing integrated analysis on the data, and establishing a mathematical model, and judging whether the collected system monitoring data conforms to the mathematical model, the system operation state is judged, and a mode of judging the system operation state by setting a threshold value for a single monitoring item in the past is abandoned, so that the system monitoring is more accurate and comprehensive. For example: therefore, the static business data monitoring index of the order quantity and the monitoring indexes of other dynamic data of system operation can be combined, the monitoring indexes of multiple dimensions are fused and quantized into correlation coefficients, and the operation state of the system is analyzed through the correlation coefficients.
Aiming at the defects exposed when the system running state is judged through a threshold value in the traditional system monitoring means: if the monitoring scene is single, the judging mode is rigid, the judging result is inconsistent with the fact and the like, the invention provides a method for integrating multiple items of monitoring data of the system in a period, establishing a relevant mathematical model, inputting the monitoring data of the system into the model, and analyzing the output result to obtain the conclusion of the running state of the system. Because the historical performance of the system is comprehensively analyzed by multi-index statistics, technicians do not need to perform complicated operation of manually adjusting the threshold values of the monitoring items of each system aiming at different service scenes, the condition of inaccurate monitoring and alarming under different service scenes is avoided, and the intelligent degree and the accuracy of the existing monitoring means are improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. For the apparatus embodiment, since it is substantially similar to the method embodiment, it is described relatively simply, and reference may be made to some descriptions of the method embodiment for relevant points. The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

CN201810083390.1A2018-01-292018-01-29A kind of method and device of analysis system stabilityPendingCN108390793A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810083390.1ACN108390793A (en)2018-01-292018-01-29A kind of method and device of analysis system stability

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810083390.1ACN108390793A (en)2018-01-292018-01-29A kind of method and device of analysis system stability

Publications (1)

Publication NumberPublication Date
CN108390793Atrue CN108390793A (en)2018-08-10

Family

ID=63074226

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810083390.1APendingCN108390793A (en)2018-01-292018-01-29A kind of method and device of analysis system stability

Country Status (1)

CountryLink
CN (1)CN108390793A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109522325A (en)*2018-09-282019-03-26中国平安人寿保险股份有限公司Business impact analysis method, apparatus, electronic equipment and storage medium
WO2020237433A1 (en)*2019-05-242020-12-03李玄Method and apparatus for monitoring digital certificate processing device, and device, medium and product
CN112423032A (en)*2020-10-212021-02-26当趣网络科技(杭州)有限公司Data monitoring method and device based on smart television, electronic equipment and medium
CN112600705A (en)*2020-12-142021-04-02国网四川省电力公司信息通信公司Method for automatic operation and maintenance of network equipment
CN114493378A (en)*2022-04-062022-05-13树根互联股份有限公司Index acquisition method and device of industrial equipment and computer equipment
CN116337135A (en)*2023-03-182023-06-27北京嘉联优控科技有限公司 An instrument fault diagnosis method, system, electronic equipment and readable storage medium
CN118677923A (en)*2024-06-222024-09-20新驰电气集团有限公司Internet of things monitoring data grid connection method, grid connection box and storage medium thereof
CN119624087A (en)*2025-02-122025-03-14西冶科技集团股份有限公司 Smelting equipment monitoring method and system based on operation data analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104820630A (en)*2015-05-222015-08-05上海新炬网络信息技术有限公司System resource monitoring device based on business variable quantity
CN106600115A (en)*2016-11-282017-04-26湖北华中电力科技开发有限责任公司Intelligent operation and maintenance analysis method for enterprise information system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104820630A (en)*2015-05-222015-08-05上海新炬网络信息技术有限公司System resource monitoring device based on business variable quantity
CN106600115A (en)*2016-11-282017-04-26湖北华中电力科技开发有限责任公司Intelligent operation and maintenance analysis method for enterprise information system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109522325A (en)*2018-09-282019-03-26中国平安人寿保险股份有限公司Business impact analysis method, apparatus, electronic equipment and storage medium
WO2020237433A1 (en)*2019-05-242020-12-03李玄Method and apparatus for monitoring digital certificate processing device, and device, medium and product
US11924194B2 (en)2019-05-242024-03-05Antpool Technologies LimitedMethod and apparatus for monitoring digital certificate processing device, and device, medium, and product
CN112423032A (en)*2020-10-212021-02-26当趣网络科技(杭州)有限公司Data monitoring method and device based on smart television, electronic equipment and medium
CN112600705A (en)*2020-12-142021-04-02国网四川省电力公司信息通信公司Method for automatic operation and maintenance of network equipment
CN114493378A (en)*2022-04-062022-05-13树根互联股份有限公司Index acquisition method and device of industrial equipment and computer equipment
CN116337135A (en)*2023-03-182023-06-27北京嘉联优控科技有限公司 An instrument fault diagnosis method, system, electronic equipment and readable storage medium
CN118677923A (en)*2024-06-222024-09-20新驰电气集团有限公司Internet of things monitoring data grid connection method, grid connection box and storage medium thereof
CN119624087A (en)*2025-02-122025-03-14西冶科技集团股份有限公司 Smelting equipment monitoring method and system based on operation data analysis

Similar Documents

PublicationPublication DateTitle
CN108390793A (en)A kind of method and device of analysis system stability
CN111221702B (en)Log analysis-based exception handling method, system, terminal and medium
CN107888397B (en)Method and device for determining fault type
CN110223146B (en) System and method for monitoring the whole process of customer power purchase service
CN110708204A (en)Abnormity processing method, system, terminal and medium based on operation and maintenance knowledge base
CN118761745B (en)OA collaborative workflow optimization method applied to enterprise
US20060026467A1 (en)Method and apparatus for automatically discovering of application errors as a predictive metric for the functional health of enterprise applications
CN112953738B (en) Root cause alarm positioning system, method, device and computer equipment
CN107704387B (en)Method, device, electronic equipment and computer readable medium for system early warning
CN110471821A (en)Abnormal alteration detection method, server and computer readable storage medium
CN104618948B (en)The method and system of transmitting file in a kind of monitoring
CN113704018A (en)Application operation and maintenance data processing method and device, computer equipment and storage medium
CN106612216A (en)Method and apparatus of detecting website access exception
CN114531338A (en)Monitoring alarm and tracing method and system based on call chain data
CN111984442A (en) Anomaly detection method and device for computer cluster system, and storage medium
CN110807050B (en)Performance analysis method, device, computer equipment and storage medium
CN106951360B (en)Data statistical integrity calculation method and system
CN116074215B (en) Network quality detection method, device, equipment and storage medium
US9645877B2 (en)Monitoring apparatus, monitoring method, and recording medium
CN116300564A (en)Automatic monitoring operation and maintenance platform supporting cross-region and cross-cluster mixed infrastructure
CN111158926A (en)Service request analysis method, device and equipment
CN119127559A (en) Abnormal positioning method, device, electronic equipment and storage medium
CN112612679A (en)System running state monitoring method and device, computer equipment and storage medium
CN117931491A (en)Fault area detection method, device, terminal equipment and storage medium
CN117076270A (en)System stability evaluation method, device and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20180810


[8]ページ先頭

©2009-2025 Movatter.jp