Movatterモバイル変換


[0]ホーム

URL:


CN117076270A - System stability evaluation method, device and medium - Google Patents

System stability evaluation method, device and medium
Download PDF

Info

Publication number
CN117076270A
CN117076270ACN202310974165.8ACN202310974165ACN117076270ACN 117076270 ACN117076270 ACN 117076270ACN 202310974165 ACN202310974165 ACN 202310974165ACN 117076270 ACN117076270 ACN 117076270A
Authority
CN
China
Prior art keywords
service system
pressure
resource
resource allocation
deployment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310974165.8A
Other languages
Chinese (zh)
Inventor
丁杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co LtdfiledCriticalInspur General Software Co Ltd
Priority to CN202310974165.8ApriorityCriticalpatent/CN117076270A/en
Publication of CN117076270ApublicationCriticalpatent/CN117076270A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The embodiment of the specification discloses a system stability evaluation method, equipment and medium, comprising the following steps: in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined; in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result; in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition; and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.

Description

System stability evaluation method, device and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a system stability evaluation method, apparatus, and medium.
Background
Nowadays, enterprise informatization construction becomes trend, the stability of an informatization system directly influences user experience, a plurality of enterprises monitor the running condition of the system by adopting a conventional situation monitoring and early warning means, and when a bottleneck occurs in system resources, early warning information is sent in real time to inform the system of abnormality, or system personnel periodically analyze potential risks of a situation monitoring and identifying system, the traditional situation monitoring mode is single in information acquisition dimension, after operation and maintenance personnel receive early warning notification, the system may be in a paralysis state, and the identification and analysis of risks are relatively lagged; in addition, from the aspect of stability of the system, the premise of system stability guarantee is the supportability of the framework and the resources, and the pressure test is also an important link of the stability guarantee based on the premise combined with the concurrency capability and the scene requirement, so that the stability guarantee of the system is not only a post-processing process, but also evaluation and monitoring in each link of system construction, test and operation and maintenance.
Disclosure of Invention
One or more embodiments of the present disclosure provide a system stability evaluation method, apparatus, and medium, which are used to solve the technical problems set forth in the background art.
One or more embodiments of the present disclosure adopt the following technical solutions:
one or more embodiments of the present disclosure provide a system stability assessment method, including:
in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined;
in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result;
in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition;
and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
One or more embodiments of the present specification provide a system stability assessment apparatus, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined;
in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result;
in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition;
and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
One or more embodiments of the present description provide a non-volatile computer storage medium storing computer-executable instructions that, when executed by a computer, enable:
in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined;
in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result;
in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition;
and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect:
according to the embodiment of the specification, the system stability can be better evaluated by performing evaluation monitoring on each link of system construction, test and operation and maintenance, and a more accurate system stability evaluation result is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flow diagram of a system stability assessment method according to one or more embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a phase-by-phase evaluation monitor provided by one or more embodiments of the present disclosure;
fig. 3 is a schematic structural diagram of a system stability evaluation device according to one or more embodiments of the present disclosure.
Detailed Description
The embodiment of the specification provides a system stability evaluation method, equipment and medium.
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.
Fig. 1 is a schematic flow diagram of a system stability evaluation method according to one or more embodiments of the present disclosure, where the flow may be performed by a system stability evaluation system. Some input parameters or intermediate results in the flow allow for manual intervention adjustments to help improve accuracy.
The method flow steps of the embodiment of the present specification are as follows:
s102, in the construction stage, according to preset construction requirements of the service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by operation of the service system is determined.
In the embodiment of the present disclosure, when determining a corresponding deployment framework according to a preset service system construction requirement, a deployment architecture scheme may be determined according to a concurrency requirement and a processing capability requirement of the service system construction, and a hardware requirement, and a corresponding deployment architecture may be determined according to the deployment architecture scheme.
It should be noted that, regarding the concurrency requirement and the processing capability requirement mentioned in the embodiments of the present specification, the amount of concurrency requests that the system needs to support may be determined according to the number of expected concurrency users of the service system. This can be obtained by analyzing past traffic data, user traffic statistics and user behavior patterns. At the same time, the processing capacity requirement of the system on concurrent requests, namely the number of requests which the system needs to process in a given time, can also be considered.
It should be noted that, regarding the hardware requirements mentioned in the embodiments of the present specification, the hardware resources required by the system may be determined according to the concurrency requirement and the processing capability requirement. Hardware resources may include the number, configuration, and specification of servers, as well as storage and network bandwidth requirements. In addition, the scalability of the system, i.e., whether the system can be easily expanded horizontally or vertically as the number of concurrent users increases, can also be considered.
It should be noted that, regarding the deployment architecture schemes mentioned in the embodiments of the present disclosure, an appropriate deployment architecture scheme may be determined according to the characteristics of the service system and the hardware requirements. Common architectures include stand-alone architectures, clustered architectures, distributed architectures, and the like. For systems with higher concurrency requirements, load balancing can be considered to share the load of the concurrency request, so as to improve the stability and performance of the system.
In combination with the foregoing, the embodiments of the present disclosure may consider factors such as installation location and layout of the servers, network topology, and security when determining the deployment architecture. According to the deployment architecture scheme, a connection mode, a communication protocol, configuration of network equipment and the like between servers are determined. At the same time, security requirements of the system may also be assessed and considered, for example, using firewall, access control, etc. measures to protect the system and data.
In addition, the present specification embodiments may also be evaluated and tested prior to finalizing the deployment architecture. Simulation tools or actual load testing tools may be used to verify the performance and reliability of the system. Through testing and evaluation, potential problems can be discovered and adjusted and optimized.
In this embodiment of the present disclosure, when determining, according to the deployment framework, the resource configuration required for running the service system may be determined according to the deployment framework and a preset configuration resource standard, where the configuration resource standard is set according to the number of online users, the traffic data volume, and the peak concurrency number of the service system.
The following details are set for the resource allocation standard set for the online user number, the service data volume and the peak concurrency number of the service system:
regarding the number of online users mentioned in the embodiments of the present specification, the number of concurrent connections and the access amount of users that the system needs to support may be determined according to the expected number of online users of the service system. Based on experience or past data, the resource consumption of each user to the system can be estimated, including CPU, memory, network bandwidth, etc. And according to the estimation of the online user number, the corresponding number and configuration of the servers are calculated by combining the resource consumption of each user.
With respect to the traffic data volumes mentioned in the embodiments of the present specification, the requirements of the system for storage resources may be determined according to the data storage requirements of the traffic system. Including the storage capacity of the database, disk space, backup and restore requirements, etc. And determining the capacity and performance requirements of the storage device according to the estimation of the traffic data volume, and ensuring that the system can efficiently process and store the data.
Regarding the peak concurrency numbers mentioned in the embodiments of the present specification, the demands of the system for processing power and network bandwidth may be determined according to the number of peak hours concurrency requests of the service system. During peak hours, the system needs to be able to withstand a large number of concurrent requests, and therefore needs to ensure that the processing power of the server and the network bandwidth can meet the requirements of high load.
From the above analysis, the resource allocation required for the operation of the business system can be determined. This may involve the configuration and deployment of multiple servers, including the number, size and performance of servers, as well as the configuration of storage devices and network devices. According to the actual requirements of the service system and the resource allocation standard, a proper resource allocation scheme can be determined.
It should be noted that the criteria for resource allocation should be set according to the specific requirements and performance requirements of the service system. In determining the resource configuration, factors such as cost, expandability, convenience in maintenance and the like can be comprehensively considered to find an optimal resource configuration scheme.
And S104, in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource configuration, and carrying out pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result.
In the embodiment of the present disclosure, the pressure measurement plan of the service system includes a pressure simulation curve of the number of pressure measurement concurrence and the actual scenario. When the pressure measurement plan of the service system is determined according to the deployment frame and the resource configuration, a pressure simulation curve of the pressure measurement concurrency number and the actual scene can be determined according to the deployment frame and the resource configuration, wherein the pressure simulation curve of the actual scene is a pressure measurement index formulation of a special scene and a pressure measurement index formulation of a mixed scene.
It should be noted that, in the testing stage, determining the pressure measurement plan of the service system according to the deployment framework and the resource configuration is an important step for ensuring the stability and performance of the system. The following are some specific analytical content:
with respect to the number of concurrent pressure measurements referred to in the embodiments of the present description, the number of concurrent requests that a system can withstand may be determined based on the deployment framework and the resource configuration. Depending on factors such as the hardware configuration of the system, network bandwidth and processing power. And setting a proper concurrent number of pressure measurement according to the actual condition and performance requirement of the system. The number of concurrency may be increased stepwise until the limits of the system are reached or a performance bottleneck is found.
Regarding the pressure simulation curve of the actual scene mentioned in the embodiment of the present specification, a reasonable pressure simulation curve may be designed according to the actual service scene and the use situation. The pressure simulation curve can be formulated according to different time periods, business operation types, user behaviors and other factors. For example, pressure is increased during peak hours and critical operating scenarios to simulate the pressure conditions of an actual system.
Regarding the pressure measurement index of the special scenario mentioned in the embodiment of the present specification, a corresponding pressure measurement index may be formulated for a specific scenario of the service system. For example, for an e-commerce platform, indicators such as the number of concurrent users, the success rate of purchase transactions, and response time may be set. And determining the pressure measurement index of the special scene according to the actual condition and the performance requirement of the system, and performing corresponding pressure measurement.
Regarding the pressure measurement indexes of the mixed scenes mentioned in the embodiment of the present specification, for mixed pressure measurement of multiple different service scenes in a system, the pressure measurement indexes of the respective scenes need to be comprehensively considered. According to the business flow and the user behavior, the pressure measurement index of the mixed scene is determined by combining the resource configuration and the architecture of the system, so as to verify the performance and the stability of the system in the complex scene.
When determining the pressure measurement plan, the hardware configuration, network bandwidth, processing capacity, service scene, performance requirement and other factors of the system need to be comprehensively considered.
In the embodiment of the present disclosure, when the pressure test of the service system is performed according to the pressure test plan to obtain a pressure test result, the pressure test of the service system may be performed according to the pressure test plan, so as to identify a performance bottleneck and a resource bottleneck in the pressure test process, so as to obtain the pressure test result according to the performance bottleneck and the resource bottleneck.
It should be noted that, in the embodiment of the present disclosure, the pressure test of the service system is performed according to the pressure test plan to verify the performance and stability of the system, and identify the performance bottleneck and the resource bottleneck. The following are some specific analytical content:
regarding the performance bottleneck recognition in the pressure measurement process mentioned in the embodiment of the present specification, the pressure test of the system may be performed according to the pressure measurement concurrency number and the pressure simulation curve set by the pressure measurement plan. During the pressure measurement, the system's metrics and performance data, such as response time, throughput, and error rate, are monitored. By analyzing these data, it is possible to identify the performance bottleneck of the system under pressure, i.e. the bottleneck point that has the greatest impact on the system performance.
With respect to the identification of resource bottlenecks mentioned in the embodiments of the present specification, the utilization rate and bottleneck point of each resource of the system under pressure may be analyzed according to the pressure measurement result and the resource configuration of the system. This may include indicators of CPU utilization, memory usage, number of database connections, network bandwidth, etc. By comparing the resource allocation with the actual usage, bottleneck points of underutilization or excessive consumption of resources in the system can be identified.
With respect to the evaluation of the pressure measurement results mentioned in the embodiments of the present specification, the performance and stability of the system may be evaluated based on the recognition results of the performance bottleneck and the resource bottleneck. And judging whether the system meets the preset performance requirement and user experience requirement according to the pressure measurement result and the corresponding index. If a performance bottleneck or resource bottleneck is found, further analysis and optimization is required to improve the performance and stability of the system.
Acquisition and analysis of the crush test results requires the use of appropriate performance testing and monitoring tools, as well as the associated testing and analysis experience.
In embodiments of the present description, performance bottlenecks may include session blocking and thread blocking during pressure measurement; the resource bottlenecks may include application level resource bottlenecks and database level resource bottlenecks.
And S106, in the operation and maintenance stage, formulating an acquisition specification of the resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition.
In the embodiment of the present specification, when the collection specification of the resource configuration in the operation process of the service system is formulated, the collection period, the collection range, the expected value and the fault tolerance value of the resource configuration in the operation process of the service system may be formulated.
It should be noted that, the embodiment of the present disclosure sets a collection specification of resource allocation in the operation process of the service system, so as to effectively monitor and manage the use condition of system resources, so as to adjust and optimize the resource allocation in time. The following are several aspects of formulating acquisition specifications:
during acquisition: the time interval for resource allocation acquisition is determined according to the characteristics and requirements of the service system. Typically, the minute, hour, or once a day collection may be selected.
The acquisition range is as follows: resource configuration items to be collected are determined, including but not limited to CPU utilization, memory usage, disk space occupation, network bandwidth utilization, and the like. And selecting a proper resource allocation item according to the specific requirements and focus of the service system.
Expected values: and presetting a reasonable expected value of each resource allocation item according to the performance requirement and actual condition of the service system. These values may be determined from historical data, pressure test results, reference industry standards, and the like.
Fault tolerance value: and setting a fault tolerance value for each resource configuration item, and judging whether the resource is abnormal or not. The fault tolerance value may be set according to the error range of the expected value, and a reasonable threshold value may be generally selected.
The service condition of the system resources can be acquired and monitored in real time by formulating the acquisition specification of the resource allocation in the operation process of the service system. When the resource allocation exceeds the expected value or approaches the fault tolerance value, measures can be taken in time to adjust and optimize, so that the stability and performance of the system are ensured.
In the embodiment of the specification, according to the acquisition specification, the change condition of the resource configuration can be monitored by periodically acquiring the data of the system resource configuration. The following is one possible monitoring result generation flow:
and (3) data acquisition: and periodically acquiring data of system resource allocation, including CPU utilization rate, memory usage amount, disk space occupation amount, network bandwidth utilization rate and the like, according to a time interval set by an acquisition specification.
And (3) data storage: the collected resource configuration data is stored in a database or log file for subsequent analysis and processing.
Data analysis: and comparing and analyzing the resource configuration data acquired in the previous time with the data acquired at present, and calculating the change condition of the resource configuration. The degree of change in the resource configuration can be obtained by calculating the difference or ratio.
And (3) monitoring result generation: and generating a monitoring result according to the change condition of the resource configuration. These results may be numerical data such as increases or decreases in resource utilization; an alarm type notification is also possible, for example, the resource utilization exceeds a preset fault tolerance value.
Results display and reporting: according to the monitoring result, the monitoring result is displayed in an instrument panel or a report of the monitoring system, and the change condition of the resource configuration can be intuitively presented in the form of a chart, a table and the like. Reports may be sent to the relevant personnel periodically by mail or other means.
S108, carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
In the embodiment of the present disclosure, the stability of the service system may be evaluated according to the pressure measurement result and the monitoring result.
The following is an evaluation method of the embodiment of the present specification:
and analyzing the pressure measurement result, wherein the performance of the system under high load can be analyzed according to the session blocking and thread blocking conditions in the pressure measurement process. This includes indicators of response time, number of concurrent users, throughput, etc. If the system performs well under high load, the response time is low and the throughput is high, the system is indicated to have better stability.
And the monitoring result analysis can analyze the resource utilization condition of the system according to the resource allocation acquisition result in the running process of the system. Attention is paid to whether the resource utilization exceeds an expected value, whether the fault tolerance value is close, and the change condition of the resource configuration. If the resource utilization rate is stable and within an acceptable range, and the resource allocation changes steadily, the system is indicated to have better stability.
And (3) comparing and analyzing the pressure measurement result with the monitoring result. If the system can still keep lower response time and good throughput in the pressure measurement process and the resource utilization rate is stable in the monitoring process, the service system can be considered to have higher stability.
Exception handling, if an abnormality or performance degradation condition of the system is found in the pressure measurement or monitoring process, detailed fault detection and problem positioning are required. Further analysis of the log, troubleshooting of the database, or other related components may be required to find the root cause of the problem and repair it.
And by combining the analysis, the stability of the service system can be evaluated according to the pressure measurement result and the monitoring result. If the system can keep good performance under high load and the resource utilization rate is stable, the system is indicated to have higher stability. Conversely, if performance decreases or resource utilization fluctuates abnormally, further optimization and tuning of the system is required to improve stability.
It should be noted that, in the prior art, the monitoring means of the system stability guarantee is single, and the risk identification is delayed, so that the system paralysis generates a huge risk, and how to systematically evaluate the overall stability of the system in the system construction stage and the testing stage becomes a difficult problem.
In order to solve the above technical problems, the embodiments of the present disclosure may be implemented by:
the embodiment of the specification provides a system stability evaluation and monitoring implementation method, which comprehensively guarantees the system stability:
in one aspect, the present description provides a system stability assessment method, including assessment of system overall architecture and resources, and system concurrency capability assessment by automated analog crush testing.
Firstly, based on the concurrency requirement and the processing capability requirement of the system construction, the optimal deployment architecture scheme of the whole system is integrated by combining the best practices of a platform architecture, components, modules and the like, for example, the following steps are adopted:
a) The method comprises the steps of formulating an overall deployment architecture of a system based on requirements of high availability, dynamic expansion, gray level release and the like of the system;
b) Formulating a deployment architecture of system dependence construction based on requirements of service management, message management, data caching and the like of the system;
c) Formulating a deployment framework of system integration based on the requirements of the number of integrated systems, data magnitude, timeliness requirements, networks and the like;
d) Formulating a deployment framework of the dispatching task based on priority requirements, network requirements, service complexity, sustainability and other requirements of the dispatching task;
and generating a resource allocation list of the current system allocation architecture according to the overall allocation architecture scheme and combining with the established minimum allocation resource standard, and taking the resource allocation list as a reference file for subsequent resource purchase and establishment.
The establishment of the minimum allocation resource standard is combined with the establishment of dimensions such as the number of online users, the traffic data volume, the peak concurrency number and the like of the system;
secondly, based on the indexes of the deployment architecture, the resource allocation, the concurrency requirement, the data magnitude and the like, automatically generating a pressure measurement plan of the current system, wherein the establishment of the pressure measurement plan comprises pressure measurement indexes of special scenes and pressure measurement indexes of mixed scenes, and establishing a pressure application curve according to actual operation scenes to simulate the actual concurrency operation scenes more dynamically;
according to the generated pressure measurement plan, the system automatically simulates and tests the system pressure, identifies the performance bottleneck and the resource bottleneck in the pressure measurement process, and generates a pressure measurement report;
analyzing according to the pressure measurement result, identifying performance bottlenecks, such as automatically identifying session blockage, thread blockage and the like in the pressure measurement process, collecting pressure measurement process data, and giving expected requirements related to performance problems;
analyzing according to the pressure measurement result, and identifying resource bottlenecks, such as automatically identifying application level resource bottlenecks, database level resource bottlenecks, network bandwidth and other levels of resource bottlenecks;
the system stability evaluation aims at customizing a set of resource configuration capable of supporting the concurrency and processing capacity of the current service for the system, and carrying out whether the current configuration is enough to support the actual concurrency requirement or not through a system pressure test;
on the other hand, the specification provides a system stability monitoring method which comprises daily monitoring, information acquisition and problem identification.
The establishment of the acquisition specification comprises the conventional system resource situation monitoring and the establishment of an expected value and a fault tolerance value, and real-time early warning is carried out on the fault tolerance value exceeding the expected value;
the establishment of the acquisition specification also comprises the monitoring of important technical indexes such as system flow distribution condition, thread snapshot condition, database session condition, memory recovery condition and the like;
identifying the TOP consumption of the flow by collecting the flow distribution condition, capturing the information of network links, network data packets and the like of the flow consumption, and simulating the current connection to perform heartbeat detection;
counting thread running states in the running process of the system by collecting thread snapshot conditions, and automatically identifying blocked threads and high concurrency threads;
by collecting the database session condition, identifying whether line lock blocking, slow SQL, large transaction and bottleneck pointing to the database parameter, configuration and other layers exist in the current running state according to the database session waiting type;
and if the frequent Ful lGC problem is found, automatically triggering system risk early warning and capturing a thread snapshot and a memory snapshot for subsequent analysis by collecting the memory recovery condition.
The system stability guarantee is developed from two aspects of evaluation and monitoring, and the system is subjected to architecture evaluation and resource evaluation in the construction stage, pressure measurement planning and pressure simulation in the testing stage and information acquisition and performance analysis in the operation and maintenance stage by referring to the schematic diagram of evaluation and monitoring in each stage shown in fig. 2 throughout the whole life cycle of the system in the construction stage, the test stage and the operation and maintenance stage.
It should be noted that the embodiment of the present specification discloses a method for implementing system stability evaluation and monitoring, where the method includes: according to the construction requirements of the service system and in combination with best practices of platform frames, components and modules, a deployment architecture optimal scheme of the service system is constructed, and high availability, expandability and other scenes of the system are supported on the basis of guaranteeing the processing capacity and high concurrency capacity of the service system; according to the scheme of the system deployment architecture, the minimum resource allocation list required by the operation of the system is evaluated and used as a reference file for estimating the system resource cost and constructing the system; automatically generating a system pressure measurement plan according to concurrency requirements and service scenes, simulating a given scene to perform pressure test, and evaluating performance bottlenecks and resource bottlenecks of the current system by combining the current system deployment architecture and resource conditions; according to the system running condition, defining the acquisition period, the acquisition sample, the preset expected value and the fault tolerance value, and carrying out classified statistics and calculation through daily acquisition information to obtain a system risk report.
It should be noted that, a method for implementing system stability evaluation and monitoring includes:
in the project construction stage, according to the construction requirements and best practices of a service system and the best practices of a platform frame, a component and a module, and the requirements of the concurrency number and the processing capacity of the system, the optimal deployment architecture of each specific scene is evaluated, so that the high availability, high concurrency and the expandability of the system are realized;
based on a system deployment architecture scheme, evaluating the resource allocation situation required by the system operation, and automatically generating a resource allocation list required by the system according to the system deployment scheme evaluation, wherein the resource allocation situation can be used as a reference for subsequent hardware resource purchase and system landing, and the reference files are used for resource purchase and system construction;
in the project test stage, a system pressure test plan is formulated based on a deployment architecture and resource allocation, and a given scene is automatically simulated according to the pressure test plan to carry out system pressure test, wherein the formulation of the pressure test plan comprises pressure test concurrency number formulation and pressure simulation curve formulation, so that a system peak concurrency scene can be simulated more vividly.
The pressure measurement plan is formulated to comprise pressure measurement indexes of special scenes and pressure measurement indexes of mixed scenes, and a pressure application curve is formulated according to actual operation scenes, so that the actual concurrent operation scenes are simulated more vividly;
and according to the generated pressure measurement plan, automatically simulating the system to perform system pressure test, identifying the performance bottleneck and the resource bottleneck in the pressure measurement process, and generating a pressure measurement report.
According to the pressure measurement result, automatically identifying the performance bottleneck and the resource bottleneck existing in the current system;
analyzing according to the pressure measurement result, identifying performance bottlenecks, such as automatically identifying session blockage, thread blockage and the like in the pressure measurement process, collecting pressure measurement process data, and giving expected requirements related to performance problems;
and analyzing according to the pressure measurement result, and identifying the resource bottleneck, for example, automatically identifying the resource bottleneck of the application level, the resource bottleneck of the database level, the resource bottleneck of the network bandwidth and the like.
And formulating a system acquisition index which comprises an acquisition period, an acquisition sample, an expected value, a fault tolerance value and the like and is used as a check standard for acquisition information.
And in the project operation and maintenance stage, based on the formulated acquisition specification, acquiring information of the system running condition, classifying, counting and calculating according to the established standard, identifying the expected risk, and generating a system risk report.
The establishment of the acquisition specification comprises the conventional system resource situation monitoring and the establishment of an expected value and a fault tolerance value, and real-time early warning is carried out on the fault tolerance value exceeding the expected value;
the establishment of the acquisition specification also comprises the monitoring of important technical indexes such as system flow distribution condition, thread snapshot condition, database session condition, memory recovery condition and the like;
identifying the TOP consumption of the flow by collecting the flow distribution condition, capturing the information of network links, network data packets and the like of the flow consumption, and simulating the current connection to perform heartbeat detection;
counting thread running states in the running process of the system by collecting thread snapshot conditions, and automatically identifying blocked threads and high concurrency threads;
by collecting the database session condition, identifying whether line lock blocking, slow SQL, large transaction and bottleneck pointing to the database parameter, configuration and other layers exist in the current running state according to the database session waiting type;
and if the frequent FullGC problem is found, automatically triggering system risk early warning and capturing a thread snapshot and a memory snapshot for subsequent analysis by collecting the memory recovery condition.
Fig. 3 is a schematic structural diagram of a system stability evaluation device according to one or more embodiments of the present disclosure, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined;
in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result;
in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition;
and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
One or more embodiments of the present description provide a non-volatile computer storage medium storing computer-executable instructions that, when executed by a computer, enable:
in the construction stage, according to preset construction requirements of a service system, a corresponding deployment frame is determined, and according to the deployment frame, resource allocation required by the operation of the service system is determined;
in the testing stage, determining a pressure testing plan of the service system according to the deployment frame and the resource allocation, and performing pressure testing of the service system according to the pressure testing plan to obtain a pressure testing result;
in an operation and maintenance stage, formulating an acquisition specification of resource allocation in the operation process of the service system, monitoring the change condition of the resource allocation according to the acquisition specification, and generating a monitoring result according to the change condition;
and carrying out stability evaluation on the service system according to the pressure measurement result and the monitoring result.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims (10)

CN202310974165.8A2023-08-032023-08-03System stability evaluation method, device and mediumPendingCN117076270A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310974165.8ACN117076270A (en)2023-08-032023-08-03System stability evaluation method, device and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310974165.8ACN117076270A (en)2023-08-032023-08-03System stability evaluation method, device and medium

Publications (1)

Publication NumberPublication Date
CN117076270Atrue CN117076270A (en)2023-11-17

Family

ID=88712537

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310974165.8APendingCN117076270A (en)2023-08-032023-08-03System stability evaluation method, device and medium

Country Status (1)

CountryLink
CN (1)CN117076270A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119130623A (en)*2024-11-082024-12-13深圳市法本信息技术股份有限公司 A testing method and system based on financial environment stability

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119130623A (en)*2024-11-082024-12-13深圳市法本信息技术股份有限公司 A testing method and system based on financial environment stability
CN119130623B (en)*2024-11-082025-02-28深圳市法本信息技术股份有限公司 A testing method and system based on financial environment stability

Similar Documents

PublicationPublication DateTitle
US6973415B1 (en)System and method for monitoring and modeling system performance
CN111459761B (en)Redis configuration method, device, storage medium and equipment
CN101442762B (en) Network performance analysis and network fault location method and device
US20060101308A1 (en)System and method for problem determination using dependency graphs and run-time behavior models
CN107992410B (en)Software quality monitoring method and device, computer equipment and storage medium
CN106886485A (en)Power system capacity analyzing and predicting method and device
WO2004079553A2 (en)System, method and model for autonomic management of enterprise applications
US20060277080A1 (en)Method and system for automatically testing information technology control
CN116719664B (en) Application and cloud platform cross-layer fault analysis method and system based on microservice deployment
CN110765189A (en)Exception management method and system for Internet products
CN108390793A (en)A kind of method and device of analysis system stability
CN114531338A (en)Monitoring alarm and tracing method and system based on call chain data
CN117876059A (en)Method, device, equipment and medium for online order management
CN117076270A (en)System stability evaluation method, device and medium
CN108696371A (en)Network failure determines method and system
JP2022127958A (en)Business improvement support device, program, and storage medium storing program
CN118118379B (en)Equipment operation monitoring method and system based on Internet of things
CN107087284A (en)Quality control method and monitoring system, the server of a kind of network cell
CN111932706B (en)Informationized inspection method and device, storage medium and electronic equipment
CN110609761B (en)Method and device for determining fault source, storage medium and electronic equipment
CN111639022A (en)Transaction testing method and device, storage medium and electronic device
BlazhkovskiiCollecting metrics for continuous platform monitoring
CN113448823B (en) Intelligent stress testing method, device, electronic equipment and computer storage medium
CN114971397A (en)Method and system for evaluating operation health degree of operation and maintenance full link of metering automation system
CN114301904A (en) Monitoring method, device, monitoring system and readable storage medium for big data cluster

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp