A kind of intelligent operation management systemTechnical field
The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.
Background technology
IT operational systems scale constantly increases at present, and system is to performance of network equipments such as server, virtual machine, interchangersAnd its during network connectivty is monitored, operation maintenance personnel can receive increasing monitoring alarm daily, in face of magnanimityFor O&M index when system breaks down, operation maintenance personnel is difficult that failure root is quickly found from magnanimity monitor control index because of wind of alarmingThe sudden and violent speed for significantly reducing orientation problem, fault recovery speed rely on substantially experience and the operation response of operation maintenance personnelSpeed.Therefore the intelligent operation platform that an automatic fault diagnosis cooperates with processing with quick recovery system is established, for more scenesMachine learning model and big data expert system are built, inline diagnosis and positioning are carried out to the abnormal of operation platform in real time, when beingQuick reparation is realized by performing corresponding strategy when system breaks down, it is desirable to recover normal operation.
The content of the invention
In order to overcome above mentioned problem, the present invention provides a kind of intelligent operation management system, and the exception of system is carried out in real timeInline diagnosis and positioning, quick reparation is realized by performing corresponding strategy when system breaks down, and can be automatically to repairingAs a result pay no attention to and think of repair time long failure system for prompting keeper and optimize.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of intelligent operation management system, including system monitoring module, fault message identification module, fault restoration module andFault restoration evaluation module;
The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module willCurrent state parameter and the abnormal conditions monitored pass to fault information collection module;
The fault message identification module is used to the abnormal conditions that collection module transmission is collected into confirmation is identifiedNo is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module;
The fault restoration module is used for after the warning message of fault message identification module is received according to fault signatureFailure is repaired;
It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether;The fault restoration evaluation module also includes time detecting unit, when the time detecting unit is used to detect fault restoration costBetween and judge whether spent time is more than threshold value.
Further, the fault restoration evaluation module is additionally operable to after fault restoration, according to the running status pair of systemResult is repaired every time to be given a mark, and the selfreparing implementation procedure for giving a mark low is periodically submitted into system manager and analyzed, andAll scripts corresponding in script calling module are deposited in prompting keeper's optimization.
Further, the time detecting cell operation flow is:When fault restoration module is receiving fault message knowledgeAfter the warning message of other module, the time detecting unit detects and records present system time, when fault restoration module will be formerAfter barrier is repaired, the time time detecting unit detects and records present system time again, and calculates detected twiceTime interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneselfMultiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to depositScript.
Further, the threshold value is repair average time needed for the failure 2-3 times.
The invention has the advantages that the monitoring modular in the present invention can carry out complete detection to system, work as detecting systemWhen occurring abnormal, abnormal information is identified fault message identification module determines whether failure, for being judged as failureInformation, fault restoration module are effectively repaired to failure, reparation result of the fault restoration evaluation module to fault restoration moduleSystem manager can be submitted to for repairing the undesirable failure of result and is analyzed by assess, and the system can also be for reparationDuring occur the time required to long result remind system keeper to be analyzed and carry out corresponding optimization system.The system is notIt is only capable of, to abnormal progress inline diagnosis and positioning, quick reparation being realized by performing corresponding strategy when system breaks down, andAnd can pay no attention to automatically to repairing result and think of repair time long failure system for prompting keeper and optimize, constantly lifting thereforeHinder repairing effect and efficiency.
Brief description of the drawings
Fig. 1 is the intelligent operation management system structured flowchart of a better embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, completeSite preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based onEmbodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not madeEmbodiment, belong to the scope of protection of the invention.
It should be noted that when component is referred to as " being fixed on " another component, it can be directly on another componentOr there may also be component placed in the middle.When a component is considered as " connection " another component, it can be directly connected toTo another component or it may be simultaneously present component placed in the middle.When a component is considered as " being arranged at " another component, itCan be set directly on another component or may be simultaneously present component placed in the middle.Term as used herein is " vertical", " horizontal ", "left", "right" and similar statement for illustrative purposes only.
Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the inventionThe implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description toolThe purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein " and/or " include one or more phasesThe arbitrary and all combination of the Listed Items of pass.
Please referring also to Fig. 1 better embodiments of the invention provide a kind of intelligent operation management system, including including beingSystem monitoring modular 10, fault message identification module 20, fault restoration module 30 and fault restoration evaluation module 40.System monitoring mouldBlock 10 is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module 10 is by current state parameter and prisonThe abnormal conditions measured pass to fault information collection module 20;Fault message identification module 20 is used to receive collection module transmissionThe abnormal conditions collected, which are identified, is confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is entered to fault restoration module 30Row is repaired;30 pieces of fault restoration mould is used for after the warning message of fault message identification module is received according to fault signature pair eventBarrier is repaired;Fault restoration evaluation module 40 is used to the fault restoration result of fault restoration module is carried out assessing whether to closeLattice;Fault restoration evaluation module 40 also includes time detecting unit 410, and time detecting unit 410 is used to detect fault restoration flowerWhether time time-consuming and that judgement is spent is more than threshold value.
Further, fault restoration evaluation module 40 is additionally operable to after fault restoration, according to the running status of system to everySecondary reparation result is given a mark, and the selfreparing implementation procedure for giving a mark low periodically is submitted into system manager and analyzed, and is carriedShow that all scripts corresponding in script calling module are deposited in keeper's optimization.
Further, the workflow of time detecting unit 410 is:When fault restoration module 30 is receiving fault message knowledgeAfter the warning message of other 20 pieces of mould, time detecting unit 410 detects and records present system time, when fault restoration module 30 willAfter fault restoration, time time detecting unit 410 detects and records present system time again, and calculates detected twiceTime interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneselfMultiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to depositScript.The threshold value is repair average time needed for the failure 2-3 times.