Disclosure of Invention
The embodiment of the application aims to provide an Internet of things fault operation and maintenance method to solve the problem of low network fault operation and maintenance efficiency in an Internet of things network.
In order to solve the technical problem, an embodiment of the present application provides an internet of things fault operation and maintenance method, including the following steps:
acquiring fault data;
inputting fault data into a trained fault diagnosis model to determine a fault type corresponding to the fault data;
and triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
Further, acquiring fault data comprises:
acquiring running data of an application node in real time, wherein the application node comprises a virtual server and a real server, and the running data comprises a state value;
determining an abnormal state value from the state values according to a preset normal state reference table;
and taking the abnormal state value and the application node corresponding to the abnormal state value as fault data, and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, inputting the fault data into the trained fault diagnosis model to determine the fault type corresponding to the fault data includes:
detecting the abnormal state value through a deep neural network model;
if the known fault type corresponding to the abnormal state value is detected, outputting the fault type;
and if the known fault type corresponding to the abnormal state value cannot be detected, taking the abnormal state value as an abnormal variable value, and storing the abnormal variable value into the operation and maintenance database.
Further, the deep neural network model training method includes:
marking the abnormal variable value;
inputting the marked abnormal variable values into a deep neural network model for training, and outputting results;
and if the matching probability of the output result is smaller than a preset matching threshold, performing parameter adjustment on the deep neural network model, and stopping training until the matching probability of the output result reaches the matching threshold.
Further, inputting the marked abnormal variable values into a deep neural network model for training, and outputting the result comprises:
calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In order to solve the technical problem, an embodiment of the present application further provides an internet of things fault operation and maintenance device, where the internet of things fault operation and maintenance device includes:
the acquisition module is used for acquiring fault data;
the diagnosis module is used for inputting the fault data into the trained fault diagnosis model so as to determine the fault type corresponding to the fault data;
and the operation and maintenance module is used for triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
Further, the acquisition module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the running data of an application node in real time, the application node comprises a virtual server and a real server, and the running data comprises a state value;
the determining unit is used for determining an abnormal state value from the state values according to a preset normal state reference table;
and the storage unit is used for taking the abnormal state value and the application node corresponding to the abnormal state value as fault data and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, the diagnosis module includes:
the detection unit is used for detecting the abnormal state value through the deep neural network model;
the output unit is used for outputting the fault type if the known fault type corresponding to the abnormal state value is detected;
and the exception unit is used for taking the abnormal state value as an exception variable value and storing the exception variable value into the operation maintenance database if the known fault type corresponding to the abnormal state value is not detected.
Further, the internet of things fault operation and maintenance device further comprises a fault detection unit;
the marking module is used for marking the abnormal variable value;
the training module is used for inputting the marked abnormal variable values into the deep neural network model for training and outputting results;
and the adjusting module is used for adjusting parameters of the deep neural network model if the matching probability of the output result is smaller than a preset matching threshold value, and stopping training until the matching probability of the output result reaches the matching threshold value.
Further, the training module comprises:
the calculating unit is used for calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and the result unit is used for taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In order to solve the technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the internet of things fault operation and maintenance method when executing the computer program.
In order to solve the technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the internet of things fault operation and maintenance method are implemented.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the fault data are input into the trained fault diagnosis model by acquiring the fault data to determine the fault type corresponding to the fault data, and the operation and maintenance measures corresponding to the fault type are triggered according to the preset fault operation and maintenance strategy, so that the workload of manual operation and maintenance is reduced, and the operation and maintenance efficiency is improved.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, thesystem architecture 100 may includeterminal devices 101, 102, 103, anetwork 104, and aserver 105. Thenetwork 104 serves as a medium for providing communication links between theterminal devices 101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use theterminal devices 101, 102, 103 to interact with theserver 105 via thenetwork 104 to receive or send messages or the like. Theterminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
Theterminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an E-book reader, an MP3 player (Moving Picture E internet of things failure operation and maintenance property Group Audio Layer III, motion Picture experts compression standard Audio Layer 3), an MP4 player (Moving Picture E internet of things failure operation and maintenance property Group Audio Layer IV, motion Picture experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.
Theserver 105 may be a server providing various services, such as a background server providing support for pages displayed on theterminal devices 101, 102, 103.
It should be noted that, the internet of things fault operation and maintenance method provided by the embodiment of the present application generally includes a server/a terminalTerminal endDeviceAnd executing, correspondingly, the internet of things fault operation and maintenance device is generally arranged in the server/terminal equipment.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of the internet of things fault operation and maintenance method provided by the present application. The internet of things network is used as an application scene of the embodiment, the main body for executing the internet of things fault operation and maintenance method is an internet of things application delivery device, the internet of things application delivery device comprises an internet of things network data analysis module, a machine learning module and an artificial intelligence control module, operation data of the internet of things network data analysis module is analyzed through the internet of things network data analysis module and converted into a data format meeting requirements, fault data are screened out from the operation data, the fault data are input into a trained fault diagnosis model in the machine learning module to obtain fault types of the fault data, and operation and maintenance measures corresponding to the fault types are triggered from fault operation and maintenance strategies in the artificial intelligence control module to achieve network operation and maintenance intellectualization.
Further, the internet of things network of the present application is a seven-layer network, and the seven-layer network refers to an Open System Interconnection (OSI) model, wherein the OSI enables reliable communication between different networks of different systems through seven layered structural models, and the OSI includes an application layer, a presentation layer, a session layer, a transport layer, a network layer, a data link layer, and a physical layer.
With continued reference to fig. 3, a flowchart of an embodiment of a method for fault operation and maintenance of the internet of things of the present application is shown. The Internet of things fault operation and maintenance method comprises the following steps:
s301: and acquiring fault data.
In the embodiment of the application, the running data of each application delivery node, the running data of the virtual server and the running data of each real server are collected in real time, wherein the running data are performance data, and the performance data comprise state values, such as CPU utilization rate, memory utilization rate, connection number and broadband number.
Further, the specific process of acquiring the fault data includes:
acquiring running data of an application node in real time, wherein the application node comprises a virtual server and a real server, and the running data comprises a state value;
determining an abnormal state value from the state values according to a preset normal state reference table;
and taking the abnormal state value and the application node corresponding to the abnormal state value as fault data, and storing the fault data in the storage and maintenance database.
Specifically, when the application node is a virtual server, each virtual server is collected in real time, and the collected operation data includes: the method comprises the steps of obtaining a data packet, a data packet and a data packet, wherein the data packet comprises a current active connection number, a current active data packet number, a current active bandwidth number, a maximum/average/minimum response time of a virtual server to a first data packet requested by a client, a maximum/average/minimum response time of a virtual server to all data packets returned after the client requests are processed, a current RTT (round trip time) maximum/average/minimum, a WAF (worldwide interoperability traffic) invasion number and a WAF processing number, wherein the RTT is composed of a propagation time (multiprotocol) of a link, a processing time of an end system and a queuing and processing time in a buffer of a router.
Specifically, when the application node is a real server, the current active connection number, the current active bandwidth number, the maximum/average/minimum response time of the real server to the first data packet requested by the client, the maximum/average/minimum response time of all data packets returned after the real server processes the client request, the current RTT maximum/average/minimum, and the number of connections maintained in the current session are collected.
Specifically, when the application node is an application delivery node, the CPU utilization, the memory utilization, the total connection number, the total packet number, the total bandwidth, the real-time status of the virtual server such as UP/DOWN/disable, the real-time status of the virtual service sub-node such as UP/DOWN/disable, the real-time status of the real server such as UP/DOWN/disable, the current active connection number, the current active packet number, and the current active bandwidth number are collected in real time.
The preset normal state reference table is the operation data of different application nodes in normal operation, that is, the normal state reference table records the normal state value ranges of different application nodes. Comparing the currently acquired state value of the application node according to the normal state value range of the application node, and when the state value is not in the normal state value range, marking the state value as an abnormal state value, for example, the marking mode can be a label carrying a preset field or color marking on the field where the state value is located, so as to realize accurate positioning of a fault source.
Furthermore, the abnormal state value and the application node generating the abnormal state value are used as fault data to be stored in the operation and maintenance database, so that the corresponding fault type can be more accurately identified by adopting a fault diagnosis model subsequently, and a data basis is provided for training the fault diagnosis model.
S302: and inputting the fault data into a trained fault diagnosis model to determine the fault type corresponding to the fault data.
The trained fault diagnosis model comprises interest information such as an abnormal state value and a corresponding application node of each fault type, and a weight value corresponding to the interest information of each fault type, wherein the weight values corresponding to the same interest information of different fault types can be the same or different.
Further, determining the fault type corresponding to the fault data specifically includes:
detecting the abnormal state value through a deep neural network model;
if the known fault type corresponding to the abnormal state value is detected, outputting the fault type;
and if the known fault type corresponding to the abnormal state value cannot be detected, taking the abnormal state value as an abnormal variable value, and storing the abnormal variable value into the operation and maintenance database.
In this embodiment, the detecting the abnormal state value by using the deep neural network model includes detecting that the abnormal state value matches with the interesting information of the corresponding fault type, for example, the abnormal state value is m that the number of active data packets of the virtual server is, the response time of the virtual server to the first data packet requested by the client is n, the interesting information of the fault type a includes m and n, and the interesting information of the fault type B includes m, n and x ', then the fault type a and the fault type B corresponding to the abnormal state value at the beginning are detected at this time, and further, m and n are calculated according to the weights in the fault type a and the fault type B, a fault weight value a ' of the abnormal state value in the fault type a and a fault weight value B ' of the fault type B are calculated, and whether the fault weight value a ' and the fault weight value B ' are greater than a preset weight threshold is compared, if the abnormal state value is greater than the preset value, it indicates that the abnormal state value is a key factor of the fault type fault, that is, the abnormal state value affects normal operation of the application node, and further a fault type corresponding to the abnormal state value can be determined, the output fault types can be at least 1, for example, the fault weight value a 'and the fault weight value B' are both greater than a preset weight threshold, the fault type a and the fault type B are respectively output, and the output can be in an order of the magnitude of the fault weight. The failure weight value may be calculated by multiplication and accumulation, for example, the weight values a and b corresponding to m and n in the failure type a, respectively, and the failure weight value a' may be m × a + n × b, where a and b are known set percentage values and both satisfy the range of (0, 1).
Further, if the known fault type corresponding to the abnormal state value is not detected, for example, the abnormal state value includes that the WAF intrusion amount is x, and the interested information does not exist in all the fault types, the abnormal state value is used as an abnormal variable value, an alarm can be further issued at this time to prompt manual diagnosis and manual operation and maintenance, and a new fault type redefined after the manual operation and maintenance and a corresponding operation and maintenance measure are stored in the operation and maintenance database together to be used as sample data for subsequently adjusting the deep neural network model.
Further, the deep neural network model training method includes:
marking the abnormal variable value;
inputting the marked abnormal variable values into a deep neural network model for training, and outputting results;
and if the matching probability of the output result is smaller than a preset matching threshold, performing parameter adjustment on the deep neural network model, and stopping training until the matching probability of the output result reaches the matching threshold.
In the embodiment of the application, the training data for training the deep neural network model may be obtained from the operation and maintenance database, and the historical operation data may be obtained from each application delivery node according to days, months and years, where the historical operation data includes the number of connections, the number of packets, the bandwidth, TPS (Transactions Per Second), and the number of SSL (secure socket protocol) connections; historical operating data of each virtual server comprises the number of active connections, the number of active bandwidths, the maximum/average/minimum response time of the virtual server to a first data packet requested by a client, the maximum/average/minimum response time of all data packets returned by the virtual server after the client requests are processed, the maximum/average/minimum RTT, the number of WAF intrusions, the number of WAF processes and WEB data compression; the historical operating data for each real server includes the number of active connections, the number of active bandwidths, the maximum/average/minimum response time of the real server to the first data packet requested by the client, the maximum/average/minimum response time of all data packets returned after the real server processes the client request, the maximum/average/minimum RTT, session holding connections, and the like.
Specifically, inputting the marked abnormal variable values into a deep neural network model for training, and outputting the result comprises the following steps:
calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In the embodiment of the present application, based on supervised learning, the abnormal variable values are labeled, the labeling mode may be in a label form, and the labeled abnormal variable values are input into an initial deep neural network model for training, the initial neural network model is configured with parameters corresponding to known fault types, and obtains a trained output result, the output result is matched with a diagnostic type corresponding to a labeled state variable value, so as to obtain a matching probability, where the matching probability is a fault type comparison accuracy, for example, comparing fault types corresponding to 1000 labeled abnormal variable values output with fault types actually corresponding, and if the comparison is consistent, indicating that the abnormal variable values are successfully compared with the corresponding fault types, for example, there are 950, the matching probability is p ÷ 1000 × 100% ═ 95%, and if the preset matching threshold is 98%, performing parameter adjustment on the deep neural network model, the parameter adjustment can be that the abnormal variable value which is processed by clustering and has the obvious interesting information characteristic is used as the abnormal state value of a new fault type in the deep neural network model, the weight corresponding to the abnormal state value is distributed, the marked abnormal variable value is input into the adjusted deep neural network model again for training, the abnormal variable value is input into the adjusted deep neural network model and can be better matched with the corresponding fault type, the deep neural network model is continuously trained in the way until the matching probability of the output result reaches the matching threshold value, so that more abnormal state values can identify the corresponding fault type, and the identification accuracy and efficiency are improved.
S303: and triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
The preset fault operation and maintenance strategy is operation and maintenance measures corresponding to each known fault type and pre-recorded in an operation and maintenance database by operation and maintenance personnel, and the operation and maintenance measures comprise operation and maintenance code segments, a capacity expansion tool, a file call tool and the like. For example, when the fault type is link jitter, packet loss, or delay, the capacity expansion may be performed by using a capacity expansion tool if the operation and maintenance measure is obtained as capacity expansion.
In the embodiment of the application, the fault data is acquired and input into the trained fault diagnosis model to determine the fault type corresponding to the fault data, and the operation and maintenance measures corresponding to the fault type are triggered according to the preset fault operation and maintenance strategy, so that the workload of manual operation and maintenance is reduced, and the operation and maintenance efficiency is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 4, as an implementation of the method shown in fig. 3, the present application provides an embodiment of an internet of things fault operation and maintenance device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 3, and the device may be specifically applied to various electronic devices.
As shown in fig. 4, the internet of things fault operation and maintenance device described in this embodiment includes: anacquisition module 401, adiagnosis module 402 and an operation andmaintenance module 403. Wherein:
an obtainingmodule 401, configured to obtain fault data;
adiagnosis module 402, configured to input fault data into a trained fault diagnosis model to determine a fault type corresponding to the fault data;
and the operation andmaintenance module 403 is configured to trigger an operation and maintenance measure corresponding to the fault type according to a preset fault operation and maintenance policy.
Further, the acquisition module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the running data of an application node in real time, the application node comprises a virtual server and a real server, and the running data comprises a state value;
the determining unit is used for determining an abnormal state value from the state values according to a preset normal state reference table;
and the storage unit is used for taking the abnormal state value and the application node corresponding to the abnormal state value as fault data and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, the diagnosis module includes:
the detection unit is used for detecting the abnormal state value through the deep neural network model;
the output unit is used for outputting the fault type if the known fault type corresponding to the abnormal state value is detected;
and the exception unit is used for taking the abnormal state value as an exception variable value and storing the exception variable value into the operation maintenance database if the known fault type corresponding to the abnormal state value is not detected.
Further, the internet of things fault operation and maintenance device further comprises a fault detection unit;
the marking module is used for marking the abnormal variable value;
the training module is used for inputting the marked abnormal variable values into the deep neural network model for training and outputting results;
and the adjusting module is used for adjusting parameters of the deep neural network model if the matching probability of the output result is smaller than a preset matching threshold value, and stopping training until the matching probability of the output result reaches the matching threshold value.
Further, the training module comprises:
the calculating unit is used for calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and the result unit is used for taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
With regard to the internet of things fault operation and maintenance device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.
Thecomputer device 5 comprises amemory 51, aprocessor 52, anetwork interface 53 communicatively connected to each other via a system bus. It is noted that only acomputer device 5 having components 51-53 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Thememory 51 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., an SD or D internet of things fault operation and maintenance memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, thememory 51 may be an internal storage unit of thecomputer device 5, such as a hard disk or a memory of thecomputer device 5. In other embodiments, thememory 51 may also be an external storage device of thecomputer device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on thecomputer device 5. Of course, thememory 51 may also comprise both an internal storage unit of thecomputer device 5 and an external storage device thereof. In this embodiment, thememory 51 is generally used for storing an operating system installed in thecomputer device 5 and various types of application software, such as program codes of the internet of things fault operation and maintenance method. Further, thememory 51 may also be used to temporarily store various types of data that have been output or are to be output.
Theprocessor 52 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. Theprocessor 52 is typically used to control the overall operation of thecomputer device 5. In this embodiment, theprocessor 52 is configured to execute the program code stored in thememory 51 or process data, for example, execute the program code of the internet of things fault operation and maintenance method.
Thenetwork interface 53 may comprise a wireless network interface or a wired network interface, and thenetwork interface 53 is generally used for establishing communication connections between thecomputer device 5 and other electronic devices.
The present application further provides another embodiment, that is, a computer-readable storage medium is provided, where an internet of things fault operation and maintenance program is stored in the computer-readable storage medium, and the internet of things fault operation and maintenance program is executable by at least one processor, so as to cause the at least one processor to execute the steps of the internet of things fault operation and maintenance method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.