Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
As shown in fig. 1 in the related art, in the current network device, a master control board performs interface expansion through a PCIe Switch chip (PCIe Switch chip/PCIe bridge chip), connects a plurality of service boards, selects a PCIe link through a Switch circuit on the service boards, and determines whether to communicate with the master control board or communicate with a standby board. When the network device performs the active-standby switching, the switching circuit can be controlled to implement the active-standby switching, and when the service board card is hot-plugged, the PCIe standard hot-plugging flow can be started to implement the hot-plugging.
Although the current network device can support the main/standby switching and hot plug mechanisms, the occurrence of the network device failure is uncontrollable, and when the main/standby switching or hot plug event occurs, the main control board and the service board are communicating, which may possibly cause unexpected abnormality of the network device due to untimely notification. For example, the main control board card and the service board card communicate normally, and when the main control board card is removed by violence, the service board card is abnormal because the service board card does not receive the response of the main control board card in time, which may cause the service board card to fail to respond to the instruction of the standby board card. If the service board card is performing memory access, the memory of the standby board card may be at risk of being illegally modified.
In view of the above technical problems, an embodiment of the present invention provides a more stable and reliable network device system architecture, and as shown in fig. 2, a Field Programmable Gate Array (FPGA) is introduced between a main control board (or a standby board) and a service board, and the FPGA is used to replace an original PCIe switch chip/PCIe bridge, so that a fault in a network device can be monitored, a response can be made at a first time when the fault occurs, and unexpected abnormality of the network device is avoided. For further explanation of the present application, the following examples are provided:
as shown in fig. 3, which is a schematic view of an implementation flow of the network device failure processing method according to the embodiment of the present application, the method may specifically include the following steps:
s301, monitoring whether a main control board card in the network equipment meets a preset first requirement according to a preset first monitoring period, wherein the preset first requirement is that the main control board card is hot-plugged or abnormal;
the embodiment of the application is applied to the FPGA, and whether a main control board card in the network equipment meets a preset first requirement is monitored according to a preset first monitoring period, wherein the preset first requirement can be that the main control board card is hot-plugged or is abnormal, and the main control board card is abnormal, which means that main-standby switching is required.
If the main control board card in the network device does not meet the preset requirement and the service board card does not meet the preset requirement, the normal operation of the network device is indicated, the FPGA can receive the access instruction of the main control board card to the service board card and forward the access instruction to the corresponding service board card, and meanwhile, the FPGA can also receive the access instruction of the service board card to the main control board card and forward the access instruction to the main control board card. In the process of forwarding the instruction, forwarding is performed based on address mapping, where the address mapping is a mapping relationship between a service board address space and an address space of the FPGA itself, and between an address space of the main control CPU or the standby CPU and an address space of the FPGA itself, as shown in the following tables 1 and 2:
| CPU | FPGA | service board card address space |
| Master or standby CPUs | Address space 1 | Service board 1 address space |
| Master or standby CPUs | Address space 2 | Service board 2 address space |
| Master or standby CPUs | Address space 3 | Service board 3 address space |
TABLE 1
TABLE 2
As can be seen from table 1, when an access instruction of the main control board to the service board is received, according to an access address space carried in the access instruction: the address space 1 forwards the address space to the service board card 1;
as can be seen from table 2, when an access instruction from a service board to a master control board is received, according to an access address space carried in the access instruction: address space 4, which forwards it to the master CPU.
S302, if a main control board card in the network equipment meets a preset requirement, stopping forwarding access instructions of all service board cards to the main control board card, and sending instructions for stopping accessing the main control board card to all service board cards;
and if the main control board card in the network equipment meets the preset requirement, stopping forwarding the access instruction of all the service board cards to the main control board card, and sending the instruction of stopping accessing the main control board card to all the service board cards. Because the main control board card may be hot-plugged or abnormal, different treatments are required to be performed for different situations:
if the main control board card in the network equipment meets the preset requirement, judging whether the met preset requirement is that the main control board card is abnormal; if the main control board card is not abnormal, directly stopping forwarding the access instruction of all the service board cards to the main control board card, and sending an instruction for stopping accessing the main control board card to all the service board cards; if the main control board card is abnormal, the main and standby switching is required, when the forwarding of the access instruction of all the service board cards to the main control board card is stopped, the instruction for stopping accessing the main control board card is sent to all the service board cards, and meanwhile, whether the service board card access instruction sent by the standby board card is received or not needs to be monitored according to a preset third monitoring period; when a service board access instruction sent by the standby board is received, indicating that the main-standby switching is completed, the service board access instruction can be forwarded to the corresponding service board based on the address mapping, and an instruction for accessing the standby board is sent to all the service boards.
S303, monitoring whether a service board card in the network equipment meets a preset second requirement according to a preset second monitoring period, wherein the preset second requirement is that the service board card is hot-plugged or abnormal;
whether the main control board card meets a preset first requirement or not is monitored, and whether a service board card in the network equipment meets a preset second requirement or not can be monitored according to a preset second monitoring period, wherein the preset second requirement can be that the service board card is hot-plugged or that the service board card is abnormal.
When the service board card does not meet the preset second requirement and the main control board card does not meet the preset requirement at the same time, indicating that the network device normally operates, the access instruction of the service board card to the main control board card or the access instruction of the main control board card to the service board card can be forwarded based on the address mapping.
And S304, if the service board card in the network equipment meets a preset second requirement, stopping forwarding the access instruction of the main control board card to the service board card, and sending the instruction for stopping accessing the service board card to the main control board card.
If the service board card in the network device meets the preset second requirement, it indicates that the service board card is abnormal or is hot-plugged, and at this time, it is necessary to stop forwarding the access instruction of the main control board card to the service board card, and send an instruction to stop accessing the service board card to the main control board card (or the standby board card).
The service board card meeting the preset second requirement can be determined according to the identification of the link between the field programmable gate array and the service board card or the identification of the service board card, the access instruction of the main control board card to the service board card meeting the preset second requirement is stopped to be forwarded, and the instruction for stopping accessing the service board card meeting the preset second requirement is sent to the main control board card.
In addition, on the basis of the above scheme, the technical scheme provided by the embodiment of the application may further include the following steps: and if the main control board card in the network equipment meets the preset requirement, retraining the PCIe link between the field programmable gate array and the main control board card.
If the main control board card in the network equipment meets the preset requirement, the main control board card indicates that the main control board card is abnormal or is hot-plugged, at the moment, the PCIe link between the field programmable gate array and the main control board card cannot be used any more, and when the main control board card returns to be normal, retraining is needed, so that a new PCIe link between the field programmable gate array and the main control board card is obtained.
Through the above description of the technical solution provided in the embodiment of the present application, a Field Programmable Gate Array (FPGA) is introduced between a main control board (or a standby board) and a service board, so as to monitor a fault in a network device, and respond at a first moment when the fault occurs, thereby avoiding unexpected abnormality of the network device.
Corresponding to the foregoing embodiment of the network device failure processing method, an embodiment of the present application further provides a network device processing apparatus, as shown in fig. 4, the apparatus may include: afirst monitoring module 410, afirst processing module 420, asecond monitoring module 430, and asecond processing module 440.
Thefirst monitoring module 410 is configured to monitor whether a master control board in the network device meets a preset first requirement according to a preset first monitoring period, where the preset first requirement is that the master control board is hot-plugged or abnormal;
thefirst processing module 420 is configured to, if a main control board card in the network device meets a preset requirement, stop forwarding an access instruction of all service board cards to the main control board card, and send an instruction to stop accessing the main control board card to all service board cards;
thesecond monitoring module 430 is configured to monitor whether the service board in the network device meets a preset second requirement according to a preset second monitoring period, where the preset second requirement is that the service board is hot-plugged or abnormal;
thesecond processing module 440 is configured to stop forwarding the access instruction of the main control board to the service board if the service board in the network device meets a preset second requirement, and send an instruction to stop accessing the service board to the main control board.
In a specific implementation manner of the embodiment of the present application, thefirst processing module 410 is specifically configured to:
if the main control board card in the network equipment meets the preset requirement, judging whether the met preset requirement is that the main control board card is abnormal;
if not, stopping forwarding the access instructions of all the service board cards to the main control board card, and sending the instruction of stopping accessing the main control board card to all the service board cards.
In a specific implementation manner of the embodiment of the present application, the apparatus further includes:
a third processing module 450, configured to stop forwarding the access instructions of all the service boards to the master control board if yes, send an instruction to stop accessing the master control board to all the service boards, and monitor whether the service board access instruction sent by the standby board is received according to a preset third monitoring period;
and when receiving a service board access instruction sent by the standby board, forwarding the service board access instruction to the corresponding service board, and sending an instruction for accessing the standby board to all the service boards.
In a specific implementation manner of the embodiment of the present application, thesecond processing module 440 is specifically configured to:
if the service board card in the network equipment meets a preset second requirement, determining the service board card meeting the preset second requirement according to the identification of the link between the field programmable gate array and the service board card or the identification of the service board card;
and stopping forwarding the access instruction of the main control board card to the service board card meeting the preset second requirement, and sending an instruction for stopping accessing the service board card meeting the preset second requirement to the main control board card.
In a specific implementation manner of the embodiment of the present application, the apparatus further includes:
the fourth processing module 460 is configured to retrain the PCIe link between the field programmable gate array and the master control board if the master control board in the network device meets a preset requirement.
The implementation process of the device is detailed in the implementation process of the corresponding steps in the method, and is not described herein again.
Through the above description of the technical solution provided in the embodiment of the present application, a Field Programmable Gate Array (FPGA) is introduced between a main control board (or a standby board) and a service board, so as to monitor a fault in a network device, and respond at a first time when a fault occurs, thereby avoiding unexpected abnormality of the network device.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The foregoing is directed to embodiments of the present invention, and it is understood that various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention.