CN113055084B

Movatterモバイル変換

Info

Publication number: CN113055084B
Application number: CN202110297482.1A
Authority: CN
Inventors: 盛伟
Original assignee: Fiberhome Telecommunication Technologies Co Ltd
Current assignee: Fiberhome Telecommunication Technologies Co Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2022-04-26
Anticipated expiration: 2041-03-19
Also published as: CN113055084A

Abstract

The present invention relates to the field of optical communications, and in particular, to a method and a system for recovering an optical network service failure. The method comprises the following steps: the centralized controller synchronously establishes a whole-network topology information base and a whole-network service connection information base according to whole-network topology information and service connection information reported by all distributed control level nodes in a network; receiving network fault link information report, and determining the calculation type of rerouting according to the fault service type; the centralized controller generates a reroute connection path calculation task corresponding to each fault service, the centralized controller distributes the calculation task to the centralized controller or the distributed control nodes for processing according to the calculation type of the reroute, and the centralized controller sends a reroute path to the control nodes according to the calculation result; and updating the whole network topology information base and the whole network service connection information base according to the connection state after rerouting. The method improves the calculation efficiency of the reconnection path, eliminates the path resource allocation conflict and provides an efficient and reliable method for restoring the optical network fault service.

Description

Method and system for recovering optical network service fault

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of optical communications, and in particular, to a method and a system for recovering an optical network service failure.

[ background of the invention ]

In the application of the optical communication network, in order to ensure the stability of communication, when a network link or a network node fails, a plurality of services passing through a failure point need to be restored as soon as possible to reduce the loss of service traffic.

The failure recovery of the current optical network intelligent system generally uses two control technologies: the first is a distributed control plane technology based on an Automatic Switching Optical Network (ASON), where control planes where different fault services are located respectively calculate rerouting connection paths in parallel, so as to achieve fast recovery of part of the fault services, but since different fault service source nodes calculate paths and allocate Network resources in parallel based on a local topology information base, rerouting Network resources obtained by calculation at different distributed control nodes may conflict with each other, resulting in failure of establishment of part of rerouting connections. The second is a centralized controller technology based on a Software Defined Network (SDN), where a control core of the SDN employs a centralized Path Computation Element (PCE) technology, and multiple failure service connection paths are computed globally and optimally based on a full Network topology and service connection information, but failure services in all control levels need to be computed by the centralized controller, the Computation pressure of the centralized controller is large, and multiple rerouting connection paths need to be computed in sequence to eliminate resource conflicts, so that a large Computation delay is generated for a sequential failure service, and the requirement on failure service recovery real-time performance is affected.

In view of this, how to overcome the defects in the prior art and solve the respective defects of the two current fault recovery techniques is a problem to be solved in the technical field.

[ summary of the invention ]

In view of the above drawbacks or needs for improvement of the prior art, the present invention solves the problem of inconsistent availability and computational efficiency of computation results when distributed control plane technology is used alone or centralized controller technology is used alone.

The embodiment of the invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for recovering an optical network service failure, which specifically includes: the centralized controller synchronously establishes a whole-network topology information base and a whole-network service connection information base according to whole-network topology information and service connection information reported by all distributed control level nodes in a network; receiving network fault link information report, and determining the calculation type of rerouting according to the fault service type; the centralized controller generates a reroute connection path calculation task corresponding to each fault service, the centralized controller distributes the calculation task to the centralized controller or the distributed control nodes for processing according to the calculation type of the reroute, and the centralized controller sends a reroute path to the control nodes according to the calculation result; and updating the whole network topology information base and the whole network service connection information base according to the connection state after rerouting.

Preferably, the calculation type of the rerouting specifically includes:

the centralized controller calculates a connection path of the rerouting based on the whole network topology information base and the whole network service connection information base; and/or the centralized controller sends a calculation request to a control plane where the fault node is located, distributed calculation is carried out on the rerouting by distributed control level nodes in the control plane based on a full-network topology information base, and a rerouting result is reported to the centralized controller for verification.

Preferably, the rerouting according to the calculation result specifically includes: and issuing a connection path of the rerouting to a control plane where the fault node is located, controlling the corresponding node by the distributed control level node by using an ASON signaling to establish rerouting connection, and performing service recovery on the service before the fault.

Preferably, the centralized controller allocates the computation task to the centralized controller or the distributed leveling node for processing according to the computation type of the rerouting, and further includes: and when the centralized controller has no calculation task, allocating the deepest task in the connection path in the tasks to be calculated of the distributed control level nodes to the centralized controller for calculation.

Preferably, reporting the rerouting result to the centralized controller for verification, specifically including: the centralized controller checks a rerouting connection path returned by the distributed leveling nodes and judges whether the rerouting connection path is available; if the connection path of the rerouting is not available, calculating the connection path of the rerouting by the distributed leveling node again after adding the constraint condition; if the rerouting connection path is available, the rerouting connection path passes the verification, and rerouting is performed on the fault service according to the calculation result.

Preferably, the adding of the constraint condition specifically includes: the centralized controller analyzes the rerouting connection path to obtain the reason of the verification failure; and generating a constraint condition of the rerouting according to the reason of the check failure, adding the constraint condition of the rerouting in the path information to be calculated, and sending the constraint condition back to the distributed leveling control node.

Preferably, before the synchronous establishment of the topology information base of the whole network and the service connection information base of the whole network, the method further includes: and establishing a PCEP server interface, using the centralized controller as a PCE server, and establishing PCEP session connection with each distributed control level node through the PCEP server interface.

Preferably, the synchronous establishment of the topology information base of the whole network and the service connection information base of the whole network specifically includes: the centralized controller receives link topology information reported by a link or each distributed control level node and stores the link topology information into a whole network topology information base; the centralized controller receives the local service connection information reported by the link or each distributed control level node and stores the local service connection information into the whole network service connection information base.

Preferably, the receiving the report of the network fault link information specifically includes: and the distributed control level node receives the alarm message of the link, generates a link fault message and reports the link fault message to the centralized controller.

In another aspect, the present invention provides a system for recovering an optical network service failure, including an integrated controller subsystem and at least one distributed control level node subsystem, where the integrated controller subsystem and each distributed control level node subsystem perform message interaction through a PCEP interface to complete a function of recovering the optical network service failure as in the first aspect.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: by combining the distributed control plane technology and the centralized controller technology, the centralized controller integrates network connection data and service data, performs unified allocation management on computing resources, and schedules the distributed control nodes to perform parallel computation, so that the network computing resources are fully utilized, and the computation efficiency of the reconnection path of the fault service is improved. In addition, in the preferred scheme, the centralized controller checks the calculation result of the distributed control plane node, so that the availability of the rerouted path is ensured, the resource allocation conflict of the distributed calculation path is eliminated, and an efficient and reliable method for recovering the optical network fault service is provided.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart of a method for recovering an optical network service failure according to an embodiment of the present invention;

fig. 2 is a flowchart of another method for recovering from an optical network service failure according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a task queue used in a method for recovering an optical network service failure according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a task queue used in another method for recovering an optical network service failure according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a network topology used in a method for recovering an optical network service failure according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a network topology used in another method for recovering an optical network service failure according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a system architecture for recovering an optical network service failure according to an embodiment of the present invention;

wherein the reference numbers are as follows:

1: centralized controller subsystem, 11: whole network service connection management module, 12: full-network topology information base, 13: global optimized path calculation module, 14: rerouting connection calculation classification management module, 15: the network service is connected to the control module,

2: distributed leveling node subsystem, 21: local service connection management module, 22: network topology management module, 23: local path calculation module, 24: local service connection control module, 25: local resource management module, 26: and a signaling module.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The present invention is a system structure of a specific function system, so the functional logic relationship of each structural module is mainly explained in the specific embodiment, and the specific software and hardware implementation is not limited.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.

Example 1:

in the optical communication process, in order to reduce the influence of the network link or network node fault on the service, the optical network intelligent control system supports the automatic establishment of an end-to-end path of the service and the automatic recovery of the fault connection rerouting. According to the transport network topology automatic discovery protocol ITU-T g.7714.1, the optical network control system can automatically discover the link and establish a Traffic Engineering (TE) -based full network topology information base (TED). When the control system receives a connection establishment request of an end-to-end service or receives a fault alarm to trigger service connection end-to-end rerouting recovery, the control system calculates an end-to-end connection path based on the whole network topology information base, allocates transmission resources, and issues service connection configuration to a transmission network element node to realize automatic establishment or rerouting recovery of the end-to-end connection of the optical network service.

Two common control techniques for automatic failure recovery of an optical network intelligent system are as follows:

the First distributed control plane (hereinafter, abbreviated as control plane) technology based on ASON, the ASON control plane core is composed of Link management Protocol (Link Manager Protocol, abbreviated as LMP), Resource ReSerVation Protocol-Traffic Engineering (Resource Reserve Resource ReSerVation Protocol-Traffic Engineering, abbreviated as RSVP-TE), Open Shortest Path First-Traffic Engineering (Open Shortest Path First-Traffic Engineering, abbreviated as OSPF-TE) with Traffic Engineering, and end-to-end Path computation technology, the ASON control plane system is independently operated on each equipment node of the network, the control plane node discovers the TE Link of the node through LMP Protocol, mutually floods and advertises the TE Link through the F-OSPF-TE Protocol, distributively constructs a TE topology information base of the whole network, the service source node control plane is based on the whole network topology, calculates the end-to-end service Label switching connection (Label Switch Path, abbreviated as LSP), and through RSVP-TE protocol, service connection is issued end to end along the LSP path, and service equipment configuration is issued node by node. When a fault occurs, the level control node where the source node of the fault service is located calculates the rerouting connection path of the fault service in the level control in a distributed and parallel mode, and then the rerouting connection is configured in parallel by the path node, so that the quick recovery of the fault service is realized through parallel calculation. However, since the control levels of different failure service source nodes are only based on the local topology information base to calculate paths and allocate network resources in parallel, the rerouting connections calculated by different control level nodes may allocate the same network resources to generate conflicts, which results in failure of establishment of partial rerouting connections, although a retry mechanism of the control levels triggers the recalculation of the failed connections, the restoration performance of the failure service rerouting is reduced, the calculation load of the control level nodes is increased, and the control levels do not have the whole network connection information, so that the calculation of the paths based on the whole network topology information base cannot realize the calculation of the global optimization paths, and the success rate of restoration of the failure service rerouting is reduced.

In the second centralized controller technology based on the SDN, a SDN control core adopts a centralized Path Computation Element (PCE) technology, a deployment control server is connected with each network Element equipment node, the equipment nodes automatically discover links and report to a centralized controller, the centralized controller constructs a whole network TE topology information base and a service connection information base, end-to-end service label switching connection is computed through a centralized global optimization Path algorithm, the centralized controller directly issues Path Computation results to the links, the equipment network Element nodes are configured, and service connection creation and rerouting recovery are controlled in a centralized mode. Because the centralized controller carries out path recalculation based on the whole network topology and the service connection information, the method can support global optimization to calculate a plurality of fault service connection paths, has no problem of resource allocation conflict, and can ensure the success rate of fault service recovery. However, the centralized controller needs to calculate all the failure traffic restoration paths of the entire network, and when the number of failures is large, the calculation pressure of the centralized controller is large. On the other hand, in order to eliminate resource conflicts between different paths, the rerouting connection path of each failed service needs to be calculated in sequence, and a large routing delay is generated for the failed service at the tail of the queue, which affects the requirement on recovery real-time performance of the failed service.

In view of the advantages and disadvantages of the above two control techniques, an improved control method combining centralized and distributed control techniques is proposed in the prior art. When the network fails, the centralized controller preferentially controls the recovery of the failure service, calculates the rerouting connection path through global optimization, and issues the equipment rerouting connection configuration; when the failed service rerouting connection needing to be recovered simultaneously is more, the processing pressure of the integrated controller is higher, and the time delay of the failed service is longer, the integrated controller delegates part of the failed service rerouting connection to the service source node for controlling the level, and the level control node calculates the rerouting connection path in a distributed mode, so that the failed service rerouting recovery is realized. However, in the prior art, the distributed tie nodes do not have full-network service connection information, and do not support global optimization path calculation, and the reroute connection paths calculated in parallel by different tie nodes may have network transmission resource allocation conflicts, which may cause the reroute connection path recalculation or calculation failure, and reduce the performance of fast recovery and success rate of the failed service.

In order to solve the above problems, in the service failure recovery method provided by the present invention, the centralized controller performs synchronous unified management on the topology information of the whole network and the service connection information of the whole network, and performs unified allocation on the computing resources. And after the distributed control level node calculates the connection path of the rerouting, the rerouting is not directly performed, but is reported to the centralized controller to perform the rerouting after being checked according to the whole network service connection information, so that the problem that in the prior art, the calculation result of the distributed control level node only can determine whether the calculation result is correct according to the rerouting result of the link is solved, the time and resources for path issuing, rerouting attempt and rerouting result reporting are saved, and the efficiency of service fault recovery and the accuracy of path selection are improved.

As shown in fig. 1, the method for recovering an optical network service failure provided in the embodiment of the present invention includes the following specific steps:

step 101: the centralized controller synchronously establishes a whole-network topology information base and a whole-network service connection information base according to whole-network topology information and service connection information reported by all distributed control level nodes in the network.

Step 102: and receiving the report of the network fault link information, and determining the calculation type of the rerouting according to the fault service type.

In the existing control scheme, only each distributed control level node processes the fault information in the control level, or only the centralized controller directly processes the fault information submitted by each link. In this embodiment, each control plane receives the alarm information of all links in the control plane, generates a topology link failure message, and reports the topology link failure message to the distributed control plane node of the control plane, and the distributed control plane node receives the alarm information of the links, generates a link failure message, and reports the link failure message to the centralized controller. And the centralized controller updates the whole network topology information base and the whole network service connection information base according to the fault message reported by the control level, traverses all service connections on each fault link and creates a connection to be rerouted of each fault service.

In the specific implementation process, a proper calculation type can be determined according to the service type difference such as whether the service connection has non-homologous connection and the like. In this embodiment, for each individual fault service, the calculation types include: and completing path recalculation and rerouting by using the centralized controller, and completing path recalculation and rerouting for the fault service by using the distributed control nodes. For all fault services, the centralized controller and each distributed leveling node complete calculation tasks in parallel, and only one calculation device may perform calculation at the same time point, or a plurality of calculation devices may perform calculation at the same time.

Step 103: the centralized controller generates a rerouting connection path calculation task corresponding to each fault service, the centralized controller distributes the calculation task to the centralized controller or the distributed control nodes for processing according to the calculation type of rerouting, and the centralized controller sends a rerouting path to the control nodes according to the calculation result.

In order to avoid the problem in the prior art that a centralized controller is used alone or distributed control level nodes are used alone, a service to be recovered is matched with an appropriate calculation type according to a service type, and then an appropriate calculation node is selected according to the calculation type to perform path recalculation and rerouting.

And for different fault services, the centralized controller distributes the calculation tasks to different calculation devices according to the calculation types. The following simply provides examples of the allocation of the computation tasks in some specific implementation scenarios, and different allocations can be made according to the actual situation and the computation type.

(1) And distributing the non-homologous services needing global optimization calculation to the centralized controller, and calculating the connection path of the rerouting by the centralized controller based on the full-network topology information base and the full-network service connection information base to obtain the optimized collision-free path in the global range.

(2) Distributing the service limited in one control level to the distributed control level nodes of the control level, sending a calculation request to the control plane where the fault node is located, performing distributed calculation on the rerouting by the distributed control level nodes in the control plane, and reporting the rerouting result.

(3) The distributed control level nodes calculate the services related to the control level, the centralized controller performs conflict check on the path calculation result of each control level, and sends the path with the check failure back to the distributed control level nodes for recalculation, so that the path conflict or resource conflict caused by parallel calculation or link resource change is avoided under the condition that the calculation efficiency is improved through the distributed calculation.

Under the condition of multitasking, the centralized controller and the plurality of distributed leveling nodes are used for parallel computing, the processing efficiency of the fault service is improved, and the waiting time of fault recovery is reduced.

In order to further improve the computing efficiency and fully utilize the computing resources, when the centralized controller has no computing task, some connection paths to be computed in the distributed control plane nodes may be allocated to the centralized controller for computation. Since the deeper connection path involves a larger number of connections and is more likely to conflict with other paths, in a preferred scheme of this embodiment, the task at the deepest of the connection paths in the tasks to be calculated of the distributed control plane node is allocated to the centralized controller for calculation.

After calculating the connection path of the rerouting, the centralized controller performs rerouting connection issuing to the distributed leveling nodes, and issues device connection configuration of rerouting node by node along the connection path through signaling messages, so as to complete establishment of rerouting connection, realize recovery of service connection, and ensure normal communication at two ends of a service.

Step 104: and updating the whole network topology information base and the whole network service connection information base according to the connection state after rerouting.

After rerouting is performed according to the connection path calculation result, the network link reestablishes the link according to the rerouted path, the link connection relationship between the network nodes changes, and the connection paths at the two ends of the service also change. In order to ensure that the topology information and the service connection information for service restoration after each rerouting are consistent with the actual connection condition of the current network, a result of rerouting connection establishment needs to be reported by a link or a distributed control node, and the whole-network topology information base and the whole-network service connection information base are updated and synchronized to ensure that the latest topology information and service connection information are used by subsequent calculation seeds.

Through thesteps 101 to 104, the centralized controller and the distributed leveling nodes are used as the controllers for path recalculation and rerouting, and the advantages of the distributed control plane technology and the centralized controller technology are combined, so that path conflicts can be reduced, and the efficiency of path calculation can be improved.

In a specific implementation scenario of the embodiment of the present invention, any link protocol and network transmission protocol capable of implementing the method provided by the present invention may also be used. In the preferred scheme, the network link is a TE link, and the whole network topology information base is a whole network TE topology information base; the centralized controller is an SDOTN centralized controller; the distributed level control node is an ASON level control node; the centralized controller is used as a PCE service end to carry out conversation with the distributed control level node; the distributed control level node reports data such as link information, service information, fault information, rerouted path calculation results and the like to the centralized controller through a southbound interface, such as a PCEP (physical layer protocol) or an OSPF-TE (open shortest path first-time Ethernet protocol) protocol and the like; the centralized controller sends path connection calculation tasks, verification results, a whole network topology information base or a whole network service connection information base and other data to the distributed control nodes through a PCEP protocol. In the conventional ASON technology, each of the leveling nodes independently completes the calculation when the failure rerouting is automatically triggered, and in a scenario involving the topology data synchronization of the whole network and the calculation of E2E, a problem of management scale occurs. Therefore, a PCE protocol needs to be used to solve the problems that the standalone performance of the distributed ASON peering node is limited and the management scale cannot be rapidly increased. In practical use, if the computing power of a single PCE service end is insufficient, multi-stage PCE cascade can be performed, and the network management scale is further enlarged.

Further, in an implementation scenario using a PCEP interface, in order to correctly establish a PCEP session connection between the centralized controller and the distributed leveling nodes, a PCEP server interface needs to be established first, and the centralized controller is used as a PCE server to establish the PCEP session connection with each leveling node. Meanwhile, the centralized controller also needs to maintain a PCEP protocol interface to ensure the stability of network connection.

Because the distributed control level nodes cannot acquire the connection information of the whole network service, and because of the reasons of distributed parallel computation or network resource change and the like, resource conflict may occur in a plurality of rerouting connection paths computed by different distributed control level nodes. In the existing scheme, the distributed leveling nodes cannot check and identify resource conflicts, and the rerouting failure and the waste of time and resources can be caused by directly using the calculation result of the distributed leveling. In the embodiment of the invention, after the distributed control level nodes calculate the rerouting connection path, the calculation result is not directly used for rerouting, but the calculation result is reported to the centralized controller, and the centralized controller checks the rerouting connection path returned by different distributed control level nodes and judges whether the rerouting connection path is available. If the connection path of the rerouting is not available, calculating the connection path of the rerouting by the distributed leveling node again after adding the constraint condition; if the rerouting connection path is available, the rerouting connection path passes the verification, and rerouting is performed on the fault service according to the calculation result. In the method provided by this embodiment, the centralized controller manages the topology information base and the service connection information base of the whole network in a unified manner, so that the connection paths calculated by the distributed control level nodes can be checked and managed in a unified manner according to the information of the whole network, and the recalculation step is directly executed after the check fails, so that the rerouting failure and resource conflict possibly caused by the fact that the distributed control level nodes perform rerouting directly are avoided, and the efficiency of service fault recovery is improved.

Further, in order to avoid that the connection path after recalculation reuses the collision resource when the previous calculation failed, and the path after recalculation is still unavailable, the centralized controller is further required to analyze the rerouting connection path, obtain the reason of the verification failure, generate the constraint condition of the rerouting according to the reason of the verification failure, add the constraint condition of the rerouting in the path information to be calculated, and send back to the original distributed leveling node for rerouting calculation. For example, in a certain path, since the link L12 between the network node NE1 and the network node NE2 is already occupied by other traffic, the centralized controller needs to send "L12 unavailable" as a rerouting constraint back to the original distributed leveling node.

Further, after the distributed control nodes report the calculation result to the centralized controller, the centralized controller optimizes the path connection reported by each distributed control node based on the path connection reported by the whole network topology database, the whole network service connection information base and other distributed control nodes, searches for path connections with higher communication efficiency, higher resource idleness or more stable connection, and improves the service communication efficiency after the fault service is recovered.

Specifically, as shown in fig. 2, the calculation of the rerouted connection path, the checksum recalculation may be implemented by the following steps, in combination withsteps 101 to 104.

Step 201: and receiving the report of the network fault link, traversing and establishing all service rerouting connections passing through the fault link, and selecting the calculation type of rerouting according to the type of the fault service.

With reference to step 102, after receiving the report of the failed link, the centralized controller determines whether each service connection passes through the failed link according to the full-network topology information base and the full-network service connection information base, and if the service connection passes through the failed link, it indicates that the service is a failed service, and a connection path that does not pass through the failed link needs to be recalculated. After the report of the network fault link is received, the service fault needs to be classified, and a proper calculation type is selected for calculating the connection path.

Step 202: and according to the calculation type of the rerouting, the distributed centralized controller or the distributed leveling nodes calculate the connection path of the rerouting.

After the calculation type distribution is completed, the primary calculation of the connection path is completed by the centralized controller and the distributed control level nodes respectively. Because the centralized controller calculates based on the full-network topology information base and the full-network service connection information base, path conflict cannot be generated to cause that the path is unavailable, and only the connection paths calculated by different distributed control plane nodes may generate conflict to cause that the path is unavailable. However, each distributed leveling node cannot perform resource integration and verification with each other, and therefore, a centralized controller needs to be used for verification.

Step 203: as shown in fig. 3, a first type connection queue is established for the centralized controller, and the rerouted connection that needs to be calculated by the centralized controller is placed in the first type connection queue, and step 204 is performed. And establishing a second-class connection queue for each distributed control plane, putting the rerouted connections which need to be calculated by the distributed control plane nodes into the respective corresponding second-class connection queues, and turning to step 205. The direction indicated by the arrows in the figure is the order of task computation.

Because the service fault is recovered by adopting a first-come first-serve scheduling mode according to the time sequence of receiving the fault and the time sequence of generating the service, the path connection needing recalculation can be managed by using a first-in first-out queue, and each queue element represents a fault service connection. Further, for ease of management, a separate queue is created for each node that can perform path computations independently. The first type of connection queue stores the fault service connection to be processed by the centralized controller; each second type connection queue stores the fault service connection to be processed by each distributed control node, and each element in the second type connection queue is a different rerouting connection of the same source node.

Further, in the specific implementation, the queue may be added according to the order of receiving the report of the fault service, and the processing may be performed according to a first-come-first-served scheduling manner. The priority can also be divided according to the emergency degree or the importance degree of the fault service, the fault service with higher priority is inserted into the position ahead of the queue, and then the processing is carried out according to the queue sequence, so that the fault recovery time of the emergency or important service is reduced, and the loss caused by longer fault recovery time of the emergency or important service is reduced.

Step 204: the centralized controller performs rerouting path connection calculation for the failed traffic in each queue element according to the first-type connection queue sequence, and goes to step 206.

Step 205: the distributed control level nodes perform rerouting path connection calculation for the fault service in each queue element according to the sequence of the corresponding second-type connection queue, and go to step 207.

The method comprises the steps of distributing proper computing types to service faults, distributing the computing types to different computing nodes respectively, using an integrated controller and a plurality of distributed control flat nodes, fully utilizing all computing resources in a network through multi-path parallel computing of cooperative network resources, improving the overall computing efficiency of the system, reducing the total time of service fault recovery, and avoiding overlong waiting time caused by only using the integrated controller for sequential computing. When the task is distributed, the queue is used for managing the computing resources of each computing node, and the computing tasks of each computing node are organized into corresponding connection queues, so that the computing nodes can conveniently acquire, distribute, transfer, distribute, manage and the like the tasks.

Steps

204 and 205, and the corresponding subsequent steps, are performed concurrently.

Step 206: and judging whether the first type of connection queue is empty or not. If the first type connection queue is empty, go to step 207; if the first type connection queue is not empty, go to step 211.

The centralized controller may be in an idle state due to fewer tasks or higher computational efficiency in the centralized controller. In order to fully utilize the computing capacity of the centralized controller, tasks to be computed in the distributed control level nodes can be distributed to the centralized controller for processing, so that the overall fault recovery efficiency is improved.

Step 207: as shown in fig. 4, when the first-type connection queue is empty, the connection path with the deepest path depth in the second-type connection queue array is placed in the first-type connection queue, and instep 204, the controller performs global optimization route calculation.

When the first-class connection queue is idle, namely the centralized controller is idle, tasks to be calculated in the second-class connection queue array are added into the first-class connection queue, idle calculation nodes are used for assisting in processing the tasks to be calculated in the distributed control plane nodes under the condition of no waiting, and the total waiting time of connection path calculation tasks is reduced. When the connection path is calculated, the deeper the path depth, the more the number of the network link connections involved, the more likely the path collision will occur, and the path collision can be reduced by using the centralized controller to calculate the path. Therefore, the to-be-processed tasks with deeper path depth are preferentially selected to be added into the first-class connection queue, and the centralized controller is used for calculation, so that the reduction of calculation efficiency caused by recalculation is reduced, and the overall fault service recovery efficiency is further improved.

Step 208: and the distributed control flat nodes report the calculation results of the path connection to the centralized controller, and the centralized controller verifies the calculation results of the distributed control flat nodes.

Due to the parallel computing characteristics of the distributed leveling nodes and the dynamic random nature of the network resources, the connection paths computed by different distributed leveling nodes may generate conflicts. In order to avoid the collision of the path connections calculated by different distributed leveling nodes, it is necessary to perform collision check on the path connection calculation results of all the distributed leveling nodes by using the centralized controller.

Step 209: and judging whether the connection path reported by the distributed control level node has path conflict. If there is a path conflict, go to step 211; if there is no path conflict, go to step 212.

And the centralized controller checks the path connection calculation results reported by all the distributed nodes according to the whole network topology information base and the whole network service connection information base, and judges whether path conflicts exist.

Step 210: the centralized controller resolves the reason for the failure of the verification into a rerouting constraint condition, adds the rerouting constraint condition to the path information to be calculated, and goes to step 205 to perform recalculation.

After the path connection check fails, the centralized controller analyzes the part with the conflict, and generates a constraint condition according to the conflict reason, such as that a certain network node or network connection is occupied, and the like, so that the conflict caused again by the same conflict reason during recalculation is avoided. Specifically, when the connection path is verified by the centralized controller, the unavailable network resources in the connection path calculated by the distributed leveling nodes are obtained, when the connection path is re-calculated next time, the unavailable network resources are taken as exclusion conditions and brought into the calculation constraint to be added into the calculation task, and then re-calculation is performed, or the unavailable network resources are distributed to the corresponding distributed leveling nodes for re-calculation.

Step 211: and performing rerouting and fault service recovery by using the calculation result of the successfully verified path connection.

And the centralized controller and the distributed control level nodes perform path connection calculation of fault service recovery, and after no conflict is verified, the calculation result can be used for rerouting to complete fault service recovery.

Through thesteps 201 to 211, the task management optimization, the calculation efficiency optimization and the calculation error processing in the service fault recovery are further realized, the fault service recovery efficiency is further improved, and the task management difficulty is reduced.

By the method for recovering the optical network service failure provided by the embodiment, the controller schedules distributed control plane computing resources to synchronously complete the calculation of the connection path of the rerouting of the whole network, fully utilizes the capacity of the whole network computing path, eliminates the problem of resource conflict of parallel computing of multiple connection paths, and improves the recovery performance of the network failure.

Example 2:

based on the method for recovering from the optical network service failure provided inembodiment 1, in a specific implementation scenario, the following process may be used to perform specific application, and control the connection recovery of the rerouting of multiple failed services.

Fig. 5 is a diagram showing a network topology when a network link used in the present embodiment is not failed. The network topology diagram includes NE1-NE5 five Reconfigurable Optical Add-Drop Multiplexer (ROADM for short) nodes, eight links (two numbers in the link label of this embodiment represent the serial numbers of the nodes at both ends of the link respectively), and five traffic LSPs 1-LSP5 with the type of rerouting protection restoration. Each service connection name includes a number passing through a routing node and an occupied optical channel, such as LSP4(1-2-5, CH3), which indicates that the service connection sequence passes through NE1, NE2, NE5 nodes and L12, L25 links, and uses the 3 rd channel. The two service connections of the LSP4(1-2-5, CH3) and LSP5(3-4, CH1) are associated with each other, that is, the two non-homologous service rerouting connection paths are separated from each other in node and link, and the nodes and links passed by the two service rerouting connection paths are different as much as possible. According to step 101, each of the distributed control nodes reports the data to the centralized controller, and the centralized controller integrates all the data to establish a full-network topology information base and a full-network service connection information base.

As shown in fig. 6, when a link L12 fails, according tostep 102, the centralized controller receives the failure information reported directly by the link or reported by the distributed control node, receives the failure report of the distributed control node where the L12 link or L12 is located, traverses four failure services of LSP1 to LSP4 passing through L12, and creates four rerouting connections corresponding to LSP1', LSP2', LSP3', and LSP4', respectively, which need to be rerouted.

According to thestep 102, the calculation type of the rerouting is selected for the four rerouting connections of LSP1', LSP2', LSP3 'and LSP4' according to the type of the failed service, in this embodiment, the calculation type of the rerouting is selected according to whether the failed service connection has a related non-homologous connection, and the tasks allocated to different calculation nodes are stored in the corresponding connection queues according to thestep 203. As shown in fig. 6, LSP4 'is associated with non-homologous connection LSP5, and LSP4' is divided into first-class reroute connections, and placed in a first-class connection queue, and then the centralized controller globally optimizes the centralized route calculation. The LSP1', LSP2' and LSP3 'have no associated non-homologous connection, are divided into a second type of rerouting connection, and are put into a second type of connection queue array corresponding to each distributed tie node to request the tie nodes of the service source nodes NE1, NE3 and NE4 to calculate connection paths of LSP1', LSP2 'and LSP3' in parallel.

According to step 103, the distributed centralized controller or the distributed leveling node calculates the rerouted connection path. According to thestep 204 and thestep 205, the centralized controller and the distributed leveling nodes respectively calculate the tasks allocated to the corresponding connection queues. And taking out the LSP4' connection from the head of the first-type connection queue, and carrying out global optimization route calculation by the centralized controller. And taking out the LSP1', LSP2' and LSP3 'from the head of the second type connection queue array, respectively initiating a routing calculation request to the distributed control nodes of the corresponding service source nodes NE1, NE3 and NE4, and calculating the paths of LSP1', LSP2 'and LSP3' by the corresponding distributed control nodes.

The centralized controller calculates LSP4 'path LSP4' (1-5, CH1), the L15 link allocates the 1 st channel CH1, the centralized controller verifies that the channel of link CH1 is available, and the path calculation is successful. According to step 103, rerouting the fault service successfully calculated by the path connection according to the calculation result. And (3) issuing an LSP4 '(1-5, CH1) path to an NE1 leveling node, wherein the LSP4' connection is successfully established, and the LSP4 is successfully deleted.

According to step 209, step 210 and step 211, the centralized controller receives a distributed leveling node return path LSP1' (1-5-2, CH1) of the leveling of NE1, the link of L15 and L25 allocates the 1 st channel CH1, the centralized controller detects that the channel of link CH1 is unavailable, the path check of LSP1' (1-5-2, CH1) fails, and puts LSP1' into the tail of the node queue of the second type connection queue array NE1 for recalculation. Further, according tostep 211, before performing the path recalculation, a constraint condition that the CH1 channel is not usable is added to avoid generating the path recalculation failure again.

And in the same way, the calculation result of each distributed control level node is verified, rerouting is completed according to the path successfully verified, and the path failed in verification is sent back to the corresponding distributed control level node for recalculation. The centralized controller receives NE4 to level a node return path LSP3' (4-5-1, CH1), the link of L45 and L15 allocates a 1 st channel CH1, the centralized controller detects that the channel of the link CH1 is unavailable, the path check of LSP3' (4-5-1, CH1) fails, and the LSP3' is placed at the tail of a node queue of a NE4 of a second type connection queue. The centralized controller receives a node return path LSP2' (3-5-2, CH1) controlled by the NE3, a 1 st channel CH1 is allocated to links of the L35 and the L25, the centralized controller verifies that connection path resources are available, path calculation is successful, the NE3 node is controlled to send an LSP2' (3-5-2, CH1) path, LSP2' connection is established successfully, and LSP2 is deleted successfully.

Through the steps, the calculation of the LSP4' task in the first-type connection path is completed, and the first-type connection queue is empty. And the LSP1 'and the LSP3' fail in checking and are respectively retransmitted back to the second-class connection queues corresponding to the NE1 node and the NE4 node.

According to step 207 and step 208, when the first type connection queue is empty, adding the task in the second type connection queue into the first type connection queue, processing by the idle centralized controller, and rerouting after the path calculation is successful. Since the LSP1 'is deeper than the path depth of LSP3', the centralized controller takes out the LSP1 'connection from the tail of the second type of connection queue corresponding to the node of NE1, modifies LSP1' to be the first type of connection, and puts it into the first type of connection queue. Further, in order to perform parallel computation to reduce latency, the centralized controller may fetch an LSP3 'connection from the head of the second type connection queue array in parallel while fetching and globally optimizing the LSP1' path from the first type connection queue, and initiate a routing request to the NE4 node, so as to keep all the compute nodes in a busy state, and avoid the wait of compute tasks due to untimely task scheduling. The centralized controller calculates LSP1 'path LSP1' (1-5-2, CH2), the link of L15 and L25 allocates the 2 nd channel CH2, the centralized controller verifies that the path resource is available, the path calculation is successful, the LSP1'(1-5-2, CH2) path is issued to the NE1 leveling node, the connection establishment of LSP1' is successful, and the deletion of LSP1 is successful. The centralized controller receives a return path LSP3' (4-5-1, CH3) of an NE4 leveling node, a 3 rd channel CH3 is allocated to links of L45 and L15, the centralized controller verifies that path resources are available, path calculation is successful, the NE4 leveling node is issued with an LSP3' (4-5-1, CH3) path, LSP4' connection is successfully established, LSP4 is successfully deleted,

after each rerouting connection is successfully established and the faulty service connection is deleted, the topology structure and the service connection relationship of the network are changed from the topology connection relationship and the service connection relationship shown in fig. 5 to the topology connection relationship and the service connection relationship shown in fig. 6. According to step 104, the centralized controller further needs to update the full-network topology information base and the full-network service connection information base according to the connection state after rerouting reported by the link or the distributed leveling node, so as to perform calculation of fault recovery by using correct topology information and service connection information during subsequent calculation.

In the network topology and the service connection relationship provided in this embodiment, if only the centralized controller technology is used to perform failure recovery, the centralized controller needs to calculate the 4 rerouted path connections of the LSPs 1-4, and needs to spend 4 calculation time units to complete calculation, which may cause long waiting time of the LSPs 2-4 and affect the efficiency of failure service recovery. If only the distributed control plane technology is used, since the LSP4' is associated with the non-homologous connection LSP5, it may not be possible to complete the calculation using a single distributed control node or a collision is easily generated, and the LSP1' and the LSP3' also generate a collision and need recalculation, which affects the efficiency of recovering the failed traffic. If the method for restoring the optical network service failure provided inembodiment 1 is adopted to restore the failure service according to the manner in this embodiment, on one hand, the first calculation of all the path connections can be completed by using only 1 calculation time unit through parallel calculation, and even if a conflict occurs and recalculation is required, the calculation of all the path connections can be completed by only 2 calculation time units, thereby improving the calculation efficiency; on the other hand, the LSP4' associated with the non-homologous connection is distributed to the centralized controller for calculation, and path calculation is completed through the complete information of the global topology information base and the global service connection base, so that the generation of conflict is avoided, and the recalculation times are reduced. Therefore, the method for restoring the optical network service failure provided by theembodiment 1 and the embodiment has advantages over two existing control methods, avoids the problems in the existing control methods, reasonably schedules network distributed computing resources, cooperatively controls the rerouting restoration of the network failure service, improves the network failure service restoration performance, and meets the requirement of quick self-healing intelligent control of the optical network failure.

Example 3:

on the basis of the methods for recovering from an optical network service failure provided in

embodiments

1 and 2, this embodiment also provides a system for recovering from an optical network service failure. The system comprises acentralized controller subsystem 1 and at least one distributed controllevel node subsystem 2, wherein thecentralized controller subsystem 1 and each distributed controllevel node subsystem 2 carry out message interaction through a PCEP interface. Thecentralized controller subsystem 1 is a PCEP server, and each distributed controllevel node subsystem 2 is a PCEP client, and collectively performs the function of recovering the optical network service failure corresponding to the centralized controller and the distributed control level nodes in

embodiments

1 and 2.

As shown in fig. 7, a system structure diagram provided in the present embodiment is provided.

Thecentralized controller subsystem 1 mainly completes functions executed by the centralized controller in

embodiments

1 and 2, and completes corresponding functions of communication connection, information interaction, data storage, and the like. And calculating a rerouting connection path according to the whole network topology information base and the whole network service connection information base, establishing rerouting connection, and realizing whole network fault service recovery. Thecentralized controller subsystem 1 includes the following sub-modules.

The whole network serviceconnection management module 11 issues the faulty service connection allocated to the distributed control plane node instep 103 to each equipment node ASON control plane subsystem, synchronizes service connection data, creates a service connection information base, stores whole network service connection information, and provides an API call interface for an external module to access the service connection information.

Thetopology information base 12 of the whole network receives the report of the TE link state, creates the topology information base according to step 101, stores the topology information of the whole network, provides an API call interface power external module to access the TE topology information so as to facilitate the path connection calculation between the centralized controller and the distributed control node, notifies the network service connection control module of the link failure, and updates the topology information when the network connection state changes according tostep 104.

And the global optimizationpath calculation module 13 receives the network service connection control module routing request, calculates the fault service connection path allocated to the centralized controller based on the full-network topology information base and the full-network service connection information according tostep 103, and returns a path calculation result.

The rerouting connection calculationclassification management module 14 classifies the fault traffic requiring rerouting connection according tostep 102, and provides an API interface for viewing and modifying rerouting connection type data for an external module.

The network serviceconnection control module 15 receives the control level link fault notification according tostep 102, and allocates the fault service to the global optimizedpath calculation module 13 for calculation according to the calculation type or distributes the fault service to the distributedcontrol level subsystem 2 for cooperative calculation according tostep 103.

The distributedcontrol plane subsystem 2 mainly completes functions executed by the distributed control plane nodes in

embodiments

1 and 2, and completes corresponding functions of communication connection, information interaction, data storage and the like. And calculating a local rerouting connection path in the control level according to the whole network topology information base and the whole network service connection information base, establishing rerouting connection, and realizing local fault service recovery. The distributedcontrol plane subsystem 2 includes the following sub-modules.

The local serviceconnection management module 21 manages service connections of which the local nodes are source nodes, receives service connections distributed to distributed leveling nodes of the local leveling by thecentralized controller subsystem 1 according to step 103, issues connection configurations to devices, reports local service connection states to thecentralized controller subsystem 1 according to step 101, and provides an API call interface for an external module to access local service connection information.

The networktopology management module 22 receives the report of the local network resource, advertises the TE link information with the distributed control nodes of each other distributedcontrol subsystem 2, synchronizes and stores the data of the whole network topology information base according to step 104, provides an API call interface for the external module to access the TE topology information, and reports the fault link state to thecentralized controller subsystem 1 according tostep 102.

The localpath calculation module 23 receives the route calculation request issued by the local serviceconnection control module 21, calculates the connection path of the failed service based on the topology information base of the entire network according to step 103, and returns the path calculation result.

The local serviceconnection control module 24 receives the reroute path connection calculation request distributed by thecentralized controller subsystem 1, schedules the localpath calculation module 23 to perform path connection calculation, and returns a path connection calculation result to thecentralized controller subsystem 1.

The localresource management module 25 receives the device failure alarm reported by the link instep 102, and notifies the whole network topology information base of updating and synchronizing the resource change of the link.

Thesignaling module 26 issues the service connection configuration of the local control level to the devices in the local control level, interacts ASON signaling messages with upstream and downstream nodes of the service connection path, and issues the local device configuration according to the connection route, thereby completing the rerouting function instep 103.

Thecentralized controller subsystem 1 and the distributed levelingnode subsystem 2 may be implemented by using existing network equipment, and load functional modules corresponding to the respective sub-modules to complete functions required by the respective sub-modules. In specific implementation, according to actual needs, each sub-module may be integrated in the same hardware device, or may be distributed in multiple hardware devices capable of performing data, control signal and signaling interaction.

The method for recovering the optical network service failure provided in

embodiments

1 and 2 can be completed by the system for recovering the optical network service failure provided in this embodiment, where thecentralized controller subsystem 1 centrally schedules network distributed computing resources, and cooperatively controls each distributed levelingnode subsystem 2 to complete the rerouting recovery of the network failure service in parallel, so that the performance of recovering the network failure service is improved, and the requirement of fast self-healing intelligent control of the optical network failure is met.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for recovering from a service failure in an optical network, comprising:

the centralized controller synchronously establishes a whole-network topology information base and a whole-network service connection information base according to whole-network topology information and service connection information reported by all distributed control level nodes in a network;

receiving network fault link information report, determining a calculation type of a rerouting according to a fault service type, distributing an integrated controller or distributed control leveling nodes to calculate a connection path of the rerouting according to the calculation type of the rerouting, establishing a first type connection queue for the integrated controller, putting the rerouting connection required to be calculated by the integrated controller into the first type connection queue, establishing a second type connection queue for each distributed control leveling, and putting the rerouting connection required to be calculated by the distributed control leveling nodes into respective corresponding second type connection queues, wherein the rerouting connections in the first type connection queue and the second type connection queues are ordered according to a fault service report sequence and/or a fault service priority;

the centralized controller generates a rerouting connection path calculation task corresponding to each fault service, the centralized controller sequentially distributes calculation tasks in a first type of connection queue to the centralized controller for processing according to the calculation type of rerouting, sequentially distributes calculation tasks in a second type of connection queue to the distributed control nodes for processing, and the centralized controller sends a rerouting path to the control nodes according to the calculation result;

and updating the whole network topology information base and the whole network service connection information base according to the connection state after rerouting.

2. The method for recovering from an optical network service failure according to claim 1, wherein the calculation type of the rerouting specifically includes:

the centralized controller calculates a connection path of the rerouting based on the whole network topology information base and the whole network service connection information base;

and/or the centralized controller sends a calculation request to a control plane where the fault node is located, distributed calculation is carried out on the rerouting by distributed control level nodes in the control plane based on a full-network topology information base, and a rerouting result is reported to the centralized controller for verification.

3. The method for recovering from an optical network service failure according to claim 1, wherein the rerouting according to the calculation result specifically includes:

and issuing a connection path of the rerouting to a control plane where the fault node is located, controlling the corresponding node by the distributed control level node by using an ASON signaling to establish rerouting connection, and performing service recovery on the service before the fault.

4. The method according to claim 1, wherein the centralized controller allocates a calculation task to the centralized controller or the distributed leveling node for processing according to the calculation type of the rerouting, and further comprising:

and when the centralized controller has no calculation task, allocating the deepest task in the connection path in the tasks to be calculated of the distributed control level nodes to the centralized controller for calculation.

5. The method for recovering from an optical network service failure according to claim 2, wherein the reporting the rerouting result to the centralized controller for verification specifically comprises:

the centralized controller checks a rerouting connection path returned by the distributed leveling nodes and judges whether the rerouting connection path is available;

if the connection path of the rerouting is not available, calculating the connection path of the rerouting by the distributed leveling node again after adding the constraint condition;

if the rerouting connection path is available, the rerouting connection path passes the verification, and rerouting is performed on the fault service according to the calculation result.

6. The method for recovering from an optical network service failure according to claim 5, wherein the adding the constraint condition specifically includes:

the centralized controller analyzes the rerouting connection path to obtain the reason of the verification failure;

and generating a constraint condition of the rerouting according to the reason of the check failure, adding the constraint condition of the rerouting in the path information to be calculated, and sending the constraint condition back to the distributed leveling control node.

7. The method for recovering from an optical network service failure according to claim 1, wherein before the synchronously establishing the topology information base and the service connection information base, the method further comprises:

and establishing a PCEP server interface, using the centralized controller as a PCE server, and establishing PCEP session connection with each distributed control level node through the PCEP server interface.

8. The method for recovering from an optical network service failure according to claim 1, wherein the synchronously establishing a full-network topology information base and a full-network service connection information base specifically comprises:

the centralized controller receives link topology information reported by a link or each distributed control level node and stores the link topology information into a whole network topology information base;

the centralized controller receives the local service connection information reported by the link or each distributed control level node and stores the local service connection information into the whole network service connection information base.

9. The method for recovering from an optical network service failure according to claim 1, wherein the receiving of the report of the network failure link information specifically includes:

and the distributed control level node receives the alarm message of the link, generates a link fault message and reports the link fault message to the centralized controller.

10. A system for optical network service failure recovery, comprising:

the method comprises the steps that a centralized controller subsystem and at least one distributed control level node subsystem are used, message interaction is carried out between the centralized controller subsystem and each distributed control level node subsystem through a PCEP interface, and the corresponding optical network service fault recovery method as claimed in any one of claims 1 to 9 is completed.