BACKGROUND OF THE INVENTION1. Field of the Invention
Embodiments of the present invention relate generally to network communications and more specifically to a system and method for intelligently failing back network connections in a load-balanced networking environment.
2. Description of the Related Art
Performance and reliability are key requirements for modern computer networks. When a network interface card (“NIC”) fails or becomes unreliable and then returns to a fully functional state, the computing device may realize improved network performance by redistributing network connections to the now-functional NIC. More specifically, if a particular NIC in the computing device is or becomes overloaded, network performance may be improved by redistributing network connections between functional NICs in the computing device, including the recovered NIC. However, the overhead related to transferring connections from one NIC to another NIC may exceed the performance benefits of redistributing traffic among the functional NICs within the computing device. In such cases, overall networking performance may be reduced by attempting to redistribute the network connections. Additionally, the transferred connections may overload an efficiently operating NIC, thereby reducing the performance and reliability of that NIC.
As the foregoing illustrates, what is needed in the art is a technique for transferring network connections to one or more functional NICs in a computing device when failing back network connections that reduces the likelihood of NIC overloading or other phenomena that can impair overall system performance.
SUMMARY OF THE INVENTIONOne embodiment of the present invention sets forth a method for failing back network connections to a network interface card (NIC) within a computing device. The method includes the steps of monitoring a failed or unreliable NIC within the computing device, determining that the failed or unreliable NIC has recovered, determining that a functional NIC within the computing device is overloaded, selecting a first connection set communicating through the overloaded NIC, and transferring the first connection set to the recovered NIC.
One advantage of the disclosed method is that, by rehashing connection sets on an overloaded NIC, intelligent decisions can be made regarding whether to fail back a network connection set to a recovered NIC based on the traffic loads on the overloaded NIC and the recovered NIC. Such an approach to balancing network traffic across the functional NICs within a computing device may substantially improve overall performance relative to prior art techniques.
BRIEF DESCRIPTION OF THE DRAWINGSSo that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
FIGS. 1A-1C illustrate a computing device in which one or more aspects of the present invention can be implemented; and
FIGS. 2A-2D illustrate a flowchart of method steps for failing back network connections from an overloaded NIC to one or more functional NICs in a computing device, according to one embodiment of the invention.
DETAILED DESCRIPTIONIntelligent failback of network connections from an overloaded NIC to one or more fully functional NICs may be accomplished by using a hash engine that tracks network statistics, including the number of connections and amount of transmitted and received traffic through each NIC, and a TCP/IP stack that tracks the NIC through which each network connection initially communicates. Once a failed or unreliable NIC recovers to once again operate fully functionally, if another NIC within the system becomes overloaded, some network connections on the overloaded NIC are automatically failed back to the recovered NIC until the previously overloaded NIC is no longer overloaded. Such a transfer allows one or more additional network connections to be handled by an alternative NIC without exceeding the capacity of that NIC, thereby avoiding a connection redistribution paradigm that may reduce the overall performance of the computing device. Transferring the connections from the overloaded NIC may also include unoffloading the connections from the hardware offload engine within the overloaded NIC and subsequently offloading those connections to the hardware offload engine(s) within the recovered NIC to which those connections are transferred.
FIGS. 1A-1C illustrate acomputing device100 in which one or more aspects of the present invention can be implemented. As shown, thecomputing device100 includes amain memory102, amemory controller104, amicroprocessor106, an I/O controller108, and NICs110,111 and116. NIC110 includes amulticast list114 and a hardware offload engine (“HOE”)112. NIC111 includes amulticast list115 and aHOE113. NIC116 includes amulticast list120 and anHOE118.HOEs112,113 and118 include logic configured for processing network frames associated with network connections between thecomputing device100 and one or more remote network computing devices (not shown) that have been selectively offloaded to NICs110,111 and116. By processing network frames with HOEs112,113 and118 (sometimes referred to as “handling connections in hardware”) rather than performing those processing functions in a host software TCP/IP stack (sometimes referred to as “handling connections in software”), as is conventionally done, communications between theNICs110,111 and116 and themicroprocessor106 as well as computations performed by themicroprocessor106 may be substantially reduced.
Thememory controller104 is coupled to themain memory102 and to themicroprocessor106, and the I/O controller108 is coupled to themicroprocessor106 and theNICs110,111 and116. In one embodiment of the invention, themicroprocessor106 transmits commands or data to theNICs110,111 and116 by writing commands or data into the I/O controller108. Once such commands or data are written into the I/O controller108, the I/O controller108 optionally translates the commands or data into a format that the target NIC may understand and communicates the commands or data to the target NIC. Similarly,NICs110,111 and116 transmit commands or data to themicroprocessor106 by writing commands or data into the I/O controller108, and the I/O controller108 optionally translates the commands or data into a format that themicroprocessor106 may understand and communicates the commands or data to themicroprocessor106. The aforementioned couplings may be implemented as memory busses or I/O busses, such as PCI™ busses, or any combination thereof, or may otherwise be implemented in any other technical feasible manner.
As shown in more detail inFIG. 1B, themain memory102 includes anoperating system122 and asoftware driver124. Thesoftware driver124 includes a Load Balancing and Failover (“LBFO”)module126 and a TCP/IP stack130. LBFOmodule126 tracks networking statistics for each NIC (e.g., the number of connections on each NIC, the number of packets sent and received by each NIC) and communicates with the TCP/IP stack130 when network connections are being moved from one NIC to another NIC within thecomputing device100. The LBFOmodule126 includes ahash engine128, which intelligently determines how network connections should be distributed across the different functional NICs in thecomputing device100, based on the aforementioned networking statistics. More details regarding the functionality ofhash engine128 are described in the related U.S. patent application titled, “Intelligent Load Balancing and Failover of Network Traffic,” filed on May 18, 2007 and having Ser. No. 11/750,919. This related patent application is hereby incorporated herein by reference.
As shown in more detail inFIG. 1C, thehash engine128 includes a transmit hash table138 and a receive hash table140. The purpose of the transmit hash table138 is to select a functional NIC within thecomputing device100 for transmitting packets related to a network connection, based on data provided to the transmit hash table138 by theLBFO module126. The transmit hash table138 includes a plurality of hash table entries (e.g., hash table entry134) and a software hash function (not shown). Additionally, each hash table entry includes a table index (e.g., table index132) and a table value (e.g., table value136). The LBFOmodule126 directs thehash engine128 to select a transmit NIC within thecomputing device100 by communicating TCP/IP connection data to thehash engine128, which communicates the TCP/IP connection data to the software hash function in the transmit hash table138. In response, the software hash function selects a table index within the transmit hash table138, based on the values of the TCP/IP connection data. From this selected table index, the transmit hash table138 identifies the corresponding table value, and thehash engine128 communicates the identified table value back to theLBFO module126. Since the design and operation of software hash functions is well known to those skilled in the art, these issues will not be discussed herein. In one embodiment, theLBFO module126 communicates the following four TCP/IP data to the hash engine128: the client internet protocol (“IP”) address, the server IP address, the server TCP port, the client TCP port, and the virtual local area network (“VLAN”) connection ID. In other embodiments, the LBFOmodule126 may communicate any technically feasible TCP/IP parameters to thehash engine128.
The purpose of the receive hash table140 is to select a functional NIC within thecomputing device100 for receiving packets related to a network connection, based on the data provided to the receive hash table140 by the LBFOmodule126. Similar to the transmit hash table138, the receive hash table140 includes a plurality of hash table entries and a software hash function (not shown), and each hash table entry includes a table index and a table value. Again, the LBFOmodule126 directs thehash engine128 to select a receive NIC within thecomputing device100 by communicating TCP/IP connection data to thehash engine128, which communicates the TCP/IP connection data to the software hash function in the receive hash table140. In response, the software hash function selects a table index within the receive hash table140, based on the values of the TCP/IP connection data. From this selected table index, the receive hash table140 identifies the corresponding table value, and thehash engine128 communicates the identified table value back to theLBFO module126. In one embodiment, the TCP/IP data that theLBFO module126 communicates to thehash engine128 includes the server IP address. In other embodiments, theLBFO module126 may communicate any technically feasible TCP/IP data to thehash engine128.
Thecomputing device100 may be a desktop computer, server, laptop computer, palm-sized computer, personal digital assistant, tablet computer, game console, cellular telephone, or any other type of similar device that processes information.
FIGS. 2A-2D illustrate a flowchart of method steps200 for failing back network connections from an overloaded NIC to a recovered NIC, according to one embodiment of the invention. Although the method is described in reference to thecomputing device100, persons skilled in the art will understand that any system configured to perform the method steps, in any order, is within the scope of the present invention.
As shown, the method for failing back network connections begins atstep202, where theLBFO module126 monitors the status of each NIC for an indication that a failed or unreliable NIC has recovered (i.e., that the failed or unreliable NIC is now fully functional). In one embodiment, a NIC is deemed to have recovered when two conditions are present. First, based on the link indication for the failed or unreliable NIC, theLBFO module126 determines that a link connection exists for the failed or unreliable NIC, suggesting that a network cable which may have been previously disconnected has now been reconnected. Second, theLBFO module126 determines that keep-alive packets transmitted between the NICs in thecomputing device100 are being received reliably by the failed or unreliable NIC. As described in the related U.S. patent application titled, “Technique for Identifying a Failed Network Interface Card within a Team of Network Interface Cards,” filed on Dec. 15, 2005 and having Ser. No. 11/303,285, failed or functional NICs within a computing device can be identified based on how each NIC is transmitting and/or receiving keep-alive packets. This related patent application is hereby incorporated herein by reference. By monitoring these two conditions, theLBFO module126 is able to determine whether a failed or unreliable NIC has recovered.
Instep204, theLBFO module126 determines whether a failed or unreliable NIC being monitored instep202 has recovered. If the failed or unreliable NIC has not recovered, then the method returns to step202, where theLBFO module126 continues to monitor the failed or unreliable NIC. If instep204, however, theLBFO module126 determines that the failed or unreliable NIC has recovered, then the method proceeds to step206. For purposes of discussion only, it is assumed that theNIC110 had previously failed or become unreliable and has now recovered, that one or more connection sets that were initially communicating throughNIC110 were transferred toNIC116 whenNIC110 failed, and thatNIC116 is currently overloaded. As used herein, a “connection set” is a plurality of connections that were initially communicating through a common NIC. Importantly, transferring connection sets rather than individual connections to a failback NIC prevents connections for a given MAC address from being assigned to more than one NIC.
Instep206, theLBFO module126 signals the TCP/IP stack130 that theNIC110 has recovered. Instep208, theLBFO module126 signals thehash engine128 that theNIC110 has recovered. Instep209, thehash engine128 configures the transmit hash table138 and the receive hash table140 to enable connections to be assigned again to theNIC110 when thehash engine128 makes decisions regarding how network connections should be distributed across the functional NICs within thecomputing device100.
Instep210, theLBFO module126 monitors the fullyfunctional NICs110,111 and116 to determine whether any of these NICs is overloaded. In one embodiment, a NIC is deemed to be overloaded when the utilization of that NIC, as a percentage of the transmit or receive capacity of the NIC, is above a certain threshold value. In another embodiment, a NIC is deemed to be overloaded when the error rate for the NIC rises above a certain threshold value. In yet another embodiment, a combination of utilization and error rate may be used to determine whether a NIC is overloaded. Instep212, theLBFO module126 determines whether any of theNICs110,111 and116 being monitored instep210 is overloaded, based on the utilization of each NIC and/or the error rate of that NIC. If theLBFO module126 finds that none of the monitored NICs is overloaded, then the method returns to step210, where theLBFO module126 continues monitoring theNICs110,111 and116.
If instep212, however, a NIC is found to be overloaded (e.g., NIC116), then the method proceeds to steps214-238, where a plurality of “connection sets” on theoverloaded NIC116 are “rehashed” to reduce the network traffic on theoverloaded NIC116. Here, rehashing a connection set includes determining the “initial NIC” for the connection set and transferring the connection set to the recoveredNIC110. As used herein, “initial NIC” refers to the NIC through which a connection was originally communicated. Instep214, theLBFO module126 selects a connection set on theoverloaded NIC116 to rehash. Instep216, if necessary, theLBFO module126 unoffloads the selected connection set from theHOE118 to the TCP/IP stack130. As described in the related U.S. patent application titled, “Intelligent Failover in a Load-Balanced Networking Environment,” filed on May 18, 2007 and having Ser. No. 11/750,903, connections may be offloaded or unoffloaded to the hardware offloadengines112,113 and118 within theNICs110,111 and116, respectively. This related patent application is hereby incorporated herein by reference.
Instep217, theLBFO module126 identifies the recovered NIC (in this case, the recovered NIC110) as the new NIC to which the selected connection set should be transferred. Importantly, once theLBFO module126 identifies the recoveredNIC110 as the new NIC for the selected connection set, theLBFO module126 configures itself to intercept packets being communicated from theoperating system122 to a remote computing device (not shown) and rewrites the source MAC address of the intercepted packets to correspond to the MAC address of the recoveredNIC110. Rewriting the source MAC address of the packets of a connection set ensures that the receive traffic for the connection set will be correctly distributed to recoveredNIC110 by the switch. Instep218, theLBFO module126 determines which NIC within thecomputing device100 was the initial NIC for the selected connection set. In one embodiment, the identity of the initial NIC for each connection set is stored in the TCP/IP stack130, allowing theLBFO module126 to query the TCP/IP stack130 for the identity of the initial NIC for any connection set. Instep219, the TCP/IP stack130 directs the recoveredNIC110 to send a learning packet to the network switch (again, not shown). The learning packet may be any technically feasible packet type that includes the MAC address of the initial NIC. As is well-known, sending such a packet from the recoveredNIC110 causes the switch to reconfigure itself to route subsequent packets destined for the MAC address of the initial NIC for the selected connection set (here, the recovered NIC110) to theactual NIC110 and not theoverloaded NIC116. Thus, all network traffic related to the selected connection set being transferred to the recoveredNIC110 is thereafter received by the recoveredNIC110.
Instep220, theLBFO module126 determines whether the initial NIC for the selected connection set was the recoveredNIC110, based on the identity of the initial NIC determined instep218. If theLBFO module126 determines that the initial NIC for the selected connection set was the recovered NIC (in this case, NIC110), then the method proceeds to step222, where thesoftware driver124 removes the MAC address of the initial NIC for the selected connection set from the multicast list of theoverloaded NIC116. Removing this MAC address from the multicast list prevents theoverloaded NIC116 from receiving packets that are being transmitted to the MAC address of the initial NIC (here, NIC110). Instep232, theLBFO module126 optionally offloads the selected connection set to theHOE112 within the recoveredNIC110 if theLBFO module126 determines that the performance benefit from offloading warrants such action.
Instep234, theLBFO module126 determines whether a sufficient number of connection sets on theoverloaded NIC116 have been rehashed such that theNIC116 is no longer overloaded. If theLBFO module126 determines that theNIC116 is no longer overloaded, then the method terminates atstep238. If, however, theNIC116 is still overloaded, then the method proceeds to step236, where theLBFO module126 selects another connection set on theoverloaded NIC116 to rehash before returning to step216.
Returning now to step220, if theLBFO module126 determines that the initial NIC for the selected connection set was not the recoveredNIC110—meaning (i) that that the selected connection set was transferred to theoverloaded NIC116 from a functional NIC other thanNIC110 in a previous load-balancing operation, or (ii) that theoverloaded NIC116 was the initial NIC for the selected connection set—then the method proceeds to step226. Instep226, thesoftware driver124 removes the MAC address of the initial NIC from the multicast list of theoverloaded NIC116, if the selected connection set was transferred to theoverloaded NIC116 from a functional NIC other thanNIC110 in a previous load-balancing operation.
Instep228, thesoftware driver124 adds the MAC address of the initial NIC for the selected connection set to the multicast list of the recoveredNIC110, which allows theNIC110 to receive packets, associated with the selected connection set, that are being transmitted to the MAC address of the initial NIC. The method then proceeds to step232, as set forth above.
One advantage of the disclosed method is that, by rehashing connection sets on an overloaded NIC, intelligent decisions can be made regarding whether to fail back a network connection set to a recovered NIC based on the traffic loads on the overloaded NIC and the recovered NIC. Such an approach to balancing network traffic across the functional NICs within a computing device may substantially improve overall performance relative to prior art techniques.
While the forgoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. For example, aspects of the present invention may be implemented in hardware or software or in a combination of hardware and software. One embodiment of the invention may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. Such computer-readable storage media, when carrying computer-readable instructions that direct the functions of the present invention, are embodiments of the present invention. Therefore, the scope of the present invention is determined by the claims that follow.