BACKGROUND OF THE INVENTION1. Technical Field
The present invention relates to network switching and more particularly, to methods and systems controlling network data traffic on full-duplex media in switched networks.
2. Background Art
Switched local area networks use a network switch for supplying data frames between network nodes such as network stations, routers, etc., where each network node is connected to the network switch by a media. The switched local area network architecture uses a media access control (MAC) layer enabling a network interface to access the media. The network switch passes data frames received from a transmitting node to a destination node based on the header information in the received data frame.
A network switch such as a workgroup switch typically includes port buffering at both input and output buffers. Specifically, a non-blocking switch typically includes ports having input buffers and output buffers such as first in first out (FIFO) buffers, that are sized to accommodate the transfer of data between a source and destination port at wire speed. However, congestion of an output buffer may occur if multiple data packets from multiple input buffers are directed to a single output buffer. Hence, an output buffer may be unable to keep up with reception of data packets from multiple input buffers within the switch.
Flow control has been proposed to reduce network congestion, where a transmitting node temporarily suspends transmission of data packets. A proposed flow control arrangement for a full-duplex environment, referred to as IEEE 802.3x[2] specifies generation of a flow control message, for example a PAUSE frame. A transmitting station that receives the PAUSE frame enters a pause state in which no data frames are sent on the network for a time interval specified in the PAUSE frame. However, control frames can be sent during the pause state.
If flow control is implemented in a switch, however, the transmission of data packets from the respective network nodes is effectively halted until the output congestion eases. One problem associated with implementing flow control in full-duplex links is head of line (HOL) blocking, where a transmitting node sending data packets to the workgroup switch may be forced into a pause interval, even though the transmitting node is attempting to send data packets to a destination node via a network switch port other than the congested network switch port. In addition, outputting flow control PAUSE frames to all the network nodes may unnecessarily reduce network traffic while the congested output buffer is outputting the data frames. Hence, the conventional policy for generation of PAUSE control frames may substantially reduce the throughput of the network unnecessarily.
SUMMARY OF THE INVENTIONThere is a need for an arrangement in a network switch for selectively generating a pause control frame from a network switch port to a corresponding network node, where the pause control frame specifies a pause interval having a duration based upon the traffic contribution by the network node relative to the total network traffic received by a congested network switch port.
There is also a need for an arrangement that controls congestion in a transmit buffer of a network switch port, where a pause control frame is selectively output from one of the remaining network switch ports based on a corresponding network node providing the maximum traffic contribution relative to the total network traffic received by the congested network switch port.
There is also a need for an arrangement in a network switch for controlling a detected congestion condition in a transmit buffer of one of the network switch ports, where head of line blocking is minimized in network nodes transmitting data packets to network switch ports other than the congested network switch port.
These and other needs are attained by the present invention, where a network switch outputs a pause control frame to at least one network node to eliminate a congestion condition detected in a transmit buffer in one of the network switch ports, where the pause control frame specifies a pause interval based on the corresponding traffic contribution by the one network node.
According to one aspect of the present invention, a method in a network switch having network switch ports comprises detecting a congestion condition in a transmit buffer of one of the network switch ports, determining for each remaining network switch port a traffic contribution relative to a total network traffic received by the one congested network switch port, and outputting from at least one of the remaining network switch ports, to a corresponding network node, a first pause control frame specifying a first pause interval having a duration based on the corresponding traffic contribution. Outputting the pause control frame to at least one of the remaining network switch ports based on the corresponding traffic contribution optimizes elimination of the congestion condition in the transmit buffer by prioritizing the generation of the pause control frame for the network node most responsible for creating the congestion condition. Moreover, the generation of a pause control frame specifying a pause interval duration based on the corresponding traffic contribution ensures that the pause interval is minimized in stations having little or no traffic contribution to the congested network switch port.
Another aspect of the present invention provides a network switch comprising network switch ports for sending and receiving data packets between respective network nodes, each network switch port comprising an input buffer for receiving a received data packet from a corresponding network node and an output buffer for transmitting a switched data packet to the corresponding network node, a first monitor configured for detecting a congestion condition in the output buffer of one of the network switch ports, a data traffic monitor configured for determining a traffic contribution, relative to a total network traffic received by the one congested network switch port, for each of the remaining network switch ports, and a controller for generating a pause control frame, the pause control frame output from the output buffer of at least one of the remaining network switch ports and specifying a pause interval having a duration based on the corresponding traffic contribution. The data traffic monitor is able to determine the flow of data traffic between the input buffers and output buffers of the network switch, enabling the network switch to identify which input buffer (and corresponding network node) is most responsible for creating a congestion condition in a network switch output buffer. In addition, generation of the pause control frame by the controller based on the traffic contributions detected by the data traffic monitor ensures that the congestion condition is efficiently eliminated by generating a pause frame to the network node most responsible for the congestion condition, while minimizing the pause intervals in other network nodes providing little or no contribution to the detected congestion condition.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGSReference is made to the attached drawings, wherein elements having the same reference numeral designations represent like elements throughout and wherein:
FIG. 1 is a block diagram of the network switch for selectively generating pause control frames according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating the memory in the data traffic monitor for tracking network traffic on a port by port basis.
FIGS. 3A and 3B are flow diagrams illustrating methods for selectively generating pause control frames according to first and second embodiments of the present invention, respectively.
BEST MODE FOR CARRYING OUT THE INVENTIONFIG. 1 is a block diagram of a packet switchednetwork 10, such as an Ethernet (ANSI/IEEE 802.3) network. The packet switchednetwork 10 includes amultiple port switch 12 that enables communication of data packets between network nodes 14. According to the disclosed embodiment, thenetwork switch 12 includes network switch ports 16 including an input buffer 18 having a receive buffer for receiving a received data packet from a corresponding network node 14. Each network switch port 16 also includes an output buffer 20 for transmitting a switched data packet to the corresponding network node 14. Thenetwork switch 12 also includes switching logic 22, also referred to as a switch fabric for selectively switching a received data packet from an input buffer 18 to a specific output buffer 20 based on address information in the received data frame, and switching logic that makes switching decisions based on the detected address information.
The network switch ports 16 and the network nodes 14 preferably operate in full-duplex mode according to the proposed Ethernet standard IEEE 802.3x Full-Duplex with Flow Control--Working Draft (0.3). The full-duplex environment provides a two-way, point to point communication link between each network node 14 and themultiple port switch 12, where the multiple port switch 12 and the respective nodes 14 can simultaneously transmit and receive data packets at 100 Mb/s or Gigabit rates without collisions. The network nodes 14 may be implemented as workstations, servers, or routers for connection to other networks.
The switching logic 22 determines which output buffers 20 should transmit a data packet received by one of the input buffers 18 based on internal switching logic. The switching logic 22 may output a given data packet received by an input buffer 18 to either a single port, multiple ports, or all ports (i.e., broadcast). For example, each data packet includes a header having a source and destination address, where the switching logic 22 may identify the appropriate output buffer 20 based upon the destination address. Alternatively, the destination address may correspond to a virtual address that the switching logic 22 identifies as corresponding to a plurality of network nodes 14. Alternatively, the received data packet may include a virtual LAN (VLAN) tagged frame according to IEEE 801d protocol that specifies another network (via a router at one of the 100 Mb/s nodes 14) or a prescribed group of workstations.
Thenetwork switch 12 also includes acongestion monitor 24 configured for detecting a congestion condition in one of the output buffers 20. Specifically, thecongestion monitor 24 includesprogrammable registers 26 and 28 for storing a first threshold (TI) and a second threshold (T2), respectively. As described in detail below, a congestion condition is detected in a transmit buffer 20 of one of the network switch ports 16 if the stored number of bytes in the corresponding transmit buffer is detected as exceeding one of the thresholds T1 and T2, where T1 is less than T2.
As described above, one problem with employing flow control in full-duplex links is head of line blocking. Since one input buffer may include data packets to be directed by the switching logic 22 to different output buffers 20, issuing a PAUSE frame having a substantially long pause interval to a network node 14 may result in traffic starvation on other output buffers 20.
According to the disclosed embodiment, a PAUSE frame is sent to at least one network node 14i that specifies a pause interval having a duration based on the corresponding traffic contribution by that network node 14i in creating the congestion condition. For example, assume that thenetwork switch 12 is a non-blocking switch, where data packets are switched between ports 16 at wire speed (e.g., 100 Mb/s). Assume that thecongestion monitor 24 detects a congestion condition in the transmit buffer 204, where the stored number of bytes in the transmit buffer for output buffer 204 exceeds the lower threshold T1. The output buffer 204 may be congested, for example, by input buffers 181, 182, and 183 transmitting packet data to the output buffer 204. Upon detecting the congestion condition in the output buffer 204, thenetwork switch 12 outputs PAUSE frames from the remaining network switch ports having created the congestion condition (i.e., ports 161, 162, and 163) with a pause time that is proportional to the traffic flow between the ports.
Specifically, thenetwork switch 12 includes adata traffic monitor 30 configured for monitoring network traffic throughout theswitch 12, including the routing of a received data packet from a given input buffer 18i to at least one output buffer 20i. The data monitor 30 monitors the network traffic through the switching logic 22, for example, by receiving management information base (MIB) objects from the switching logic 22 for each data packet routed through the switching logic 22. As such, the data monitor 30 monitors for each output buffer 20 the total network traffic received by that output buffer 20, as well as the relative contribution by each input buffer 18 to the total network traffic received by the one network switch port 20.
FIG. 2 is a diagram illustrating amemory 40 in the data monitor 30 for storing packet data traffic information based on data packets switched by the switching logic 22. As shown in FIG. 2, thememory 40 is configured as a matrix, referred to as a Traffic Table, that stores for each output buffer 20 the contributions from the input buffers relative to the corresponding received network traffic. For example, the output buffer 201 corresponding tocolumn 1, receives forty percent (40%) of the total traffic from input buffer 182, forty percent (40%) from input buffer 183, and twenty percent (20%) from input buffer 204. Hence, input buffer 182 provides a 40% traffic contribution to output buffer 201, input buffer 183 provides a 40% traffic contribution to output buffer 201, and input buffer 184 provides a 20% traffic contribution to the output buffer 181. Hence, the data monitor 30 identifies, for each destination network switch port 20, the traffic contributions from the respective source ports 18 relative to the total network traffic received by the destination network switch port 20, and stores for each destination port 20 the identified traffic contributions for the source port in terms of percentage of total traffic. Hence, the data monitormodule 30 can determine the relative traffic contribution by each input buffer 18 in causing a congestion condition in one of the output buffers 20. The traffic contributions stored in thememory 40 thus enable generation of pause frames having different durations based on the corresponding traffic contribution, as well as enabling a priority-based pause interval generation scheme, where a single PAUSE frame interval is sent to the network node providing the highest amount of traffic, or to all the remaining network nodes based on the relative traffic contribution, described below.
FIGS. 3A and 3B are flow diagrams illustrating alternative arrangements for controlling detected congestion conditions in thenetwork switch 12 according to first and second embodiments of the present invention, respectively. FIG. 3A describes a method for controlling congestion detected in a network switch port, where the remaining network switch ports (i.e., the network switch ports supplying data to the congested network output buffer 20) output PAUSE control frames to the respective network nodes 14, each PAUSE control frame having a corresponding pause interval for the corresponding network node 14 based on the corresponding traffic contribution specified in the traffic table 40 of FIG. 2.
As shown in FIG. 3A, the method begins in step 42 by setting a threshold T1 in theprogrammable register 26 in step 42. The data monitor 30 then begins to monitor traffic contributions instep 44 by monitoring the routing of data packets via the switching logic 22. Specifically, the switchinglogic 44 supplies an input packet from an input buffer 18i to at least one destination output buffer instep 44. The data monitor 30 then updates the traffic table 40 instep 46 by populating the tables for each output buffer following supplying the input packet instep 44.
The congestion monitor 24 monitors the stored number of bytes in each of the transmit buffers 20i, and detects if the stored number of bytes in each transmit buffer 20i exceeds the congestion threshold T1 instep 48. The congestion monitor 24 may check each destination output buffer 20i after each switched packet (e.g., on a per-packet basis), or alternatively operate as a state machine that continuously monitors the output buffer 20i. If none of the transmit buffers for the output buffer 20i have a stored number of data bytes greater than the threshold T1, then no congestion condition is present. As shown in FIG. 3A, the data monitor 30 continues to monitor the data traffic, and updates the identified traffic contributions, each time a packet is switched (i.e., supplied) to an output buffer, for the respective source ports for each destination port based on the network traffic.
If instep 48 the congestion monitor 24 determines that the stored number of bytes in one of the transmit buffers (e.g., transmit buffer 204) exceeds the prescribed T1 threshold, thepause control 32 generates a PAUSE control frame for each remaining output buffer by determining the pause interval for each of the remaining network switch ports based on the corresponding traffic contribution. Specifically, thepause control 32 accesses the traffic table 40 instep 49 to determine the corresponding traffic contribution, and outputs to each of the remaining network nodes 14 instep 50 the corresponding PAUSE control frame specifying the corresponding pause interval.
For example, assuming that the congestion monitor 24 detected the output buffer 204 as having a congestion condition, thepause controller 32 would accesscolumn 4 of the traffic table 40 to determine therespective traffic contributions 10%, 20%, and 70% for the input buffers 181, 182, and 183, respectively. Thepause controller 32 then generates a PAUSE control frame for each corresponding network node 141, 142, and 143, where the pause interval for node 141 is PAUSE1 =1*Pmin, where Pmin equals the minimum pause interval. According to the disclosed embodiment, the minimum pause interval is one slot time, although alternative minimum pause intervals may be used, and the maximum pause interval is 64 K slot times. Similarly, PAUSE2 =2*Pmin, and PAUSE3 =7*Pmin. Upon calculating the pause durations, the PAUSE control frames carrying the pause intervals PAUSE1, PAUSE2, and PAUSE3 are output via the output buffers to nodes 141, 142, and 143, respectively. Hence, node 143 receives a PAUSE frame specifying a pause interval having the largest duration of 7*Pmin, since the traffic table 40 specifies that the node 143 supplies input buffer 183 with 70% of the traffic encountered by the output buffer 204.
FIG. 3B is a block diagram illustrating an alternative method for controlling congestion, where thresholds T1 and T2 are used to selectively output PAUSE control frames prioritized on the basis of the maximum traffic contribution.
The method of FIG. 3B begins instep 52, where the thresholds T1 and T2 are set inprogrammable registers 26 and 28, respectively. The thresholds T1 and T2 may have threshold values, for example, of 50% and 75% of buffer capacity, respectively. The data monitor 30 then begins to monitor traffic and update the traffic table 40 insteps 44 and 46, respectively. The congestion monitor 24 monitors the buffer capacity of each of the output buffers 20i to determine a first-level congestion condition where the stored number of bytes is greater than the first threshold T1 instep 54. As described above, the congestion monitor 24 may independently monitor the congestion conditions.
Assuming the output buffer 204 has a number of bytes greater than the first threshold T1, thepause controller 32 identifies the input buffer 18i with the maximum traffic contribution by accessing the column of the traffic table 40 corresponding to the congested output buffer 204 instep 55. Thepause controller 32 identifies instep 56 the input buffer 183 as having the maximum traffic contribution (70%) for the output buffer 204, and outputs a PAUSE frame via the output buffer 203 to thenetwork node 143, where the PAUSE control frame specifies a pause interval of Xmax *Pmin, where Xmax =7 (step 58). The congestion monitor 24 checks instep 60 whether the stored number of bytes in the output buffer 20i is greater than the second threshold T2. If the stored number of bytes is less than T2, then normal operations are resumed instep 44.
Hence, the arrangement of FIG. 3B enables prioritized generation of a PAUSE control frame, where a first PAUSE control frame is sent to the identified network switch port having the maximum traffic contribution in causing the detected congestion condition. Hence, the unnecessary generation of PAUSE frames to other network nodes having a minimal contribution to the congestion condition is avoided.
If instep 60 the stored number of bytes in the congested output buffer 20i is greater than the higher threshold T2, then thepause controller 32 outputs to each of the remaining network switch ports (e.g., 201, 202, and 203) a corresponding PAUSE frame to control the congestion of the congested network switch port 204, as described above in FIG. 3A with respect to step 50.
According to the disclosed embodiment, a data monitor module within the network switch monitors the data utilization between ports to provide efficient generation of PAUSE frames without unnecessarily reducing network activity from network nodes that do not provide a substantial contribution to a congestion condition. Rather, a PAUSE control frame is output to at least one network node from a corresponding network switch port, where the PAUSE control frame specifies a pause interval having a duration based on the traffic contribution by the corresponding network node in creating the congestion condition.
While this invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.