- Start address—base address ofpool71 inmemory38.
- Size of the descriptors in this pool. The size determines the length of the scatter lists that can be used.
- Total size of the pool, i.e., the maximum number of descriptors that the pool can hold. When the producer or consumer index reaches this value, it wraps back to the base address.
- Owner of the pool (client software or HCA hardware—when the owner is “hardware,” it means that the descriptors in the pool are available for use by the HCA).
- Producer index address—memory location to whichhost24 writes and updates the value of the producer index ofpool71.
- Consumer index.

Eachdescriptor78 comprises one or more scatter entries, each indicating a buffer inmemory38 to whichwrite engine58 should write the data contained in an incoming send message. Preferably, each scatter entry includes a base address and a length of the data that can be written beginning at the base address. In addition,descriptor78 may include other fields used for control and signaling toHCA22. The structure ofdescriptors78 inpool71 is preferably the same as that of the descriptors that are placed in the receive queues of QPs that are not pool members.

FIG. 4 is a flow chart that schematically illustrates a method by whichHCA22 processes incoming send request messages, in accordance with a preferred embodiment of the present invention. The method is initiated whenTCU52 receives a send packet from a remote requester viafabric26, at asend reception step80. After completing the required transport checks, the TCU passes the packet toRDE56 for service. Note that IB send messages may comprise multiple packets, depending on the volume of data carried by the message and the maximum transfer unit (NTU) of the links over which the message travels. In the description that follows, it is assumed that the packet received atstep80 is the first or only packet in the send message. For multi-packet messages, the same descriptor that is fetched and used to scatter the data in the first packet is retained byRDE56 for use in processing the subsequent packets in the message, as well.

Writeengine58 determines the destination QP of the send packet, based on the packet header, and then looks up the context of the QP incache54, at a poolmembership checking step82. As noted above, the context indicates whether or not this QP belongs to a descriptor pool. If the QP is not a pool member, then in order to receive a send message, there must be a WQE available in the specific receive queue of this QP. The write engine reads the WQE address from the QP context and then fetches the WQE from the receive queue, at adescriptor fetching step84. It then processes the send message in the usual way, as provided by the IB specification.

Ifwrite engine58 determines atstep82 that the destination QP does, in fact, belong to a receive descriptor pool, it reads the number of the pool from the QP context, at a poolnumber reading step86. It uses this number to find the information necessary to accessdescriptor pool71 to which this QP belongs, at aninformation lookup step88. This information is typically contained inentry76 in table74 (FIG. 3), which is indexed bypool number72. Additionally, in order to accessdescriptor pool71 inmemory38, the write engine may need an access key, as is known in the art. This key is typically held in the QP context, and is preferably the same for all QPs belonging to the pool.

Using the information inentry76, writeengine58 reads the consumer index (CI) and producer index (PI) ofdescriptor pool71, at anindex checking step90. If the values of these indices are equal, it means that alldescriptors78 inpool71 have already been used. Without a valid descriptor, the write engine is unable to process the current send packet. Under these circumstances, the send packet is typically discarded. If the send packet was sent on a reliable service, writeengine58 instructs a send data engine (not shown) inHCA22 to return a RNR NACK packet to the sender, at aNACK return step92. The sender may subsequently resend the packet. Meanwhile, in such a case, the write engine preferably triggers an event, at anevent submission step94, which is placed in an event queue to be read byhost24. Optionally, an interrupt may be generated, as well, to prompt the host to service the event queue. When the host reads the event, it will be alerted to the fact thatdescriptors78 inpool71 have been exhausted. The host software should then generate new descriptors to replenish the pool.

As long as the values of CI and PI are not equal, writeengine58 readsdescriptor78 from the head of the circular buffer inpool71, at the location indicated by the CI, at adescriptor reading step96. It increments the CI to point to the next descriptor in the pool, at anindex incrementation step98. The write engine then uses the scatter list provided by the descriptor it has read in processing the send packet data, at apacket processing step100. To perform this processing, the write engine reads the first scatter entry from the scatter list indescriptor78, which points to the first buffer to receive the data inmemory38. The write engine pushes the data from the packet to this first buffer, until the buffer is filled. It then reads the next scatter entry, and continues pushing the data to the location that this scatter entry indicates. For multipacket send messages, as long asHCA22 continues to receive additional packets in the same message, the write engine proceeds through the scatter list entries of the descriptor it has read from the pool, until the message is completed.

Upon completion of an incoming send message, writeengine58 instructscompletion reporter60 to generate a completion queue element (CQE), at aCQE generation step102. The completion reporter places the CQE in a completion queue inmemory38, to be read by client software onhost24. Optionally, an event or interrupt may also be generated to notify the host that there are new data inmemory38 waiting to be read. Preferably, the CQE indicates the QP on which the incoming send message was received and includes a pointer to thedescriptor78 inpool71 that was used in processing the message that has now been completed.Host24 reads the scatter list from the descriptor in order to determine the location of the data to be read inmemory38. Once the host has read the data, the descriptor is no longer needed and can be overwritten by a new descriptor.

As noted above, for send messages using reliable connection services, the IB specification provides a flow control mechanism based on end-to-end credits. Typically, each credit represents one WQE posted to the receive queue of the responding QP. A QP that draws its WQEs from a shared descriptor pool, however, has no WQEs posted to its receive queue. Instead, these QPs may send credits to the corresponding requester based on the number ofdescriptors78 posted to pool71 (preferably a smaller number of credits on each QP than there are actual descriptors in the pool). As long as an appropriate statistical relationship is maintained between the number of credits and the number of descriptors in the pool, there will usually be a descriptor available to handle each send message that arrives. Alternatively, even if the QPs belonging to pool71 do not send credits to their corresponding requesters, or if a requester exhausts its credits, the requester may still transmit send packets in limited mode, as described in section 9.7.7.2.5 of the IB specification.

Although preferred embodiments are described herein with specific reference to IB terminology and conventions, the principles of the present invention may similarly be applied to handling of data “push” operations and message transfers using channel semantics in networks of other types. For example, the methods described hereinabove can be used in protocol bridge applications, in which multiple connections on a first network are served by a single sink to a second network. In this manner multiple hosts on the first network (for instance, on an IB fabric) can be connected to a converter that channels their traffic to the second network (such as an Ethernet network). by means of this mechanism, the amount of memory required by the protocol bridge is substantially reduced.

It will thus be appreciated that the preferred embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.