InfiniBand and Remote DMA (RDMA) Interfaces

Introduction and Overview

TBD

InfiniBand core interfaces

structiwpm_nlmsg_request*iwpm_get_nlmsg_request(__u32nlmsg_seq,u8nl_client,gfp_tgfp)

Allocate and initialize netlink message request

Parameters

__u32nlmsg_seq

Sequence number of the netlink message

u8nl_client

The index of the netlink client

gfp_tgfp

Indicates how the memory for the request should be allocated

Description

Returns the newly allocated netlink request object if successful,otherwise returns NULL

voidiwpm_free_nlmsg_request(structkref*kref)

Deallocate netlink message request

Parameters

structkref*kref

Holds reference of netlink message request

structiwpm_nlmsg_request*iwpm_find_nlmsg_request(__u32echo_seq)

Find netlink message request in the request list

Parameters

__u32echo_seq

Sequence number of the netlink request to find

Description

Returns the found netlink message request,if not found, returns NULL

intiwpm_wait_complete_req(structiwpm_nlmsg_request*nlmsg_request)

Block while servicing the netlink request

Parameters

structiwpm_nlmsg_request*nlmsg_request

Netlink message request to service

Description

Wakes up, after the request is completed or expiredReturns 0 if the request is complete without error

intiwpm_get_nlmsg_seq(void)

Get the sequence number for a netlink message to send to the port mapper

Parameters

void

no arguments

Description

Returns the sequence number for the netlink message.

voidiwpm_add_remote_info(structiwpm_remote_info*reminfo)

Add remote address info of the connecting peer to the remote info hash table

Parameters

structiwpm_remote_info*reminfo

The remote info to be added

u32iwpm_check_registration(u8nl_client,u32reg)

Check if the client registration matches the given one

Parameters

u8nl_client

The index of the netlink client

u32reg

The given registration type to compare with

Description

Calliwpm_register_pid() to register a clientReturns true if the client registration matches reg,otherwise returns false

voidiwpm_set_registration(u8nl_client,u32reg)

Set the client registration

Parameters

u8nl_client

The index of the netlink client

u32reg

Registration type to set

u32iwpm_get_registration(u8nl_client)

Get the client registration

Parameters

u8nl_client

The index of the netlink client

Description

Returns the client registration type

intiwpm_send_mapinfo(u8nl_client,intiwpm_pid)

Send local and mapped IPv4/IPv6 address info of a client to the user space port mapper

Parameters

u8nl_client

The index of the netlink client

intiwpm_pid

The pid of the user space port mapper

Description

If successful, returns the number of sent mapping info records

intiwpm_mapinfo_available(void)

Check if any mapping info records is available in the hash table

Parameters

void

no arguments

Description

Returns 1 if mapping information is available, otherwise returns 0

intiwpm_compare_sockaddr(structsockaddr_storage*a_sockaddr,structsockaddr_storage*b_sockaddr)

Compare two sockaddr storage structs

Parameters

structsockaddr_storage*a_sockaddr

first sockaddr to compare

structsockaddr_storage*b_sockaddr

second sockaddr to compare

Return

0 if they are holding the same ip/tcp address info,otherwise returns 1

intiwpm_validate_nlmsg_attr(structnlattr*nltb[],intnla_count)

Check for NULL netlink attributes

Parameters

structnlattr*nltb[]

Holds address of each netlink message attributes

intnla_count

Number of netlink message attributes

Description

Returns error if any of the nla_count attributes is NULL

structsk_buff*iwpm_create_nlmsg(u32nl_op,structnlmsghdr**nlh,intnl_client)

Allocate skb and form a netlink message

Parameters

u32nl_op

Netlink message opcode

structnlmsghdr**nlh

Holds address of the netlink message header in skb

intnl_client

The index of the netlink client

Description

Returns the newly allcated skb, or NULL if the tailroom of the skbis insufficient to store the message header and payload

intiwpm_parse_nlmsg(structnetlink_callback*cb,intpolicy_max,conststructnla_policy*nlmsg_policy,structnlattr*nltb[],constchar*msg_type)

Validate and parse the received netlink message

Parameters

structnetlink_callback*cb

Netlink callback structure

intpolicy_max

Maximum attribute type to be expected

conststructnla_policy*nlmsg_policy

Validation policy

structnlattr*nltb[]

Array to store policy_max parsed elements

constchar*msg_type

Type of netlink message

Description

Returns 0 on success or a negative error code

voidiwpm_print_sockaddr(structsockaddr_storage*sockaddr,char*msg)

Print IPv4/IPv6 address and TCP port

Parameters

structsockaddr_storage*sockaddr

Socket address to print

char*msg

Message to print

intiwpm_send_hello(u8nl_client,intiwpm_pid,u16abi_version)

Send hello response to iwpmd

Parameters

u8nl_client

The index of the netlink client

intiwpm_pid

The pid of the user space port mapper

u16abi_version

The kernel’s abi_version

Description

Returns 0 on success or a negative error code

intib_process_cq_direct(structib_cq*cq,intbudget)

process a CQ in caller context

Parameters

structib_cq*cq

CQ to process

intbudget

number of CQEs to poll for

Description

This function is used to process all outstanding CQ entries.It does not offload CQ processing to a different context and doesnot ask for completion interrupts from the HCA.Using direct processing on CQ with non IB_POLL_DIRECT type may triggerconcurrent processing.

Note

do not pass -1 asbudget unless it is guaranteed that the numberof completions that will be processed is small.

structib_cq*__ib_alloc_cq(structib_device*dev,void*private,intnr_cqe,intcomp_vector,enumib_poll_contextpoll_ctx,constchar*caller)

allocate a completion queue

Parameters

structib_device*dev

device to allocate the CQ for

void*private

driver private data, accessible from cq->cq_context

intnr_cqe

number of CQEs to allocate

intcomp_vector

HCA completion vectors for this CQ

enumib_poll_contextpoll_ctx

context to poll the CQ from.

constchar*caller

module owner name.

Description

This is the proper interface to allocate a CQ for in-kernel users. ACQ allocated with this interface will automatically be polled from thespecified context. The ULP must use wr->wr_cqe instead of wr->wr_idto use this CQ abstraction.

structib_cq*__ib_alloc_cq_any(structib_device*dev,void*private,intnr_cqe,enumib_poll_contextpoll_ctx,constchar*caller)

allocate a completion queue

Parameters

structib_device*dev

device to allocate the CQ for

void*private

driver private data, accessible from cq->cq_context

intnr_cqe

number of CQEs to allocate

enumib_poll_contextpoll_ctx

context to poll the CQ from

constchar*caller

module owner name

Description

Attempt to spread ULP Completion Queues over each device’s interruptvectors. A simple best-effort mechanism is used.

voidib_free_cq(structib_cq*cq)

free a completion queue

Parameters

structib_cq*cq

completion queue to free.

structib_cq*ib_cq_pool_get(structib_device*dev,unsignedintnr_cqe,intcomp_vector_hint,enumib_poll_contextpoll_ctx)

Find the least used completion queue that matches a given cpu hint (or least used for wild card affinity) and fits nr_cqe.

Parameters

structib_device*dev

rdma device

unsignedintnr_cqe

number of needed cqe entries

intcomp_vector_hint

completion vector hint (-1) for the driver to assigna comp vector based on internal counter

enumib_poll_contextpoll_ctx

cq polling context

Description

Finds a cq that satisfiescomp_vector_hint andnr_cqe requirements andclaim entries in it for us. In case there is no available cq, allocatea new cq with the requirements and add it to the device pool.IB_POLL_DIRECT cannot be used for shared cqs so it is not a valid valueforpoll_ctx.

voidib_cq_pool_put(structib_cq*cq,unsignedintnr_cqe)

Return a CQ taken from a shared pool.

Parameters

structib_cq*cq

The CQ to return.

unsignedintnr_cqe

The max number of cqes that the user had requested.

intib_cm_listen(structib_cm_id*cm_id,__be64service_id)

Initiates listening on the specified service ID for connection and service ID resolution requests.

Parameters

structib_cm_id*cm_id

Connection identifier associated with the listen request.

__be64service_id

Service identifier matched against incoming connectionand service ID resolution requests. The service ID should be specifiednetwork-byte order. If set to IB_CM_ASSIGN_SERVICE_ID, the CM willassign a service ID to the caller.

structib_cm_id*ib_cm_insert_listen(structib_device*device,ib_cm_handlercm_handler,__be64service_id)

Create a new listening ib_cm_id and listen on the given service ID.

Parameters

structib_device*device

Device associated with the cm_id. All related communication willbe associated with the specified device.

ib_cm_handlercm_handler

Callback invoked to notify the user of CM events.

__be64service_id

Service identifier matched against incoming connectionand service ID resolution requests. The service ID should be specifiednetwork-byte order. If set to IB_CM_ASSIGN_SERVICE_ID, the CM willassign a service ID to the caller.

Description

If there’s an existing ID listening on that same device and service ID,return it.

Callers should call ib_destroy_cm_id when done with the listener ID.

intrdma_rw_ctx_init(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structscatterlist*sg,u32sg_cnt,u32sg_offset,u64remote_addr,u32rkey,enumdma_data_directiondir)

initialize a RDMA READ/WRITE context

Parameters

structrdma_rw_ctx*ctx

context to initialize

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structscatterlist*sg

scatterlist to READ/WRITE from/to

u32sg_cnt

number of entries insg

u32sg_offset

current byte offset intosg

u64remote_addr

remote address to read/write (relative torkey)

u32rkey

remote key to operate on

enumdma_data_directiondir

DMA_TO_DEVICE for RDMA WRITE,DMA_FROM_DEVICE for RDMA READ

Description

Returns the number of WQEs that will be needed on the workqueue ifsuccessful, or a negative error code.

intrdma_rw_ctx_signature_init(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structscatterlist*sg,u32sg_cnt,structscatterlist*prot_sg,u32prot_sg_cnt,structib_sig_attrs*sig_attrs,u64remote_addr,u32rkey,enumdma_data_directiondir)

initialize a RW context with signature offload

Parameters

structrdma_rw_ctx*ctx

context to initialize

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structscatterlist*sg

scatterlist to READ/WRITE from/to

u32sg_cnt

number of entries insg

structscatterlist*prot_sg

scatterlist to READ/WRITE protection information from/to

u32prot_sg_cnt

number of entries inprot_sg

structib_sig_attrs*sig_attrs

signature offloading algorithms

u64remote_addr

remote address to read/write (relative torkey)

u32rkey

remote key to operate on

enumdma_data_directiondir

DMA_TO_DEVICE for RDMA WRITE,DMA_FROM_DEVICE for RDMA READ

Description

Returns the number of WQEs that will be needed on the workqueue ifsuccessful, or a negative error code.

structib_send_wr*rdma_rw_ctx_wrs(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structib_cqe*cqe,structib_send_wr*chain_wr)

return chain of WRs for a RDMA READ or WRITE operation

Parameters

structrdma_rw_ctx*ctx

context to operate on

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structib_cqe*cqe

completion queue entry for the last WR

structib_send_wr*chain_wr

WR to append to the posted chain

Description

Return the WR chain for the set of RDMA READ/WRITE operations described byctx, as well as any memory registration operations needed. Ifchain_wris non-NULL the WR it points to will be appended to the chain of WRs posted.Ifchain_wr is not setcqe must be set so that the caller gets acompletion notification.

intrdma_rw_ctx_post(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structib_cqe*cqe,structib_send_wr*chain_wr)

post a RDMA READ or RDMA WRITE operation

Parameters

structrdma_rw_ctx*ctx

context to operate on

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structib_cqe*cqe

completion queue entry for the last WR

structib_send_wr*chain_wr

WR to append to the posted chain

Description

Post the set of RDMA READ/WRITE operations described byctx, as well asany memory registration operations needed. Ifchain_wr is non-NULL theWR it points to will be appended to the chain of WRs posted. Ifchain_wris not setcqe must be set so that the caller gets a completionnotification.

voidrdma_rw_ctx_destroy(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structscatterlist*sg,u32sg_cnt,enumdma_data_directiondir)

release all resources allocated by rdma_rw_ctx_init

Parameters

structrdma_rw_ctx*ctx

context to release

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structscatterlist*sg

scatterlist that was used for the READ/WRITE

u32sg_cnt

number of entries insg

enumdma_data_directiondir

DMA_TO_DEVICE for RDMA WRITE,DMA_FROM_DEVICE for RDMA READ

voidrdma_rw_ctx_destroy_signature(structrdma_rw_ctx*ctx,structib_qp*qp,u32port_num,structscatterlist*sg,u32sg_cnt,structscatterlist*prot_sg,u32prot_sg_cnt,enumdma_data_directiondir)

release all resources allocated by rdma_rw_ctx_signature_init

Parameters

structrdma_rw_ctx*ctx

context to release

structib_qp*qp

queue pair to operate on

u32port_num

port num to which the connection is bound

structscatterlist*sg

scatterlist that was used for the READ/WRITE

u32sg_cnt

number of entries insg

structscatterlist*prot_sg

scatterlist that was used for the READ/WRITE of the PI

u32prot_sg_cnt

number of entries inprot_sg

enumdma_data_directiondir

DMA_TO_DEVICE for RDMA WRITE,DMA_FROM_DEVICE for RDMA READ

unsignedintrdma_rw_mr_factor(structib_device*device,u32port_num,unsignedintmaxpages)

return number of MRs required for a payload

Parameters

structib_device*device

device handling the connection

u32port_num

port num to which the connection is bound

unsignedintmaxpages

maximum payload pages per rdma_rw_ctx

Description

Returns the number of MRs the device requires to movemaxpayloadbytes. The returned value is used during transport creation tocompute max_rdma_ctxts and the size of the transport’s Send andSend Completion Queues.

boolrdma_dev_access_netns(conststructib_device*dev,conststructnet*net)

Return whether an rdma device can be accessed from a specified net namespace or not.

Parameters

conststructib_device*dev

Pointer to rdma device which needs to be checked

conststructnet*net

Pointer to net namesapce for which access to be checked

Description

When the rdma device is in shared mode, it ignores the net namespace.When the rdma device is exclusive to a net namespace, rdma device netnamespace is checked against the specified one.

boolrdma_dev_has_raw_cap(conststructib_device*dev)

Returns whether a specified rdma device has CAP_NET_RAW capability or not.

Parameters

conststructib_device*dev

Pointer to rdma device whose capability to be checked

Description

Returns true if a rdma device’s owning user namespace has CAP_NET_RAWcapability, otherwise false. When rdma subsystem is in legacy shared network,namespace mode, the default net namespace is considered.

voidib_device_put(structib_device*device)

Release IB device reference

Parameters

structib_device*device

device whose reference to be released

Description

ib_device_put() releases reference to the IB device to allow it to beunregistered and eventually free.

structib_device*ib_device_get_by_name(constchar*name,enumrdma_driver_iddriver_id)

Find an IB device by name

Parameters

constchar*name

The name to look for

enumrdma_driver_iddriver_id

The driver ID that must match (RDMA_DRIVER_UNKNOWN matches all)

Description

Find and hold an ib_device by its name. The caller must callib_device_put() on the returned pointer.

structib_device*_ib_alloc_device(size_tsize,structnet*net)

allocate an IB device struct

Parameters

size_tsize

size of structure to allocate

structnet*net

network namespace device should be located in, namespacemust stay valid untilib_register_device() is completed.

Description

Low-level drivers should useib_alloc_device() to allocatestructib_device.size is the size of the structure to be allocated,including any private data used by the low-level driver.ib_dealloc_device() must be used to free structures allocated withib_alloc_device().

voidib_dealloc_device(structib_device*device)

free an IB device struct

Parameters

structib_device*device

structure to free

Description

Free a structure allocated withib_alloc_device().

conststructib_port_immutable*ib_port_immutable_read(structib_device*dev,unsignedintport)

Read rdma port’s immutable data

Parameters

structib_device*dev

IB device

unsignedintport

port number whose immutable data to read. It starts with index 1 andvalid upto includingrdma_end_port().

intib_register_device(structib_device*device,constchar*name,structdevice*dma_device)

Register an IB device with IB core

Parameters

structib_device*device

Device to register

constchar*name

unique string device name. This may include a ‘%’ which willcause a unique index to be added to the passed device name.

structdevice*dma_device

pointer to a DMA-capable device. IfNULL, then the IBdevice will be used. In this case the caller should fullysetup the ibdev for DMA. This usually means using dma_virt_ops.

Description

Low-level drivers useib_register_device() to register theirdevices with the IB core. All registered clients will receive acallback for each device that is added.device must be allocatedwithib_alloc_device().

If the driver uses ops.dealloc_driver and calls anyib_unregister_device()asynchronously then the device pointer may become freed as soon as thisfunction returns.

voidib_unregister_device(structib_device*ib_dev)

Unregister an IB device

Parameters

structib_device*ib_dev

The device to unregister

Description

Unregister an IB device. All clients will receive a remove callback.

Callers should call this routine only once, and protect against races withregistration. Typically it should only be called as part of a removecallback in an implementation of driver core’sstructdevice_driver andrelated.

If ops.dealloc_driver is used then ib_dev will be freed upon return fromthis function.

voidib_unregister_device_and_put(structib_device*ib_dev)

Unregister a device while holding a ‘get’

Parameters

structib_device*ib_dev

The device to unregister

Description

This is the same asib_unregister_device(), except it includes an internalib_device_put() that should match a ‘get’ obtained by the caller.

It is safe to call this routine concurrently from multiple threads whileholding the ‘get’. When the function returns the device is fullyunregistered.

Drivers using this flow MUST use the driver_unregister callback to clean uptheir resources associated with the device and dealloc it.

voidib_unregister_driver(enumrdma_driver_iddriver_id)

Unregister all IB devices for a driver

Parameters

enumrdma_driver_iddriver_id

The driver to unregister

Description

This implements a fence for device unregistration. It only returns once alldevices associated with the driver_id have fully completed theirunregistration and returned from ib_unregister_device*().

If device’s are not yet unregistered it goes ahead and starts unregisteringthem.

This does not block creation of new devices with the given driver_id, thatis the responsibility of the caller.

voidib_unregister_device_queued(structib_device*ib_dev)

Unregister a device using a work queue

Parameters

structib_device*ib_dev

The device to unregister

Description

This schedules an asynchronous unregistration using a WQ for the device. Adriver should use this to avoid holding locks while doing unregistration,such as holding the RTNL lock.

Drivers using this API must use ib_unregister_driver before module unloadto ensure that all scheduled unregistrations have completed.

intib_register_client(structib_client*client)

Register an IB client

Parameters

structib_client*client

Client to register

Description

Upper level users of the IB drivers can useib_register_client() toregister callbacks for IB device addition and removal. When an IBdevice is added, each registered client’s add method will be called(in the order the clients were registered), and when a device isremoved, each client’s remove method will be called (in the reverseorder that clients were registered). In addition, whenib_register_client() is called, the client will receive an addcallback for all devices already registered.

voidib_unregister_client(structib_client*client)

Unregister an IB client

Parameters

structib_client*client

Client to unregister

Description

Upper level users useib_unregister_client() to remove their clientregistration. Whenib_unregister_client() is called, the clientwill receive a remove callback for each IB device still registered.

This is a full fence, once it returns no client callbacks will be called,or are running in another thread.

voidib_set_client_data(structib_device*device,structib_client*client,void*data)

Set IB client context

Parameters

structib_device*device

Device to set context for

structib_client*client

Client to set context for

void*data

Context to set

Description

ib_set_client_data() sets client context data that can be retrieved withib_get_client_data(). This can only be called while the client isregistered to the device, once the ib_clientremove() callback returns thiscannot be called.

voidib_register_event_handler(structib_event_handler*event_handler)

Register an IB event handler

Parameters

structib_event_handler*event_handler

Handler to register

Description

ib_register_event_handler() registers an event handler that will becalled back when asynchronous IB events occur (as defined inchapter 11 of the InfiniBand Architecture Specification). Thiscallback occurs in workqueue context.

voidib_unregister_event_handler(structib_event_handler*event_handler)

Unregister an event handler

Parameters

structib_event_handler*event_handler

Handler to unregister

Description

Unregister an event handler registered withib_register_event_handler().

intib_query_port(structib_device*device,u32port_num,structib_port_attr*port_attr)

Query IB port attributes

Parameters

structib_device*device

Device to query

u32port_num

Port number to query

structib_port_attr*port_attr

Port attributes

Description

ib_query_port() returns the attributes of a port through theport_attr pointer.

intib_device_set_netdev(structib_device*ib_dev,structnet_device*ndev,u32port)

Associate the ib_dev with an underlying net_device

Parameters

structib_device*ib_dev

Device to modify

structnet_device*ndev

net_device to affiliate, may be NULL

u32port

IB port the net_device is connected to

Description

Drivers should use this to link the ib_device to a netdev so the netdevshows up in interfaces like ib_enum_roce_netdev. Only one netdev may beaffiliated with any port.

The caller must ensure that the given ndev is not unregistered orunregistering, and that either the ib_device is unregistered orib_device_set_netdev() is called with NULL when the ndev sends aNETDEV_UNREGISTER event.

intib_query_netdev_port(structib_device*ibdev,structnet_device*ndev,u32*port)

Query the port number of a net_device associated with an ibdev

Parameters

structib_device*ibdev

IB device

structnet_device*ndev

Network device

u32*port

IB port the net_device is connected to

structib_device*ib_device_get_by_netdev(structnet_device*ndev,enumrdma_driver_iddriver_id)

Find an IB device associated with a netdev

Parameters

structnet_device*ndev

netdev to locate

enumrdma_driver_iddriver_id

The driver ID that must match (RDMA_DRIVER_UNKNOWN matches all)

Description

Find and hold an ib_device that is associated with a netdev viaib_device_set_netdev(). The caller must callib_device_put() on thereturned pointer.

intib_query_pkey(structib_device*device,u32port_num,u16index,u16*pkey)

Get P_Key table entry

Parameters

structib_device*device

Device to query

u32port_num

Port number to query

u16index

P_Key table index to query

u16*pkey

Returned P_Key

Description

ib_query_pkey() fetches the specified P_Key table entry.

intib_modify_device(structib_device*device,intdevice_modify_mask,structib_device_modify*device_modify)

Change IB device attributes

Parameters

structib_device*device

Device to modify

intdevice_modify_mask

Mask of attributes to change

structib_device_modify*device_modify

New attribute values

Description

ib_modify_device() changes a device’s attributes as specified bythedevice_modify_mask anddevice_modify structure.

intib_modify_port(structib_device*device,u32port_num,intport_modify_mask,structib_port_modify*port_modify)

Modifies the attributes for the specified port.

Parameters

structib_device*device

The device to modify.

u32port_num

The number of the port to modify.

intport_modify_mask

Mask used to specify which attributes of the portto change.

structib_port_modify*port_modify

New attribute values for the port.

Description

ib_modify_port() changes a port’s attributes as specified by theport_modify_mask andport_modify structure.

intib_find_gid(structib_device*device,unionib_gid*gid,u32*port_num,u16*index)

Returns the port number and GID table index where a specified GID value occurs. Its searches only for IB link layer.

Parameters

structib_device*device

The device to query.

unionib_gid*gid

The GID value to search for.

u32*port_num

The port number of the device where the GID value was found.

u16*index

The index into the GID table where the GID was found. Thisparameter may be NULL.

intib_find_pkey(structib_device*device,u32port_num,u16pkey,u16*index)

Returns the PKey table index where a specified PKey value occurs.

Parameters

structib_device*device

The device to query.

u32port_num

The port number of the device to search for the PKey.

u16pkey

The PKey value to search for.

u16*index

The index into the PKey table where the PKey was found.

structnet_device*ib_get_net_dev_by_params(structib_device*dev,u32port,u16pkey,constunionib_gid*gid,conststructsockaddr*addr)

Return the appropriate net_dev for a received CM request

Parameters

structib_device*dev

An RDMA device on which the request has been received.

u32port

Port number on the RDMA device.

u16pkey

The Pkey the request came on.

constunionib_gid*gid

A GID that the net_dev uses to communicate.

conststructsockaddr*addr

Contains the IP address that the request specified as itsdestination.

structib_pd*__ib_alloc_pd(structib_device*device,unsignedintflags,constchar*caller)

Allocates an unused protection domain.

Parameters

structib_device*device

The device on which to allocate the protection domain.

unsignedintflags

protection domain flags

constchar*caller

caller’s build-time module name

Description

A protection domain object provides an association between QPs, sharedreceive queues, address handles, memory regions, and memory windows.

Every PD has a local_dma_lkey which can be used as the lkey value for localmemory operations.

intib_dealloc_pd_user(structib_pd*pd,structib_udata*udata)

Deallocates a protection domain.

Parameters

structib_pd*pd

The protection domain to deallocate.

structib_udata*udata

Valid user data or NULL for kernel object

Description

It is an error to call this function while any resources in the pd stillexist. The caller is responsible to synchronously destroy them andguarantee no new allocations will happen.

voidrdma_copy_ah_attr(structrdma_ah_attr*dest,conststructrdma_ah_attr*src)

Copy rdma ah attribute from source to destination.

Parameters

structrdma_ah_attr*dest

Pointer to destination ah_attr. Contents of the destinationpointer is assumed to be invalid and attribute are overwritten.

conststructrdma_ah_attr*src

Pointer to source ah_attr.

voidrdma_replace_ah_attr(structrdma_ah_attr*old,conststructrdma_ah_attr*new)

Replace valid ah_attr with new one.

Parameters

structrdma_ah_attr*old

Pointer to existing ah_attr which needs to be replaced.old is assumed to be valid or zero’d

conststructrdma_ah_attr*new

Pointer to the new ah_attr.

Description

rdma_replace_ah_attr() first releases any reference in the old ah_attr ifold the ah_attr is valid; after that it copies the new attribute and holdsthe reference to the replaced ah_attr.

voidrdma_move_ah_attr(structrdma_ah_attr*dest,structrdma_ah_attr*src)

Move ah_attr pointed by source to destination.

Parameters

structrdma_ah_attr*dest

Pointer to destination ah_attr to copy to.dest is assumed to be valid or zero’d

structrdma_ah_attr*src

Pointer to the new ah_attr.

Description

rdma_move_ah_attr() first releases any reference in the destination ah_attrif it is valid. This also transfers ownership of internal references fromsrc to dest, making src invalid in the process. No new reference of the srcah_attr is taken.

structib_ah*rdma_create_ah(structib_pd*pd,structrdma_ah_attr*ah_attr,u32flags)

Creates an address handle for the given address vector.

Parameters

structib_pd*pd

The protection domain associated with the address handle.

structrdma_ah_attr*ah_attr

The attributes of the address vector.

u32flags

Create address handle flags (seeenumrdma_create_ah_flags).

Description

It returns 0 on success and returns appropriate error code on error.The address handle is used to reference a local or global destinationin all UD QP post sends.

structib_ah*rdma_create_user_ah(structib_pd*pd,structrdma_ah_attr*ah_attr,structib_udata*udata)

Creates an address handle for the given address vector. It resolves destination mac address for ah attribute of RoCE type.

Parameters

structib_pd*pd

The protection domain associated with the address handle.

structrdma_ah_attr*ah_attr

The attributes of the address vector.

structib_udata*udata

pointer to user’s input output buffer information need byprovider driver.

Description

It returns 0 on success and returns appropriate error code on error.The address handle is used to reference a local or global destinationin all UD QP post sends.

voidrdma_move_grh_sgid_attr(structrdma_ah_attr*attr,unionib_gid*dgid,u32flow_label,u8hop_limit,u8traffic_class,conststructib_gid_attr*sgid_attr)

Sets the sgid attribute of GRH, taking ownership of the reference

Parameters

structrdma_ah_attr*attr

Pointer to AH attribute structure

unionib_gid*dgid

Destination GID

u32flow_label

Flow label

u8hop_limit

Hop limit

u8traffic_class

traffic class

conststructib_gid_attr*sgid_attr

Pointer to SGID attribute

Description

This takes ownership of the sgid_attr reference. The caller must ensurerdma_destroy_ah_attr() is called before destroying the rdma_ah_attr aftercalling this function.

voidrdma_destroy_ah_attr(structrdma_ah_attr*ah_attr)

Release reference to SGID attribute of ah attribute.

Parameters

structrdma_ah_attr*ah_attr

Pointer to ah attribute

Description

Release reference to the SGID attribute of the ah attribute if it isnon NULL. It is safe to call this multiple times, and safe to call it ona zero initialized ah_attr.

structib_srq*ib_create_srq_user(structib_pd*pd,structib_srq_init_attr*srq_init_attr,structib_usrq_object*uobject,structib_udata*udata)

Creates a SRQ associated with the specified protection domain.

Parameters

structib_pd*pd

The protection domain associated with the SRQ.

structib_srq_init_attr*srq_init_attr

A list of initial attributes required to create theSRQ. If SRQ creation succeeds, then the attributes are updated tothe actual capabilities of the created SRQ.

structib_usrq_object*uobject

uobject pointer if this is not a kernel SRQ

structib_udata*udata

udata pointer if this is not a kernel SRQ

Description

srq_attr->max_wr and srq_attr->max_sge are read the determine therequested size of the SRQ, and set to the actual values allocatedon return. Ifib_create_srq() succeeds, then max_wr and max_sgewill always be at least as large as the requested values.

structib_qp*ib_create_qp_user(structib_device*dev,structib_pd*pd,structib_qp_init_attr*attr,structib_udata*udata,structib_uqp_object*uobj,constchar*caller)

Creates a QP associated with the specified protection domain.

Parameters

structib_device*dev

IB device

structib_pd*pd

The protection domain associated with the QP.

structib_qp_init_attr*attr

A list of initial attributes required to create theQP. If QP creation succeeds, then the attributes are updated tothe actual capabilities of the created QP.

structib_udata*udata

User data

structib_uqp_object*uobj

uverbs obect

constchar*caller

caller’s build-time module name

intib_modify_qp_with_udata(structib_qp*ib_qp,structib_qp_attr*attr,intattr_mask,structib_udata*udata)

Modifies the attributes for the specified QP.

Parameters

structib_qp*ib_qp

The QP to modify.

structib_qp_attr*attr

On input, specifies the QP attributes to modify. On output,the current values of selected QP attributes are returned.

intattr_mask

A bit-mask used to specify which attributes of the QPare being modified.

structib_udata*udata

pointer to user’s input output buffer informationare being modified.It returns 0 on success and returns appropriate error code on error.

structib_mr*ib_alloc_mr(structib_pd*pd,enumib_mr_typemr_type,u32max_num_sg)

Allocates a memory region

Parameters

structib_pd*pd

protection domain associated with the region

enumib_mr_typemr_type

memory region type

u32max_num_sg

maximum sg entries available for registration.

Notes

Memory registeration page/sg lists must not exceed max_num_sg.For mr_type IB_MR_TYPE_MEM_REG, the total length cannot exceedmax_num_sg * used_page_size.

structib_mr*ib_alloc_mr_integrity(structib_pd*pd,u32max_num_data_sg,u32max_num_meta_sg)

Allocates an integrity memory region

Parameters

structib_pd*pd

protection domain associated with the region

u32max_num_data_sg

maximum data sg entries available for registration

u32max_num_meta_sg

maximum metadata sg entries available forregistration

Notes

Memory registration page/sg lists must not exceed max_num_sg,also the integrity page/sg lists must not exceed max_num_meta_sg.

structib_xrcd*ib_alloc_xrcd_user(structib_device*device,structinode*inode,structib_udata*udata)

Allocates an XRC domain.

Parameters

structib_device*device

The device on which to allocate the XRC domain.

structinode*inode

inode to connect XRCD

structib_udata*udata

Valid user data or NULL for kernel object

intib_dealloc_xrcd_user(structib_xrcd*xrcd,structib_udata*udata)

Deallocates an XRC domain.

Parameters

structib_xrcd*xrcd

The XRC domain to deallocate.

structib_udata*udata

Valid user data or NULL for kernel object

structib_wq*ib_create_wq(structib_pd*pd,structib_wq_init_attr*wq_attr)

Creates a WQ associated with the specified protection domain.

Parameters

structib_pd*pd

The protection domain associated with the WQ.

structib_wq_init_attr*wq_attr

A list of initial attributes required to create theWQ. If WQ creation succeeds, then the attributes are updated tothe actual capabilities of the created WQ.

Description

wq_attr->max_wr and wq_attr->max_sge determinethe requested size of the WQ, and set to the actual values allocatedon return.Ifib_create_wq() succeeds, then max_wr and max_sge will always beat least as large as the requested values.

intib_destroy_wq_user(structib_wq*wq,structib_udata*udata)

Destroys the specified user WQ.

Parameters

structib_wq*wq

The WQ to destroy.

structib_udata*udata

Valid user data

intib_map_mr_sg_pi(structib_mr*mr,structscatterlist*data_sg,intdata_sg_nents,unsignedint*data_sg_offset,structscatterlist*meta_sg,intmeta_sg_nents,unsignedint*meta_sg_offset,unsignedintpage_size)

Map the dma mapped SG lists for PI (protection information) and set an appropriate memory region for registration.

Parameters

structib_mr*mr

memory region

structscatterlist*data_sg

dma mapped scatterlist for data

intdata_sg_nents

number of entries in data_sg

unsignedint*data_sg_offset

offset in bytes into data_sg

structscatterlist*meta_sg

dma mapped scatterlist for metadata

intmeta_sg_nents

number of entries in meta_sg

unsignedint*meta_sg_offset

offset in bytes into meta_sg

unsignedintpage_size

page vector desired page size

Description

Constraints:- The MR must be allocated with type IB_MR_TYPE_INTEGRITY.

After this completes successfully, the memory regionis ready for registration.

Return

0 on success.

intib_map_mr_sg(structib_mr*mr,structscatterlist*sg,intsg_nents,unsignedint*sg_offset,unsignedintpage_size)

Map the largest prefix of a dma mapped SG list and set it the memory region.

Parameters

structib_mr*mr

memory region

structscatterlist*sg

dma mapped scatterlist

intsg_nents

number of entries in sg

unsignedint*sg_offset

offset in bytes into sg

unsignedintpage_size

page vector desired page size

Description

Constraints:

  • The first sg element is allowed to have an offset.

  • Each sg element must either be aligned to page_size or virtuallycontiguous to the previous element. In case an sg element has anon-contiguous offset, the mapping prefix will not include it.

  • The last sg element is allowed to have length less than page_size.

  • If sg_nents total byte length exceeds the mr max_num_sge * page_sizethen only max_num_sg entries will be mapped.

  • If the MR was allocated with type IB_MR_TYPE_SG_GAPS, none of theseconstraints holds and the page_size argument is ignored.

Returns the number of sg elements that were mapped to the memory region.

After this completes successfully, the memory regionis ready for registration.

intib_sg_to_pages(structib_mr*mr,structscatterlist*sgl,intsg_nents,unsignedint*sg_offset_p,int(*set_page)(structib_mr*,u64))

Convert the largest prefix of a sg list to a page vector

Parameters

structib_mr*mr

memory region

structscatterlist*sgl

dma mapped scatterlist

intsg_nents

number of entries in sg

unsignedint*sg_offset_p

IN

start offset in bytes into sg

OUT

offset in bytes for element n of the sg of the firstbyte that has not been processed where n is the returnvalue of this function.

int(*set_page)(structib_mr*,u64)

driver page assignment function pointer

Description

Core service helper for drivers to convert the largestprefix of given sg list to a page vector. The sg listprefix converted is the prefix that meet the requirementsof ib_map_mr_sg.

Returns the number of sg elements that were assigned toa page vector.

voidib_drain_sq(structib_qp*qp)

Block until all SQ CQEs have been consumed by the application.

Parameters

structib_qp*qp

queue pair to drain

Description

If the device has a provider-specific drain function, thencall that. Otherwise call the generic drain function__ib_drain_sq().

The caller must:

ensure there is room in the CQ and SQ for the drain work request andcompletion.

allocate the CQ usingib_alloc_cq().

ensure that there are no other contexts that are posting WRs concurrently.Otherwise the drain is not guaranteed.

voidib_drain_rq(structib_qp*qp)

Block until all RQ CQEs have been consumed by the application.

Parameters

structib_qp*qp

queue pair to drain

Description

If the device has a provider-specific drain function, thencall that. Otherwise call the generic drain function__ib_drain_rq().

The caller must:

ensure there is room in the CQ and RQ for the drain work request andcompletion.

allocate the CQ usingib_alloc_cq().

ensure that there are no other contexts that are posting WRs concurrently.Otherwise the drain is not guaranteed.

voidib_drain_qp(structib_qp*qp)

Block until all CQEs have been consumed by the application on both the RQ and SQ.

Parameters

structib_qp*qp

queue pair to drain

Description

The caller must:

ensure there is room in the CQ(s), SQ, and RQ for drain work requestsand completions.

allocate the CQs usingib_alloc_cq().

ensure that there are no other contexts that are posting WRs concurrently.Otherwise the drain is not guaranteed.

structrdma_hw_stats*rdma_alloc_hw_stats_struct(conststructrdma_stat_desc*descs,intnum_counters,unsignedlonglifespan)

Helper function to allocate dynamic struct for the drivers.

Parameters

conststructrdma_stat_desc*descs

array of static descriptors

intnum_counters

number of elements in array

unsignedlonglifespan

milliseconds between updates

voidrdma_free_hw_stats_struct(structrdma_hw_stats*stats)

Helper function to release rdma_hw_stats

Parameters

structrdma_hw_stats*stats

statistics to release

voidib_pack(conststructib_field*desc,intdesc_len,void*structure,void*buf)

Pack a structure into a buffer

Parameters

conststructib_field*desc

Array of structure field descriptions

intdesc_len

Number of entries indesc

void*structure

Structure to pack from

void*buf

Buffer to pack into

Description

ib_pack() packs a list of structure fields into a buffer,controlled by the array of fields indesc.

voidib_unpack(conststructib_field*desc,intdesc_len,void*buf,void*structure)

Unpack a buffer into a structure

Parameters

conststructib_field*desc

Array of structure field descriptions

intdesc_len

Number of entries indesc

void*buf

Buffer to unpack from

void*structure

Structure to unpack into

Description

ib_pack() unpacks a list of structure fields from a buffer,controlled by the array of fields indesc.

voidib_sa_cancel_query(intid,structib_sa_query*query)

try to cancel an SA query

Parameters

intid

ID of query to cancel

structib_sa_query*query

query pointer to cancel

Description

Try to cancel an SA query. If the id and query don’t match up orthe query has already completed, nothing is done. Otherwise thequery is canceled and will complete with a status of -EINTR.

intib_init_ah_attr_from_path(structib_device*device,u32port_num,structsa_path_rec*rec,structrdma_ah_attr*ah_attr,conststructib_gid_attr*gid_attr)

Initialize address handle attributes based on an SA path record.

Parameters

structib_device*device

Device associated ah attributes initialization.

u32port_num

Port on the specified device.

structsa_path_rec*rec

path record entry to use for ah attributes initialization.

structrdma_ah_attr*ah_attr

address handle attributes to initialization from path record.

conststructib_gid_attr*gid_attr

SGID attribute to consider during initialization.

Description

Whenib_init_ah_attr_from_path() returns success,(a) for IB link layer it optionally contains a reference to SGID attributewhen GRH is present for IB link layer.(b) for RoCE link layer it contains a reference to SGID attribute.User must invokerdma_destroy_ah_attr() to release reference to SGIDattributes which are initialized usingib_init_ah_attr_from_path().

intib_sa_path_rec_get(structib_sa_client*client,structib_device*device,u32port_num,structsa_path_rec*rec,ib_sa_comp_maskcomp_mask,unsignedlongtimeout_ms,gfp_tgfp_mask,void(*callback)(intstatus,structsa_path_rec*resp,unsignedintnum_paths,void*context),void*context,structib_sa_query**sa_query)

Start a Path get query

Parameters

structib_sa_client*client

SA client

structib_device*device

device to send query on

u32port_num

port number to send query on

structsa_path_rec*rec

Path Record to send in query

ib_sa_comp_maskcomp_mask

component mask to send in query

unsignedlongtimeout_ms

time to wait for response

gfp_tgfp_mask

GFP mask to use for internal allocations

void(*callback)(intstatus,structsa_path_rec*resp,unsignedintnum_paths,void*context)

function called when query completes, times out or iscanceled

void*context

opaque user context passed to callback

structib_sa_query**sa_query

query context, used to cancel query

Description

Send a Path Record Get query to the SA to look up a path. Thecallback function will be called when the query completes (orfails); status is 0 for a successful response, -EINTR if the queryis canceled, -ETIMEDOUT is the query timed out, or -EIO if an erroroccurred sending the query. The resp parameter of the callback isonly valid if status is 0.

If the return value ofib_sa_path_rec_get() is negative, it is anerror code. Otherwise it is a query ID that can be used to cancelthe query.

intib_sa_service_rec_get(structib_sa_client*client,structib_device*device,u32port_num,structsa_service_rec*rec,ib_sa_comp_maskcomp_mask,unsignedlongtimeout_ms,gfp_tgfp_mask,void(*callback)(intstatus,structsa_service_rec*resp,unsignedintnum_services,void*context),void*context,structib_sa_query**sa_query)

Start a Service get query

Parameters

structib_sa_client*client

SA client

structib_device*device

device to send query on

u32port_num

port number to send query on

structsa_service_rec*rec

Service Record to send in query

ib_sa_comp_maskcomp_mask

component mask to send in query

unsignedlongtimeout_ms

time to wait for response

gfp_tgfp_mask

GFP mask to use for internal allocations

void(*callback)(intstatus,structsa_service_rec*resp,unsignedintnum_services,void*context)

function called when query completes, times out or iscanceled

void*context

opaque user context passed to callback

structib_sa_query**sa_query

query context, used to cancel query

Description

Send a Service Record Get query to the SA to look up a path. Thecallback function will be called when the query completes (orfails); status is 0 for a successful response, -EINTR if the queryis canceled, -ETIMEDOUT is the query timed out, or -EIO if an erroroccurred sending the query. The resp parameter of the callback isonly valid if status is 0.

If the return value ofib_sa_service_rec_get() is negative, it is anerror code. Otherwise it is a query ID that can be used to cancelthe query.

intib_ud_header_init(intpayload_bytes,intlrh_present,inteth_present,intvlan_present,intgrh_present,intip_version,intudp_present,intimmediate_present,structib_ud_header*header)

Initialize UD header structure

Parameters

intpayload_bytes

Length of packet payload

intlrh_present

specify if LRH is present

inteth_present

specify if Eth header is present

intvlan_present

packet is tagged vlan

intgrh_present

GRH flag (if non-zero, GRH will be included)

intip_version

if non-zero, IP header, V4 or V6, will be included

intudp_present

if non-zero, UDP header will be included

intimmediate_present

specify if immediate data is present

structib_ud_header*header

Structure to initialize

intib_ud_header_pack(structib_ud_header*header,void*buf)

Pack UD headerstructinto wire format

Parameters

structib_ud_header*header

UD header struct

void*buf

Buffer to pack into

Description

ib_ud_header_pack() packs the UD header structureheader into wireformat in the bufferbuf.

unsignedlongib_umem_find_best_pgsz(structib_umem*umem,unsignedlongpgsz_bitmap,unsignedlongvirt)

Find best HW page size to use for this MR

Parameters

structib_umem*umem

umem struct

unsignedlongpgsz_bitmap

bitmap of HW supported page sizes

unsignedlongvirt

IOVA

Description

This helper is intended for HW that support multiple pagesizes but can do only a single page size in an MR.

Returns 0 if the umem requires page sizes not supported bythe driver to be mapped. Drivers always supporting PAGE_SIZEor smaller will never see a 0 result.

structib_umem*ib_umem_get(structib_device*device,unsignedlongaddr,size_tsize,intaccess)

Pin and DMA map userspace memory.

Parameters

structib_device*device

IB device to connect UMEM

unsignedlongaddr

userspace virtual address to start at

size_tsize

length of region to pin

intaccess

IB_ACCESS_xxx flags for memory being pinned

voidib_umem_release(structib_umem*umem)

release memory pinned with ib_umem_get

Parameters

structib_umem*umem

umemstructto release

structib_umem_odp*ib_umem_odp_alloc_implicit(structib_device*device,intaccess)

Allocate a parent implicit ODP umem

Parameters

structib_device*device

IB device to create UMEM

intaccess

ib_reg_mr access flags

Description

Implicit ODP umems do not have a VA range and do not have any page lists.They exist only to hold the per_mm reference to help the driver createchildren umems.

structib_umem_odp*ib_umem_odp_alloc_child(structib_umem_odp*root,unsignedlongaddr,size_tsize,conststructmmu_interval_notifier_ops*ops)

Allocate a child ODP umem under an implicit parent ODP umem

Parameters

structib_umem_odp*root

The parent umem enclosing the child. This must be allocated usingib_alloc_implicit_odp_umem()

unsignedlongaddr

The starting userspace VA

size_tsize

The length of the userspace VA

conststructmmu_interval_notifier_ops*ops

MMU interval ops, currently onlyinvalidate

structib_umem_odp*ib_umem_odp_get(structib_device*device,unsignedlongaddr,size_tsize,intaccess,conststructmmu_interval_notifier_ops*ops)

Create a umem_odp for a userspace va

Parameters

structib_device*device

IB devicestructto get UMEM

unsignedlongaddr

userspace virtual address to start at

size_tsize

length of region to pin

intaccess

IB_ACCESS_xxx flags for memory being pinned

conststructmmu_interval_notifier_ops*ops

MMU interval ops, currently onlyinvalidate

Description

The driver should use when the access flags indicate ODP memory. It avoidspinning, instead, stores the mm for future page fault handling inconjunction with MMU notifiers.

intib_umem_odp_map_dma_and_lock(structib_umem_odp*umem_odp,u64user_virt,u64bcnt,u64access_mask,boolfault)

DMA map userspace memory in an ODP MR and lock it.

Parameters

structib_umem_odp*umem_odp

the umem to map and pin

u64user_virt

the address from which we need to map.

u64bcnt

the minimal number of bytes to pin and map. The mapping might bebigger due to alignment, and may also be smaller in case of an errorpinning or mapping a page. The actual pages mapped is returned inthe return value.

u64access_mask

bit mask of the requested access permissions for the givenrange.

boolfault

is faulting required for the given range

Description

Maps the range passed in the argument to DMA addresses.Upon success the ODP MR will be locked to let caller complete its devicepage table update.

Returns the number of pages mapped in success, negative error codefor failure.

RDMA Verbs transport library

intrvt_fast_reg_mr(structrvt_qp*qp,structib_mr*ibmr,u32key,intaccess)

fast register physical MR

Parameters

structrvt_qp*qp

the queue pair where the work request comes from

structib_mr*ibmr

the memory region to be registered

u32key

updated key for this memory region

intaccess

access flags for this memory region

Description

Returns 0 on success.

intrvt_invalidate_rkey(structrvt_qp*qp,u32rkey)

invalidate an MR rkey

Parameters

structrvt_qp*qp

queue pair associated with the invalidate op

u32rkey

rkey to invalidate

Description

Returns 0 on success.

intrvt_lkey_ok(structrvt_lkey_table*rkt,structrvt_pd*pd,structrvt_sge*isge,structrvt_sge*last_sge,structib_sge*sge,intacc)

check IB SGE for validity and initialize

Parameters

structrvt_lkey_table*rkt

table containing lkey to check SGE against

structrvt_pd*pd

protection domain

structrvt_sge*isge

outgoing internal SGE

structrvt_sge*last_sge

last outgoing SGE written

structib_sge*sge

SGE to check

intacc

access flags

Description

Check the IB SGE for validity and initialize our internal versionof it.

Increments the reference count when a new sge is stored.

Return

0 if compressed, 1 if added , otherwise returns -errno.

intrvt_rkey_ok(structrvt_qp*qp,structrvt_sge*sge,u32len,u64vaddr,u32rkey,intacc)

check the IB virtual address, length, and RKEY

Parameters

structrvt_qp*qp

qp for validation

structrvt_sge*sge

SGE state

u32len

length of data

u64vaddr

virtual address to place data

u32rkey

rkey to check

intacc

access flags

Return

1 if successful, otherwise 0.

Description

increments the reference count upon success

__be32rvt_compute_aeth(structrvt_qp*qp)

compute the AETH (syndrome + MSN)

Parameters

structrvt_qp*qp

the queue pair to compute the AETH for

Description

Returns the AETH.

voidrvt_get_credit(structrvt_qp*qp,u32aeth)

flush the send work queue of a QP

Parameters

structrvt_qp*qp

the qp who’s send work queue to flush

u32aeth

the Acknowledge Extended Transport Header

Description

The QP s_lock should be held.

u32rvt_restart_sge(structrvt_sge_state*ss,structrvt_swqe*wqe,u32len)

rewind the sge state for a wqe

Parameters

structrvt_sge_state*ss

the sge state pointer

structrvt_swqe*wqe

the wqe to rewind

u32len

the data length from the start of the wqe in bytes

Description

Returns the remaining data length.

intrvt_check_ah(structib_device*ibdev,structrdma_ah_attr*ah_attr)

validate the attributes of AH

Parameters

structib_device*ibdev

the ib device

structrdma_ah_attr*ah_attr

the attributes of the AH

Description

If driver supports a more detailed check_ah function call back to itotherwise just check the basics.

Return

0 on success

structrvt_dev_info*rvt_alloc_device(size_tsize,intnports)

allocate rdi

Parameters

size_tsize

how big of a structure to allocate

intnports

number of ports to allocate array slots for

Description

Use IB core device alloc to allocate space for the rdi which is assumed to beinside of the ib_device. Any extra space that drivers require should beincluded in size.

We also allocate a port array based on the number of ports.

Return

pointer to allocated rdi

voidrvt_dealloc_device(structrvt_dev_info*rdi)

deallocate rdi

Parameters

structrvt_dev_info*rdi

structure to free

Description

Free a structure allocated withrvt_alloc_device()

intrvt_register_device(structrvt_dev_info*rdi)

register a driver

Parameters

structrvt_dev_info*rdi

main dev structure for all of rdmavt operations

Description

It is up to drivers to allocate the rdi and fill in the appropriateinformation.

Return

0 on success otherwise an errno.

voidrvt_unregister_device(structrvt_dev_info*rdi)

remove a driver

Parameters

structrvt_dev_info*rdi

rvt dev struct

intrvt_init_port(structrvt_dev_info*rdi,structrvt_ibport*port,intport_index,u16*pkey_table)

init internal data for driver port

Parameters

structrvt_dev_info*rdi

rvt_dev_info struct

structrvt_ibport*port

rvt port

intport_index

0 based index of ports, different from IB core port num

u16*pkey_table

pkey_table forport

Description

Keep track of a list of ports. No need to have a detach port.They persist until the driver goes away.

Return

always 0

boolrvt_cq_enter(structrvt_cq*cq,structib_wc*entry,boolsolicited)

add a new entry to the completion queue

Parameters

structrvt_cq*cq

completion queue

structib_wc*entry

work completion entry to add

boolsolicited

true ifentry is solicited

Description

This may be called with qp->s_lock held.

Return

return true on success, else returnfalse if cq is full.

intrvt_error_qp(structrvt_qp*qp,enumib_wc_statuserr)

put a QP into the error state

Parameters

structrvt_qp*qp

the QP to put into the error state

enumib_wc_statuserr

the receive completion error to signal if a RWQE is active

Description

Flushes both send and receive work queues.

Return

true if last WQE event should be generated.The QP r_lock and s_lock should be held and interrupts disabled.If we are already in error state, just return.

intrvt_get_rwqe(structrvt_qp*qp,boolwr_id_only)

copy the next RWQE into the QP’s RWQE

Parameters

structrvt_qp*qp

the QP

boolwr_id_only

update qp->r_wr_id only, not qp->r_sge

Description

Return -1 if there is a local error, 0 if no RWQE is available,otherwise return 1.

Can be called from interrupt level.

voidrvt_comm_est(structrvt_qp*qp)

handle trap with QP established

Parameters

structrvt_qp*qp

the QP

voidrvt_add_rnr_timer(structrvt_qp*qp,u32aeth)

add/start an rnr timer on the QP

Parameters

structrvt_qp*qp

the QP

u32aeth

aeth of RNR timeout, simulated aeth for loopback

voidrvt_stop_rc_timers(structrvt_qp*qp)

stop all timers

Parameters

structrvt_qp*qp

the QPstop any pending timers

voidrvt_del_timers_sync(structrvt_qp*qp)

wait for any timeout routines to exit

Parameters

structrvt_qp*qp

the QP

structrvt_qp_iter*rvt_qp_iter_init(structrvt_dev_info*rdi,u64v,void(*cb)(structrvt_qp*qp,u64v))

initial for QP iteration

Parameters

structrvt_dev_info*rdi

rvt devinfo

u64v

u64 value

void(*cb)(structrvt_qp*qp,u64v)

user-defined callback

Description

This returns an iterator suitable for iterating QPsin the system.

Thecb is a user-defined callback andv is a 64-bitvalue passed to and relevant for processing in thecb. An example use case would be to alter QP processingbased on criteria not part of the rvt_qp.

Use cases that require memory allocation to succeedmust preallocate appropriately.

Return

a pointer to an rvt_qp_iter or NULL

intrvt_qp_iter_next(structrvt_qp_iter*iter)

return the next QP in iter

Parameters

structrvt_qp_iter*iter

the iterator

Description

Fine grained QP iterator suitable for usewith debugfs seq_file mechanisms.

Updates iter->qp with the current QP when the returnvalue is 0.

Return

0 - iter->qp is valid 1 - no more QPs

voidrvt_qp_iter(structrvt_dev_info*rdi,u64v,void(*cb)(structrvt_qp*qp,u64v))

iterate all QPs

Parameters

structrvt_dev_info*rdi

rvt devinfo

u64v

a 64-bit value

void(*cb)(structrvt_qp*qp,u64v)

a callback

Description

This provides a way for iterating all QPs.

Thecb is a user-defined callback andv is a 64-bitvalue passed to and relevant for processing in thecb. An example use case would be to alter QP processingbased on criteria not part of the rvt_qp.

The code has an internal iterator to simplifynon seq_file use cases.

voidrvt_copy_sge(structrvt_qp*qp,structrvt_sge_state*ss,void*data,u32length,boolrelease,boolcopy_last)

copy data to SGE memory

Parameters

structrvt_qp*qp

associated QP

structrvt_sge_state*ss

the SGE state

void*data

the data to copy

u32length

the length of the data

boolrelease

boolean to release MR

boolcopy_last

do a separate copy of the last 8 bytes

voidrvt_ruc_loopback(structrvt_qp*sqp)

handle UC and RC loopback requests

Parameters

structrvt_qp*sqp

the sending QP

Description

This is called fromrvt_do_send() to forward a WQE addressed to the same HFINote that although we are single threaded due to the send engine, we stillhave to protect againstpost_send(). We don’t have to worry aboutreceive interrupts since this is a connected protocol and all packetswill pass through here.

structrvt_mcast*rvt_mcast_find(structrvt_ibport*ibp,unionib_gid*mgid,u16lid)

search the global table for the given multicast GID/LID

Parameters

structrvt_ibport*ibp

the IB port structure

unionib_gid*mgid

the multicast GID to search for

u16lid

the multicast LID portion of the multicast address (host order)

NOTE

It is valid to have 1 MLID with multiple MGIDs. It is not validto have 1 MGID with multiple MLIDs.

Description

The caller is responsible for decrementing the reference count if found.

Return

NULL if not found.

Upper Layer Protocols

iSCSI Extensions for RDMA (iSER)

structiser_data_buf

iSER data buffer

Definition:

struct iser_data_buf {    struct scatterlist *sg;    int size;    unsigned long      data_len;    int dma_nents;};

Members

sg

pointer to the sg list

size

num entries of this sg

data_len

total buffer byte len

dma_nents

returned by dma_map_sg

structiser_mem_reg

iSER memory registration info

Definition:

struct iser_mem_reg {    struct ib_sge sge;    u32 rkey;    struct iser_fr_desc *desc;};

Members

sge

memory region sg element

rkey

memory region remote key

desc

pointer to fast registration context

structiser_tx_desc

iSER TX descriptor

Definition:

struct iser_tx_desc {    struct iser_ctrl             iser_header;    struct iscsi_hdr             iscsi_header;    enum iser_desc_type        type;    u64 dma_addr;    struct ib_sge                tx_sg[2];    int num_sge;    struct ib_cqe                cqe;    bool mapped;    struct ib_reg_wr             reg_wr;    struct ib_send_wr            send_wr;    struct ib_send_wr            inv_wr;};

Members

iser_header

iser header

iscsi_header

iscsi header

type

command/control/dataout

dma_addr

header buffer dma_address

tx_sg

sg[0] points to iser/iscsi headerssg[1] optionally points to either of immediate dataunsolicited data-out or control

num_sge

number sges used on this TX task

cqe

completion handler

mapped

Is the task header mapped

reg_wr

registration WR

send_wr

send WR

inv_wr

invalidate WR

structiser_rx_desc

iSER RX descriptor

Definition:

struct iser_rx_desc {    struct iser_ctrl             iser_header;    struct iscsi_hdr             iscsi_header;    char data[ISER_RECV_DATA_SEG_LEN];    u64 dma_addr;    struct ib_sge                rx_sg;    struct ib_cqe                cqe;    char pad[ISER_RX_PAD_SIZE];};

Members

iser_header

iser header

iscsi_header

iscsi header

data

received data segment

dma_addr

receive buffer dma address

rx_sg

ib_sge of receive buffer

cqe

completion handler

pad

for sense data TODO: Modify to maximum sense length supported

structiser_login_desc

iSER login descriptor

Definition:

struct iser_login_desc {    void *req;    void *rsp;    u64 req_dma;    u64 rsp_dma;    struct ib_sge                sge;    struct ib_cqe                cqe;};

Members

req

pointer to login request buffer

rsp

pointer to login response buffer

req_dma

DMA address of login request buffer

rsp_dma

DMA address of login response buffer

sge

IB sge for login post recv

cqe

completion handler

structiser_device

iSER device handle

Definition:

struct iser_device {    struct ib_device             *ib_device;    struct ib_pd                 *pd;    struct ib_event_handler      event_handler;    struct list_head             ig_list;    int refcount;};

Members

ib_device

RDMA device

pd

Protection Domain for this device

event_handler

IB events handle routine

ig_list

entry in devices list

refcount

Reference counter, dominated by open iser connections

structiser_reg_resources

Fast registration resources

Definition:

struct iser_reg_resources {    struct ib_mr                     *mr;    struct ib_mr                     *sig_mr;};

Members

mr

memory region

sig_mr

signature memory region

structiser_fr_desc

Fast registration descriptor

Definition:

struct iser_fr_desc {    struct list_head                  list;    struct iser_reg_resources         rsc;    bool sig_protected;    struct list_head                  all_list;};

Members

list

entry in connection fastreg pool

rsc

data buffer registration resources

sig_protected

is region protected indicator

all_list

first and last list members

structiser_fr_pool

connection fast registration pool

Definition:

struct iser_fr_pool {    struct list_head        list;    spinlock_t lock;    int size;    struct list_head        all_list;};

Members

list

list of fastreg descriptors

lock

protects fastreg pool

size

size of the pool

all_list

first and last list members

structib_conn

Infiniband related objects

Definition:

struct ib_conn {    struct rdma_cm_id           *cma_id;    struct ib_qp                *qp;    struct ib_cq                *cq;    u32 cq_size;    struct iser_device          *device;    struct iser_fr_pool          fr_pool;    bool pi_support;    struct ib_cqe                reg_cqe;};

Members

cma_id

rdma_cm connection maneger handle

qp

Connection Queue-pair

cq

Connection completion queue

cq_size

The number of max outstanding completions

device

reference to iser device

fr_pool

connection fast registration pool

pi_support

Indicate device T10-PI support

reg_cqe

completion handler

structiser_conn

iSER connection context

Definition:

struct iser_conn {    struct ib_conn               ib_conn;    struct iscsi_conn            *iscsi_conn;    struct iscsi_endpoint        *ep;    enum iser_conn_state         state;    unsigned qp_max_recv_dtos;    u16 max_cmds;    char name[ISER_OBJECT_NAME_SIZE];    struct work_struct           release_work;    struct mutex                 state_mutex;    struct completion            stop_completion;    struct completion            ib_completion;    struct completion            up_completion;    struct list_head             conn_list;    struct iser_login_desc       login_desc;    struct iser_rx_desc          *rx_descs;    u32 num_rx_descs;    unsigned short               scsi_sg_tablesize;    unsigned short               pages_per_mr;    bool snd_w_inv;};

Members

ib_conn

connection RDMA resources

iscsi_conn

link to matching iscsi connection

ep

transport handle

state

connection logical state

qp_max_recv_dtos

maximum number of data outs, correspondsto max number of post recvs

max_cmds

maximum cmds allowed for this connection

name

connection peer portal

release_work

deferred work for release job

state_mutex

protects iser onnection state

stop_completion

conn_stop completion

ib_completion

RDMA cleanup completion

up_completion

connection establishment completed(state is ISER_CONN_UP)

conn_list

entry in ig conn list

login_desc

login descriptor

rx_descs

rx buffers array (cyclic buffer)

num_rx_descs

number of rx descriptors

scsi_sg_tablesize

scsi host sg_tablesize

pages_per_mr

maximum pages available for registration

snd_w_inv

connection uses remote invalidation

structiscsi_iser_task

iser task context

Definition:

struct iscsi_iser_task {    struct iser_tx_desc          desc;    struct iser_conn             *iser_conn;    enum iser_task_status        status;    struct scsi_cmnd             *sc;    int command_sent;    int dir[ISER_DIRS_NUM];    struct iser_mem_reg          rdma_reg[ISER_DIRS_NUM];    struct iser_data_buf         data[ISER_DIRS_NUM];    struct iser_data_buf         prot[ISER_DIRS_NUM];};

Members

desc

TX descriptor

iser_conn

link to iser connection

status

current task status

sc

link to scsi command

command_sent

indicate if command was sent

dir

iser data direction

rdma_reg

task rdma registration desc

data

iser data buffer desc

prot

iser protection buffer desc

structiser_global

iSER global context

Definition:

struct iser_global {    struct mutex      device_list_mutex;    struct list_head  device_list;    struct mutex      connlist_mutex;    struct list_head  connlist;    struct kmem_cache *desc_cache;};

Members

device_list_mutex

protects device_list

device_list

iser devices global list

connlist_mutex

protects connlist

connlist

iser connections global list

desc_cache

kmem cache for tx dataout

intiscsi_iser_pdu_alloc(structiscsi_task*task,uint8_topcode)

allocate an iscsi-iser PDU

Parameters

structiscsi_task*task

iscsi task

uint8_topcode

iscsi command opcode

Description

Netes: This routine can’t fail, just assign iscsi task

hdr and max hdr size.

intiser_initialize_task_headers(structiscsi_task*task,structiser_tx_desc*tx_desc)

Initialize task headers

Parameters

structiscsi_task*task

iscsi task

structiser_tx_desc*tx_desc

iser tx descriptor

Notes

This routine may race with iser teardown flow for scsierror handling TMFs. So for TMF we should acquire thestate mutex to avoid dereferencing the IB device whichmay have already been terminated.

intiscsi_iser_task_init(structiscsi_task*task)

Initialize iscsi-iser task

Parameters

structiscsi_task*task

iscsi task

Description

Initialize the task for the scsi command or mgmt command.

Return

Returns zero on success or -ENOMEM when failingto init task headers (dma mapping error).

intiscsi_iser_mtask_xmit(structiscsi_conn*conn,structiscsi_task*task)

xmit management (immediate) task

Parameters

structiscsi_conn*conn

iscsi connection

structiscsi_task*task

task management task

Notes

The function can return -EAGAIN in which case caller mustcall it again later, or recover. ‘0’ return code means successfulxmit.

intiscsi_iser_task_xmit(structiscsi_task*task)

xmit iscsi-iser task

Parameters

structiscsi_task*task

iscsi task

Return

zero on success or escalates $error on failure.

voidiscsi_iser_cleanup_task(structiscsi_task*task)

cleanup an iscsi-iser task

Parameters

structiscsi_task*task

iscsi task

Notes

In case the RDMA device is already NULL (might have

been removed in DEVICE_REMOVAL CM event it will bail-outwithout doing dma unmapping.

u8iscsi_iser_check_protection(structiscsi_task*task,sector_t*sector)

check protection information status of task.

Parameters

structiscsi_task*task

iscsi task

sector_t*sector

error sector if exsists (output)

Return

zero if no data-integrity errors have occurred0x1: data-integrity error occurred in the guard-block0x2: data-integrity error occurred in the reference tag0x3: data-integrity error occurred in the application tag

Description

In addition the error sector is marked.

structiscsi_cls_conn*iscsi_iser_conn_create(structiscsi_cls_session*cls_session,uint32_tconn_idx)

create a new iscsi-iser connection

Parameters

structiscsi_cls_session*cls_session

iscsi class connection

uint32_tconn_idx

connection index within the session (for MCS)

Return

iscsi_cls_conn when iscsi_conn_setup succeeds or NULLotherwise.

intiscsi_iser_conn_bind(structiscsi_cls_session*cls_session,structiscsi_cls_conn*cls_conn,uint64_ttransport_eph,intis_leading)

bind iscsi and iser connection structures

Parameters

structiscsi_cls_session*cls_session

iscsi class session

structiscsi_cls_conn*cls_conn

iscsi class connection

uint64_ttransport_eph

transport end-point handle

intis_leading

indicate if this is the session leading connection (MCS)

Return

zero on success, $error if iscsi_conn_bind fails and-EINVAL in case end-point doesn’t exists anymore or iser connectionstate is not UP (teardown already started).

intiscsi_iser_conn_start(structiscsi_cls_conn*cls_conn)

start iscsi-iser connection

Parameters

structiscsi_cls_conn*cls_conn

iscsi class connection

Notes

Here iser intialize (or re-initialize) stop_completion as

from this point iscsi must call conn_stop in session/connectionteardown so iser transport must wait for it.

voidiscsi_iser_conn_stop(structiscsi_cls_conn*cls_conn,intflag)

stop iscsi-iser connection

Parameters

structiscsi_cls_conn*cls_conn

iscsi class connection

intflag

indicate if recover or terminate (passed as is)

Notes

Calling iscsi_conn_stop might theoretically race with

DEVICE_REMOVAL event and dereference a previously freed RDMA devicehandle, so we call it under iser the state lock to protect againstthis kind of race.

voidiscsi_iser_session_destroy(structiscsi_cls_session*cls_session)

destroy iscsi-iser session

Parameters

structiscsi_cls_session*cls_session

iscsi class session

Description

Removes and free iscsi host.

structiscsi_cls_session*iscsi_iser_session_create(structiscsi_endpoint*ep,uint16_tcmds_max,uint16_tqdepth,uint32_tinitial_cmdsn)

create an iscsi-iser session

Parameters

structiscsi_endpoint*ep

iscsi end-point handle

uint16_tcmds_max

maximum commands in this session

uint16_tqdepth

session command queue depth

uint32_tinitial_cmdsn

initiator command sequnce number

Description

Allocates and adds a scsi host, expose DIF supprot ifexists, and sets up an iscsi session.

structiscsi_endpoint*iscsi_iser_ep_connect(structScsi_Host*shost,structsockaddr*dst_addr,intnon_blocking)

Initiate iSER connection establishment

Parameters

structScsi_Host*shost

scsi_host

structsockaddr*dst_addr

destination address

intnon_blocking

indicate if routine can block

Description

Allocate an iscsi endpoint, an iser_conn structure and bind them.After that start RDMA connection establishment via rdma_cm. Wedon’t allocate iser_conn embedded in iscsi_endpoint since in teardownthe endpoint will be destroyed at ep_disconnect while iser_conn willcleanup its resources asynchronuously.

Return

iscsi_endpoint created by iscsi layer or ERR_PTR(error)if fails.

intiscsi_iser_ep_poll(structiscsi_endpoint*ep,inttimeout_ms)

poll for iser connection establishment to complete

Parameters

structiscsi_endpoint*ep

iscsi endpoint (created at ep_connect)

inttimeout_ms

polling timeout allowed in ms.

Description

This routine boils down to waiting for up_completion signalingthat cma_id got CONNECTED event.

Return

1 if succeeded in connection establishment, 0 if timeout expired(libiscsi will retry will kick in) or -1 if interrupted by signalor more likely iser connection state transitioned to TEMINATING orDOWN during the wait period.

voidiscsi_iser_ep_disconnect(structiscsi_endpoint*ep)

Initiate connection teardown process

Parameters

structiscsi_endpoint*ep

iscsi endpoint handle

Description

This routine is not blocked by iser and RDMA termination processcompletion as we queue a deffered work for iser/RDMA destructionand cleanup or actually call it immediately in case we didn’t passiscsi conn bind/start stage, thus it is safe.

intiser_send_command(structiscsi_conn*conn,structiscsi_task*task)

send command PDU

Parameters

structiscsi_conn*conn

link to matching iscsi connection

structiscsi_task*task

SCSI command task

intiser_send_data_out(structiscsi_conn*conn,structiscsi_task*task,structiscsi_data*hdr)

send data out PDU

Parameters

structiscsi_conn*conn

link to matching iscsi connection

structiscsi_task*task

SCSI command task

structiscsi_data*hdr

pointer to the LLD’s iSCSI message header

intiser_alloc_fastreg_pool(structib_conn*ib_conn,unsignedcmds_max,unsignedintsize)

Creates pool of fast_reg descriptors for fast registration work requests.

Parameters

structib_conn*ib_conn

connection RDMA resources

unsignedcmds_max

max number of SCSI commands for this connection

unsignedintsize

max number of pages per map request

Return

0 on success, or errno code on failure

voidiser_free_fastreg_pool(structib_conn*ib_conn)

releases the pool of fast_reg descriptors

Parameters

structib_conn*ib_conn

connection RDMA resources

voidiser_free_ib_conn_res(structiser_conn*iser_conn,booldestroy)

release IB related resources

Parameters

structiser_conn*iser_conn

iser connection struct

booldestroy

indicator if we need to try to release theiser device and memory regoins pool (only iscsishutdown and DEVICE_REMOVAL will use this).

Description

This routine is called with the iser state mutex heldso the cm_id removal is out of here. It is Safe tobe invoked multiple times.

voidiser_conn_release(structiser_conn*iser_conn)

Frees all conn objects and deallocs conn descriptor

Parameters

structiser_conn*iser_conn

iSER connection context

intiser_conn_terminate(structiser_conn*iser_conn)

triggers start of the disconnect procedures and waits for them to be done

Parameters

structiser_conn*iser_conn

iSER connection context

Description

Called with state mutex held

intiser_post_send(structib_conn*ib_conn,structiser_tx_desc*tx_desc)

Initiate a Send DTO operation

Parameters

structib_conn*ib_conn

connection RDMA resources

structiser_tx_desc*tx_desc

iSER TX descriptor

Return

0 on success, -1 on failure

Omni-Path (OPA) Virtual NIC support

structopa_vnic_ctrl_port

OPA virtual NIC control port

Definition:

struct opa_vnic_ctrl_port {    struct ib_device           *ibdev;    struct opa_vnic_ctrl_ops   *ops;    u8 num_ports;};

Members

ibdev

pointer to ib device

ops

opa vnic control operations

num_ports

number of opa ports

structopa_vnic_adapter

OPA VNIC netdev private data structure

Definition:

struct opa_vnic_adapter {    struct net_device             *netdev;    struct ib_device              *ibdev;    struct opa_vnic_ctrl_port     *cport;    const struct net_device_ops   *rn_ops;    u8 port_num;    u8 vport_num;    struct mutex lock;    struct __opa_veswport_info  info;    u8 vema_mac_addr[ETH_ALEN];    u32 umac_hash;    u32 mmac_hash;    struct hlist_head    *mactbl;    struct mutex mactbl_lock;    spinlock_t stats_lock;    u8 flow_tbl[OPA_VNIC_FLOW_TBL_SIZE];    unsigned long trap_timeout;    u8 trap_count;};

Members

netdev

pointer to associated netdev

ibdev

ib device

cport

pointer to opa vnic control port

rn_ops

rdma netdev’s net_device_ops

port_num

OPA port number

vport_num

vesw port number

lock

adapter lock

info

virtual ethernet switch port information

vema_mac_addr

mac address configured by vema

umac_hash

unicast maclist hash

mmac_hash

multicast maclist hash

mactbl

hash table of MAC entries

mactbl_lock

mac table lock

stats_lock

statistics lock

flow_tbl

flow to default port redirection table

trap_timeout

trap timeout

trap_count

no. of traps allowed within timeout period

structopa_vnic_mac_tbl_node

OPA VNIC mac table node

Definition:

struct opa_vnic_mac_tbl_node {    struct hlist_node                    hlist;    u16 index;    struct __opa_vnic_mactable_entry     entry;};

Members

hlist

hash list handle

index

index of entry in the mac table

entry

entry in the table

structopa_vesw_info

OPA vnic switch information

Definition:

struct opa_vesw_info {    __be16 fabric_id;    __be16 vesw_id;    u8 rsvd0[6];    __be16 def_port_mask;    u8 rsvd1[2];    __be16 pkey;    u8 rsvd2[4];    __be32 u_mcast_dlid;    __be32 u_ucast_dlid[OPA_VESW_MAX_NUM_DEF_PORT];    __be32 rc;    u8 rsvd3[56];    __be16 eth_mtu;    u8 rsvd4[2];};

Members

fabric_id

10-bit fabric id

vesw_id

12-bit virtual ethernet switch id

rsvd0

reserved bytes

def_port_mask

bitmask of default ports

rsvd1

reserved bytes

pkey

partition key

rsvd2

reserved bytes

u_mcast_dlid

unknown multicast dlid

u_ucast_dlid

array of unknown unicast dlids

rc

routing control

rsvd3

reserved bytes

eth_mtu

Ethernet MTU

rsvd4

reserved bytes

structopa_per_veswport_info

OPA vnic per port information

Definition:

struct opa_per_veswport_info {    __be32 port_num;    u8 eth_link_status;    u8 rsvd0[3];    u8 base_mac_addr[ETH_ALEN];    u8 config_state;    u8 oper_state;    __be16 max_mac_tbl_ent;    __be16 max_smac_ent;    __be32 mac_tbl_digest;    u8 rsvd1[4];    __be32 encap_slid;    u8 pcp_to_sc_uc[OPA_VNIC_MAX_NUM_PCP];    u8 pcp_to_vl_uc[OPA_VNIC_MAX_NUM_PCP];    u8 pcp_to_sc_mc[OPA_VNIC_MAX_NUM_PCP];    u8 pcp_to_vl_mc[OPA_VNIC_MAX_NUM_PCP];    u8 non_vlan_sc_uc;    u8 non_vlan_vl_uc;    u8 non_vlan_sc_mc;    u8 non_vlan_vl_mc;    u8 rsvd2[48];    __be16 uc_macs_gen_count;    __be16 mc_macs_gen_count;    u8 rsvd3[8];};

Members

port_num

port number

eth_link_status

current ethernet link state

rsvd0

reserved bytes

base_mac_addr

base mac address

config_state

configured port state

oper_state

operational port state

max_mac_tbl_ent

max number of mac table entries

max_smac_ent

max smac entries in mac table

mac_tbl_digest

mac table digest

rsvd1

reserved bytes

encap_slid

base slid for the port

pcp_to_sc_uc

sc by pcp index for unicast ethernet packets

pcp_to_vl_uc

vl by pcp index for unicast ethernet packets

pcp_to_sc_mc

sc by pcp index for multicast ethernet packets

pcp_to_vl_mc

vl by pcp index for multicast ethernet packets

non_vlan_sc_uc

sc for non-vlan unicast ethernet packets

non_vlan_vl_uc

vl for non-vlan unicast ethernet packets

non_vlan_sc_mc

sc for non-vlan multicast ethernet packets

non_vlan_vl_mc

vl for non-vlan multicast ethernet packets

rsvd2

reserved bytes

uc_macs_gen_count

generation count for unicast macs list

mc_macs_gen_count

generation count for multicast macs list

rsvd3

reserved bytes

structopa_veswport_info

OPA vnic port information

Definition:

struct opa_veswport_info {    struct opa_vesw_info          vesw;    struct opa_per_veswport_info  vport;};

Members

vesw

OPA vnic switch information

vport

OPA vnic per port information

Description

On host, each of the virtual ethernet ports belongsto a different virtual ethernet switches.

structopa_veswport_mactable_entry

single entry in the forwarding table

Definition:

struct opa_veswport_mactable_entry {    u8 mac_addr[ETH_ALEN];    u8 mac_addr_mask[ETH_ALEN];    __be32 dlid_sd;};

Members

mac_addr

MAC address

mac_addr_mask

MAC address bit mask

dlid_sd

Matching DLID and side data

Description

On the host each virtual ethernet port will havea forwarding table. These tables are used tomap a MAC to a LID and other data. For moredetails seestructopa_veswport_mactable_entries.This is the structure of a single mactable entry

structopa_veswport_mactable

Forwarding table array

Definition:

struct opa_veswport_mactable {    __be16 offset;    __be16 num_entries;    __be32 mac_tbl_digest;    struct opa_veswport_mactable_entry  tbl_entries[];};

Members

offset

mac table starting offset

num_entries

Number of entries to get or set

mac_tbl_digest

mac table digest

tbl_entries

Array of table entries

Description

The EM sends down this structure in a MAD indicatingthe starting offset in the forwarding table that thisentry is to be loaded into and the number of entriesthat that this MAD instance containsThe mac_tbl_digest has been added to this MAD structure. It will be set bythe EM and it will be used by the EM to check if there are anydiscrepancies with this value and the valuemaintained by the EM in the case of VNIC port being deleted or unloadedA new instantiation of a VNIC will always have a value of zero.This value is stored as part of the vnic adapter structure and will beaccessed by the GET and SET routines for both the mactable entries and theveswport info.

structopa_veswport_summary_counters

summary counters

Definition:

struct opa_veswport_summary_counters {    __be16 vp_instance;    __be16 vesw_id;    __be32 veswport_num;    __be64 tx_errors;    __be64 rx_errors;    __be64 tx_packets;    __be64 rx_packets;    __be64 tx_bytes;    __be64 rx_bytes;    __be64 tx_unicast;    __be64 tx_mcastbcast;    __be64 tx_untagged;    __be64 tx_vlan;    __be64 tx_64_size;    __be64 tx_65_127;    __be64 tx_128_255;    __be64 tx_256_511;    __be64 tx_512_1023;    __be64 tx_1024_1518;    __be64 tx_1519_max;    __be64 rx_unicast;    __be64 rx_mcastbcast;    __be64 rx_untagged;    __be64 rx_vlan;    __be64 rx_64_size;    __be64 rx_65_127;    __be64 rx_128_255;    __be64 rx_256_511;    __be64 rx_512_1023;    __be64 rx_1024_1518;    __be64 rx_1519_max;    __be64 reserved[16];};

Members

vp_instance

vport instance on the OPA port

vesw_id

virtual ethernet switch id

veswport_num

virtual ethernet switch port number

tx_errors

transmit errors

rx_errors

receive errors

tx_packets

transmit packets

rx_packets

receive packets

tx_bytes

transmit bytes

rx_bytes

receive bytes

tx_unicast

unicast packets transmitted

tx_mcastbcast

multicast/broadcast packets transmitted

tx_untagged

non-vlan packets transmitted

tx_vlan

vlan packets transmitted

tx_64_size

transmit packet length is 64 bytes

tx_65_127

transmit packet length is >=65 and < 127 bytes

tx_128_255

transmit packet length is >=128 and < 255 bytes

tx_256_511

transmit packet length is >=256 and < 511 bytes

tx_512_1023

transmit packet length is >=512 and < 1023 bytes

tx_1024_1518

transmit packet length is >=1024 and < 1518 bytes

tx_1519_max

transmit packet length >= 1519 bytes

rx_unicast

unicast packets received

rx_mcastbcast

multicast/broadcast packets received

rx_untagged

non-vlan packets received

rx_vlan

vlan packets received

rx_64_size

received packet length is 64 bytes

rx_65_127

received packet length is >=65 and < 127 bytes

rx_128_255

received packet length is >=128 and < 255 bytes

rx_256_511

received packet length is >=256 and < 511 bytes

rx_512_1023

received packet length is >=512 and < 1023 bytes

rx_1024_1518

received packet length is >=1024 and < 1518 bytes

rx_1519_max

received packet length >= 1519 bytes

reserved

reserved bytes

Description

All the above are counters of corresponding conditions.

structopa_veswport_error_counters

error counters

Definition:

struct opa_veswport_error_counters {    __be16 vp_instance;    __be16 vesw_id;    __be32 veswport_num;    __be64 tx_errors;    __be64 rx_errors;    __be64 rsvd0;    __be64 tx_smac_filt;    __be64 rsvd1;    __be64 rsvd2;    __be64 rsvd3;    __be64 tx_dlid_zero;    __be64 rsvd4;    __be64 tx_logic;    __be64 rsvd5;    __be64 tx_drop_state;    __be64 rx_bad_veswid;    __be64 rsvd6;    __be64 rx_runt;    __be64 rx_oversize;    __be64 rsvd7;    __be64 rx_eth_down;    __be64 rx_drop_state;    __be64 rx_logic;    __be64 rsvd8;    __be64 rsvd9[16];};

Members

vp_instance

vport instance on the OPA port

vesw_id

virtual ethernet switch id

veswport_num

virtual ethernet switch port number

tx_errors

transmit errors

rx_errors

receive errors

rsvd0

reserved bytes

tx_smac_filt

smac filter errors

rsvd1

reserved bytes

rsvd2

reserved bytes

rsvd3

reserved bytes

tx_dlid_zero

transmit packets with invalid dlid

rsvd4

reserved bytes

tx_logic

other transmit errors

rsvd5

reserved bytes

tx_drop_state

packet tansmission in non-forward port state

rx_bad_veswid

received packet with invalid vesw id

rsvd6

reserved bytes

rx_runt

received ethernet packet with length < 64 bytes

rx_oversize

received ethernet packet with length > MTU size

rsvd7

reserved bytes

rx_eth_down

received packets when interface is down

rx_drop_state

received packets in non-forwarding port state

rx_logic

other receive errors

rsvd8

reserved bytes

rsvd9

reserved bytes

Description

All the above are counters of corresponding error conditions.

structopa_veswport_trap

Trap message sent to EM by VNIC

Definition:

struct opa_veswport_trap {    __be16 fabric_id;    __be16 veswid;    __be32 veswportnum;    __be16 opaportnum;    u8 veswportindex;    u8 opcode;    __be32 reserved;};

Members

fabric_id

10 bit fabric id

veswid

12 bit virtual ethernet switch id

veswportnum

logical port number on the Virtual switch

opaportnum

physical port num (redundant on host)

veswportindex

switch port index on opa port 0 based

opcode

operation

reserved

32 bit for alignment

Description

The VNIC will send trap messages to the Ethernet manager toinform it about changes to the VNIC config, behaviour etc.This is the format of the trap payload.

structopa_vnic_iface_mac_entry

single entry in the mac list

Definition:

struct opa_vnic_iface_mac_entry {    u8 mac_addr[ETH_ALEN];};

Members

mac_addr

MAC address

structopa_veswport_iface_macs

Msg to set globally administered MAC

Definition:

struct opa_veswport_iface_macs {    __be16 start_idx;    __be16 num_macs_in_msg;    __be16 tot_macs_in_lst;    __be16 gen_count;    struct opa_vnic_iface_mac_entry entry[];};

Members

start_idx

position of first entry (0 based)

num_macs_in_msg

number of MACs in this message

tot_macs_in_lst

The total number of MACs the agent has

gen_count

gen_count to indicate change

entry

The mac list entry

Description

Same attribute IDS and attribute modifiers as in locally administeredaddresses used to set globally administered addresses

structopa_vnic_vema_mad

Generic VEMA MAD

Definition:

struct opa_vnic_vema_mad {    struct ib_mad_hdr  mad_hdr;    struct ib_rmpp_hdr rmpp_hdr;    u8 reserved;    u8 oui[3];    u8 data[OPA_VNIC_EMA_DATA];};

Members

mad_hdr

Generic MAD header

rmpp_hdr

RMPP header for vendor specific MADs

reserved

reserved bytes

oui

Unique org identifier

data

MAD data

structopa_vnic_notice_attr

Generic Notice MAD

Definition:

struct opa_vnic_notice_attr {    u8 gen_type;    u8 oui_1;    u8 oui_2;    u8 oui_3;    __be16 trap_num;    __be16 toggle_count;    __be32 issuer_lid;    __be32 reserved;    u8 issuer_gid[16];    u8 raw_data[64];};

Members

gen_type

Generic/Specific bit and type of notice

oui_1

Vendor ID byte 1

oui_2

Vendor ID byte 2

oui_3

Vendor ID byte 3

trap_num

Trap number

toggle_count

Notice toggle bit and count value

issuer_lid

Trap issuer’s lid

reserved

reserved bytes

issuer_gid

Issuer GID (only if Report method)

raw_data

Trap message body

structopa_vnic_vema_mad_trap

Generic VEMA MAD Trap

Definition:

struct opa_vnic_vema_mad_trap {    struct ib_mad_hdr            mad_hdr;    struct ib_rmpp_hdr           rmpp_hdr;    u8 reserved;    u8 oui[3];    struct opa_vnic_notice_attr  notice;};

Members

mad_hdr

Generic MAD header

rmpp_hdr

RMPP header for vendor specific MADs

reserved

reserved bytes

oui

Unique org identifier

notice

Notice structure

voidopa_vnic_vema_report_event(structopa_vnic_adapter*adapter,u8event)

sent trap to report the specified event

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

u8event

event to be reported

Description

This function calls vema api to sent a trap for the given event.

voidopa_vnic_get_summary_counters(structopa_vnic_adapter*adapter,structopa_veswport_summary_counters*cntrs)

get summary counters

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_veswport_summary_counters*cntrs

pointer to destination summary counters structure

Description

This function populates the summary counters that is maintained by thegiven adapter to destination address provided.

voidopa_vnic_get_error_counters(structopa_vnic_adapter*adapter,structopa_veswport_error_counters*cntrs)

get error counters

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_veswport_error_counters*cntrs

pointer to destination error counters structure

Description

This function populates the error counters that is maintained by thegiven adapter to destination address provided.

voidopa_vnic_get_vesw_info(structopa_vnic_adapter*adapter,structopa_vesw_info*info)
  • Get the vesw information

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_vesw_info*info

pointer to destination vesw info structure

Description

This function copies the vesw info that is maintained by thegiven adapter to destination address provided.

voidopa_vnic_set_vesw_info(structopa_vnic_adapter*adapter,structopa_vesw_info*info)
  • Set the vesw information

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_vesw_info*info

pointer to vesw info structure

Description

This function updates the vesw info that is maintained by thegiven adapter with vesw info provided. Reserved fields are storedand returned back to EM as is.

voidopa_vnic_get_per_veswport_info(structopa_vnic_adapter*adapter,structopa_per_veswport_info*info)
  • Get the vesw per port information

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_per_veswport_info*info

pointer to destination vport info structure

Description

This function copies the vesw per port info that is maintained by thegiven adapter to destination address provided.Note that the read only fields are not copied.

voidopa_vnic_set_per_veswport_info(structopa_vnic_adapter*adapter,structopa_per_veswport_info*info)
  • Set vesw per port information

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_per_veswport_info*info

pointer to vport info structure

Description

This function updates the vesw per port info that is maintained by thegiven adapter with vesw per port info provided. Reserved fields arestored and returned back to EM as is.

voidopa_vnic_query_mcast_macs(structopa_vnic_adapter*adapter,structopa_veswport_iface_macs*macs)

query multicast mac list

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_veswport_iface_macs*macs

pointer mac list

Description

This function populates the provided mac list with the configuredmulticast addresses in the adapter.

voidopa_vnic_query_ucast_macs(structopa_vnic_adapter*adapter,structopa_veswport_iface_macs*macs)

query unicast mac list

Parameters

structopa_vnic_adapter*adapter

vnic port adapter

structopa_veswport_iface_macs*macs

pointer mac list

Description

This function populates the provided mac list with the configuredunicast addresses in the adapter.

structopa_vnic_vema_port
  • VNIC VEMA port details

Definition:

struct opa_vnic_vema_port {    struct opa_vnic_ctrl_port      *cport;    struct ib_mad_agent            *mad_agent;    struct opa_class_port_info      class_port_info;    u64 tid;    u8 port_num;    struct xarray                   vports;    struct ib_event_handler         event_handler;    struct mutex                    lock;};

Members

cport

pointer to port

mad_agent

pointer to mad agent for port

class_port_info

Class port info information.

tid

Transaction id

port_num

OPA port number

vports

vnic ports

event_handler

ib event handler

lock

adapter interface lock

u8vema_get_vport_num(structopa_vnic_vema_mad*recvd_mad)
  • Get the vnic from the mad

Parameters

structopa_vnic_vema_mad*recvd_mad

Received mad

Return

returns value of the vnic port number

structopa_vnic_adapter*vema_get_vport_adapter(structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_port*port)
  • Get vnic port adapter from recvd mad

Parameters

structopa_vnic_vema_mad*recvd_mad

received mad

structopa_vnic_vema_port*port

ptr to portstructon which MAD was recvd

Return

vnic adapter

boolvema_mac_tbl_req_ok(structopa_veswport_mactable*mac_tbl)
  • Check if mac request has correct values

Parameters

structopa_veswport_mactable*mac_tbl

mac table

Description

This function checks for the validity of the offset and number ofentries required.

Return

true if offset and num_entries are valid

structopa_vnic_adapter*vema_add_vport(structopa_vnic_vema_port*port,u8vport_num)
  • Add a new vnic port

Parameters

structopa_vnic_vema_port*port

ptr to opa_vnic_vema_port struct

u8vport_num

vnic port number (to be added)

Description

Return a pointer to the vnic adapter structure

voidvema_get_class_port_info(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Get class info for port

Parameters

structopa_vnic_vema_port*port

Port on whic MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function copies the latest class port info value set for theport and stores it for generating traps

voidvema_set_class_port_info(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Get class info for port

Parameters

structopa_vnic_vema_port*port

Port on whic MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function updates the port class info for the specific vnicand sets up the response mad data

voidvema_get_veswport_info(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Get veswport info

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

voidvema_set_veswport_info(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Set veswport info

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function gets the port class infor for vnic

voidvema_get_mac_entries(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Get MAC entries in VNIC MAC table

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function gets the MAC entries that are programmed intothe VNIC MAC forwarding table. It checks for the validity ofthe index into the MAC table and the number of entries thatare to be retrieved.

voidvema_set_mac_entries(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Set MAC entries in VNIC MAC table

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function sets the MAC entries in the VNIC forwarding tableIt checks for the validity of the index and the number of forwardingtable entries to be programmed.

voidvema_set_delete_vesw(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Reset VESW info to POD values

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

pointer to the received mad

structopa_vnic_vema_mad*rsp_mad

pointer to respose mad

Description

This function clears all the fields of veswport info for the requested veswand sets them back to the power-on default values. It does not delete thevesw.

voidvema_get_mac_list(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad,u16attr_id)
  • Get the unicast/multicast macs.

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

Received mad contains fields to set vnic parameters

structopa_vnic_vema_mad*rsp_mad

Response mad to be built

u16attr_id

Attribute ID indicating multicast or unicast mac list

voidvema_get_summary_counters(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Gets summary counters.

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

Received mad contains fields to set vnic parameters

structopa_vnic_vema_mad*rsp_mad

Response mad to be built

voidvema_get_error_counters(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Gets summary counters.

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

Received mad contains fields to set vnic parameters

structopa_vnic_vema_mad*rsp_mad

Response mad to be built

voidvema_get(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Process received get MAD

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

Received mad

structopa_vnic_vema_mad*rsp_mad

Response mad to be built

voidvema_set(structopa_vnic_vema_port*port,structopa_vnic_vema_mad*recvd_mad,structopa_vnic_vema_mad*rsp_mad)
  • Process received set MAD

Parameters

structopa_vnic_vema_port*port

source port on which MAD was received

structopa_vnic_vema_mad*recvd_mad

Received mad contains fields to set vnic parameters

structopa_vnic_vema_mad*rsp_mad

Response mad to be built

voidvema_send(structib_mad_agent*mad_agent,structib_mad_send_wc*mad_wc)
  • Send handler for VEMA MAD agent

Parameters

structib_mad_agent*mad_agent

pointer to the mad agent

structib_mad_send_wc*mad_wc

pointer to mad send work completion information

Description

Free all the data structures associated with the sent MAD

voidvema_recv(structib_mad_agent*mad_agent,structib_mad_send_buf*send_buf,structib_mad_recv_wc*mad_wc)
  • Recv handler for VEMA MAD agent

Parameters

structib_mad_agent*mad_agent

pointer to the mad agent

structib_mad_send_buf*send_buf

Send buffer if found, else NULL

structib_mad_recv_wc*mad_wc

pointer to mad send work completion information

Description

Handle only set and get methods and respond to other methodsas unsupported. Allocate response buffer and address handlefor the response MAD.

structopa_vnic_vema_port*vema_get_port(structopa_vnic_ctrl_port*cport,u8port_num)
  • Gets the opa_vnic_vema_port

Parameters

structopa_vnic_ctrl_port*cport

pointer to control dev

u8port_num

Port number

Description

This function loops through the ports and returnsthe opa_vnic_vema port structure that is associatedwith the OPA port number

Return

ptr to requested opa_vnic_vema_port strucureif success, NULL if not

voidopa_vnic_vema_send_trap(structopa_vnic_adapter*adapter,struct__opa_veswport_trap*data,u32lid)
  • This function sends a trap to the EM

Parameters

structopa_vnic_adapter*adapter

pointer to vnic adapter

struct__opa_veswport_trap*data

pointer to trap data filled by calling function

u32lid

issuers lid (encap_slid from vesw_port_info)

Description

This function is called from the VNIC driver to send a trap if thereis somethng the EM should be notified about. These events currentlyare1) UNICAST INTERFACE MACADDRESS changes2) MULTICAST INTERFACE MACADDRESS changes3) ETHERNET LINK STATUS changesWhile allocating the send mad the remote site qpn used is 1as this is the well known QP.

voidvema_unregister(structopa_vnic_ctrl_port*cport)
  • Unregisters agent

Parameters

structopa_vnic_ctrl_port*cport

pointer to control port

Description

This deletes the registration by VEMA for MADs

intvema_register(structopa_vnic_ctrl_port*cport)
  • Registers agent

Parameters

structopa_vnic_ctrl_port*cport

pointer to control port

Description

This function registers the handlers for the VEMA MADs

Return

returns 0 on success. non zero otherwise

voidopa_vnic_ctrl_config_dev(structopa_vnic_ctrl_port*cport,boolen)
  • This function sends a trap to the EM by way of ib_modify_port to indicate support for ethernet on the fabric.

Parameters

structopa_vnic_ctrl_port*cport

pointer to control port

boolen

enable or disable ethernet on fabric support

intopa_vnic_vema_add_one(structib_device*device)
  • Handle new ib device

Parameters

structib_device*device

ib device pointer

Description

Allocate the vnic control port and initialize it.

voidopa_vnic_vema_rem_one(structib_device*device,void*client_data)
  • Handle ib device removal

Parameters

structib_device*device

ib device pointer

void*client_data

ib client data

Description

Uninitialize and free the vnic control port.

InfiniBand SCSI RDMA protocol target support

enumsrpt_command_state

SCSI command state managed by SRPT

Constants

SRPT_STATE_NEW

New command arrived and is being processed.

SRPT_STATE_NEED_DATA

Processing a write or bidir command and waitingfor data arrival.

SRPT_STATE_DATA_IN

Data for the write or bidir command arrived and isbeing processed.

SRPT_STATE_CMD_RSP_SENT

SRP_RSP for SRP_CMD has been sent.

SRPT_STATE_MGMT

Processing a SCSI task management command.

SRPT_STATE_MGMT_RSP_SENT

SRP_RSP for SRP_TSK_MGMT has been sent.

SRPT_STATE_DONE

Command processing finished successfully, commandprocessing has been aborted or command processingfailed.

structsrpt_ioctx

shared SRPT I/O context information

Definition:

struct srpt_ioctx {    struct ib_cqe           cqe;    void *buf;    dma_addr_t dma;    uint32_t offset;    uint32_t index;};

Members

cqe

Completion queue element.

buf

Pointer to the buffer.

dma

DMA address of the buffer.

offset

Offset of the first byte inbuf anddma that is actually used.

index

Index of the I/O context in its ioctx_ring array.

structsrpt_recv_ioctx

SRPT receive I/O context

Definition:

struct srpt_recv_ioctx {    struct srpt_ioctx       ioctx;    struct list_head        wait_list;    int byte_len;};

Members

ioctx

See above.

wait_list

Node for insertion in srpt_rdma_ch.cmd_wait_list.

byte_len

Number of bytes inioctx.buf.

structsrpt_send_ioctx

SRPT send I/O context

Definition:

struct srpt_send_ioctx {    struct srpt_ioctx       ioctx;    struct srpt_rdma_ch     *ch;    struct srpt_recv_ioctx  *recv_ioctx;    struct srpt_rw_ctx      s_rw_ctx;    struct srpt_rw_ctx      *rw_ctxs;    struct scatterlist      imm_sg;    struct ib_cqe           rdma_cqe;    enum srpt_command_state state;    struct se_cmd           cmd;    u8 n_rdma;    u8 n_rw_ctx;    bool queue_status_only;    u8 sense_data[TRANSPORT_SENSE_BUFFER];};

Members

ioctx

See above.

ch

Channel pointer.

recv_ioctx

Receive I/O context associated with this send I/O context.Only used for processing immediate data.

s_rw_ctx

rw_ctxs points here if only a single rw_ctx is needed.

rw_ctxs

RDMA read/write contexts.

imm_sg

Scatterlist for immediate data.

rdma_cqe

RDMA completion queue element.

state

I/O context state.

cmd

Target core command data structure.

n_rdma

Number of work requests needed to transfer this ioctx.

n_rw_ctx

Size of rw_ctxs array.

queue_status_only

Send a SCSI status back to the initiator but no data.

sense_data

Sense data to be sent to the initiator.

enumrdma_ch_state

SRP channel state

Constants

CH_CONNECTING

QP is in RTR state; waiting for RTU.

CH_LIVE

QP is in RTS state.

CH_DISCONNECTING

DREQ has been sent and waiting for DREP or DREQ hasbeen received.

CH_DRAINING

DREP has been received or waiting for DREP timed outand last work request has been queued.

CH_DISCONNECTED

Last completion has been received.

structsrpt_rdma_ch

RDMA channel

Definition:

struct srpt_rdma_ch {    struct srpt_nexus       *nexus;    struct ib_qp            *qp;    union {        struct {            struct ib_cm_id         *cm_id;        } ib_cm;        struct {            struct rdma_cm_id       *cm_id;        } rdma_cm;    };    struct ib_cq            *cq;    u32 cq_size;    struct ib_cqe           zw_cqe;    struct rcu_head         rcu;    struct kref             kref;    struct completion       *closed;    int rq_size;    u32 max_rsp_size;    atomic_t sq_wr_avail;    struct srpt_port        *sport;    int max_ti_iu_len;    atomic_t req_lim;    atomic_t req_lim_delta;    u16 imm_data_offset;    spinlock_t spinlock;    enum rdma_ch_state      state;    struct kmem_cache       *rsp_buf_cache;    struct srpt_send_ioctx  **ioctx_ring;    struct kmem_cache       *req_buf_cache;    struct srpt_recv_ioctx  **ioctx_recv_ring;    struct list_head        list;    struct list_head        cmd_wait_list;    uint16_t pkey;    bool using_rdma_cm;    bool processing_wait_list;    struct se_session       *sess;    u8 sess_name[40];    struct work_struct      release_work;};

Members

nexus

I_T nexus this channel is associated with.

qp

IB queue pair used for communicating over this channel.

{unnamed_union}

anonymous

ib_cm

See below.

ib_cm.cm_id

IB CM ID associated with the channel.

rdma_cm

See below.

rdma_cm.cm_id

RDMA CM ID associated with the channel.

cq

IB completion queue for this channel.

cq_size

Number of CQEs incq.

zw_cqe

Zero-length write CQE.

rcu

RCU head.

kref

kref for this channel.

closed

Completion object that will be signaled as soon as a newchannel object with the same identity can be created.

rq_size

IB receive queue size.

max_rsp_size

Maximum size of an RSP response message in bytes.

sq_wr_avail

number of work requests available in the send queue.

sport

pointer to the information of the HCA port used by thischannel.

max_ti_iu_len

maximum target-to-initiator information unit length.

req_lim

request limit: maximum number of requests that may be sentby the initiator without having received a response.

req_lim_delta

Number of credits not yet sent back to the initiator.

imm_data_offset

Offset from start of SRP_CMD for immediate data.

spinlock

Protects free_list and state.

state

channel state. See alsoenumrdma_ch_state.

rsp_buf_cache

kmem_cache forioctx_ring.

ioctx_ring

Send ring.

req_buf_cache

kmem_cache forioctx_recv_ring.

ioctx_recv_ring

Receive I/O context ring.

list

Node in srpt_nexus.ch_list.

cmd_wait_list

List of SCSI commands that arrived before the RTU event. Thislist containsstructsrpt_ioctx elements and is protectedagainst concurrent modification by the cm_id spinlock.

pkey

P_Key of the IB partition for this SRP channel.

using_rdma_cm

Whether the RDMA/CM or IB/CM is used for this channel.

processing_wait_list

Whether or not cmd_wait_list is being processed.

sess

Session information associated with this SRP channel.

sess_name

Session name.

release_work

Allows scheduling ofsrpt_release_channel().

structsrpt_nexus

I_T nexus

Definition:

struct srpt_nexus {    struct rcu_head         rcu;    struct list_head        entry;    struct list_head        ch_list;    u8 i_port_id[16];    u8 t_port_id[16];};

Members

rcu

RCU head for this data structure.

entry

srpt_port.nexus_list list node.

ch_list

structsrpt_rdma_ch list. Protected by srpt_port.mutex.

i_port_id

128-bit initiator port identifier copied from SRP_LOGIN_REQ.

t_port_id

128-bit target port identifier copied from SRP_LOGIN_REQ.

structsrpt_port_attrib

attributes for SRPT port

Definition:

struct srpt_port_attrib {    u32 srp_max_rdma_size;    u32 srp_max_rsp_size;    u32 srp_sq_size;    bool use_srq;};

Members

srp_max_rdma_size

Maximum size of SRP RDMA transfers for new connections.

srp_max_rsp_size

Maximum size of SRP response messages in bytes.

srp_sq_size

Shared receive queue (SRQ) size.

use_srq

Whether or not to use SRQ.

structsrpt_tpg

information about a single “target portal group”

Definition:

struct srpt_tpg {    struct list_head        entry;    struct srpt_port_id     *sport_id;    struct se_portal_group  tpg;};

Members

entry

Entry insport_id->tpg_list.

sport_id

Port name this TPG is associated with.

tpg

LIO TPG data structure.

Description

Zero or more target portal groups are associated with each port name(srpt_port_id). With each TPG an ACL list is associated.

structsrpt_port_id

LIO RDMA port information

Definition:

struct srpt_port_id {    struct mutex            mutex;    struct list_head        tpg_list;    struct se_wwn           wwn;    char name[64];};

Members

mutex

Protectstpg_list changes.

tpg_list

TPGs associated with the RDMA port name.

wwn

WWN associated with the RDMA port name.

name

ASCII representation of the port name.

Description

Multiple sysfs directories can be associated with a single RDMA port. Thisdata structure represents a single (port, name) pair.

structsrpt_port

SRPT RDMA port information

Definition:

struct srpt_port {    struct srpt_device      *sdev;    struct ib_mad_agent     *mad_agent;    bool enabled;    u8 port;    u32 sm_lid;    u32 lid;    union ib_gid            gid;    struct work_struct      work;    char guid_name[64];    struct srpt_port_id     *guid_id;    char gid_name[64];    struct srpt_port_id     *gid_id;    struct srpt_port_attrib port_attrib;    atomic_t refcount;    struct completion       *freed_channels;    struct mutex            mutex;    struct list_head        nexus_list;};

Members

sdev

backpointer to the HCA information.

mad_agent

per-port management datagram processing information.

enabled

Whether or not this target port is enabled.

port

one-based port number.

sm_lid

cached value of the port’s sm_lid.

lid

cached value of the port’s lid.

gid

cached value of the port’s gid.

work

work structure for refreshing the aforementioned cached values.

guid_name

port name in GUID format.

guid_id

LIO target port information for the port name in GUID format.

gid_name

port name in GID format.

gid_id

LIO target port information for the port name in GID format.

port_attrib

Port attributes that can be accessed through configfs.

refcount

Number of objects associated with this port.

freed_channels

Completion that will be signaled oncerefcount becomes 0.

mutex

Protects nexus_list.

nexus_list

Nexus list. See also srpt_nexus.entry.

structsrpt_device

information associated by SRPT with a single HCA

Definition:

struct srpt_device {    struct kref             refcnt;    struct ib_device        *device;    struct ib_pd            *pd;    u32 lkey;    struct ib_srq           *srq;    struct ib_cm_id         *cm_id;    int srq_size;    struct mutex            sdev_mutex;    bool use_srq;    struct kmem_cache       *req_buf_cache;    struct srpt_recv_ioctx  **ioctx_ring;    struct ib_event_handler event_handler;    struct list_head        list;    struct srpt_port        port[];};

Members

refcnt

Reference count for this device.

device

Backpointer to thestructib_device managed by the IB core.

pd

IB protection domain.

lkey

L_Key (local key) with write access to all local memory.

srq

Per-HCA SRQ (shared receive queue).

cm_id

Connection identifier.

srq_size

SRQ size.

sdev_mutex

Serializes use_srq changes.

use_srq

Whether or not to use SRQ.

req_buf_cache

kmem_cache forioctx_ring buffers.

ioctx_ring

Per-HCA SRQ.

event_handler

Per-HCA asynchronous IB event handler.

list

Node in srpt_dev_list.

port

Information about the ports owned by this HCA.

voidsrpt_event_handler(structib_event_handler*handler,structib_event*event)

asynchronous IB event callback function

Parameters

structib_event_handler*handler

IB event handler registered byib_register_event_handler().

structib_event*event

Description of the event that occurred.

Description

Callback function called by the InfiniBand core when an asynchronous IBevent occurs. This callback may occur in interrupt context. See alsosection 11.5.2, Set Asynchronous Event Handler in the InfiniBandArchitecture Specification.

voidsrpt_srq_event(structib_event*event,void*ctx)

SRQ event callback function

Parameters

structib_event*event

Description of the event that occurred.

void*ctx

Context pointer specified at SRQ creation time.

voidsrpt_qp_event(structib_event*event,void*ptr)

QP event callback function

Parameters

structib_event*event

Description of the event that occurred.

void*ptr

SRPT RDMA channel.

voidsrpt_set_ioc(u8*c_list,u32slot,u8value)

initialize a IOUnitInfo structure

Parameters

u8*c_list

controller list.

u32slot

one-based slot number.

u8value

four-bit value.

Description

Copies the lowest four bits of value in element slot of the array of fourbit elements called c_list (controller list). The index slot is one-based.

voidsrpt_get_class_port_info(structib_dm_mad*mad)

copy ClassPortInfo to a management datagram

Parameters

structib_dm_mad*mad

Datagram that will be sent as response to DM_ATTR_CLASS_PORT_INFO.

Description

See also section 16.3.3.1 ClassPortInfo in the InfiniBand ArchitectureSpecification.

voidsrpt_get_iou(structib_dm_mad*mad)

write IOUnitInfo to a management datagram

Parameters

structib_dm_mad*mad

Datagram that will be sent as response to DM_ATTR_IOU_INFO.

Description

See also section 16.3.3.3 IOUnitInfo in the InfiniBand ArchitectureSpecification. See also section B.7, table B.6 in the SRP r16a document.

voidsrpt_get_ioc(structsrpt_port*sport,u32slot,structib_dm_mad*mad)

write IOControllerprofile to a management datagram

Parameters

structsrpt_port*sport

HCA port through which the MAD has been received.

u32slot

Slot number specified in DM_ATTR_IOC_PROFILE query.

structib_dm_mad*mad

Datagram that will be sent as response to DM_ATTR_IOC_PROFILE.

Description

See also section 16.3.3.4 IOControllerProfile in the InfiniBandArchitecture Specification. See also section B.7, table B.7 in the SRPr16a document.

voidsrpt_get_svc_entries(u64ioc_guid,u16slot,u8hi,u8lo,structib_dm_mad*mad)

write ServiceEntries to a management datagram

Parameters

u64ioc_guid

I/O controller GUID to use in reply.

u16slot

I/O controller number.

u8hi

End of the range of service entries to be specified in the reply.

u8lo

Start of the range of service entries to be specified in the reply..

structib_dm_mad*mad

Datagram that will be sent as response to DM_ATTR_SVC_ENTRIES.

Description

See also section 16.3.3.5 ServiceEntries in the InfiniBand ArchitectureSpecification. See also section B.7, table B.8 in the SRP r16a document.

voidsrpt_mgmt_method_get(structsrpt_port*sp,structib_mad*rq_mad,structib_dm_mad*rsp_mad)

process a received management datagram

Parameters

structsrpt_port*sp

HCA port through which the MAD has been received.

structib_mad*rq_mad

received MAD.

structib_dm_mad*rsp_mad

response MAD.

voidsrpt_mad_send_handler(structib_mad_agent*mad_agent,structib_mad_send_wc*mad_wc)

MAD send completion callback

Parameters

structib_mad_agent*mad_agent

Return value ofib_register_mad_agent().

structib_mad_send_wc*mad_wc

Work completion reporting that the MAD has been sent.

voidsrpt_mad_recv_handler(structib_mad_agent*mad_agent,structib_mad_send_buf*send_buf,structib_mad_recv_wc*mad_wc)

MAD reception callback function

Parameters

structib_mad_agent*mad_agent

Return value ofib_register_mad_agent().

structib_mad_send_buf*send_buf

Not used.

structib_mad_recv_wc*mad_wc

Work completion reporting that a MAD has been received.

intsrpt_refresh_port(structsrpt_port*sport)

configure a HCA port

Parameters

structsrpt_port*sport

SRPT HCA port.

Description

Enable InfiniBand management datagram processing, update the cached sm_lid,lid and gid values, and register a callback function for processing MADson the specified port.

Note

It is safe to call this function more than once for the same port.

voidsrpt_unregister_mad_agent(structsrpt_device*sdev,intport_cnt)

unregister MAD callback functions

Parameters

structsrpt_device*sdev

SRPT HCA pointer.

intport_cnt

number of ports with registered MAD

Note

It is safe to call this function more than once for the same device.

structsrpt_ioctx*srpt_alloc_ioctx(structsrpt_device*sdev,intioctx_size,structkmem_cache*buf_cache,enumdma_data_directiondir)

allocate a SRPT I/O context structure

Parameters

structsrpt_device*sdev

SRPT HCA pointer.

intioctx_size

I/O context size.

structkmem_cache*buf_cache

I/O buffer cache.

enumdma_data_directiondir

DMA data direction.

voidsrpt_free_ioctx(structsrpt_device*sdev,structsrpt_ioctx*ioctx,structkmem_cache*buf_cache,enumdma_data_directiondir)

free a SRPT I/O context structure

Parameters

structsrpt_device*sdev

SRPT HCA pointer.

structsrpt_ioctx*ioctx

I/O context pointer.

structkmem_cache*buf_cache

I/O buffer cache.

enumdma_data_directiondir

DMA data direction.

structsrpt_ioctx**srpt_alloc_ioctx_ring(structsrpt_device*sdev,intring_size,intioctx_size,structkmem_cache*buf_cache,intalignment_offset,enumdma_data_directiondir)

allocate a ring of SRPT I/O context structures

Parameters

structsrpt_device*sdev

Device to allocate the I/O context ring for.

intring_size

Number of elements in the I/O context ring.

intioctx_size

I/O context size.

structkmem_cache*buf_cache

I/O buffer cache.

intalignment_offset

Offset in each ring buffer at which the SRP informationunit starts.

enumdma_data_directiondir

DMA data direction.

voidsrpt_free_ioctx_ring(structsrpt_ioctx**ioctx_ring,structsrpt_device*sdev,intring_size,structkmem_cache*buf_cache,enumdma_data_directiondir)

free the ring of SRPT I/O context structures

Parameters

structsrpt_ioctx**ioctx_ring

I/O context ring to be freed.

structsrpt_device*sdev

SRPT HCA pointer.

intring_size

Number of ring elements.

structkmem_cache*buf_cache

I/O buffer cache.

enumdma_data_directiondir

DMA data direction.

enumsrpt_command_statesrpt_set_cmd_state(structsrpt_send_ioctx*ioctx,enumsrpt_command_statenew)

set the state of a SCSI command

Parameters

structsrpt_send_ioctx*ioctx

Send I/O context.

enumsrpt_command_statenew

New I/O context state.

Description

Does not modify the state of aborted commands. Returns the previous commandstate.

boolsrpt_test_and_set_cmd_state(structsrpt_send_ioctx*ioctx,enumsrpt_command_stateold,enumsrpt_command_statenew)

test and set the state of a command

Parameters

structsrpt_send_ioctx*ioctx

Send I/O context.

enumsrpt_command_stateold

Current I/O context state.

enumsrpt_command_statenew

New I/O context state.

Description

Returns true if and only if the previous command state was equal to ‘old’.

intsrpt_post_recv(structsrpt_device*sdev,structsrpt_rdma_ch*ch,structsrpt_recv_ioctx*ioctx)

post an IB receive request

Parameters

structsrpt_device*sdev

SRPT HCA pointer.

structsrpt_rdma_ch*ch

SRPT RDMA channel.

structsrpt_recv_ioctx*ioctx

Receive I/O context pointer.

intsrpt_zerolength_write(structsrpt_rdma_ch*ch)

perform a zero-length RDMA write

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

Description

A quote from the InfiniBand specification: C9-88: For an HCA responderusing Reliable Connection service, for each zero-length RDMA READ or WRITErequest, the R_Key shall not be validated, even if the request includesImmediate data.

intsrpt_get_desc_tbl(structsrpt_recv_ioctx*recv_ioctx,structsrpt_send_ioctx*ioctx,structsrp_cmd*srp_cmd,enumdma_data_direction*dir,structscatterlist**sg,unsignedint*sg_cnt,u64*data_len,u16imm_data_offset)

parse the data descriptors of a SRP_CMD request

Parameters

structsrpt_recv_ioctx*recv_ioctx

I/O context associated with the received commandsrp_cmd.

structsrpt_send_ioctx*ioctx

I/O context that will be used for responding to the initiator.

structsrp_cmd*srp_cmd

Pointer to the SRP_CMD request data.

enumdma_data_direction*dir

Pointer to the variable to which the transfer direction will bewritten.

structscatterlist**sg

[out] scatterlist for the parsed SRP_CMD.

unsignedint*sg_cnt

[out] length ofsg.

u64*data_len

Pointer to the variable to which the total data length of alldescriptors in the SRP_CMD request will be written.

u16imm_data_offset

[in] Offset in SRP_CMD requests at which immediate datastarts.

Description

This function initializes ioctx->nrbuf and ioctx->r_bufs.

Returns -EINVAL when the SRP_CMD request contains inconsistent descriptors;-ENOMEM when memory allocation fails and zero upon success.

intsrpt_init_ch_qp(structsrpt_rdma_ch*ch,structib_qp*qp)

initialize queue pair attributes

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

structib_qp*qp

Queue pair pointer.

Description

Initialized the attributes of queue pair ‘qp’ by allowing local write,remote read and remote write. Also transitions ‘qp’ to state IB_QPS_INIT.

intsrpt_ch_qp_rtr(structsrpt_rdma_ch*ch,structib_qp*qp)

change the state of a channel to ‘ready to receive’ (RTR)

Parameters

structsrpt_rdma_ch*ch

channel of the queue pair.

structib_qp*qp

queue pair to change the state of.

Description

Returns zero upon success and a negative value upon failure.

Note

currently astructib_qp_attr takes 136 bytes on a 64-bit system.If this structure ever becomes larger, it might be necessary to allocateit dynamically instead of on the stack.

intsrpt_ch_qp_rts(structsrpt_rdma_ch*ch,structib_qp*qp)

change the state of a channel to ‘ready to send’ (RTS)

Parameters

structsrpt_rdma_ch*ch

channel of the queue pair.

structib_qp*qp

queue pair to change the state of.

Description

Returns zero upon success and a negative value upon failure.

Note

currently astructib_qp_attr takes 136 bytes on a 64-bit system.If this structure ever becomes larger, it might be necessary to allocateit dynamically instead of on the stack.

intsrpt_ch_qp_err(structsrpt_rdma_ch*ch)

set the channel queue pair state to ‘error’

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

structsrpt_send_ioctx*srpt_get_send_ioctx(structsrpt_rdma_ch*ch)

obtain an I/O context for sending to the initiator

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

intsrpt_abort_cmd(structsrpt_send_ioctx*ioctx)

abort a SCSI command

Parameters

structsrpt_send_ioctx*ioctx

I/O context associated with the SCSI command.

voidsrpt_rdma_read_done(structib_cq*cq,structib_wc*wc)

RDMA read completion callback

Parameters

structib_cq*cq

Completion queue.

structib_wc*wc

Work completion.

Description

XXX: what is now target_execute_cmd used to be asynchronous, and unmappingthe data that has been transferred via IB RDMA had to be postponed until thecheck_stop_free() callback. None of this is necessary anymore and needs tobe cleaned up.

intsrpt_build_cmd_rsp(structsrpt_rdma_ch*ch,structsrpt_send_ioctx*ioctx,u64tag,intstatus)

build a SRP_RSP response

Parameters

structsrpt_rdma_ch*ch

RDMA channel through which the request has been received.

structsrpt_send_ioctx*ioctx

I/O context associated with the SRP_CMD request. The response willbe built in the buffer ioctx->buf points at and hence this function willoverwrite the request data.

u64tag

tag of the request for which this response is being generated.

intstatus

value for the STATUS field of the SRP_RSP information unit.

Description

Returns the size in bytes of the SRP_RSP response.

An SRP_RSP response contains a SCSI status or service response. See alsosection 6.9 in the SRP r16a document for the format of an SRP_RSPresponse. See also SPC-2 for more information about sense data.

intsrpt_build_tskmgmt_rsp(structsrpt_rdma_ch*ch,structsrpt_send_ioctx*ioctx,u8rsp_code,u64tag)

build a task management response

Parameters

structsrpt_rdma_ch*ch

RDMA channel through which the request has been received.

structsrpt_send_ioctx*ioctx

I/O context in which the SRP_RSP response will be built.

u8rsp_code

RSP_CODE that will be stored in the response.

u64tag

Tag of the request for which this response is being generated.

Description

Returns the size in bytes of the SRP_RSP response.

An SRP_RSP response contains a SCSI status or service response. See alsosection 6.9 in the SRP r16a document for the format of an SRP_RSPresponse.

voidsrpt_handle_cmd(structsrpt_rdma_ch*ch,structsrpt_recv_ioctx*recv_ioctx,structsrpt_send_ioctx*send_ioctx)

process a SRP_CMD information unit

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

structsrpt_recv_ioctx*recv_ioctx

Receive I/O context.

structsrpt_send_ioctx*send_ioctx

Send I/O context.

voidsrpt_handle_tsk_mgmt(structsrpt_rdma_ch*ch,structsrpt_recv_ioctx*recv_ioctx,structsrpt_send_ioctx*send_ioctx)

process a SRP_TSK_MGMT information unit

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

structsrpt_recv_ioctx*recv_ioctx

Receive I/O context.

structsrpt_send_ioctx*send_ioctx

Send I/O context.

Description

Returns 0 if and only if the request will be processed by the target core.

For more information about SRP_TSK_MGMT information units, see also section6.7 in the SRP r16a document.

boolsrpt_handle_new_iu(structsrpt_rdma_ch*ch,structsrpt_recv_ioctx*recv_ioctx)

process a newly received information unit

Parameters

structsrpt_rdma_ch*ch

RDMA channel through which the information unit has been received.

structsrpt_recv_ioctx*recv_ioctx

Receive I/O context associated with the information unit.

voidsrpt_send_done(structib_cq*cq,structib_wc*wc)

send completion callback

Parameters

structib_cq*cq

Completion queue.

structib_wc*wc

Work completion.

Note

Although this has not yet been observed during tests, at least intheory it is possible that thesrpt_get_send_ioctx() call invoked bysrpt_handle_new_iu() fails. This is possible because the req_lim_deltavalue in each response is set to one, and it is possible that this responsemakes the initiator send a new request before the send completion for thatresponse has been processed. This could e.g. happen if the call tosrpt_put_send_iotcx() is delayed because of a higher priority interrupt orif IB retransmission causes generation of the send completion to bedelayed. Incoming information units for whichsrpt_get_send_ioctx() failsare queued on cmd_wait_list. The code below processes these delayedrequests one at a time.

intsrpt_create_ch_ib(structsrpt_rdma_ch*ch)

create receive and send completion queues

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

boolsrpt_close_ch(structsrpt_rdma_ch*ch)

close a RDMA channel

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

Description

Make sure all resources associated with the channel will be deallocated atan appropriate time.

Returns true if and only if the channel state has been modified intoCH_DRAINING.

intsrpt_cm_req_recv(structsrpt_device*constsdev,structib_cm_id*ib_cm_id,structrdma_cm_id*rdma_cm_id,u8port_num,__be16pkey,conststructsrp_login_req*req,constchar*src_addr)

process the event IB_CM_REQ_RECEIVED

Parameters

structsrpt_device*constsdev

HCA through which the login request was received.

structib_cm_id*ib_cm_id

IB/CM connection identifier in case of IB/CM.

structrdma_cm_id*rdma_cm_id

RDMA/CM connection identifier in case of RDMA/CM.

u8port_num

Port through which the REQ message was received.

__be16pkey

P_Key of the incoming connection.

conststructsrp_login_req*req

SRP login request.

constchar*src_addr

GID (IB/CM) or IP address (RDMA/CM) of the port that submittedthe login request.

Description

Ownership of the cm_id is transferred to the target session if thisfunction returns zero. Otherwise the caller remains the owner of cm_id.

voidsrpt_cm_rtu_recv(structsrpt_rdma_ch*ch)

process an IB_CM_RTU_RECEIVED or USER_ESTABLISHED event

Parameters

structsrpt_rdma_ch*ch

SRPT RDMA channel.

Description

An RTU (ready to use) message indicates that the connection has beenestablished and that the recipient may begin transmitting.

intsrpt_cm_handler(structib_cm_id*cm_id,conststructib_cm_event*event)

IB connection manager callback function

Parameters

structib_cm_id*cm_id

IB/CM connection identifier.

conststructib_cm_event*event

IB/CM event.

Description

A non-zero return value will cause the caller destroy the CM ID.

Note

srpt_cm_handler() must only return a non-zero value when transferringownership of the cm_id to a channel bysrpt_cm_req_recv() failed. Returninga non-zero value in any other case will trigger a race with theib_destroy_cm_id() call insrpt_release_channel().

voidsrpt_queue_response(structse_cmd*cmd)

transmit the response to a SCSI command

Parameters

structse_cmd*cmd

SCSI target command.

Description

Callback function called by the TCM core. Must not block since it can beinvoked on the context of the IB completion handler.

intsrpt_release_sport(structsrpt_port*sport)

disable login and wait for associated channels

Parameters

structsrpt_port*sport

SRPT HCA port.

structport_and_port_idsrpt_lookup_port(constchar*name)

Look up an RDMA port by name

Parameters

constchar*name

ASCII port name

Description

Increments the RDMA port reference count if an RDMA port pointer is returned.The caller must drop that reference count by callingsrpt_port_put_ref().

intsrpt_add_one(structib_device*device)

InfiniBand device addition callback function

Parameters

structib_device*device

Describes a HCA.

voidsrpt_remove_one(structib_device*device,void*client_data)

InfiniBand device removal callback function

Parameters

structib_device*device

Describes a HCA.

void*client_data

The value passed as the third argument toib_set_client_data().

voidsrpt_close_session(structse_session*se_sess)

forcibly close a session

Parameters

structse_session*se_sess

SCSI target session.

Description

Callback function invoked by the TCM core to clean up sessions associatedwith a node ACL when the user invokesrmdir /sys/kernel/config/target/$driver/$port/$tpg/acls/$i_port_id

intsrpt_parse_i_port_id(u8i_port_id[16],constchar*name)

parse an initiator port ID

Parameters

u8i_port_id[16]

Binary 128-bit port ID.

constchar*name

ASCII representation of a 128-bit initiator port ID.

structse_portal_group*srpt_make_tpg(structse_wwn*wwn,constchar*name)

configfs callback invoked for mkdir /sys/kernel/config/target/$driver/$port/$tpg

Parameters

structse_wwn*wwn

Corresponds to $driver/$port.

constchar*name

$tpg.

voidsrpt_drop_tpg(structse_portal_group*tpg)

configfs callback invoked for rmdir /sys/kernel/config/target/$driver/$port/$tpg

Parameters

structse_portal_group*tpg

Target portal group to deregister.

structse_wwn*srpt_make_tport(structtarget_fabric_configfs*tf,structconfig_group*group,constchar*name)

configfs callback invoked for mkdir /sys/kernel/config/target/$driver/$port

Parameters

structtarget_fabric_configfs*tf

Not used.

structconfig_group*group

Not used.

constchar*name

$port.

voidsrpt_drop_tport(structse_wwn*wwn)

configfs callback invoked for rmdir /sys/kernel/config/target/$driver/$port

Parameters

structse_wwn*wwn

$port.

intsrpt_init_module(void)

kernel module initialization

Parameters

void

no arguments

Note

Sinceib_register_client() registers callback functions, and since atleast one of these callback functions (srpt_add_one()) calls target corefunctions, this driver must be registered with the target core beforeib_register_client() is called.

iSCSI Extensions for RDMA (iSER) target support

voidisert_conn_terminate(structisert_conn*isert_conn)

Initiate connection termination

Parameters

structisert_conn*isert_conn

isert connection struct

Notes

In case the connection state is BOUND, move stateto TEMINATING and start teardown sequence (rdma_disconnect).In case the connection state is UP, complete flush as well.

This routine must be called with mutex held. Thus it issafe to call multiple times.

voidisert_put_unsol_pending_cmds(structiscsit_conn*conn)

Drop commands waiting for unsolicitate dataout

Parameters

structiscsit_conn*conn

iscsi connection

Description

We might still have commands that are waiting for unsoliciteddataouts messages. We must put the extra reference on thosebefore blocking on the target_wait_for_session_cmds