NTB Drivers¶
NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connectsthe separate memory systems of two or more computers to the same PCI-Expressfabric. Existing NTB hardware supports a common feature set: doorbellregisters and memory translation windows, as well as non common features likescratchpad and message registers. Scratchpad registers are read-and-writableregisters that are accessible from either side of the device, so that peers canexchange a small amount of information at a fixed address. Message registers canbe utilized for the same purpose. Additionally they are provided with withspecial status bits to make sure the information isn’t rewritten by anotherpeer. Doorbell registers provide a way for peers to send interrupt events.Memory windows allow translated read and write access to the peer memory.
NTB Core Driver (ntb)¶
The NTB core driver defines an api wrapping the common feature set, and allowsclients interested in NTB features to discover NTB the devices supported byhardware drivers. The term “client” is used here to mean an upper layercomponent making use of the NTB api. The term “driver,” or “hardware driver,”is used here to mean a driver for a specific vendor and model of NTB hardware.
NTB Client Drivers¶
NTB client drivers should register with the NTB core driver. Afterregistering, the client probe and remove functions will be called appropriatelyas ntb hardware, or hardware drivers, are inserted and removed. Theregistration uses the Linux Device framework, so it should feel familiar toanyone who has written a pci driver.
NTB Typical client driver implementation¶
Primary purpose of NTB is to share some peace of memory between at least twosystems. So the NTB device features like Scratchpad/Message registers aremainly used to perform the proper memory window initialization. Typicallythere are two types of memory window interfaces supported by the NTB API:inbound translation configured on the local ntb port and outbound translationconfigured by the peer, on the peer ntb port. The first type isdepicted on the next figure:
Inbound translation:Memory: Local NTB Port: Peer NTB Port: Peer MMIO: ____________| dma-mapped |-ntb_mw_set_trans(addr) || memory | _v____________ | ______________| (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO|------------| |--------------| | |--------------|
So typical scenario of the first type memory window initialization looks:1) allocate a memory region, 2) put translated address to NTB config,3) somehow notify a peer device of performed initialization, 4) peer devicemaps corresponding outbound memory window so to have access to the sharedmemory region.
The second type of interface, that implies the shared windows beinginitialized by a peer device, is depicted on the figure:
Outbound translation:Memory: Local NTB Port: Peer NTB Port: Peer MMIO: ____________ ______________| dma-mapped | | | MW base addr |<== memory-mapped IO| memory | | |--------------|| (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)|------------| | |--------------|
Typical scenario of the second type interface initialization would be:1) allocate a memory region, 2) somehow deliver a translated address to a peerdevice, 3) peer puts the translated address to NTB config, 4) peer device mapsoutbound memory window so to have access to the shared memory region.
As one can see the described scenarios can be combined in one portablealgorithm.
- Local device:
- Allocate memory for a shared window
- Initialize memory window by translated address of the allocated region(it may fail if local memory window initialization is unsupported)
- Send the translated address and memory window index to a peer device
- Peer device:
- Initialize memory window with retrieved address of the allocatedby another device memory region (it may fail if peer memory windowinitialization is unsupported)
- Map outbound memory window
In accordance with this scenario, the NTB Memory Window API can be used asfollows:
- Local device:
- ntb_mw_count(pidx) - retrieve number of memory ranges, which canbe allocated for memory windows between local device and peer deviceof port with specified index.
- ntb_get_align(pidx, midx) - retrieve parameters restricting theshared memory region alignment and size. Then memory can be properlyallocated.
- Allocate physically contiguous memory region in compliance withrestrictions retrieved in 2).
- ntb_mw_set_trans(pidx, midx) - try to set translation address ofthe memory window with specified index for the defined peer device(it may fail if local translated address setting is not supported)
- Send translated base address (usually together with memory windownumber) to the peer device using, for instance, scratchpad or messageregisters.
- Peer device:
- ntb_peer_mw_set_trans(pidx, midx) - try to set received from otherdevice (related to pidx) translated address for specified memorywindow. It may fail if retrieved address, for instance, exceedsmaximum possible address or isn’t properly aligned.
- ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memorywindow so to have an access to the shared memory.
Also it is worth to note, that method ntb_mw_count(pidx) should return thesame value as ntb_peer_mw_count() on the peer with port index - pidx.
NTB Transport Client (ntb_transport) and NTB Netdev (ntb_netdev)¶
The primary client for NTB is the Transport client, used in tandem with NTBNetdev. These drivers function together to create a logical link to the peer,across the ntb, to exchange packets of network data. The Transport clientestablishes a logical link to the peer, and creates queue pairs to exchangemessages and data. The NTB Netdev then creates an ethernet device using aTransport queue pair. Network data is copied between socket buffers and theTransport queue pair buffer. The Transport client may be used for other thingsbesides Netdev, however no other applications have yet been written.
NTB Ping Pong Test Client (ntb_pingpong)¶
The Ping Pong test client serves as a demonstration to exercise the doorbelland scratchpad registers of NTB hardware, and as an example simple NTB client.Ping Pong enables the link when started, waits for the NTB link to come up, andthen proceeds to read and write the doorbell scratchpad registers of the NTB.The peers interrupt each other using a bit mask of doorbell bits, which isshifted by one in each round, to test the behavior of multiple doorbell bitsand interrupt vectors. The Ping Pong driver also reads the first localscratchpad, and writes the value plus one to the first peer scratchpad, eachround before writing the peer doorbell register.
Module Parameters:
- unsafe - Some hardware has known issues with scratchpad and doorbell
- registers. By default, Ping Pong will not attempt to exercise suchhardware. You may override this behavior at your own risk by settingunsafe=1.
- delay_ms - Specify the delay between receiving a doorbell
- interrupt event and setting the peer doorbell register for the nextround.
- init_db - Specify the doorbell bits to start new series of rounds. A new
- series begins once all the doorbell bits have been shifted out ofrange.
- dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
- then to observe debugging output on the console.
NTB Tool Test Client (ntb_tool)¶
The Tool test client serves for debugging, primarily, ntb hardware and drivers.The Tool provides access through debugfs for reading, setting, and clearing theNTB doorbell, and reading and writing scratchpads.
The Tool does not currently have any module parameters.
Debugfs Files:
- debugfs/ntb_tool/hw/
- A directory in debugfs will be created for eachNTB device probed by the tool. This directory is shortened tohwbelow.
- hw/db
- This file is used to read, set, and clear the local doorbell. Notall operations may be supported by all hardware. To read the doorbell,read the file. To set the doorbell, writes followed by the bits toset (eg:echo ‘s 0x0101’ > db). To clear the doorbell, writecfollowed by the bits to clear.
- hw/mask
- This file is used to read, set, and clear the local doorbell mask.Seedb for details.
- hw/peer_db
- This file is used to read, set, and clear the peer doorbell.Seedb for details.
- hw/peer_mask
- This file is used to read, set, and clear the peer doorbellmask. Seedb for details.
- hw/spad
- This file is used to read and write local scratchpads. To readthe values of all scratchpads, read the file. To write values, write aseries of pairs of scratchpad number and value(eg:echo ‘4 0x123 7 0xabc’ > spad# to set scratchpads4 and7 to0x123 and0xabc, respectively).
- hw/peer_spad
- This file is used to read and write peer scratchpads. Seespad for details.
NTB MSI Test Client (ntb_msi_test)¶
The MSI test client serves to test and debug the MSI library whichallows for passing MSI interrupts across NTB memory windows. Thetest client is interacted with through the debugfs filesystem:
- debugfs/ntb_tool/hw/
- A directory in debugfs will be created for eachNTB device probed by the tool. This directory is shortened tohwbelow.
- hw/port
- This file describes the local port number
- hw/irq*_occurrences
- One occurrences file exists for each interrupt and, when read,returns the number of times the interrupt has been triggered.
- hw/peer*/port
- This file describes the port number for each peer
- hw/peer*/count
- This file describes the number of interrupts that can betriggered on each peer
- hw/peer*/trigger
- Writing an interrupt number (any number less than the valuespecified in count) will trigger the interrupt on thespecified peer. That peer’s interrupt’s occurrence fileshould be incremented.
NTB Hardware Drivers¶
NTB hardware drivers should register devices with the NTB core driver. Afterregistering, clients probe and remove functions will be called.
NTB Intel Hardware Driver (ntb_hw_intel)¶
The Intel hardware driver supports NTB on Xeon and Atom CPUs.
Module Parameters:
- b2b_mw_idx
- If the peer ntb is to be accessed via a memory window, then usethis memory window to access the peer ntb. A value of zero or positivestarts from the first mw idx, and a negative value starts from the lastmw idx. Both sides MUST set the same value here! The default value is-1.
- b2b_mw_share
- If the peer ntb is to be accessed via a memory window, and ifthe memory window is large enough, still allow the client to use thesecond half of the memory window for address translation to the peer.
- xeon_b2b_usd_bar2_addr64
- If using B2B topology on Xeon hardware, usethis 64 bit address on the bus between the NTB devices for the windowat BAR2, on the upstream side of the link.
- xeon_b2b_usd_bar4_addr64 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_usd_bar4_addr32 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_usd_bar5_addr32 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_dsd_bar2_addr64 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_dsd_bar4_addr64 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_dsd_bar4_addr32 - Seexeon_b2b_bar2_addr64.
- xeon_b2b_dsd_bar5_addr32 - Seexeon_b2b_bar2_addr64.