Buffer Sharing and Synchronization (dma-buf)

The dma-buf subsystem provides the framework for sharing buffers forhardware (DMA) access across multiple device drivers and subsystems, andfor synchronizing asynchronous hardware access.

As an example, it is used extensively by the DRM subsystem to exchangebuffers between processes, contexts, library APIs within the sameprocess, and also to exchange buffers with other subsystems such asV4L2.

This document describes the way in which kernel subsystems can use andinteract with the three main primitives offered by dma-buf:

  • dma-buf, representing a sg_table and exposed to userspace as a filedescriptor to allow passing between processes, subsystems, devices,etc;

  • dma-fence, providing a mechanism to signal when an asynchronoushardware operation has completed; and

  • dma-resv, which manages a set of dma-fences for a particular dma-bufallowing implicit (kernel-ordered) synchronization of work topreserve the illusion of coherent access

Userspace API principles and use

For more details on how to design your subsystem’s API for dma-buf use, pleaseseeExchanging pixel buffers.

Shared DMA Buffers

This document serves as a guide to device-driver writers on what is the dma-bufbuffer sharing API, how to use it for exporting and using shared buffers.

Any device driver which wishes to be a part of DMA buffer sharing, can do so aseither the ‘exporter’ of buffers, or the ‘user’ or ‘importer’ of buffers.

Say a driver A wants to use buffers created by driver B, then we call B as theexporter, and A as buffer-user/importer.

The exporter

  • implements and manages operations instructdma_buf_ops for the buffer,

  • allows other users to share the buffer by using dma_buf sharing APIs,

  • manages the details of buffer allocation, wrapped in astructdma_buf,

  • decides about the actual backing storage where this allocation happens,

  • and takes care of any migration of scatterlist - for all (shared) users ofthis buffer.

The buffer-user

  • is one of (many) sharing users of the buffer.

  • doesn’t need to worry about how the buffer is allocated, or where.

  • and needs a mechanism to get access to the scatterlist that makes up thisbuffer in memory, mapped into its own address space, so it can access thesame area of memory. This interface is provided bystructdma_buf_attachment.

Any exporters or users of the dma-buf buffer sharing framework must have a‘select DMA_SHARED_BUFFER’ in their respective Kconfigs.

Userspace Interface Notes

Mostly a DMA buffer file descriptor is simply an opaque object for userspace,and hence the generic interface exposed is very minimal. There’s a few things toconsider though:

  • Since kernel 3.12 the dma-buf FD supports the llseek system call, but onlywith offset=0 and whence=SEEK_END|SEEK_SET. SEEK_SET is supported to allowthe usual size discover pattern size = SEEK_END(0); SEEK_SET(0). Every otherllseek operation will report -EINVAL.

    If llseek on dma-buf FDs isn’t supported the kernel will report -ESPIPE for allcases. Userspace can use this to detect support for discovering the dma-bufsize using llseek.

  • In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be seton the file descriptor. This is not just a resource leak, but apotential security hole. It could give the newly exec’d applicationaccess to buffers, via the leaked fd, to which it should otherwisenot be permitted access.

    The problem with doing this via a separate fcntl() call, versus doing itatomically when the fd is created, is that this is inherently racy in amulti-threaded app[3]. The issue is made worse when it is library codeopening/creating the file descriptor, as the application may not even beaware of the fd’s.

    To avoid this problem, userspace must have a way to request O_CLOEXECflag be set when the dma-buf fd is created. So any API provided bythe exporting driver to create a dmabuf fd must provide a way to letuserspace control setting of O_CLOEXEC flag passed in todma_buf_fd().

  • Memory mapping the contents of the DMA buffer is also supported. See thediscussion below onCPU Access to DMA Buffer Objects for the full details.

  • The DMA buffer FD is also pollable, seeImplicit Fence Poll Support below fordetails.

  • The DMA buffer FD also supports a few dma-buf-specific ioctls, seeDMA Buffer ioctls below for details.

Basic Operation and Device DMA Access

For device DMA access to a shared DMA buffer the usual sequence of operationsis fairly simple:

  1. The exporter defines his exporter instance usingDEFINE_DMA_BUF_EXPORT_INFO() and callsdma_buf_export() to wrap a privatebuffer object into adma_buf. It then exports thatdma_buf to userspaceas a file descriptor by callingdma_buf_fd().

  2. Userspace passes this file-descriptors to all drivers it wants this bufferto share with: First the file descriptor is converted to adma_buf usingdma_buf_get(). Then the buffer is attached to the device usingdma_buf_attach().

    Up to this stage the exporter is still free to migrate or reallocate thebacking storage.

  3. Once the buffer is attached to all devices userspace can initiate DMAaccess to the shared buffer. In the kernel this is done by callingdma_buf_map_attachment() anddma_buf_unmap_attachment().

  4. Once a driver is done with a shared buffer it needs to calldma_buf_detach() (after cleaning up any mappings) and then release thereference acquired withdma_buf_get() by callingdma_buf_put().

For the detailed semantics exporters are expected to implement seedma_buf_ops.

CPU Access to DMA Buffer Objects

There are multiple reasons for supporting CPU access to a dma buffer object:

  • Fallback operations in the kernel, for example when a device is connectedover USB and the kernel needs to shuffle the data around first beforesending it away. Cache coherency is handled by bracketing any transactionswith calls todma_buf_begin_cpu_access() anddma_buf_end_cpu_access()access.

    Since for most kernel internal dma-buf accesses need the entire buffer, avmap interface is introduced. Note that on very old 32-bit architecturesvmalloc space might be limited and result in vmap calls failing.

    Interfaces:

    void*dma_buf_vmap(structdma_buf*dmabuf,structiosys_map*map)voiddma_buf_vunmap(structdma_buf*dmabuf,structiosys_map*map)

    The vmap call can fail if there is no vmap support in the exporter, or ifit runs out of vmalloc space. Note that the dma-buf layer keeps a referencecount for all vmap access and calls down into the exporter’s vmap functiononly when no vmapping exists, and only unmaps it once. Protection againstconcurrent vmap/vunmap calls is provided by taking thedma_buf.lock mutex.

  • For full compatibility on the importer side with existing userspaceinterfaces, which might already support mmap’ing buffers. This is needed inmany processing pipelines (e.g. feeding a software rendered image into ahardware pipeline, thumbnail creation, snapshots, ...). Also, Android’s IONframework already supported this and for DMA buffer file descriptors toreplace ION buffers mmap support was needed.

    There is no special interfaces, userspace simply calls mmap on the dma-buffd. But like for CPU access there’s a need to bracket the actual access,which is handled by the ioctl (DMA_BUF_IOCTL_SYNC). Note thatDMA_BUF_IOCTL_SYNC can fail with -EAGAIN or -EINTR, in which case it mustbe restarted.

    Some systems might need some sort of cache coherency management e.g. whenCPU and GPU domains are being accessed through dma-buf at the same time.To circumvent this problem there are begin/end coherency markers, thatforward directly to existing dma-buf device drivers vfunc hooks. Userspacecan make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. Thesequence would be used like following:

    • mmap dma-buf fd

    • for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/writeto mmap area 3. SYNC_END ioctl. This can be repeated as often as youwant (with the new data being consumed by say the GPU or the scanoutdevice)

    • munmap once you don’t need the buffer any more

    For correctness and optimal performance, it is always required to useSYNC_START and SYNC_END before and after, respectively, when accessing themapped address. Userspace cannot rely on coherent access, even when thereare systems where it just works without calling these ioctls.

  • And as a CPU fallback in userspace processing pipelines.

    Similar to the motivation for kernel cpu access it is again important thatthe userspace code of a given importing subsystem can use the sameinterfaces with a imported dma-buf buffer object as with a native bufferobject. This is especially important for drm where the userspace part ofcontemporary OpenGL, X, and other drivers is huge, and reworking them touse a different way to mmap a buffer rather invasive.

    The assumption in the current dma-buf interfaces is that redirecting theinitial mmap is all that’s needed. A survey of some of the existingsubsystems shows that no driver seems to do any nefarious thing likesyncing up with outstanding asynchronous processing on the device orallocating special resources at fault time. So hopefully this is goodenough, since adding interfaces to intercept pagefaults and allow pteshootdowns would increase the complexity quite a bit.

    Interface:

    intdma_buf_mmap(structdma_buf*,structvm_area_struct*,unsignedlong);

    If the importing subsystem simply provides a special-purpose mmap call toset up a mapping in userspace, calling do_mmap withdma_buf.file willequally achieve that for a dma-buf object.

Implicit Fence Poll Support

To support cross-device and cross-driver synchronization of buffer accessimplicit fences (represented internally in the kernel withstructdma_fence)can be attached to adma_buf. The glue for that and a few related things areprovided in thedma_resv structure.

Userspace can query the state of these implicitly tracked fences using poll()and related system calls:

  • Checking for EPOLLIN, i.e. read access, can be use to query the state of themost recent write or exclusive fence.

  • Checking for EPOLLOUT, i.e. write access, can be used to query the state ofall attached fences, shared and exclusive ones.

Note that this only signals the completion of the respective fences, i.e. theDMA transfers are complete. Cache flushing and any other necessarypreparations before CPU access can begin still need to happen.

As an alternative to poll(), the set of fences on DMA buffer can beexported as async_file usingdma_buf_sync_file_export.

DMA Buffer ioctls

structdma_buf_sync

Synchronize with CPU access.

Definition:

struct dma_buf_sync {    __u64 flags;};

Members

flags

Set of access flags

DMA_BUF_SYNC_START:

Indicates the start of a map access session.

DMA_BUF_SYNC_END:

Indicates the end of a map access session.

DMA_BUF_SYNC_READ:

Indicates that the mapped DMA buffer will be read by theclient via the CPU map.

DMA_BUF_SYNC_WRITE:

Indicates that the mapped DMA buffer will be written by theclient via the CPU map.

DMA_BUF_SYNC_RW:

An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.

Description

When a DMA buffer is accessed from the CPU via mmap, it is not alwayspossible to guarantee coherency between the CPU-visible map and underlyingmemory. To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracketany CPU access to give the kernel the chance to shuffle memory around ifneeded.

Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNCwith DMA_BUF_SYNC_START and the appropriate read/write flags. Once theaccess is complete, the client should call DMA_BUF_IOCTL_SYNC withDMA_BUF_SYNC_END and the same read/write flags.

The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cachecoherency. It does not prevent other processes or devices fromaccessing the memory at the same time. If synchronization with a GPU orother device driver is required, it is the client’s responsibility towait for buffer to be ready for reading or writing before calling thisioctl with DMA_BUF_SYNC_START. Likewise, the client must ensure thatfollow-up work is not submitted to GPU or other device driver untilafter this ioctl has been called with DMA_BUF_SYNC_END?

If the driver or API with which the client is interacting uses implicitsynchronization, waiting for prior work to complete can be done viapoll() on the DMA buffer file descriptor. If the driver or API requiresexplicit synchronization, the client may have to wait on a sync_file orother synchronization primitive outside the scope of the DMA buffer API.

structdma_buf_export_sync_file

Get a sync_file from a dma-buf

Definition:

struct dma_buf_export_sync_file {    __u32 flags;    __s32 fd;};

Members

flags

Read/write flags

Must be DMA_BUF_SYNC_READ, DMA_BUF_SYNC_WRITE, or both.

If DMA_BUF_SYNC_READ is set and DMA_BUF_SYNC_WRITE is not set,the returned sync file waits on any writers of the dma-buf tocomplete. Waiting on the returned sync file is equivalent topoll() with POLLIN.

If DMA_BUF_SYNC_WRITE is set, the returned sync file waits onany users of the dma-buf (read or write) to complete. Waitingon the returned sync file is equivalent to poll() with POLLOUT.If both DMA_BUF_SYNC_WRITE and DMA_BUF_SYNC_READ are set, thisis equivalent to just DMA_BUF_SYNC_WRITE.

fd

Returned sync file descriptor

Description

Userspace can perform a DMA_BUF_IOCTL_EXPORT_SYNC_FILE to retrieve thecurrent set of fences on a dma-buf file descriptor as a sync_file. CPUwaits via poll() or other driver-specific mechanisms typically wait onwhatever fences are on the dma-buf at the time the wait begins. Thisis similar except that it takes a snapshot of the current fences on thedma-buf for waiting later instead of waiting immediately. This isuseful for modern graphics APIs such as Vulkan which assume an explicitsynchronization model but still need to inter-operate with dma-buf.

The intended usage pattern is the following:

  1. Export a sync_file with flags corresponding to the expected GPU usagevia DMA_BUF_IOCTL_EXPORT_SYNC_FILE.

  2. Submit rendering work which uses the dma-buf. The work should wait onthe exported sync file before rendering and produce another sync_filewhen complete.

  3. Import the rendering-complete sync_file into the dma-buf with flagscorresponding to the GPU usage via DMA_BUF_IOCTL_IMPORT_SYNC_FILE.

Unlike doing implicit synchronization via a GPU kernel driver’s exec ioctl,the above is not a single atomic operation. If userspace wants to ensureordering via these fences, it is the respnosibility of userspace to uselocks or other mechanisms to ensure that no other context adds fences orsubmits work between steps 1 and 3 above.

structdma_buf_import_sync_file

Insert a sync_file into a dma-buf

Definition:

struct dma_buf_import_sync_file {    __u32 flags;    __s32 fd;};

Members

flags

Read/write flags

Must be DMA_BUF_SYNC_READ, DMA_BUF_SYNC_WRITE, or both.

If DMA_BUF_SYNC_READ is set and DMA_BUF_SYNC_WRITE is not set,this inserts the sync_file as a read-only fence. Any subsequentimplicitly synchronized writes to this dma-buf will wait on thisfence but reads will not.

If DMA_BUF_SYNC_WRITE is set, this inserts the sync_file as awrite fence. All subsequent implicitly synchronized access tothis dma-buf will wait on this fence.

fd

Sync file descriptor

Description

Userspace can perform a DMA_BUF_IOCTL_IMPORT_SYNC_FILE to insert async_file into a dma-buf for the purposes of implicit synchronizationwith other dma-buf consumers. This allows clients using explicitlysynchronized APIs such as Vulkan to inter-op with dma-buf consumerswhich expect implicit synchronization such as OpenGL or most mediadrivers/video.

DMA-BUF locking convention

In order to avoid deadlock situations between dma-buf exports and importers,all dma-buf API users must follow the common dma-buf locking convention.

Convention for importers

  1. Importers must hold the dma-buf reservation lock when calling thesefunctions:

  2. Importers must not hold the dma-buf reservation lock when calling thesefunctions:

Convention for exporters

  1. Thesedma_buf_ops callbacks are invoked with unlocked dma-bufreservation and exporter can take the lock:

  2. Thesedma_buf_ops callbacks are invoked with locked dma-bufreservation and exporter can’t take the lock:

  3. Exporters must hold the dma-buf reservation lock when calling thesefunctions:

Kernel Functions and Structures Reference

structdma_buf*dma_buf_export(conststructdma_buf_export_info*exp_info)

Creates a new dma_buf, and associates an anon file with this buffer, so it can be exported. Also connect the allocator specific data and ops to the buffer. Additionally, provide a name string for exporter; useful in debugging.

Parameters

conststructdma_buf_export_info*exp_info

[in] holds all the export related information providedby the exporter. seestructdma_buf_export_infofor further details.

Description

Returns, on success, a newly createdstructdma_buf object, which wraps thesupplied private data and operations forstructdma_buf_ops. On eithermissing ops, or error in allocatingstructdma_buf, will return negativeerror.

For most cases the easiest way to createexp_info is through theDEFINE_DMA_BUF_EXPORT_INFO macro.

intdma_buf_fd(structdma_buf*dmabuf,intflags)

returns a file descriptor for the givenstructdma_buf

Parameters

structdma_buf*dmabuf

[in] pointer to dma_buf for which fd is required.

intflags

[in] flags to give to fd

Description

On success, returns an associated ‘fd’. Else, returns error.

structdma_buf*dma_buf_get(intfd)

returns thestructdma_buf related to an fd

Parameters

intfd

[in] fd associated with thestructdma_buf to be returned

Description

On success, returns thestructdma_buf associated with an fd; usesfile’s refcounting done by fget to increase refcount. returns ERR_PTRotherwise.

voiddma_buf_put(structdma_buf*dmabuf)

decreases refcount of the buffer

Parameters

structdma_buf*dmabuf

[in] buffer to reduce refcount of

Description

Uses file’s refcounting done implicitly byfput().

If, as a result of this call, the refcount becomes 0, the ‘release’ fileoperation related to this fd is called. It callsdma_buf_ops.release vfuncin turn, and frees the memory allocated for dmabuf when exported.

structdma_buf_attachment*dma_buf_dynamic_attach(structdma_buf*dmabuf,structdevice*dev,conststructdma_buf_attach_ops*importer_ops,void*importer_priv)

Add the device to dma_buf’s attachments list

Parameters

structdma_buf*dmabuf

[in] buffer to attach device to.

structdevice*dev

[in] device to be attached.

conststructdma_buf_attach_ops*importer_ops

[in] importer operations for the attachment

void*importer_priv

[in] importer private pointer for the attachment

Description

Returnsstructdma_buf_attachment pointer for this attachment. Attachmentsmust be cleaned up by callingdma_buf_detach().

Optionally this callsdma_buf_ops.attach to allow device-specific attachfunctionality.

A pointer to newly createddma_buf_attachment on success, or a negativeerror code wrapped into a pointer on failure.

Note that this can fail if the backing storage ofdmabuf is in a place notaccessible todev, and cannot be moved to a more suitable place. This isindicated with the error code -EBUSY.

structdma_buf_attachment*dma_buf_attach(structdma_buf*dmabuf,structdevice*dev)

Wrapper for dma_buf_dynamic_attach

Parameters

structdma_buf*dmabuf

[in] buffer to attach device to.

structdevice*dev

[in] device to be attached.

Description

Wrapper to calldma_buf_dynamic_attach() for drivers which still use a staticmapping.

voiddma_buf_detach(structdma_buf*dmabuf,structdma_buf_attachment*attach)

Remove the given attachment from dmabuf’s attachments list

Parameters

structdma_buf*dmabuf

[in] buffer to detach from.

structdma_buf_attachment*attach

[in] attachment to be detached; is free’d after this call.

Description

Clean up a device attachment obtained by callingdma_buf_attach().

Optionally this callsdma_buf_ops.detach for device-specific detach.

intdma_buf_pin(structdma_buf_attachment*attach)

Lock down the DMA-buf

Parameters

structdma_buf_attachment*attach

[in] attachment which should be pinned

Description

Only dynamic importers (who set upattach withdma_buf_dynamic_attach()) maycall this, and only for limited use cases like scanout and not for temporarypin operations. It is not permitted to allow userspace to pin arbitraryamounts of buffers through this interface.

Buffers must be unpinned by callingdma_buf_unpin().

Return

0 on success, negative error code on failure.

voiddma_buf_unpin(structdma_buf_attachment*attach)

Unpin a DMA-buf

Parameters

structdma_buf_attachment*attach

[in] attachment which should be unpinned

Description

This unpins a buffer pinned bydma_buf_pin() and allows the exporter to moveany mapping ofattach again and inform the importer throughdma_buf_attach_ops.invalidate_mappings.

structsg_table*dma_buf_map_attachment(structdma_buf_attachment*attach,enumdma_data_directiondirection)

Returns the scatterlist table of the attachment; mapped into _device_ address space. Is a wrapper formap_dma_buf() of the dma_buf_ops.

Parameters

structdma_buf_attachment*attach

[in] attachment whose scatterlist is to be returned

enumdma_data_directiondirection

[in] direction of DMA transfer

Description

Returns sg_table containing the scatterlist to be returned; returns ERR_PTRon error. May return -EINTR if it is interrupted by a signal.

On success, the DMA addresses and lengths in the returned scatterlist arePAGE_SIZE aligned.

A mapping must be unmapped by usingdma_buf_unmap_attachment(). Note thatthe underlying backing storage is pinned for as long as a mapping exists,therefore users/importers should not hold onto a mapping for undue amounts oftime.

Important: Dynamic importers must wait for the exclusive fence of thestructdma_resv attached to the DMA-BUF first.

structsg_table*dma_buf_map_attachment_unlocked(structdma_buf_attachment*attach,enumdma_data_directiondirection)

Returns the scatterlist table of the attachment; mapped into _device_ address space. Is a wrapper formap_dma_buf() of the dma_buf_ops.

Parameters

structdma_buf_attachment*attach

[in] attachment whose scatterlist is to be returned

enumdma_data_directiondirection

[in] direction of DMA transfer

Description

Unlocked variant ofdma_buf_map_attachment().

voiddma_buf_unmap_attachment(structdma_buf_attachment*attach,structsg_table*sg_table,enumdma_data_directiondirection)

unmaps and decreases usecount of the buffer;might deallocate the scatterlist associated. Is a wrapper forunmap_dma_buf() of dma_buf_ops.

Parameters

structdma_buf_attachment*attach

[in] attachment to unmap buffer from

structsg_table*sg_table

[in] scatterlist info of the buffer to unmap

enumdma_data_directiondirection

[in] direction of DMA transfer

Description

This unmaps a DMA mapping forattached obtained bydma_buf_map_attachment().

voiddma_buf_unmap_attachment_unlocked(structdma_buf_attachment*attach,structsg_table*sg_table,enumdma_data_directiondirection)

unmaps and decreases usecount of the buffer;might deallocate the scatterlist associated. Is a wrapper forunmap_dma_buf() of dma_buf_ops.

Parameters

structdma_buf_attachment*attach

[in] attachment to unmap buffer from

structsg_table*sg_table

[in] scatterlist info of the buffer to unmap

enumdma_data_directiondirection

[in] direction of DMA transfer

Description

Unlocked variant ofdma_buf_unmap_attachment().

voiddma_buf_invalidate_mappings(structdma_buf*dmabuf)

notify attachments that DMA-buf is moving

Parameters

structdma_buf*dmabuf

[in] buffer which is moving

Description

Informs all attachments that they need to destroy and recreate all theirmappings.

intdma_buf_begin_cpu_access(structdma_buf*dmabuf,enumdma_data_directiondirection)

Must be called before accessing a dma_buf from the cpu in the kernel context. Calls begin_cpu_access to allow exporter-specific preparations. Coherency is only guaranteed in the specified range for the specified access direction.

Parameters

structdma_buf*dmabuf

[in] buffer to prepare cpu access for.

enumdma_data_directiondirection

[in] direction of access.

Description

After the cpu access is complete the caller should calldma_buf_end_cpu_access(). Only when cpu access is bracketed by both calls isit guaranteed to be coherent with other DMA access.

This function will also wait for any DMA transactions tracked throughimplicit synchronization indma_buf.resv. For DMA transactions with explicitsynchronization this function will only ensure cache coherency, callers mustensure synchronization with such DMA transactions on their own.

Can return negative error values, returns 0 on success.

intdma_buf_end_cpu_access(structdma_buf*dmabuf,enumdma_data_directiondirection)

Must be called after accessing a dma_buf from the cpu in the kernel context. Calls end_cpu_access to allow exporter-specific actions. Coherency is only guaranteed in the specified range for the specified access direction.

Parameters

structdma_buf*dmabuf

[in] buffer to complete cpu access for.

enumdma_data_directiondirection

[in] direction of access.

Description

This terminates CPU access started withdma_buf_begin_cpu_access().

Can return negative error values, returns 0 on success.

intdma_buf_mmap(structdma_buf*dmabuf,structvm_area_struct*vma,unsignedlongpgoff)

Setup up a userspace mmap with the given vma

Parameters

structdma_buf*dmabuf

[in] buffer that should back the vma

structvm_area_struct*vma

[in] vma for the mmap

unsignedlongpgoff

[in] offset in pages where this mmap should start within thedma-buf buffer.

Description

This function adjusts the passed in vma so that it points at the file of thedma_buf operation. It also adjusts the starting pgoff and does boundschecking on the size of the vma. Then it calls the exporters mmap function toset up the mapping.

Can return negative error values, returns 0 on success.

intdma_buf_vmap(structdma_buf*dmabuf,structiosys_map*map)

Create virtual mapping for the buffer object into kernel address space. Same restrictions as for vmap and friends apply.

Parameters

structdma_buf*dmabuf

[in] buffer to vmap

structiosys_map*map

[out] returns the vmap pointer

Description

This call may fail due to lack of virtual mapping address space.These calls are optional in drivers. The intended use for themis for mapping objects linear in kernel space for high use objects.

To ensure coherency users must calldma_buf_begin_cpu_access() anddma_buf_end_cpu_access() around any cpu access performed through thismapping.

Returns 0 on success, or a negative errno code otherwise.

intdma_buf_vmap_unlocked(structdma_buf*dmabuf,structiosys_map*map)

Create virtual mapping for the buffer object into kernel address space. Same restrictions as for vmap and friends apply.

Parameters

structdma_buf*dmabuf

[in] buffer to vmap

structiosys_map*map

[out] returns the vmap pointer

Description

Unlocked version ofdma_buf_vmap()

Returns 0 on success, or a negative errno code otherwise.

voiddma_buf_vunmap(structdma_buf*dmabuf,structiosys_map*map)

Unmap a vmap obtained by dma_buf_vmap.

Parameters

structdma_buf*dmabuf

[in] buffer to vunmap

structiosys_map*map

[in] vmap pointer to vunmap

voiddma_buf_vunmap_unlocked(structdma_buf*dmabuf,structiosys_map*map)

Unmap a vmap obtained by dma_buf_vmap.

Parameters

structdma_buf*dmabuf

[in] buffer to vunmap

structiosys_map*map

[in] vmap pointer to vunmap

structdma_buf_ops

operations possible onstructdma_buf

Definition:

struct dma_buf_ops {    int (*attach)(struct dma_buf *, struct dma_buf_attachment *);    void (*detach)(struct dma_buf *, struct dma_buf_attachment *);    int (*pin)(struct dma_buf_attachment *attach);    void (*unpin)(struct dma_buf_attachment *attach);    struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *, enum dma_data_direction);    void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction);    void (*release)(struct dma_buf *);    int (*begin_cpu_access)(struct dma_buf *, enum dma_data_direction);    int (*end_cpu_access)(struct dma_buf *, enum dma_data_direction);    int (*mmap)(struct dma_buf *, struct vm_area_struct *vma);    int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map);    void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map);};

Members

attach

This is called fromdma_buf_attach() to make sure that a givendma_buf_attachment.dev can access the provideddma_buf. Exporterswhich support buffer objects in special locations like VRAM ordevice-specific carveout areas should check whether the buffer couldbe move to system memory (or directly accessed by the provideddevice), and otherwise need to fail the attach operation.

The exporter should also in general check whether the currentallocation fulfills the DMA constraints of the new device. If thisis not the case, and the allocation cannot be moved, it should alsofail the attach operation.

Any exporter-private housekeeping data can be stored in thedma_buf_attachment.priv pointer.

This callback is optional.

Returns:

0 on success, negative error code on failure. It might return -EBUSYto signal that backing storage is already allocated and incompatiblewith the requirements of requesting device.

detach

This is called bydma_buf_detach() to release adma_buf_attachment.Provided so that exporters can clean up any housekeeping for andma_buf_attachment.

This callback is optional.

pin

This is called bydma_buf_pin() and lets the exporter know that theDMA-buf can’t be moved any more. Ideally, the exporter shouldpin the buffer so that it is generally accessible by alldevices.

This is called with thedmabuf.resv object locked and is mutualexclusive withcache_sgt_mapping.

This is called automatically for non-dynamic importers fromdma_buf_attach().

Note that similar to non-dynamic exporters in theirmap_dma_bufcallback the driver must guarantee that the memory is available foruse and cleared of any old data by the time this function returns.Drivers which pipeline their buffer moves internally must wait forall moves and clears to complete.

Returns:

0 on success, negative error code on failure.

unpin

This is called bydma_buf_unpin() and lets the exporter know that theDMA-buf can be moved again.

This is called with the dmabuf->resv object locked and is mutualexclusive withcache_sgt_mapping.

This callback is optional.

map_dma_buf

This is called bydma_buf_map_attachment() and is used to map ashareddma_buf into device address space, and it is mandatory. Itcan only be called ifattach has been called successfully.

This call may sleep, e.g. when the backing storage first needs to beallocated, or moved to a location suitable for all currently attacheddevices.

Note that any specific buffer attributes required for this functionshould get added to device_dma_parameters accessible viadevice.dma_params from thedma_buf_attachment. Theattach callbackshould also check these constraints.

If this is being called for the first time, the exporter can nowchoose to scan through the list of attachments for this buffer,collate the requirements of the attached devices, and choose anappropriate backing storage for the buffer.

Based onenumdma_data_direction, it might be possible to havemultiple users accessing at the same time (for reading, maybe), orany other kind of sharing that the exporter might wish to makeavailable to buffer-users.

This is always called with the dmabuf->resv object locked whenthe dynamic_mapping flag is true.

Note that for non-dynamic exporters the driver must guarantee thatthat the memory is available for use and cleared of any old data bythe time this function returns. Drivers which pipeline their buffermoves internally must wait for all moves and clears to complete.Dynamic exporters do not need to follow this rule: For non-dynamicimporters the buffer is already pinned throughpin, which has thesame requirements. Dynamic importers otoh are required to obey thedma_resv fences.

Returns:

Asg_table scatter list of the backing storage of the DMA buffer,already mapped into the device address space of thedevice attachedwith the provideddma_buf_attachment. The addresses and lengths inthe scatter list are PAGE_SIZE aligned.

On failure, returns a negative error value wrapped into a pointer.May also return -EINTR when a signal was received while beingblocked.

Note that exporters should not try to cache the scatter list, orreturn the same one for multiple calls. Caching is done either by theDMA-BUF code (for non-dynamic importers) or the importer. Ownershipof the scatter list is transferred to the caller, and returned byunmap_dma_buf.

unmap_dma_buf

This is called bydma_buf_unmap_attachment() and should unmap andrelease thesg_table allocated inmap_dma_buf, and it is mandatory.For static dma_buf handling this might also unpin the backingstorage if this is the last mapping of the DMA buffer.

release

Called after the last dma_buf_put to release thedma_buf, andmandatory.

begin_cpu_access

This is called fromdma_buf_begin_cpu_access() and allows theexporter to ensure that the memory is actually coherent for cpuaccess. The exporter also needs to ensure that cpu access is coherentfor the access direction. The direction can be used by the exporterto optimize the cache flushing, i.e. access with a differentdirection (read instead of write) might return stale or even bogusdata (e.g. when the exporter needs to copy the data to temporarystorage).

Note that this is both called through the DMA_BUF_IOCTL_SYNC IOCTLcommand for userspace mappings established throughmmap, and alsofor kernel mappings established withvmap.

This callback is optional.

Returns:

0 on success or a negative error code on failure. This can forexample fail when the backing storage can’t be allocated. Can alsoreturn -ERESTARTSYS or -EINTR when the call has been interrupted andneeds to be restarted.

end_cpu_access

This is called fromdma_buf_end_cpu_access() when the importer isdone accessing the CPU. The exporter can use this to flush caches andundo anything else done inbegin_cpu_access.

This callback is optional.

Returns:

0 on success or a negative error code on failure. Can return-ERESTARTSYS or -EINTR when the call has been interrupted and needsto be restarted.

mmap

This callback is used by thedma_buf_mmap() function

Note that the mapping needs to be incoherent, userspace is expectedto bracket CPU access using the DMA_BUF_IOCTL_SYNC interface.

Because dma-buf buffers have invariant size over their lifetime, thedma-buf core checks whether a vma is too large and rejects suchmappings. The exporter hence does not need to duplicate this check.Drivers do not need to check this themselves.

If an exporter needs to manually flush caches and hence needs to fakecoherency for mmap support, it needs to be able to zap all the ptespointing at the backing storage. Now linux mm needs astructaddress_space associated with thestructfile stored in vma->vm_fileto do that with the function unmap_mapping_range. But the dma_bufframework only backs every dma_buf fd with the anon_filestructfile,i.e. all dma_bufs share the same file.

Hence exporters need to setup their own file (and address_space)association by setting vma->vm_file and adjusting vma->vm_pgoff inthe dma_buf mmap callback. In the specific case of a gem driver theexporter could use the shmem file already provided by gem (and setvm_pgoff = 0). Exporters can then zap ptes by unmapping thecorresponding range of thestructaddress_space associated with theirown file.

This callback is optional.

Returns:

0 on success or a negative error code on failure.

vmap

[optional] creates a virtual mapping for the buffer into kerneladdress space. Same restrictions as for vmap and friends apply.

vunmap

[optional] unmaps a vmap from the buffer

structdma_buf

shared buffer object

Definition:

struct dma_buf {    size_t size;    struct file *file;    struct list_head attachments;    const struct dma_buf_ops *ops;    unsigned vmapping_counter;    struct iosys_map vmap_ptr;    const char *exp_name;    const char *name;    spinlock_t name_lock;    struct module *owner;    struct list_head list_node;    void *priv;    struct dma_resv *resv;    wait_queue_head_t poll;    struct dma_buf_poll_cb_t {        struct dma_fence_cb cb;        wait_queue_head_t *poll;        __poll_t active;    } cb_in, cb_out;};

Members

size

Size of the buffer; invariant over the lifetime of the buffer.

file

File pointer used for sharing buffers across, and for refcounting.Seedma_buf_get() anddma_buf_put().

attachments

List of dma_buf_attachment that denotes all devices attached,protected bydma_resv lockresv.

ops

dma_buf_ops associated with this buffer object.

vmapping_counter

Used internally to refcnt the vmaps returned bydma_buf_vmap().Protected bylock.

vmap_ptr

The current vmap ptr ifvmapping_counter > 0. Protected bylock.

exp_name

Name of the exporter; useful for debugging. Must not be NULL

name

Userspace-provided name. Default value is NULL. If not NULL,length cannot be longer than DMA_BUF_NAME_LEN, including NILchar. Useful for accounting and debugging. Read/Write accessesare protected byname_lock

See the IOCTLs DMA_BUF_SET_NAME or DMA_BUF_SET_NAME_A/B

name_lock

Spinlock to protect name access for read access.

owner

Pointer to exporter module; used for refcounting when exporter is akernel module.

list_node

node for dma_buf accounting and debugging.

priv

exporter specific private data for this buffer object.

resv

Reservation object linked to this dma-buf.

IMPLICIT SYNCHRONIZATION RULES:

Drivers which support implicit synchronization of buffer access ase.g. exposed inImplicit Fence Poll Support must follow thebelow rules.

  • Drivers must add a read fence throughdma_resv_add_fence() with theDMA_RESV_USAGE_READ flag for anything the userspace API considers aread access. This highly depends upon the API and window system.

  • Similarly drivers must add a write fence throughdma_resv_add_fence() with the DMA_RESV_USAGE_WRITE flag foranything the userspace API considers write access.

  • Drivers may just always add a write fence, since that onlycauses unnecessary synchronization, but no correctness issues.

  • Some drivers only expose a synchronous userspace API with nopipelining across drivers. These do not set any fences for theiraccess. An example here is v4l.

  • Driver should usedma_resv_usage_rw() when retrieving fences asdependency for implicit synchronization.

DYNAMIC IMPORTER RULES:

Dynamic importers, seedma_buf_attachment_is_dynamic(), haveadditional constraints on how they set up fences:

  • Dynamic importers must obey the write fences and wait for them tosignal before allowing access to the buffer’s underlying storagethrough the device.

  • Dynamic importers should set fences for any access that they can’tdisable immediately from theirdma_buf_attach_ops.invalidate_mappingscallback.

IMPORTANT:

All drivers and memory management related functions must obey thestructdma_resv rules, specifically the rules for updating andobeying fences. Seeenumdma_resv_usage for further descriptions.

poll

for userspace poll support

cb_in

for userspace poll support

cb_out

for userspace poll support

Description

This represents a shared buffer, created by callingdma_buf_export(). Theuserspace representation is a normal file descriptor, which can be created bycallingdma_buf_fd().

Shared dma buffers are reference counted usingdma_buf_put() andget_dma_buf().

Device DMA access is handled by the separatestructdma_buf_attachment.

structdma_buf_attach_ops

importer operations for an attachment

Definition:

struct dma_buf_attach_ops {    bool allow_peer2peer;    void (*invalidate_mappings)(struct dma_buf_attachment *attach);};

Members

allow_peer2peer

If this is set to true the importer must be able to handle peerresources withoutstructpages.

invalidate_mappings

[optional] notification that the DMA-buf is moving

If this callback is provided the framework can avoid pinning thebacking store while mappings exists.

This callback is called with the lock of the reservation objectassociated with the dma_buf held and the mapping function must becalled with this lock held as well. This makes sure that no mappingis created concurrently with an ongoing move operation.

Mappings stay valid and are not directly affected by this callback.But the DMA-buf can now be in a different physical location, so allmappings should be destroyed and re-created as soon as possible.

New mappings can be created after this callback returns, and willpoint to the new location of the DMA-buf.

Description

Attachment operations implemented by the importer.

structdma_buf_attachment

holds device-buffer attachment data

Definition:

struct dma_buf_attachment {    struct dma_buf *dmabuf;    struct device *dev;    struct list_head node;    bool peer2peer;    const struct dma_buf_attach_ops *importer_ops;    void *importer_priv;    void *priv;};

Members

dmabuf

buffer for this attachment.

dev

device attached to the buffer.

node

list of dma_buf_attachment, protected by dma_resv lock of the dmabuf.

peer2peer

true if the importer can handle peer resources without pages.

importer_ops

importer operations for this attachment, if provideddma_buf_map/unmap_attachment() must be called with the dma_resv lock held.

importer_priv

importer specific attachment data.

priv

exporter specific attachment data.

Description

This structure holds the attachment information between the dma_buf bufferand its user device(s). The list contains one attachmentstructper deviceattached to the buffer.

An attachment is created by callingdma_buf_attach(), and released again bycallingdma_buf_detach(). The DMA mapping itself needed to initiate atransfer is created bydma_buf_map_attachment() and freed again by callingdma_buf_unmap_attachment().

structdma_buf_export_info

holds information needed to export a dma_buf

Definition:

struct dma_buf_export_info {    const char *exp_name;    struct module *owner;    const struct dma_buf_ops *ops;    size_t size;    int flags;    struct dma_resv *resv;    void *priv;};

Members

exp_name

name of the exporter - useful for debugging.

owner

pointer to exporter module - used for refcounting kernel module

ops

Attach allocator-defined dma buf ops to the new buffer

size

Size of the buffer - invariant over the lifetime of the buffer

flags

mode flags for the file

resv

reservation-object, NULL to allocate default one

priv

Attach private data of allocator to this buffer

Description

This structure holds the information required to export the buffer. Usedwithdma_buf_export() only.

structdma_buf_phys_vec

describe continuous chunk of memory

Definition:

struct dma_buf_phys_vec {    phys_addr_t paddr;    size_t len;};

Members

paddr

physical address of that chunk

len

Length of this chunk

DEFINE_DMA_BUF_EXPORT_INFO

DEFINE_DMA_BUF_EXPORT_INFO(name)

helper macro for exporters

Parameters

name

export-info name

Description

DEFINE_DMA_BUF_EXPORT_INFO macro defines thestructdma_buf_export_info,zeroes it out and pre-populates exp_name in it.

voidget_dma_buf(structdma_buf*dmabuf)

convenience wrapper for get_file.

Parameters

structdma_buf*dmabuf

[in] pointer to dma_buf

Description

Increments the reference count on the dma-buf, needed in case of driversthat either need to create additional references to the dmabuf on thekernel side. For example, an exporter that needs to keep a dmabuf ptrso that subsequent exports don’t create a new dmabuf.

booldma_buf_is_dynamic(structdma_buf*dmabuf)

check if a DMA-buf uses dynamic mappings.

Parameters

structdma_buf*dmabuf

the DMA-buf to check

Description

Returns true if a DMA-buf exporter wants to be called with the dma_resvlocked for the map/unmap callbacks, false if it doesn’t wants to be calledwith the lock held.

Reservation Objects

The reservation object provides a mechanism to manage a container ofdma_fence object associated with a resource. A reservation objectcan have any number of fences attaches to it. Each fence carries an usageparameter determining how the operation represented by the fence is using theresource. The RCU mechanism is used to protect read access to fences fromlocked write-side updates.

Seestructdma_resv for more details.

voiddma_resv_init(structdma_resv*obj)

initialize a reservation object

Parameters

structdma_resv*obj

the reservation object

voiddma_resv_fini(structdma_resv*obj)

destroys a reservation object

Parameters

structdma_resv*obj

the reservation object

intdma_resv_reserve_fences(structdma_resv*obj,unsignedintnum_fences)

Reserve space to add fences to a dma_resv object.

Parameters

structdma_resv*obj

reservation object

unsignedintnum_fences

number of fences we want to add

Description

Should be called beforedma_resv_add_fence(). Must be called withobjlocked throughdma_resv_lock().

Note that the preallocated slots need to be re-reserved ifobj is unlockedat any time before callingdma_resv_add_fence(). This is validated whenCONFIG_DEBUG_MUTEXES is enabled.

RETURNSZero for success, or -errno

voiddma_resv_reset_max_fences(structdma_resv*obj)

reset fences for debugging

Parameters

structdma_resv*obj

the dma_resv object to reset

Description

Reset the number of pre-reserved fence slots to test that drivers docorrect slot allocation usingdma_resv_reserve_fences(). See alsodma_resv_list.max_fences.

voiddma_resv_add_fence(structdma_resv*obj,structdma_fence*fence,enumdma_resv_usageusage)

Add a fence to the dma_resv obj

Parameters

structdma_resv*obj

the reservation object

structdma_fence*fence

the fence to add

enumdma_resv_usageusage

how the fence is used, seeenumdma_resv_usage

Description

Add a fence to a slot,obj must be locked withdma_resv_lock(), anddma_resv_reserve_fences() has been called.

See alsodma_resv.fence for a discussion of the semantics.

voiddma_resv_replace_fences(structdma_resv*obj,uint64_tcontext,structdma_fence*replacement,enumdma_resv_usageusage)

replace fences in the dma_resv obj

Parameters

structdma_resv*obj

the reservation object

uint64_tcontext

the context of the fences to replace

structdma_fence*replacement

the new fence to use instead

enumdma_resv_usageusage

how the new fence is used, seeenumdma_resv_usage

Description

Replace fences with a specified context with a new fence. Only valid if theoperation represented by the original fence has no longer access to theresources represented by the dma_resv object when the new fence completes.

And example for using this is replacing a preemption fence with a page tableupdate fence which makes the resource inaccessible.

structdma_fence*dma_resv_iter_first_unlocked(structdma_resv_iter*cursor)

first fence in an unlocked dma_resv obj.

Parameters

structdma_resv_iter*cursor

the cursor with the current position

Description

Subsequent fences are iterated withdma_resv_iter_next_unlocked().

Beware that the iterator can be restarted. Code which accumulates statisticsor similar needs to check for this withdma_resv_iter_is_restarted(). Forthis reason prefer the lockeddma_resv_iter_first() whenever possible.

Returns the first fence from an unlocked dma_resv obj.

structdma_fence*dma_resv_iter_next_unlocked(structdma_resv_iter*cursor)

next fence in an unlocked dma_resv obj.

Parameters

structdma_resv_iter*cursor

the cursor with the current position

Description

Beware that the iterator can be restarted. Code which accumulates statisticsor similar needs to check for this withdma_resv_iter_is_restarted(). Forthis reason prefer the lockeddma_resv_iter_next() whenever possible.

Returns the next fence from an unlocked dma_resv obj.

structdma_fence*dma_resv_iter_first(structdma_resv_iter*cursor)

first fence from a locked dma_resv object

Parameters

structdma_resv_iter*cursor

cursor to record the current position

Description

Subsequent fences are iterated withdma_resv_iter_next_unlocked().

Return the first fence in the dma_resv object while holding thedma_resv.lock.

structdma_fence*dma_resv_iter_next(structdma_resv_iter*cursor)

next fence from a locked dma_resv object

Parameters

structdma_resv_iter*cursor

cursor to record the current position

Description

Return the next fences from the dma_resv object while holding thedma_resv.lock.

intdma_resv_copy_fences(structdma_resv*dst,structdma_resv*src)

Copy all fences from src to dst.

Parameters

structdma_resv*dst

the destination reservation object

structdma_resv*src

the source reservation object

Description

Copy all fences from src to dst. dst-lock must be held.

intdma_resv_get_fences(structdma_resv*obj,enumdma_resv_usageusage,unsignedint*num_fences,structdma_fence***fences)

Get an object’s fences fences without update side lock held

Parameters

structdma_resv*obj

the reservation object

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

unsignedint*num_fences

the number of fences returned

structdma_fence***fences

the array of fence ptrs returned (array is krealloc’d to therequired size, and must be freed by caller)

Description

Retrieve all fences from the reservation object.Returns either zero or -ENOMEM.

intdma_resv_get_singleton(structdma_resv*obj,enumdma_resv_usageusage,structdma_fence**fence)

Get a single fence for all the fences

Parameters

structdma_resv*obj

the reservation object

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

structdma_fence**fence

the resulting fence

Description

Get a single fence representing all the fences inside the resv object.Returns either 0 for success or -ENOMEM.

Warning: This can’t be used like this when adding the fence back to the resvobject since that can lead to stack corruption when finalizing thedma_fence_array.

Returns 0 on success and negative error values on failure.

longdma_resv_wait_timeout(structdma_resv*obj,enumdma_resv_usageusage,boolintr,unsignedlongtimeout)

Wait on reservation’s objects fences

Parameters

structdma_resv*obj

the reservation object

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

boolintr

if true, do interruptible wait

unsignedlongtimeout

timeout value in jiffies or zero to return immediately

Description

Callers are not required to hold specific locks, but maybe holddma_resv_lock() alreadyRETURNSReturns -ERESTARTSYS if interrupted, 0 if the wait timed out, orgreater than zero on success.

voiddma_resv_set_deadline(structdma_resv*obj,enumdma_resv_usageusage,ktime_tdeadline)

Set a deadline on reservation’s objects fences

Parameters

structdma_resv*obj

the reservation object

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

ktime_tdeadline

the requested deadline (MONOTONIC)

Description

May be called without holding the dma_resv lock. Setsdeadline onall fences filtered byusage.

booldma_resv_test_signaled(structdma_resv*obj,enumdma_resv_usageusage)

Test if a reservation object’s fences have been signaled.

Parameters

structdma_resv*obj

the reservation object

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

Description

Callers are not required to hold specific locks, but maybe holddma_resv_lock() already.

RETURNS

True if all fences signaled, else false.

voiddma_resv_describe(structdma_resv*obj,structseq_file*seq)

Dump description of the resv object into seq_file

Parameters

structdma_resv*obj

the reservation object

structseq_file*seq

the seq_file to dump the description into

Description

Dump a textual description of the fences inside an dma_resv object into theseq_file.

enumdma_resv_usage

how the fences from a dma_resv obj are used

Constants

DMA_RESV_USAGE_KERNEL

For in kernel memory management only.

This should only be used for things like copying or clearing memorywith a DMA hardware engine for the purpose of kernel memorymanagement.

Driversalways must wait for those fences before accessing theresource protected by the dma_resv object. The only exception forthat is when the resource is known to be locked down in place bypinning it previously.

DMA_RESV_USAGE_WRITE

Implicit write synchronization.

This should only be used for userspace command submissions which addan implicit write dependency.

DMA_RESV_USAGE_READ

Implicit read synchronization.

This should only be used for userspace command submissions which addan implicit read dependency.

DMA_RESV_USAGE_BOOKKEEP

No implicit sync.

This should be used by submissions which don’t want to participate inany implicit synchronization.

The most common cases are preemption fences, page table updates, TLBflushes as well as explicitly synced user submissions.

Explicitly synced user submissions can be promoted toDMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE as needed usingdma_buf_import_sync_file() when implicit synchronization shouldbecome necessary after initial adding of the fence.

Description

Thisenumdescribes the different use cases for a dma_resv object andcontrols which fences are returned when queried.

An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP andwhen the dma_resv object is asked for fences for one use case the fencesfor the lower use case are returned as well.

For example when asking for WRITE fences then the KERNEL fences are returnedas well. Similar when asked for READ fences then both WRITE and KERNELfences are returned as well.

Already used fences can be promoted in the sense that a fence withDMA_RESV_USAGE_BOOKKEEP could become DMA_RESV_USAGE_READ by adding it againwith this usage. But fences can never be degraded in the sense that a fencewith DMA_RESV_USAGE_WRITE could become DMA_RESV_USAGE_READ.

enumdma_resv_usagedma_resv_usage_rw(boolwrite)

helper for implicit sync

Parameters

boolwrite

true if we create a new implicit sync write

Description

This returns the implicit synchronization usage for write or read accesses,seeenumdma_resv_usage anddma_buf.resv.

structdma_resv

a reservation object manages fences for a buffer

Definition:

struct dma_resv {    struct ww_mutex lock;    struct dma_resv_list  *fences;};

Members

lock

Update side lock. Don’t use directly, instead use the wrapperfunctions likedma_resv_lock() anddma_resv_unlock().

Drivers which use the reservation object to manage memory dynamicallyalso use this lock to protect buffer object state like placement,allocation policies or throughout command submission.

fences

Array of fences which where added to the dma_resv object

A new fence is added by callingdma_resv_add_fence(). Since thisoften needs to be done past the point of no return in commandsubmission it cannot fail, and therefore sufficient slots need to bereserved by callingdma_resv_reserve_fences().

Description

This is a container for dma_fence objects which needs to handle multiple usecases.

One use is to synchronize cross-driver access to astructdma_buf, either fordynamic buffer management or just to handle implicit synchronization betweendifferent users of the buffer in userspace. Seedma_buf.resv for a morein-depth discussion.

The other major use is to manage access and locking within a driver in abuffer based memory manager.structttm_buffer_object is the canonicalexample here, since this is where reservation objects originated from. Butuse in drivers is spreading and some drivers also managestructdrm_gem_object with the same scheme.

structdma_resv_iter

current position into the dma_resv fences

Definition:

struct dma_resv_iter {    struct dma_resv *obj;    enum dma_resv_usage usage;    struct dma_fence *fence;    enum dma_resv_usage fence_usage;    unsigned int index;    struct dma_resv_list *fences;    unsigned int num_fences;    bool is_restarted;};

Members

obj

The dma_resv object we iterate over

usage

Return fences with this usage or lower.

fence

the currently handled fence

fence_usage

the usage of the current fence

index

index into the shared fences

fences

the shared fences; private,MUST not dereference

num_fences

number of fences

is_restarted

true if this is the first returned fence

Description

Don’t touch this directly in the driver, use the accessor function instead.

IMPORTANT

When using the lockless iterators likedma_resv_iter_next_unlocked() ordma_resv_for_each_fence_unlocked() beware that the iterator can be restarted.Code which accumulates statistics or similar needs to check for this withdma_resv_iter_is_restarted().

voiddma_resv_iter_begin(structdma_resv_iter*cursor,structdma_resv*obj,enumdma_resv_usageusage)

initialize a dma_resv_iter object

Parameters

structdma_resv_iter*cursor

The dma_resv_iter object to initialize

structdma_resv*obj

The dma_resv object which we want to iterate over

enumdma_resv_usageusage

controls which fences to include, seeenumdma_resv_usage.

voiddma_resv_iter_end(structdma_resv_iter*cursor)

cleanup a dma_resv_iter object

Parameters

structdma_resv_iter*cursor

the dma_resv_iter object which should be cleaned up

Description

Make sure that the reference to the fence in the cursor is properlydropped.

enumdma_resv_usagedma_resv_iter_usage(structdma_resv_iter*cursor)

Return the usage of the current fence

Parameters

structdma_resv_iter*cursor

the cursor of the current position

Description

Returns the usage of the currently processed fence.

booldma_resv_iter_is_restarted(structdma_resv_iter*cursor)

test if this is the first fence after a restart

Parameters

structdma_resv_iter*cursor

the cursor with the current position

Description

Return true if this is the first fence in an iteration after a restart.

dma_resv_for_each_fence_unlocked

dma_resv_for_each_fence_unlocked(cursor,fence)

unlocked fence iterator

Parameters

cursor

astructdma_resv_iter pointer

fence

the current fence

Description

Iterate over the fences in astructdma_resv object without holding thedma_resv.lock and using RCU instead. The cursor needs to be initializedwithdma_resv_iter_begin() and cleaned up withdma_resv_iter_end(). Insidethe iterator a reference to the dma_fence is held and the RCU lock dropped.

Beware that the iterator can be restarted when thestructdma_resv forcursor is modified. Code which accumulates statistics or similar needs tocheck for this withdma_resv_iter_is_restarted(). For this reason prefer thelock iteratordma_resv_for_each_fence() whenever possible.

dma_resv_for_each_fence

dma_resv_for_each_fence(cursor,obj,usage,fence)

fence iterator

Parameters

cursor

astructdma_resv_iter pointer

obj

a dma_resv object pointer

usage

controls which fences to return

fence

the current fence

Description

Iterate over the fences in astructdma_resv object while holding thedma_resv.lock.all_fences controls if the shared fences are returned aswell. The cursor initialisation is part of the iterator and the fence staysvalid as long as the lock is held and so no extra reference to the fence istaken.

intdma_resv_lock(structdma_resv*obj,structww_acquire_ctx*ctx)

lock the reservation object

Parameters

structdma_resv*obj

the reservation object

structww_acquire_ctx*ctx

the locking context

Description

Locks the reservation object for exclusive access and modification. Note,that the lock is only against other writers, readers will run concurrentlywith a writer under RCU. The seqlock is used to notify readers if theyoverlap with a writer.

As the reservation object may be locked by multiple parties in anundefined order, a #ww_acquire_ctx is passed to unwind if a cycleis detected. Seeww_mutex_lock() andww_acquire_init(). A reservationobject may be locked by itself by passing NULL asctx.

When a die situation is indicated by returning -EDEADLK all locks held byctx must be unlocked and thendma_resv_lock_slow() called onobj.

Unlocked by callingdma_resv_unlock().

See alsodma_resv_lock_interruptible() for the interruptible variant.

intdma_resv_lock_interruptible(structdma_resv*obj,structww_acquire_ctx*ctx)

lock the reservation object

Parameters

structdma_resv*obj

the reservation object

structww_acquire_ctx*ctx

the locking context

Description

Locks the reservation object interruptible for exclusive access andmodification. Note, that the lock is only against other writers, readerswill run concurrently with a writer under RCU. The seqlock is used tonotify readers if they overlap with a writer.

As the reservation object may be locked by multiple parties in anundefined order, a #ww_acquire_ctx is passed to unwind if a cycleis detected. Seeww_mutex_lock() andww_acquire_init(). A reservationobject may be locked by itself by passing NULL asctx.

When a die situation is indicated by returning -EDEADLK all locks held byctx must be unlocked and thendma_resv_lock_slow_interruptible() called onobj.

Unlocked by callingdma_resv_unlock().

voiddma_resv_lock_slow(structdma_resv*obj,structww_acquire_ctx*ctx)

slowpath lock the reservation object

Parameters

structdma_resv*obj

the reservation object

structww_acquire_ctx*ctx

the locking context

Description

Acquires the reservation object after a die case. This functionwill sleep until the lock becomes available. Seedma_resv_lock() aswell.

See alsodma_resv_lock_slow_interruptible() for the interruptible variant.

intdma_resv_lock_slow_interruptible(structdma_resv*obj,structww_acquire_ctx*ctx)

slowpath lock the reservation object, interruptible

Parameters

structdma_resv*obj

the reservation object

structww_acquire_ctx*ctx

the locking context

Description

Acquires the reservation object interruptible after a die case. This functionwill sleep until the lock becomes available. Seedma_resv_lock_interruptible() as well.

booldma_resv_trylock(structdma_resv*obj)

trylock the reservation object

Parameters

structdma_resv*obj

the reservation object

Description

Tries to lock the reservation object for exclusive access and modification.Note, that the lock is only against other writers, readers will runconcurrently with a writer under RCU. The seqlock is used to notify readersif they overlap with a writer.

Also note that since no context is provided, no deadlock protection ispossible, which is also not needed for a trylock.

Returns true if the lock was acquired, false otherwise.

booldma_resv_is_locked(structdma_resv*obj)

is the reservation object locked

Parameters

structdma_resv*obj

the reservation object

Description

Returns true if the mutex is locked, false if unlocked.

structww_acquire_ctx*dma_resv_locking_ctx(structdma_resv*obj)

returns the context used to lock the object

Parameters

structdma_resv*obj

the reservation object

Description

Returns the context used to lock a reservation object or NULL if no contextwas used or the object is not locked at all.

WARNING: This interface is pretty horrible, but TTM needs it because itdoesn’t pass thestructww_acquire_ctx around in some very long callchains.Everyone else just uses it to check whether they’re holding a reservation ornot.

voiddma_resv_unlock(structdma_resv*obj)

unlock the reservation object

Parameters

structdma_resv*obj

the reservation object

Description

Unlocks the reservation object following exclusive access.

DMA Fences

DMA fences, represented bystructdma_fence, are the kernel internalsynchronization primitive for DMA operations like GPU rendering, videoencoding/decoding, or displaying buffers on a screen.

A fence is initialized usingdma_fence_init() and completed usingdma_fence_signal(). Fences are associated with a context, allocated throughdma_fence_context_alloc(), and all fences on the same context arefully ordered.

Since the purposes of fences is to facilitate cross-device andcross-application synchronization, there’s multiple ways to use one:

  • Individual fences can be exposed as async_file, accessed as a filedescriptor from userspace, created by callingsync_file_create(). This iscalled explicit fencing, since userspace passes around explicitsynchronization points.

  • Some subsystems also have their own explicit fencing primitives, likedrm_syncobj. Compared tosync_file, adrm_syncobj allows the underlyingfence to be updated.

  • Then there’s also implicit fencing, where the synchronization points areimplicitly passed around as part of shareddma_buf instances. Suchimplicit fences are stored instructdma_resv through thedma_buf.resv pointer.

DMA Fence Cross-Driver Contract

Sincedma_fence provide a cross driver contract, all drivers must follow thesame rules:

  • Fences must complete in a reasonable time. Fences which represent kernelsand shaders submitted by userspace, which could run forever, must be backedup by timeout and gpu hang recovery code. Minimally that code must preventfurther command submission and force complete all in-flight fences, e.g.when the driver or hardware do not support gpu reset, or if the gpu resetfailed for some reason. Ideally the driver supports gpu recovery which onlyaffects the offending userspace context, and no other userspacesubmissions.

  • Drivers may have different ideas of what completion within a reasonabletime means. Some hang recovery code uses a fixed timeout, others a mixbetween observing forward progress and increasingly strict timeouts.Drivers should not try to second guess timeout handling of fences fromother drivers.

  • To ensure there’s no deadlocks ofdma_fence_wait() against other locksdrivers should annotate all code required to reachdma_fence_signal(),which completes the fences, withdma_fence_begin_signalling() anddma_fence_end_signalling().

  • Drivers are allowed to calldma_fence_wait() while holdingdma_resv_lock().This means any code required for fence completion cannot acquire adma_resv lock. Note that this also pulls in the entire establishedlocking hierarchy arounddma_resv_lock() anddma_resv_unlock().

  • Drivers are allowed to calldma_fence_wait() from theirshrinkercallbacks. This means any code required for fence completion cannotallocate memory with GFP_KERNEL.

  • Drivers are allowed to calldma_fence_wait() from theirmmu_notifierrespectivelymmu_interval_notifier callbacks. This means any code requiredfor fence completion cannot allocate memory with GFP_NOFS or GFP_NOIO.Only GFP_ATOMIC is permissible, which might fail.

Note that only GPU drivers have a reasonable excuse for both requiringmmu_interval_notifier andshrinker callbacks at the same time as having totrack asynchronous compute work usingdma_fence. No driver outside ofdrivers/gpu should ever calldma_fence_wait() in such contexts.

DMA Fence Signalling Annotations

Proving correctness of all the kernel code arounddma_fence through codereview and testing is tricky for a few reasons:

  • It is a cross-driver contract, and therefore all drivers must follow thesame rules for lock nesting order, calling contexts for various functionsand anything else significant for in-kernel interfaces. But it is alsoimpossible to test all drivers in a single machine, hence brute-force N vs.N testing of all combinations is impossible. Even just limiting to thepossible combinations is infeasible.

  • There is an enormous amount of driver code involved. For render driversthere’s the tail of command submission, after fences are published,scheduler code, interrupt and workers to process job completion,and timeout, gpu reset and gpu hang recovery code. Plus for integrationwith core mm with havemmu_notifier, respectivelymmu_interval_notifier,andshrinker. For modesetting drivers there’s the commit tail functionsbetween when fences for an atomic modeset are published, and when thecorresponding vblank completes, including any interrupt processing andrelated workers. Auditing all that code, across all drivers, is notfeasible.

  • Due to how many other subsystems are involved and the locking hierarchiesthis pulls in there is extremely thin wiggle-room for driver-specificdifferences.dma_fence interacts with almost all of the core memoryhandling through page fault handlers viadma_resv,dma_resv_lock() anddma_resv_unlock(). On the other side it also interacts through allallocation sites throughmmu_notifier andshrinker.

Furthermore lockdep does not handle cross-release dependencies, which meansany deadlocks betweendma_fence_wait() anddma_fence_signal() can’t be caughtat runtime with some quick testing. The simplest example is one threadwaiting on adma_fence while holding a lock:

lock(A);dma_fence_wait(B);unlock(A);

while the other thread is stuck trying to acquire the same lock, whichprevents it from signalling the fence the previous thread is stuck waitingon:

lock(A);unlock(A);dma_fence_signal(B);

By manually annotating all code relevant to signalling adma_fence we canteach lockdep about these dependencies, which also helps with the validationheadache since now lockdep can check all the rules for us:

cookie = dma_fence_begin_signalling();lock(A);unlock(A);dma_fence_signal(B);dma_fence_end_signalling(cookie);

For usingdma_fence_begin_signalling() anddma_fence_end_signalling() toannotate critical sections the following rules need to be observed:

  • All code necessary to complete adma_fence must be annotated, from thepoint where a fence is accessible to other threads, to the point wheredma_fence_signal() is called. Un-annotated code can contain deadlock issues,and due to the very strict rules and many corner cases it is infeasible tocatch these just with review or normal stress testing.

  • structdma_resv deserves a special note, since the readers are onlyprotected by rcu. This means the signalling critical section starts as soonas the new fences are installed, even beforedma_resv_unlock() is called.

  • The only exception are fast paths and opportunistic signalling code, whichcallsdma_fence_signal() purely as an optimization, but is not required toguarantee completion of adma_fence. The usual example is a wait IOCTLwhich callsdma_fence_signal(), while the mandatory completion path goesthrough a hardware interrupt and possible job completion worker.

  • To aid composability of code, the annotations can be freely nested, as longas the overall locking hierarchy is consistent. The annotations also workboth in interrupt and process context. Due to implementation details thisrequires that callers pass an opaque cookie fromdma_fence_begin_signalling() todma_fence_end_signalling().

  • Validation against the cross driver contract is implemented by priminglockdep with the relevant hierarchy at boot-up. This means even justtesting with a single device is enough to validate a driver, at least asfar as deadlocks withdma_fence_wait() againstdma_fence_signal() areconcerned.

DMA Fence Deadline Hints

In an ideal world, it would be possible to pipeline a workload sufficientlythat a utilization based device frequency governor could arrive at a minimumfrequency that meets the requirements of the use-case, in order to minimizepower consumption. But in the real world there are many workloads whichdefy this ideal. For example, but not limited to:

  • Workloads that ping-pong between device and CPU, with alternating periodsof CPU waiting for device, and device waiting on CPU. This can result indevfreq and cpufreq seeing idle time in their respective domains and inresult reduce frequency.

  • Workloads that interact with a periodic time based deadline, such as doublebuffered GPU rendering vs vblank sync’d page flipping. In this scenario,missing a vblank deadline results in anincrease in idle time on the GPU(since it has to wait an additional vblank period), sending a signal tothe GPU’s devfreq to reduce frequency, when in fact the opposite is what isneeded.

To this end, deadline hint(s) can be set on adma_fence viadma_fence_set_deadline(or indirectly via userspace facing ioctls likesync_set_deadline).The deadline hint provides a way for the waiting driver, or userspace, toconvey an appropriate sense of urgency to the signaling driver.

A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspacefacing APIs). The time could either be some point in the future (such asthe vblank based deadline for page-flipping, or the start of a compositor’scomposition cycle), or the current time to indicate an immediate deadlinehint (Ie. forward progress cannot be made until this fence is signaled).

Multiple deadlines may be set on a given fence, even in parallel. See thedocumentation fordma_fence_ops.set_deadline.

The deadline hint is just that, a hint. The driver that created the fencemay react by increasing frequency, making different scheduling choices, etc.Or doing nothing at all.

DMA Fences Functions Reference

structdma_fence*dma_fence_get_stub(void)

return a signaled fence

Parameters

void

no arguments

Description

Return a stub fence which is already signaled. The fence’s timestampcorresponds to the initialisation time of the linux kernel.

structdma_fence*dma_fence_allocate_private_stub(ktime_ttimestamp)

return a private, signaled fence

Parameters

ktime_ttimestamp

timestamp when the fence was signaled

Description

Return a newly allocated and signaled stub fence.

u64dma_fence_context_alloc(unsignednum)

allocate an array of fence contexts

Parameters

unsignednum

amount of contexts to allocate

Description

This function will return the first index of the number of fence contextsallocated. The fence context is used for settingdma_fence.context to aunique number by passing the context todma_fence_init().

booldma_fence_begin_signalling(void)

begin a critical DMA fence signalling section

Parameters

void

no arguments

Description

Drivers should use this to annotate the beginning of any code sectionrequired to eventually completedma_fence by callingdma_fence_signal().

The end of these critical sections are annotated withdma_fence_end_signalling().

Opaque cookie needed by the implementation, which needs to be passed todma_fence_end_signalling().

voiddma_fence_end_signalling(boolcookie)

end a critical DMA fence signalling section

Parameters

boolcookie

opaque cookie fromdma_fence_begin_signalling()

Description

Closes a critical section annotation opened bydma_fence_begin_signalling().

voiddma_fence_signal_timestamp_locked(structdma_fence*fence,ktime_ttimestamp)

signal completion of a fence

Parameters

structdma_fence*fence

the fence to signal

ktime_ttimestamp

fence signal timestamp in kernel’s CLOCK_MONOTONIC time domain

Description

Signal completion for software callbacks on a fence, this will unblockdma_fence_wait() calls and run all the callbacks added withdma_fence_add_callback(). Can be called multiple times, but since a fencecan only go from the unsignaled to the signaled state and not back, it willonly be effective the first time. Set the timestamp provided as the fencesignal timestamp.

Unlikedma_fence_signal_timestamp(), this function must be called withdma_fence.lock held.

voiddma_fence_signal_timestamp(structdma_fence*fence,ktime_ttimestamp)

signal completion of a fence

Parameters

structdma_fence*fence

the fence to signal

ktime_ttimestamp

fence signal timestamp in kernel’s CLOCK_MONOTONIC time domain

Description

Signal completion for software callbacks on a fence, this will unblockdma_fence_wait() calls and run all the callbacks added withdma_fence_add_callback(). Can be called multiple times, but since a fencecan only go from the unsignaled to the signaled state and not back, it willonly be effective the first time. Set the timestamp provided as the fencesignal timestamp.

voiddma_fence_signal_locked(structdma_fence*fence)

signal completion of a fence

Parameters

structdma_fence*fence

the fence to signal

Description

Signal completion for software callbacks on a fence, this will unblockdma_fence_wait() calls and run all the callbacks added withdma_fence_add_callback(). Can be called multiple times, but since a fencecan only go from the unsignaled to the signaled state and not back, it willonly be effective the first time.

Unlikedma_fence_signal(), this function must be called withdma_fence.lockheld.

booldma_fence_check_and_signal_locked(structdma_fence*fence)

signal the fence if it’s not yet signaled

Parameters

structdma_fence*fence

the fence to check and signal

Description

Checks whether a fence was signaled and signals it if it was not yet signaled.

Unlikedma_fence_check_and_signal(), this function must be called withstructdma_fence.lock being held.

Return

true if fence has been signaled already, false otherwise.

booldma_fence_check_and_signal(structdma_fence*fence)

signal the fence if it’s not yet signaled

Parameters

structdma_fence*fence

the fence to check and signal

Description

Checks whether a fence was signaled and signals it if it was not yet signaled.All this is done in a race-free manner.

Return

true if fence has been signaled already, false otherwise.

voiddma_fence_signal(structdma_fence*fence)

signal completion of a fence

Parameters

structdma_fence*fence

the fence to signal

Description

Signal completion for software callbacks on a fence, this will unblockdma_fence_wait() calls and run all the callbacks added withdma_fence_add_callback(). Can be called multiple times, but since a fencecan only go from the unsignaled to the signaled state and not back, it willonly be effective the first time.

signedlongdma_fence_wait_timeout(structdma_fence*fence,boolintr,signedlongtimeout)

sleep until the fence gets signaled or until timeout elapses

Parameters

structdma_fence*fence

the fence to wait on

boolintr

if true, do an interruptible wait

signedlongtimeout

timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT

Description

Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or theremaining timeout in jiffies on success. Other error values may bereturned on custom implementations.

Performs a synchronous wait on this fence. It is assumed the callerdirectly or indirectly (buf-mgr between reservation and committing)holds a reference to the fence, otherwise the fence might befreed before return, resulting in undefined behavior.

See alsodma_fence_wait() anddma_fence_wait_any_timeout().

voiddma_fence_release(structkref*kref)

default release function for fences

Parameters

structkref*kref

dma_fence.recfount

Description

This is the default release functions fordma_fence. Drivers shouldn’t callthis directly, but instead calldma_fence_put().

voiddma_fence_free(structdma_fence*fence)

default release function fordma_fence.

Parameters

structdma_fence*fence

fence to release

Description

This is the default implementation fordma_fence_ops.release. It callskfree_rcu() onfence.

voiddma_fence_enable_sw_signaling(structdma_fence*fence)

enable signaling on fence

Parameters

structdma_fence*fence

the fence to enable

Description

This will request for sw signaling to be enabled, to make the fencecomplete as soon as possible. This callsdma_fence_ops.enable_signalinginternally.

intdma_fence_add_callback(structdma_fence*fence,structdma_fence_cb*cb,dma_fence_func_tfunc)

add a callback to be called when the fence is signaled

Parameters

structdma_fence*fence

the fence to wait on

structdma_fence_cb*cb

the callback to register

dma_fence_func_tfunc

the function to call

Description

Add a software callback to the fence. The caller should keep a reference tothe fence.

cb will be initialized bydma_fence_add_callback(), no initializationby the caller is required. Any number of callbacks can be registeredto a fence, but a callback can only be registered to one fence at a time.

If fence is already signaled, this function will return -ENOENT (andnot call the callback).

Note that the callback can be called from an atomic context or irq context.

Returns 0 in case of success, -ENOENT if the fence is already signaledand -EINVAL in case of error.

intdma_fence_get_status(structdma_fence*fence)

returns the status upon completion

Parameters

structdma_fence*fence

the dma_fence to query

Description

This wrapsdma_fence_get_status_locked() to return the error statuscondition on a signaled fence. Seedma_fence_get_status_locked() for moredetails.

Returns 0 if the fence has not yet been signaled, 1 if the fence hasbeen signaled without an error condition, or a negative error codeif the fence has been completed in err.

booldma_fence_remove_callback(structdma_fence*fence,structdma_fence_cb*cb)

remove a callback from the signaling list

Parameters

structdma_fence*fence

the fence to wait on

structdma_fence_cb*cb

the callback to remove

Description

Remove a previously queued callback from the fence. This function returnstrue if the callback is successfully removed, or false if the fence hasalready been signaled.

WARNING:Cancelling a callback should only be done if you really know what you’redoing, since deadlocks and race conditions could occur all too easily. Forthis reason, it should only ever be done on hardware lockup recovery,with a reference held to the fence.

Behaviour is undefined ifcb has not been added tofence usingdma_fence_add_callback() beforehand.

signedlongdma_fence_default_wait(structdma_fence*fence,boolintr,signedlongtimeout)

default sleep until the fence gets signaled or until timeout elapses

Parameters

structdma_fence*fence

the fence to wait on

boolintr

if true, do an interruptible wait

signedlongtimeout

timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT

Description

Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or theremaining timeout in jiffies on success. If timeout is zero the value one isreturned if the fence is already signaled for consistency with otherfunctions taking a jiffies timeout.

signedlongdma_fence_wait_any_timeout(structdma_fence**fences,uint32_tcount,boolintr,signedlongtimeout,uint32_t*idx)

sleep until any fence gets signaled or until timeout elapses

Parameters

structdma_fence**fences

array of fences to wait on

uint32_tcount

number of fences to wait on

boolintr

if true, do an interruptible wait

signedlongtimeout

timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT

uint32_t*idx

used to store the first signaled fence index, meaningful only onpositive return

Description

Returns -EINVAL on custom fence wait implementation, -ERESTARTSYS ifinterrupted, 0 if the wait timed out, or the remaining timeout in jiffieson success.

Synchronous waits for the first fence in the array to be signaled. Thecaller needs to hold a reference to all fences in the array, otherwise afence might be freed before return, resulting in undefined behavior.

See alsodma_fence_wait() anddma_fence_wait_timeout().

voiddma_fence_set_deadline(structdma_fence*fence,ktime_tdeadline)

set desired fence-wait deadline hint

Parameters

structdma_fence*fence

the fence that is to be waited on

ktime_tdeadline

the time by which the waiter hopes for the fence to besignaled

Description

Give the fence signaler a hint about an upcoming deadline, such asvblank, by which point the waiter would prefer the fence to besignaled by. This is intended to give feedback to the fence signalerto aid in power management decisions, such as boosting GPU frequencyif a periodic vblank deadline is approaching but the fence is notyet signaled..

voiddma_fence_describe(structdma_fence*fence,structseq_file*seq)

Dump fence description into seq_file

Parameters

structdma_fence*fence

the fence to describe

structseq_file*seq

the seq_file to put the textual description into

Description

Dump a textual description of the fence and it’s state into the seq_file.

voiddma_fence_init(structdma_fence*fence,conststructdma_fence_ops*ops,spinlock_t*lock,u64context,u64seqno)

Initialize a custom fence.

Parameters

structdma_fence*fence

the fence to initialize

conststructdma_fence_ops*ops

the dma_fence_ops for operations on this fence

spinlock_t*lock

the irqsafe spinlock to use for locking this fence

u64context

the execution context this fence is run on

u64seqno

a linear increasing sequence number for this context

Description

Initializes an allocated fence, the caller doesn’t have to keep itsrefcount after committing with this fence, but it will need to hold arefcount again ifdma_fence_ops.enable_signaling gets called.

context and seqno are used for easy comparison between fences, allowingto check which fence is later by simply usingdma_fence_later().

voiddma_fence_init64(structdma_fence*fence,conststructdma_fence_ops*ops,spinlock_t*lock,u64context,u64seqno)

Initialize a custom fence with 64-bit seqno support.

Parameters

structdma_fence*fence

the fence to initialize

conststructdma_fence_ops*ops

the dma_fence_ops for operations on this fence

spinlock_t*lock

the irqsafe spinlock to use for locking this fence

u64context

the execution context this fence is run on

u64seqno

a linear increasing sequence number for this context

Description

Initializes an allocated fence, the caller doesn’t have to keep itsrefcount after committing with this fence, but it will need to hold arefcount again ifdma_fence_ops.enable_signaling gets called.

Context and seqno are used for easy comparison between fences, allowingto check which fence is later by simply usingdma_fence_later().

constchar__rcu*dma_fence_driver_name(structdma_fence*fence)

Access the driver name

Parameters

structdma_fence*fence

the fence to query

Description

Returns a driver name backing the dma-fence implementation.

IMPORTANT CONSIDERATION:Dma-fence contract stipulates that access to driver provided data (data notdirectly embedded into the object itself), such as thedma_fence.lock andmemory potentially accessed by thedma_fence.ops functions, is forbiddenafter the fence has been signalled. Drivers are allowed to free that data,and some do.

To allow safe access drivers are mandated to guarantee a RCU grace periodbetween signalling the fence and freeing said data.

As such access to the driver name is only valid inside a RCU locked section.The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guardedby thercu_read_lock andrcu_read_unlock pair.

constchar__rcu*dma_fence_timeline_name(structdma_fence*fence)

Access the timeline name

Parameters

structdma_fence*fence

the fence to query

Description

Returns a timeline name provided by the dma-fence implementation.

IMPORTANT CONSIDERATION:Dma-fence contract stipulates that access to driver provided data (data notdirectly embedded into the object itself), such as thedma_fence.lock andmemory potentially accessed by thedma_fence.ops functions, is forbiddenafter the fence has been signalled. Drivers are allowed to free that data,and some do.

To allow safe access drivers are mandated to guarantee a RCU grace periodbetween signalling the fence and freeing said data.

As such access to the driver name is only valid inside a RCU locked section.The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guardedby thercu_read_lock andrcu_read_unlock pair.

structdma_fence

software synchronization primitive

Definition:

struct dma_fence {    spinlock_t *lock;    const struct dma_fence_ops *ops;    union {        struct list_head cb_list;        ktime_t timestamp;        struct rcu_head rcu;    };    u64 context;    u64 seqno;    unsigned long flags;    struct kref refcount;    int error;};

Members

lock

spin_lock_irqsave used for locking

ops

dma_fence_ops associated with this fence

{unnamed_union}

anonymous

cb_list

list of all callbacks to call

timestamp

Timestamp when the fence was signaled.

rcu

used for releasing fence with kfree_rcu

context

execution context this fence belongs to, returned bydma_fence_context_alloc()

seqno

the sequence number of this fence inside the execution context,can be compared to decide which fence would be signaled later.

flags

A mask of DMA_FENCE_FLAG_* defined below

refcount

refcount for this fence

error

Optional, only valid if < 0, must be set before callingdma_fence_signal, indicates that the fence has completed with an error.

Description

the flags member must be manipulated and read using the appropriateatomic ops (bit_*), so taking the spinlock will not be needed mostof the time.

DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initializedDMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaledDMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signalingDMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been calledDMA_FENCE_FLAG_USER_BITS - start of the unused bits, can be used by theimplementer of the fence for its own purposes. Can be used in differentways by different fence implementers, so do not rely on this.

Since atomic bitops are used, this is not guaranteed to be the case.Particularly, if the bit was set, but dma_fence_signal was called rightbefore this bit was set, it would have been able to set theDMA_FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.Adding a check for DMA_FENCE_FLAG_SIGNALED_BIT after settingDMA_FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure thatafter dma_fence_signal was called, any enable_signaling call will have eitherbeen completed, or never called at all.

structdma_fence_cb

callback fordma_fence_add_callback()

Definition:

struct dma_fence_cb {    struct list_head node;    dma_fence_func_t func;};

Members

node

used bydma_fence_add_callback() to append thisstructto fence::cb_list

func

dma_fence_func_t to call

Description

Thisstructwill be initialized bydma_fence_add_callback(), additionaldata can be passed along by embedding dma_fence_cb in another struct.

structdma_fence_ops

operations implemented for fence

Definition:

struct dma_fence_ops {    const char * (*get_driver_name)(struct dma_fence *fence);    const char * (*get_timeline_name)(struct dma_fence *fence);    bool (*enable_signaling)(struct dma_fence *fence);    bool (*signaled)(struct dma_fence *fence);    signed long (*wait)(struct dma_fence *fence, bool intr, signed long timeout);    void (*release)(struct dma_fence *fence);    void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);};

Members

get_driver_name

Returns the driver name. This is a callback to allow drivers tocompute the name at runtime, without having it to store permanentlyfor each fence, or build a cache of some sort.

This callback is mandatory.

get_timeline_name

Return the name of the context this fence belongs to. This is acallback to allow drivers to compute the name at runtime, withouthaving it to store permanently for each fence, or build a cache ofsome sort.

This callback is mandatory.

enable_signaling

Enable software signaling of fence.

For fence implementations that have the capability for hw->hwsignaling, they can implement this op to enable the necessaryinterrupts, or insert commands into cmdstream, etc, to avoid thesecostly operations for the common case where only hw->hwsynchronization is required. This is called in the firstdma_fence_wait() ordma_fence_add_callback() path to let the fenceimplementation know that there is another driver waiting on thesignal (ie. hw->sw case).

This is called with irq’s disabled, so only spinlocks which disableIRQ’s can be used in the code outside of this callback.

A return value of false indicates the fence already passed,or some failure occurred that made it impossible to enablesignaling. True indicates successful enabling.

dma_fence.error may be set in enable_signaling, but only when falseis returned.

Since many implementations can calldma_fence_signal() even when beforeenable_signaling has been called there’s a race window, where thedma_fence_signal() might result in the final fence reference beingreleased and its memory freed. To avoid this, implementations of thiscallback should grab their own reference usingdma_fence_get(), to bereleased when the fence is signalled (through e.g. the interrupthandler).

This callback is optional. If this callback is not present, then thedriver must always have signaling enabled.

signaled

Peek whether the fence is signaled, as a fastpath optimization fore.g.dma_fence_wait() ordma_fence_add_callback(). Note that thiscallback does not need to make any guarantees beyond that a fenceonce indicates as signalled must always return true from thiscallback. This callback may return false even if the fence hascompleted already, in this case information hasn’t propogated througthe system yet. See alsodma_fence_is_signaled().

May setdma_fence.error if returning true.

This callback is optional.

wait

Custom wait implementation, defaults todma_fence_default_wait() ifnot set.

Deprecated and should not be used by new implementations. Only usedby existing implementations which need special handling for theirhardware reset procedure.

Must return -ERESTARTSYS if the wait is intr = true and the wait wasinterrupted, and remaining jiffies if fence has signaled, or 0 if waittimed out. Can also return other error values on custom implementations,which should be treated as if the fence is signaled. For example a hardwarelockup could be reported like that.

release

Called on destruction of fence to release additional resources.Can be called from irq context. This callback is optional. If it isNULL, thendma_fence_free() is instead called as the defaultimplementation.

set_deadline

Callback to allow a fence waiter to inform the fence signaler ofan upcoming deadline, such as vblank, by which point the waiterwould prefer the fence to be signaled by. This is intended togive feedback to the fence signaler to aid in power managementdecisions, such as boosting GPU frequency.

This is called withoutdma_fence.lock held, it can be calledmultiple times and from any context. Locking is up to the calleeif it has some state to manage. If multiple deadlines are set,the expectation is to track the soonest one. If the deadline isbefore the current time, it should be interpreted as an immediatedeadline.

This callback is optional.

booldma_fence_was_initialized(structdma_fence*fence)

test if fence was initialized

Parameters

structdma_fence*fence

fence to test

Return

True if fence was ever initialized, false otherwise. Works correctlyonly when memory backing the fence structure is zero initialized onallocation.

voiddma_fence_put(structdma_fence*fence)

decreases refcount of the fence

Parameters

structdma_fence*fence

fence to reduce refcount of

structdma_fence*dma_fence_get(structdma_fence*fence)

increases refcount of the fence

Parameters

structdma_fence*fence

fence to increase refcount of

Description

Returns the same fence, with refcount increased by 1.

structdma_fence*dma_fence_get_rcu(structdma_fence*fence)

get a fence from a dma_resv_list with rcu read lock

Parameters

structdma_fence*fence

fence to increase refcount of

Description

Function returns NULL if no refcount could be obtained, or the fence.

structdma_fence*dma_fence_get_rcu_safe(structdma_fence__rcu**fencep)

acquire a reference to an RCU tracked fence

Parameters

structdma_fence__rcu**fencep

pointer to fence to increase refcount of

Description

Function returns NULL if no refcount could be obtained, or the fence.This function handles acquiring a reference to a fence that may bereallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU),so long as the caller is using RCU on the pointer to the fence.

An alternative mechanism is to employ a seqlock to protect a bunch offences, such as used bystructdma_resv. When using a seqlock,the seqlock must be taken before and checked after a reference to thefence is acquired (as shown here).

The caller is required to hold the RCU read lock.

booldma_fence_is_signaled_locked(structdma_fence*fence)

Return an indication if the fence is signaled yet.

Parameters

structdma_fence*fence

the fence to check

Description

Returns true if the fence was already signaled, false if not. Since thisfunction doesn’t enable signaling, it is not guaranteed to ever returntrue ifdma_fence_add_callback(),dma_fence_wait() ordma_fence_enable_sw_signaling() haven’t been called before.

This function requiresdma_fence.lock to be held.

See alsodma_fence_is_signaled().

booldma_fence_is_signaled(structdma_fence*fence)

Return an indication if the fence is signaled yet.

Parameters

structdma_fence*fence

the fence to check

Description

Returns true if the fence was already signaled, false if not. Since thisfunction doesn’t enable signaling, it is not guaranteed to ever returntrue ifdma_fence_add_callback(),dma_fence_wait() ordma_fence_enable_sw_signaling() haven’t been called before.

It’s recommended for seqno fences to call dma_fence_signal when theoperation is complete, it makes it possible to prevent issues fromwraparound between time of issue and time of use by checking the returnvalue of this function before calling hardware-specific wait instructions.

See alsodma_fence_is_signaled_locked().

bool__dma_fence_is_later(structdma_fence*fence,u64f1,u64f2)

return if f1 is chronologically later than f2

Parameters

structdma_fence*fence

fence in whose context to do the comparison

u64f1

the first fence’s seqno

u64f2

the second fence’s seqno from the same context

Description

Returns true if f1 is chronologically later than f2. Both fences must befrom the same context, since a seqno is not common across contexts.

booldma_fence_is_later(structdma_fence*f1,structdma_fence*f2)

return if f1 is chronologically later than f2

Parameters

structdma_fence*f1

the first fence from the same context

structdma_fence*f2

the second fence from the same context

Description

Returns true if f1 is chronologically later than f2. Both fences must befrom the same context, since a seqno is not re-used across contexts.

booldma_fence_is_later_or_same(structdma_fence*f1,structdma_fence*f2)

return true if f1 is later or same as f2

Parameters

structdma_fence*f1

the first fence from the same context

structdma_fence*f2

the second fence from the same context

Description

Returns true if f1 is chronologically later than f2 or the same fence. Bothfences must be from the same context, since a seqno is not re-used acrosscontexts.

structdma_fence*dma_fence_later(structdma_fence*f1,structdma_fence*f2)

return the chronologically later fence

Parameters

structdma_fence*f1

the first fence from the same context

structdma_fence*f2

the second fence from the same context

Description

Returns NULL if both fences are signaled, otherwise the fence that would besignaled last. Both fences must be from the same context, since a seqno isnot re-used across contexts.

intdma_fence_get_status_locked(structdma_fence*fence)

returns the status upon completion

Parameters

structdma_fence*fence

the dma_fence to query

Description

Drivers can supply an optional error status condition before they signalthe fence (to indicate whether the fence was completed due to an errorrather than success). The value of the status condition is only validif the fence has been signaled,dma_fence_get_status_locked() first checksthe signal state before reporting the error status.

Returns 0 if the fence has not yet been signaled, 1 if the fence hasbeen signaled without an error condition, or a negative error codeif the fence has been completed in err.

voiddma_fence_set_error(structdma_fence*fence,interror)

flag an error condition on the fence

Parameters

structdma_fence*fence

the dma_fence

interror

the error to store

Description

Drivers can supply an optional error status condition before they signalthe fence, to indicate that the fence was completed due to an errorrather than success. This must be set before signaling (so that the valueis visible before any waiters on the signal callback are woken). Thishelper exists to help catching erroneous setting of #dma_fence.error.

Examples of error codes which drivers should use:

  • -ENODATA This operation produced no data, no other operation affected.

  • -ECANCELED All operations from the same context have been canceled.

  • -ETIME Operation caused a timeout and potentially device reset.

ktime_tdma_fence_timestamp(structdma_fence*fence)

helper to get the completion timestamp of a fence

Parameters

structdma_fence*fence

fence to get the timestamp from.

Description

After a fence is signaled the timestamp is updated with the signaling time,but setting the timestamp can race with tasks waiting for the signaling. Thishelper busy waits for the correct timestamp to appear.

signedlongdma_fence_wait(structdma_fence*fence,boolintr)

sleep until the fence gets signaled

Parameters

structdma_fence*fence

the fence to wait on

boolintr

if true, do an interruptible wait

Description

This function will return -ERESTARTSYS if interrupted by a signal,or 0 if the fence was signaled. Other error values may bereturned on custom implementations.

Performs a synchronous wait on this fence. It is assumed the callerdirectly or indirectly holds a reference to the fence, otherwise thefence might be freed before return, resulting in undefined behavior.

See alsodma_fence_wait_timeout() anddma_fence_wait_any_timeout().

booldma_fence_is_array(structdma_fence*fence)

check if a fence is from the array subclass

Parameters

structdma_fence*fence

the fence to test

Description

Return true if it is a dma_fence_array and false otherwise.

booldma_fence_is_chain(structdma_fence*fence)

check if a fence is from the chain subclass

Parameters

structdma_fence*fence

the fence to test

Description

Return true if it is a dma_fence_chain and false otherwise.

booldma_fence_is_container(structdma_fence*fence)

check if a fence is a container for other fences

Parameters

structdma_fence*fence

the fence to test

Description

Return true if this fence is a container for other fences, false otherwise.This is important since we can’t build up large fence structure or otherwisewe run into recursion during operation on those fences.

DMA Fence Array

structdma_fence_array*dma_fence_array_alloc(intnum_fences)

Allocate a custom fence array

Parameters

intnum_fences

[in] number of fences to add in the array

Description

Return dma fence array on success, NULL on failure

voiddma_fence_array_init(structdma_fence_array*array,intnum_fences,structdma_fence**fences,u64context,unsignedseqno,boolsignal_on_any)

Init a custom fence array

Parameters

structdma_fence_array*array

[in] dma fence array to arm

intnum_fences

[in] number of fences to add in the array

structdma_fence**fences

[in] array containing the fences

u64context

[in] fence context to use

unsignedseqno

[in] sequence number to use

boolsignal_on_any

[in] signal on any fence in the array

Description

Implementation ofdma_fence_array_create without allocation. Useful to inita preallocated dma fence array in the path of reclaim or dma fence signaling.

structdma_fence_array*dma_fence_array_create(intnum_fences,structdma_fence**fences,u64context,unsignedseqno,boolsignal_on_any)

Create a custom fence array

Parameters

intnum_fences

[in] number of fences to add in the array

structdma_fence**fences

[in] array containing the fences

u64context

[in] fence context to use

unsignedseqno

[in] sequence number to use

boolsignal_on_any

[in] signal on any fence in the array

Description

Allocate a dma_fence_array object and initialize the base fence withdma_fence_init().In case of error it returns NULL.

The caller should allocate the fences array with num_fences sizeand fill it with the fences it wants to add to the object. Ownership of thisarray is taken anddma_fence_put() is used on each fence on release.

Ifsignal_on_any is true the fence array signals if any fence in the arraysignals, otherwise it signals when all fences in the array signal.

booldma_fence_match_context(structdma_fence*fence,u64context)

Check if all fences are from the given context

Parameters

structdma_fence*fence

[in] fence or fence array

u64context

[in] fence context to check all fences against

Description

Checks the provided fence or, for a fence array, all fences in the arrayagainst the given context. Returns false if any fence is from a differentcontext.

structdma_fence_array_cb

callback helper for fence array

Definition:

struct dma_fence_array_cb {    struct dma_fence_cb cb;    struct dma_fence_array *array;};

Members

cb

fence callback structure for signaling

array

reference to the parent fence array object

structdma_fence_array

fence to represent an array of fences

Definition:

struct dma_fence_array {    struct dma_fence base;    spinlock_t lock;    unsigned num_fences;    atomic_t num_pending;    struct dma_fence **fences;    struct irq_work work;    struct dma_fence_array_cb callbacks[] ;};

Members

base

fence base class

lock

spinlock for fence handling

num_fences

number of fences in the array

num_pending

fences in the array still pending

fences

array of the fences

work

internal irq_work function

callbacks

array of callback helpers

structdma_fence_array*to_dma_fence_array(structdma_fence*fence)

cast a fence to a dma_fence_array

Parameters

structdma_fence*fence

fence to cast to a dma_fence_array

Description

Returns NULL if the fence is not a dma_fence_array,or the dma_fence_array otherwise.

dma_fence_array_for_each

dma_fence_array_for_each(fence,index,head)

iterate over all fences in array

Parameters

fence

current fence

index

index into the array

head

potential dma_fence_array object

Description

Test ifarray is a dma_fence_array object and if yes iterate over all fencesin the array. If not just iterate over the fence inarray itself.

For a deep dive iterator seedma_fence_unwrap_for_each().

DMA Fence Chain

structdma_fence*dma_fence_chain_walk(structdma_fence*fence)

chain walking function

Parameters

structdma_fence*fence

current chain node

Description

Walk the chain to the next node. Returns the next fence or NULL if we are atthe end of the chain. Garbage collects chain nodes which are alreadysignaled.

intdma_fence_chain_find_seqno(structdma_fence**pfence,uint64_tseqno)

find fence chain node by seqno

Parameters

structdma_fence**pfence

pointer to the chain node where to start

uint64_tseqno

the sequence number to search for

Description

Advance the fence pointer to the chain node which will signal this sequencenumber. If no sequence number is provided then this is a no-op.

Returns EINVAL if the fence is not a chain node or the sequence number hasnot yet advanced far enough.

voiddma_fence_chain_init(structdma_fence_chain*chain,structdma_fence*prev,structdma_fence*fence,uint64_tseqno)

initialize a fence chain

Parameters

structdma_fence_chain*chain

the chain node to initialize

structdma_fence*prev

the previous fence

structdma_fence*fence

the current fence

uint64_tseqno

the sequence number to use for the fence chain

Description

Initialize a new chain node and either start a new chain or add the node tothe existing chain of the previous fence.

structdma_fence_chain

fence to represent an node of a fence chain

Definition:

struct dma_fence_chain {    struct dma_fence base;    struct dma_fence  *prev;    u64 prev_seqno;    struct dma_fence *fence;    union {        struct dma_fence_cb cb;        struct irq_work work;    };    spinlock_t lock;};

Members

base

fence base class

prev

previous fence of the chain

prev_seqno

original previous seqno before garbage collection

fence

encapsulated fence

{unnamed_union}

anonymous

cb

callback for signaling

This is used to add the callback for signaling thecomplection of the fence chain. Never used at the same timeas the irq work.

work

irq work item for signaling

Irq work structure to allow us to add the callback withoutrunning into lock inversion. Never used at the same time asthe callback.

lock

spinlock for fence handling

structdma_fence_chain*to_dma_fence_chain(structdma_fence*fence)

cast a fence to a dma_fence_chain

Parameters

structdma_fence*fence

fence to cast to a dma_fence_array

Description

Returns NULL if the fence is not a dma_fence_chain,or the dma_fence_chain otherwise.

structdma_fence*dma_fence_chain_contained(structdma_fence*fence)

return the contained fence

Parameters

structdma_fence*fence

the fence to test

Description

If the fence is a dma_fence_chain the function returns the fence containedinside the chain object, otherwise it returns the fence itself.

dma_fence_chain_alloc

dma_fence_chain_alloc()

Description

Returns a newstructdma_fence_chain object or NULL on failure.

This specialized allocator has to be a macro for its allocations to beaccounted separately (to have a separate alloc_tag). The typecast isintentional to enforce typesafety.

voiddma_fence_chain_free(structdma_fence_chain*chain)

Parameters

structdma_fence_chain*chain

chain node to free

Description

Frees up an allocated but not usedstructdma_fence_chain object. Thisdoesn’t need an RCU grace period since the fence was never initialized norpublished. Afterdma_fence_chain_init() has been called the fence must bereleased by callingdma_fence_put(), and not through this function.

dma_fence_chain_for_each

dma_fence_chain_for_each(iter,head)

iterate over all fences in chain

Parameters

iter

current fence

head

starting point

Description

Iterate over all fences in the chain. We keep a reference to the currentfence while inside the loop which must be dropped when breaking out.

For a deep dive iterator seedma_fence_unwrap_for_each().

DMA Fence unwrap

structdma_fence_unwrap

cursor into the container structure

Definition:

struct dma_fence_unwrap {    struct dma_fence *chain;    struct dma_fence *array;    unsigned int index;};

Members

chain

potential dma_fence_chain, but can be other fence as well

array

potential dma_fence_array, but can be other fence as well

index

last returned index ifarray is really a dma_fence_array

Description

Should be used withdma_fence_unwrap_for_each() iterator macro.

dma_fence_unwrap_for_each

dma_fence_unwrap_for_each(fence,cursor,head)

iterate over all fences in containers

Parameters

fence

current fence

cursor

current position inside the containers

head

starting point for the iterator

Description

Unwrap dma_fence_chain and dma_fence_array containers and deep dive into allpotential fences in them. Ifhead is just a normal fence only that one isreturned.

dma_fence_unwrap_merge

dma_fence_unwrap_merge(...)

unwrap and merge fences

Parameters

...

variable arguments

Description

All fences given as parameters are unwrapped and merged back together as flatdma_fence_array. Useful if multiple containers need to be merged together.

Implemented as a macro to allocate the necessary arrays on the stack andaccount the stack frame size to the caller.

Returns NULL on memory allocation failure, a dma_fence object representingall the given fences otherwise.

DMA Fence Sync File

structsync_file*sync_file_create(structdma_fence*fence)

creates a sync file

Parameters

structdma_fence*fence

fence to add to the sync_fence

Description

Creates a sync_file containgfence. This function acquires and additionalreference offence for the newly-createdsync_file, if it succeeds. Thesync_file can be released with fput(sync_file->file). Returns thesync_file or NULL in case of error.

structdma_fence*sync_file_get_fence(intfd)

get the fence related to the sync_file fd

Parameters

intfd

sync_file fd to get the fence from

Description

Ensuresfd references a valid sync_file and returns a fence thatrepresents all fence in the sync_file. On error NULL is returned.

structsync_file

sync file to export to the userspace

Definition:

struct sync_file {    struct file             *file;    char user_name[32];#ifdef CONFIG_DEBUG_FS;    struct list_head        sync_file_list;#endif;    wait_queue_head_t wq;    unsigned long           flags;    struct dma_fence        *fence;    struct dma_fence_cb cb;};

Members

file

file representing this fence

user_name

Name of the sync file provided by userspace, for merged fences.Otherwise generated through driver callbacks (in which case theentire array is 0).

sync_file_list

membership in global file list

wq

wait queue for fence signaling

flags

flags for the sync_file

fence

fence with the fences in the sync_file

cb

fence callback information

Description

flags:POLL_ENABLED: whether userspace is currently poll()’ing or not

DMA Fence Sync File uABI

structsync_merge_data

SYNC_IOC_MERGE: merge two fences

Definition:

struct sync_merge_data {    char name[32];    __s32 fd2;    __s32 fence;    __u32 flags;    __u32 pad;};

Members

name

name of new fence

fd2

file descriptor of second fence

fence

returns the fd of the new fence to userspace

flags

merge_data flags

pad

padding for 64-bit alignment, should always be zero

Description

Creates a new fence containing copies of the sync_pts in boththe calling fd and sync_merge_data.fd2. Returns the new fence’sfd in sync_merge_data.fence

structsync_fence_info

detailed fence information

Definition:

struct sync_fence_info {    char obj_name[32];    char driver_name[32];    __s32 status;    __u32 flags;    __u64 timestamp_ns;};

Members

obj_name

name of parent sync_timeline

driver_name

name of driver implementing the parent

status

status of the fence 0:active 1:signaled <0:error

flags

fence_info flags

timestamp_ns

timestamp of status change in nanoseconds

structsync_file_info

SYNC_IOC_FILE_INFO: get detailed information on a sync_file

Definition:

struct sync_file_info {    char name[32];    __s32 status;    __u32 flags;    __u32 num_fences;    __u32 pad;    __u64 sync_fence_info;};

Members

name

name of fence

status

status of fence. 1: signaled 0:active <0:error

flags

sync_file_info flags

num_fences

number of fences in the sync_file

pad

padding for 64-bit alignment, should always be zero

sync_fence_info

pointer to array of structsync_fence_info with allfences in the sync_file

Description

Takes astructsync_file_info. If num_fences is 0, the field is updatedwith the actual number of fences. If num_fences is > 0, the system willuse the pointer provided on sync_fence_info to return up to num_fences ofstructsync_fence_info, with detailed fence information.

structsync_set_deadline

SYNC_IOC_SET_DEADLINE - set a deadline hint on a fence

Definition:

struct sync_set_deadline {    __u64 deadline_ns;    __u64 pad;};

Members

deadline_ns

absolute time of the deadline

pad

must be zero

Description

Allows userspace to set a deadline on a fence, seedma_fence_set_deadline

The timebase for the deadline is CLOCK_MONOTONIC (same as vblank). Forexample

clock_gettime(CLOCK_MONOTONIC,t);deadline_ns = (t.tv_sec * 1000000000L) + t.tv_nsec + ns_until_deadline

Indefinite DMA Fences

At various timesstructdma_fence with an indefinite time untildma_fence_wait()finishes have been proposed. Examples include:

  • Future fences, used in HWC1 to signal when a buffer isn’t used by the displayany longer, and created with the screen update that makes the buffer visible.The time this fence completes is entirely under userspace’s control.

  • Proxy fences, proposed to handle &drm_syncobj for which the fence has not yetbeen set. Used to asynchronously delay command submission.

  • Userspace fences or gpu futexes, fine-grained locking within a command bufferthat userspace uses for synchronization across engines or with the CPU, whichare then imported as a DMA fence for integration into existing winsysprotocols.

  • Long-running compute command buffers, while still using traditional end ofbatch DMA fences for memory management instead of context preemption DMAfences which get reattached when the compute job is rescheduled.

Common to all these schemes is that userspace controls the dependencies of thesefences and controls when they fire. Mixing indefinite fences with normalin-kernel DMA fences does not work, even when a fallback timeout is included toprotect against malicious userspace:

  • Only the kernel knows about all DMA fence dependencies, userspace is not awareof dependencies injected due to memory management or scheduler decisions.

  • Only userspace knows about all dependencies in indefinite fences and whenexactly they will complete, the kernel has no visibility.

Furthermore the kernel has to be able to hold up userspace command submissionfor memory management needs, which means we must support indefinite fences beingdependent upon DMA fences. If the kernel also support indefinite fences in thekernel like a DMA fence, like any of the above proposal would, there is thepotential for deadlocks.

digraph "Fencing Cycle" {   node [shape=box bgcolor=grey style=filled]   kernel [label="Kernel DMA Fences"]   userspace [label="userspace controlled fences"]   kernel -> userspace [label="memory management"]   userspace -> kernel [label="Future fence, fence proxy, ..."]   { rank=same; kernel userspace }}

Indefinite Fencing Dependency Cycle

This means that the kernel might accidentally create deadlocksthrough memory management dependencies which userspace is unaware of, whichrandomly hangs workloads until the timeout kicks in. Workloads, which fromuserspace’s perspective, do not contain a deadlock. In such a mixed fencingarchitecture there is no single entity with knowledge of all dependencies.Therefore preventing such deadlocks from within the kernel is not possible.

The only solution to avoid dependencies loops is by not allowing indefinitefences in the kernel. This means:

  • No future fences, proxy fences or userspace fences imported as DMA fences,with or without a timeout.

  • No DMA fences that signal end of batchbuffer for command submission whereuserspace is allowed to use userspace fencing or long running computeworkloads. This also means no implicit fencing for shared buffers in thesecases.

Recoverable Hardware Page Faults Implications

Modern hardware supports recoverable page faults, which has a lot ofimplications for DMA fences.

First, a pending page fault obviously holds up the work that’s running on theaccelerator and a memory allocation is usually required to resolve the fault.But memory allocations are not allowed to gate completion of DMA fences, whichmeans any workload using recoverable page faults cannot use DMA fences forsynchronization. Synchronization fences controlled by userspace must be usedinstead.

On GPUs this poses a problem, because current desktop compositor protocols onLinux rely on DMA fences, which means without an entirely new userspace stackbuilt on top of userspace fences, they cannot benefit from recoverable pagefaults. Specifically this means implicit synchronization will not be possible.The exception is when page faults are only used as migration hints and never toon-demand fill a memory request. For now this means recoverable pagefaults on GPUs are limited to pure compute workloads.

Furthermore GPUs usually have shared resources between the 3D rendering andcompute side, like compute units or command submission engines. If both a 3Djob with a DMA fence and a compute workload using recoverable page faults arepending they could deadlock:

  • The 3D workload might need to wait for the compute job to finish and releasehardware resources first.

  • The compute workload might be stuck in a page fault, because the memoryallocation is waiting for the DMA fence of the 3D workload to complete.

There are a few options to prevent this problem, one of which drivers need toensure:

  • Compute workloads can always be preempted, even when a page fault is pendingand not yet repaired. Not all hardware supports this.

  • DMA fence workloads and workloads which need page fault handling haveindependent hardware resources to guarantee forward progress. This could beachieved through e.g. through dedicated engines and minimal compute unitreservations for DMA fence workloads.

  • The reservation approach could be further refined by only reserving thehardware resources for DMA fence workloads when they are in-flight. This mustcover the time from when the DMA fence is visible to other threads up tomoment when fence is completed throughdma_fence_signal().

  • As a last resort, if the hardware provides no useful reservation mechanics,all workloads must be flushed from the GPU when switching between jobsrequiring DMA fences or jobs requiring page fault handling: This means all DMAfences must complete before a compute job with page fault handling can beinserted into the scheduler queue. And vice versa, before a DMA fence can bemade visible anywhere in the system, all compute workloads must be preemptedto guarantee all pending GPU page faults are flushed.

  • Only a fairly theoretical option would be to untangle these dependencies whenallocating memory to repair hardware page faults, either through separatememory blocks or runtime tracking of the full dependency graph of all DMAfences. This results very wide impact on the kernel, since resolving the pageon the CPU side can itself involve a page fault. It is much more feasible androbust to limit the impact of handling hardware page faults to the specificdriver.

Note that workloads that run on independent hardware like copy engines or otherGPUs do not have any impact. This allows us to keep using DMA fences internallyin the kernel even for resolving hardware page faults, e.g. by using copyengines to clear or copy memory needed to resolve the page fault.

In some ways this page fault problem is a special case of theInfinite DMAFences discussions: Infinite fences from compute workloads are allowed todepend on DMA fences, but not the other way around. And not even the page faultproblem is new, because some other CPU thread in userspace mighthit a page fault which holds up a userspace fence - supporting page faults onGPUs doesn’t anything fundamentally new.