- Notifications
You must be signed in to change notification settings - Fork177
User space mappable dma buffer device driver for Linux.
License
ikwzm/udmabuf
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
u-dma-buf is a Linux device driver that allocates contiguous memory blocks in thekernel space as DMA buffers and makes them available from the user space.It is intended that these memory blocks are used as DMA buffers when a userapplication implements device driver in user space using UIO (User space I/O).
A DMA buffer allocated by u-dma-buf can be accessed from the user space by openingthe device file (e.g. /dev/udmabuf0) and mapping to the user memory space, orusing the read()/write() functions.
CPU cache for the allocated DMA buffer can be disabled by setting theO_SYNC
flagwhen opening the device file. It is also possible to flush or invalidate CPU cachewhile retaining CPU cache enabled.
The physical address of a DMA buffer allocated by u-dma-buf can be obtained byreading/sys/class/u-dma-buf/udmabuf0/phys_addr
.
The size of a DMA buffer and the device minor number can be specified whenthe device driver is loaded (e.g. when loaded via theinsmod
command).Some platforms allow to specify them in the device tree.
Figure 1. Architecture
- OS : Linux Kernel Version 3.6 - 3.8, 3.18, 4.4, 4.8, 4.12, 4.14, 4.19, 5.0 - 5.10, 6.1 (the author tested on 3.18, 4.4, 4.8, 4.12, 4.14, 4.19, 5.4, 5.10, 6.1).
- CPU: ARM Cortex-A9 (Xilinx ZYNQ / Altera CycloneV SoC)
- CPU: ARM64 Cortex-A53 (Xilinx ZYNQ UltraScale+ MPSoC)
- CPU: x86(64bit) However, verification is not enough. I hope the results from everyone.In addition, there is a limit to the following feature at the moment.
- Can not control of the CPU cache by O_SYNC flag . Always CPU cache is valid.
- Can not various settings by the device tree.
The predecessor of u-dma-buf is udmabuf. The kernel module name has been changedfrom "udmabuf" to "u-dma-buf". The purpose of this is to avoid duplicate namesbecause another kernel module with the same name as "udmabuf" has been added sinceLinux Kernel 5.x.
Categoly | udmabuf | u-dma-buf |
---|---|---|
module name | udmabuf.ko | u-dma-buf.ko |
source file | udmabuf.c | u-dma-buf.c |
sys class name | /sys/class/udmabuf/ | /sys/class/u-dma-buf/ |
DT compatible prop. | "ikwzm,udmabuf-0.10.a" | "ikwzm,u-dma-buf" |
This repository contains aMakefie.Makefile has the following Parameters:
Parameter Name | Description | Default Value |
---|---|---|
ARCH | Architecture Name | $(shell uname -m | sed -e s/arm.*/arm/ -e s/aarch64.*/arm64/ -e s/riscv.*/riscv/ ) |
KERNEL_SRC | Kernel Source Directory | /lib/modules/$(shell uname -r)/build |
CONFIG_U_DMA_BUF_DEBUG | Enable debug report | y |
CONFIG_U_DMA_BUF_QUIRK_MMAP | Enable quirk-mmap | y |
CONFIG_U_DMA_BUF_IN_KERNEL_FUNCTIONS | Enable in-kernel functions | y |
CONFIG_U_DMA_BUF_IOCTL | Enable ioctl | y |
CONFIG_U_DMA_BUF_EXPORT | Enable PRIME DMA-BUFS export | y |
If you have a cross-compilation environment for target system, you can compile with:
shell$make ARCH=arm KERNEL_SRC=/home/fpga/src/linux-5.10.120-zynqmp-fpga-generic all
The ARCH variable specifies the architecture name.
The KERNEL_SRC variable specifies the Linux Kernel source code path.
If your target system is capable of self-compiling the Linux Kernel module, you can compile it with:
shell$make all
You need the kernel source code in/lib/modules/$(shell uname -r)/build
to compile.
It can also be compiled into the Linux Kernel Source Tree.
shell$mkdir<linux-source-tree>/drivers/staging/u-dma-buf
shell$cp Kconfig Makefile u-dma-buf.c<linux-source-tree>/drivers/staging/u-dma-buf
shell$diff<linux-source-tree>/drivers/staging/Kconfig :+source "drivers/staging/u-dma-buf/Kconfig"+
shell$ diff <linux-source-tree>/drivers/staging/Makefile :+obj-$(CONFIG_U_DMA_BUF) += u-dma-buf/
For make menuconfig, set the following:
Device Drivers ---> Staging drivers ---> <M> u-dma-buf(User space mappable DMA Buffer) --->
If you write it directly in defconfig:
shell$diff<linux-source-tree>/arch/arm64/configs/xilinx_zynqmp_defconfig :+CONFIG_U_DMA_BUF=m
Load the u-dma-buf kernel driver usinginsmod
. The size of a DMA buffer should beprovided as an argument as follows.The device driver is created, and allocates a DMA buffer with the specified size.The maximum number of DMA buffers that can be allocated usinginsmod
is 8 (udmabuf0/1/2/3/4/5/6/7).
zynq$ insmod u-dma-buf.ko udmabuf0=1048576u-dma-buf udmabuf0: driver version = 5.2.0u-dma-buf udmabuf0: major number = 248u-dma-buf udmabuf0: minor number = 0u-dma-buf udmabuf0: phys address = 0x1e900000u-dma-buf udmabuf0: buffer size = 1048576u-dma-buf u-dma-buf.0: driver installed.zynq$ ls -la /dev/udmabuf0crw------- 1 root root 248, 0 Dec 1 09:34 /dev/udmabuf0
In the above result, the device is only read/write accessible by root.If the permission needs to be changed at the load of the kernel module,create/etc/udev/rules.d/99-u-dma-buf.rules
with the following content.
SUBSYSTEM=="u-dma-buf", GROUP="root", MODE="0666"
The module can be uninstalled by thermmod
command.
zynq$ rmmod u-dma-bufu-dma-buf u-dma-buf.0: driver removed.
For details, refer to the following URL.
The u-dma-buf kernel module has the following module parameters:
Parameter Name | Type | Default | Description |
---|---|---|---|
udmabuf0 | ulong | 0 | u-dma-buf0 buffer size |
udmabuf1 | ulong | 0 | u-dma-buf1 buffer size |
udmabuf2 | ulong | 0 | u-dma-buf2 buffer size |
udmabuf3 | ulong | 0 | u-dma-buf3 buffer size |
udmabuf4 | ulong | 0 | u-dma-buf4 buffer size |
udmabuf5 | ulong | 0 | u-dma-buf5 buffer size |
udmabuf6 | ulong | 0 | u-dma-buf6 buffer size |
udmabuf7 | ulong | 0 | u-dma-buf7 buffer size |
info_enable | int | 1 | install/uninstall infomation enable |
dma_mask_bit | int | 32 | dma mask bit size |
bind | charp | "" | bind device name |
quirk_mmap_mode | int | 2 or 3 | quirk mmap mode(1:off,2:on,3:auto,4:page) |
This parameter specifies the capacity of the u-dma-buf to be created in bytes.The number of u-dma-buf that can be created with this parameter is 8.The device name will be udmabuf[0-7].If this parameter is 0, the u-dma-buf is not created.
This parameter specifies whether or not detailed information about when the u-dma-buf was created should be displayed.
** Note: The value of dma-mask is system dependent.Make sure you are familiar with the meaning of dma-mask before setting. **
This parameter specifies the parent device of the u-dma-buf.If this parameter is an empty string (default value), u-dma-buf is created as a new platform device.If a parent device name is specified for this parameter, u-dma-buf is created as its child device.
The format of the string specified in this parameter is"<bus>/<device-name>"
.
The<bus>
is the bus name, currently pci is supported.The bus name can be omitted.If omitted, it will be the platform bus.
The<device-name>
specifies the name of the device under bus management.
For example, to designate "0000:00:15.0" under the pci bus as the parent device, do the following
shell$sudo insmod u-dma-buf.ko udmabuf0=0x10000 info_enable=3 bind="pci/0000:00:15.0"[13422.022482] u-dma-buf udmabuf0: driver version = 5.2.0[13422.022483] u-dma-buf udmabuf0: major number = 238[13422.022483] u-dma-buf udmabuf0: minor number = 0[13422.022484] u-dma-buf udmabuf0: phys address = 0x0000000070950000[13422.022485] u-dma-buf udmabuf0: buffer size = 65536[13422.022485] u-dma-buf udmabuf0: dma device = 0000:00:15.0[13422.022486] u-dma-buf udmabuf0: dma bus = pci[13422.022486] u-dma-buf udmabuf0: dma coherent = 1[13422.022487] u-dma-buf udmabuf0: dma mask = 0x00000000ffffffff[13422.022487] u-dma-buf udmabuf0: iommu domain = NONE[13422.022487] u-dma-buf udmabuf0: mmap mode = 3[13422.022487] u-dma-buf udmabuf0: mmap = dma_mmap_coherent[13422.022488] u-dma-buf: udmabuf0 installed.
This parameter specifies the default value of quirk-mmap-mode.quirk-mmap is described in detail below.
If this parameter is 1, quirk-mmap is prohibited.
If this parameter is 2, quirk-mmap is used.
If this parameter is 3, quirk-mmap is not used if the device has a dma-cohrent of true, and quirk-mmap is used only if dma-coherent is false.
If this parameter is 4, quirk-mmap is used in quirk-mmap-page mode.
If the architecture is ARM or ARM64, this parameter defaults to 2.
If the architecture is other than the above, this parameter defaults to 3.
In addition to the allocation via theinsmod
command and its arguments, DMAbuffers can be allocated by specifying the size in the device tree file.When a device tree file contains an entry like the following, u-dma-buf willallocate buffers and create device drivers when loaded byinsmod
.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";device-name = "udmabuf0";minor-number = <0>;size = <0x00100000>;};
zynq$ insmod u-dma-buf.kou-dma-buf udmabuf0: driver version = 5.2.0u-dma-buf udmabuf0: major number = 248u-dma-buf udmabuf0: minor number = 0u-dma-buf udmabuf0: phys address = 0x1e900000u-dma-buf udmabuf0: buffer size = 1048576u-dma-buf amba:udmabuf@0x00: driver installed.zynq$ ls -la /dev/udmabuf0crw------- 1 root root 248, 0 Dec 1 09:34 /dev/udmabuf0
The following properties can be set in the device tree.
compatible
size
minor-number
device-name
sync-mode
sync-always
sync-offset
sync-size
sync-direction
dma-coherent
dma-mask
quirk-mmap-off
quirk-mmap-on
quirk-mmap-auto
quirk-mmap-page
memory-region
Thecompatible
property is used to set the corresponding device driver when loadingu-dma-buf. Thecompatible
property is mandatory. Be sure to specifycompatible
property as "ikwzm,u-dma-buf" (for u-dma-buf.ko) or "ikwzm,udmabuf-0.10.a" (for udmabuf.ko).
Thesize
property is used to set the capacity of DMA buffer in bytes.Thesize
property is mandatory.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;};
If you want to specify a buffer size of 4GiB or more, specify a 64bit value as follows.A 64-bit value is expressed by arranging two in the order of upper 32 bits and lower 32 bits.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x01 0x00000000>; // size = 0x1_0000_0000};
Theminor-number
property is used to set the minor number.The valid minor number range is 0 to 255. A minor number provided asinsmod
argument will has higher precedence, and when definition in the device tree hascolliding number, creation of the device defined in the device tree will fail.
Theminor-number
property is optional. When theminor-number
property is notspecified, u-dma-buf automatically assigns an appropriate one.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";minor-number = <0>;size = <0x00100000>;};
Thedevice-name
property is used to set the name of device.
Thedevice-name
property is optional. The device name is determined as follow:
- If
device-name
property is specified, the value ofdevice-name
property is used. - If
device-name
property is not present, and ifminor-number
property isspecified,sprintf("udmabuf%d", minor-number)
is used. - If
device-name
property is not present, and ifminor-number
property isnot present, the entry name of the device tree is used (udmabuf@0x00
in this example).
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";device-name = "udmabuf0";size = <0x00100000>;};
Thesync-mode
property is used to configure the behavior when u-dma-buf is openedwith theO_SYNC
flag.
sync-mode
=<1>: IfO_SYNC
is specified orsync-always
property is specified,CPU cache is disabled. Otherwise CPU cache is enabled.sync-mode
=<2>: IfO_SYNC
is specified orsync-always
property is specified,CPU cache is disabled but CPU uses write-combine when writing data to DMA bufferimproves performance by combining multiple write accesses. Otherwise CPU cache isenabled.sync-mode
=<3>: IfO_SYNC
is specified orsync-always
property is specified,DMA coherency mode is used. Otherwise CPU cache is enabled.
Thesync-mode
property is optional.When thesync-mode
property is not specified,sync-mode
is set to <1>.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;sync-mode = <2>;};
Details onO_SYNC
and cache management will be described in the next section.
If thesync-always
property is specified, when opening u-dma-buf, it specifies thatthe operation specified by thesync-mode
property will always be performedregardless ofO_SYNC
specification.
Thesync-always
property is optional.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;sync-mode = <2>;sync-always;};
Details onO_SYNC
and cache management will be described in the next section.
Thesync-offset
property is used to set the start of the buffer range when manuallycontrolling the cache of u-dma-buf.
Thesync-offset
property is optional.When thesync-offset
property is not specified,sync-offset
is set to <0>.
Details on cache management will be described in the next section.
Thesync-size
property is used to set the size of the buffer range when manuallycontrolling the cache of u-dma-buf.
Thesync-size
property is optional.When thesync-size
property is not specified,sync-size
is set to <0>.
Details on cache management will be described in the next section.
Thesync-direction
property is used to set the direction of DMA when manuallycontrolling the cache of u-dma-buf.
sync-direction
=<0>: DMA_BIDIRECTIONALsync-direction
=<1>: DMA_TO_DEVICEsync-direction
=<2>: DMA_FROM_DEVICE
Thesync-direction
property is optional.When thesync-direction
property is not specified,sync-direction
is set to <0>.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;sync-offset = <0x00010000>;sync-size = <0x000F0000>;sync-direction = <2>;};
Details on cache management will be described in the next section.
If thedma-coherent
property is specified, indicates that coherency between DMAbuffer and CPU cache can be guaranteed by hardware.
Thedma-coherent
property is optional. When thedma-coherent
property is notspecified, indicates that coherency between DMA buffer and CPU cache can not beguaranteed by hardware.
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;dma-coherent;};
Details on cache management will be described in the next section.
** Note: The value of dma-mask is system dependent.Make sure you are familiar with the meaning of dma-mask before setting. **
udmabuf@0x00 {compatible = "ikwzm,u-dma-buf";size = <0x00100000>;dma-mask = <64>;};
If thequirk-mmap-off
property is specified, quirk-mmap. is not used.
If thequirk-mmap-on
property is specified, quirk-mmap. is used.
If thequirk-mmap-auto
property is specified, quirk-mmap is not used if the device has a dma-cohrent of true, and quirk-mmap is used only if dma-coherent is false.
If thequirk-mmap-page
property is specified, quirk-mmap. is used in quirk-mmap-page mode.
In quirk-mmap-page mode, there is no error when u-dma-buf is subject to O_DIRECT.
This mode is currently under development. Please use with caution.
Linux can specify the reserved memory area in the device tree. The Linux kernelexcludes normal memory allocation from the physical memory space specified byreserved-memory
property.In order to access this reserved memory area, it is necessary to use ageneral-purpose memory access driver such as/dev/mem
, or associate it withthe device driver in the device tree.
By thememory-region
property, it can be associated the reserved memory area with u-dma-buf.
reserved-memory {#address-cells = <1>;#size-cells = <1>;ranges;image_buf0: image_buf@0 {compatible = "shared-dma-pool";reusable;reg = <0x3C000000 0x04000000>; label = "image_buf0";};};udmabuf@0 {compatible = "ikwzm,u-dma-buf";device-name = "udmabuf0";size = <0x04000000>; // 64MiBmemory-region = <&image_buf0>;};
In this example, 64MiB of 0x3C000000 to 0x3FFFFFFF is reserved as "image_buf0".In this "image_buf0", specify "shared-dma-pool" incompatible
property and specifythereusable
property. By specifying these properties, this reserved memory areawill be allocated by the CMA. Also, you need to be careful about address and sizealignment.
The above "image_buf0" is associated with "udmabuf@0" withmemory-region
property.With this association, "udmabuf@0" reserves physical memory from the CMA areaspecified by "image_buf0".
Thememory-region
property is optional.When thememory-region
property is not specified, u-dma-buf allocates the DMA bufferfrom the CMA area allocated to the Linux kernel.
Since u-dma-buf v4.0, u-dma-buf devices can be create or delete using u-dma-buf-mgr.Seehttps://github.com/ikwzm/u-dma-buf-mgr for more information.
When u-dma-buf is loaded into the kernel, the following device files are created.<device-name>
is a placeholder for the device name described in the previous section.
/dev/<device-name>
/sys/class/u-dma-buf/<device-name>/phys_addr
/sys/class/u-dma-buf/<device-name>/size
/sys/class/u-dma-buf/<device-name>/sync_mode
/sys/class/u-dma-buf/<device-name>/sync_offset
/sys/class/u-dma-buf/<device-name>/sync_size
/sys/class/u-dma-buf/<device-name>/sync_direction
/sys/class/u-dma-buf/<device-name>/sync_owner
/sys/class/u-dma-buf/<device-name>/sync_for_cpu
/sys/class/u-dma-buf/<device-name>/sync_for_device
/sys/class/u-dma-buf/<device-name>/dma_coherent
/dev/<device-name>
is used whenmmap()
-ed to the user space or accessed viaread()
/write()
.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {buf=mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);/* Do some read/write access to buf */close(fd); }
The device file can be directly read/written by specifying the device as the target ofdd
in the shell.
zynq$ dd if=/dev/urandom of=/dev/udmabuf0 bs=4096 count=10241024+0 records in1024+0 records out4194304 bytes (4.2 MB) copied, 3.07516 s, 1.4 MB/s
zynq$dd if=/dev/udmabuf4 of=random.bin8192+0 records in8192+0 records out4194304 bytes (4.2 MB) copied, 0.173866 s, 24.1 MB/s
The physical address of a DMA buffer can be retrieved by reading/sys/class/u-dma-buf/<device-name>/phys_addr
.
unsignedcharattr[1024];unsigned longphys_addr;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/phys_addr",O_RDONLY))!=-1) {read(fd,attr,1024);sscanf(attr,"%x",&phys_addr);close(fd); }
The size of a DMA buffer can be retrieved by reading/sys/class/u-dma-buf/<device-name>/size
.
unsignedcharattr[1024];unsignedintbuf_size;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/size",O_RDONLY))!=-1) {read(fd,attr,1024);sscanf(attr,"%d",&buf_size);close(fd); }
The device file/sys/class/u-dma-buf/<device-name>/sync_mode
is used to configurethe behavior when u-dma-buf is opened with theO_SYNC
flag.
unsignedcharattr[1024];unsigned longsync_mode=2;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_mode",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_mode);write(fd,attr,strlen(attr));close(fd); }
Details onO_SYNC
and cache management will be described in the next section.
The device file/sys/class/u-dma-buf/<device-name>/sync_offset
is used to specifythe start address of a memory block of which cache is manually managed.
unsignedcharattr[1024];unsigned longsync_offset=0x00000000;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_offset",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_offset);/* or sprintf(attr, "0x%x", sync_offset); */write(fd,attr,strlen(attr));close(fd); }
Details of manual cache management is described in the next section.
The device file/sys/class/u-dma-buf/<device-name>/sync_size
is used to specifythe size of a memory block of which cache is manually managed.
unsignedcharattr[1024];unsigned longsync_size=1024;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_size",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_size);/* or sprintf(attr, "0x%x", sync_size); */write(fd,attr,strlen(attr));close(fd); }
Details of manual cache management is described in the next section.
The device file/sys/class/u-dma-buf/<device-name>/sync_direction
is used to set thedirection of DMA transfer to/from the DMA buffer of which cache is manually managed.
- 0: sets DMA_BIDIRECTIONAL
- 1: sets DMA_TO_DEVICE
- 2: sets DMA_FROM_DEVICE
unsignedcharattr[1024];unsigned longsync_direction=1;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_direction",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_direction);write(fd,attr,strlen(attr));close(fd); }
Details of manual cache management is described in the next section.
The device file/sys/class/u-dma-buf/<device-name>/dma_coherent
can read whetherthe coherency of DMA buffer and CPU cache can be guaranteed by hardware.It is able to specify whether or not it is able to guarantee by hardware with thedma-coherent
property in the device tree, but this device file is read-only.
If this value is 1, the coherency of DMA buffer and CPU cache can be guaranteed byhardware. If this value is 0, the coherency of DMA buffer and CPU cache can be notguaranteed by hardware.
unsignedcharattr[1024];intdma_coherent;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/dma_coherent",O_RDONLY))!=-1) {read(fd,attr,1024);sscanf(attr,"%x",&dma_coherent);close(fd); }
The device file/sys/class/u-dma-buf/<device-name>/sync_owner
reports the owner ofthe memory block in the manual cache management mode.If this value is 1, the buffer is owned by the device.If this value is 0, the buffer is owned by the cpu.
unsignedcharattr[1024];intsync_owner;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_owner",O_RDONLY))!=-1) {read(fd,attr,1024);sscanf(attr,"%x",&sync_owner);close(fd); }
Details of manual cache management is described in the next section.
In the manual cache management mode, CPU can be the owner of the buffer by writingnon-zero to the device file/sys/class/u-dma-buf/<device-name>/sync_for_cpu
.This device file is write only.
If '1' is written to device file, ifsync_direction
is 2(=DMA_FROM_DEVICE) or 0(=DMA_BIDIRECTIONAL),the write to the device file invalidates a cache specified bysync_offset
andsync_size
.
unsignedcharattr[1024];unsigned longsync_for_cpu=1;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_for_cpu);write(fd,attr,strlen(attr));close(fd); }
The value written to this device file can include sync_offset, sync_size, and sync_direction.
unsignedcharattr[1024];unsigned longsync_offset=0;unsigned longsync_size=0x10000;unsignedintsync_direction=0;unsigned longsync_for_cpu=1;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu",O_WRONLY))!=-1) {sprintf(attr,"0x%08X%08X", (sync_offset&0xFFFFFFFF), (sync_size&0xFFFFFFF0) | (sync_direction <<2) |sync_for_cpu);write(fd,attr,strlen(attr));close(fd); }
The sync_offset/sync_size/sync_direction specified bysync_for_cpu
is temporary and does not affect thesync_offset
orsync_size
orsync_direction
device files.
Details of manual cache management is described in the next section.
In the manual cache management mode, DEVICE can be the owner of the buffer bywriting non-zero to the device file/sys/class/u-dma-buf/<device-name>/sync_for_device
.This device file is write only.
If '1' is written to device file, ifsync_direction
is 1(=DMA_TO_DEVICE) or 0(=DMA_BIDIRECTIONAL),the write to the device file flushes a cache specified bysync_offset
andsync_size
(i.e. thecached data, if any, will be updated with data on DDR memory).
unsignedcharattr[1024];unsigned longsync_for_device=1;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_for_device",O_WRONLY))!=-1) {sprintf(attr,"%d",sync_for_device);write(fd,attr,strlen(attr));close(fd); }
The value written to this device file can include sync_offset, sync_size, and sync_direction.
unsignedcharattr[1024];unsigned longsync_offset=0;unsigned longsync_size=0x10000;unsignedintsync_direction=0;unsigned longsync_for_device=1;if ((fd=open("/sys/class/u-dma-buf/udmabuf0/sync_for_device",O_WRONLY))!=-1) {sprintf(attr,"0x%08X%08X", (sync_offset&0xFFFFFFFF), (sync_size&0xFFFFFFF0) | (sync_direction <<2) |sync_for_device);write(fd,attr,strlen(attr));close(fd); }
The sync_offset/sync_size/sync_direction specified bysync_for_device
is temporary and does not affect thesync_offset
orsync_size
orsync_direction
device files.
Details of manual cache management is described in the next section.
Starting with u-dma-buf v4.7.0, devices can be controlled by issuing ioctl to the device file.The ioctl can do the following
U_DMA_BUF_IOCTL_GET_DRV_INFO
U_DMA_BUF_IOCTL_GET_SIZE
U_DMA_BUF_IOCTL_GET_DMA_ADDR
U_DMA_BUF_IOCTL_GET_SYNC_OWNER
U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU
U_DMA_BUF_IOCTL_SET_SYNC_FOR_DEVICE
U_DMA_BUF_IOCTL_GET_DEV_INFO
U_DMA_BUF_IOCTL_GET_SYNC
U_DMA_BUF_IOCTL_SET_SYNC
U_DMA_BUF_IOCTL_EXPORT
The following header file is required to use ioctl.
#ifndefU_DMA_BUF_IOCTL_H#defineU_DMA_BUF_IOCTL_H#include<linux/ioctl.h>#defineDEFINE_U_DMA_BUF_IOCTL_FLAGS(name,type,lo,hi) \static const int U_DMA_BUF_IOCTL_FLAGS_ ## name ## _SHIFT = (lo); \static const uint64_t U_DMA_BUF_IOCTL_FLAGS_ ## name ## _MASK = (((uint64_t)1UL << ((hi)-(lo)+1))-1); \static inline void SET_U_DMA_BUF_IOCTL_FLAGS_ ## name(type *p, int value) \{ \ const int shift = U_DMA_BUF_IOCTL_FLAGS_ ## name ## _SHIFT; \ const uint64_t mask = U_DMA_BUF_IOCTL_FLAGS_ ## name ## _MASK; \ p->flags &= ~(mask << shift); \ p->flags |= ((value & mask) << shift); \} \static inline int GET_U_DMA_BUF_IOCTL_FLAGS_ ## name(type *p) \{ \ const int shift = U_DMA_BUF_IOCTL_FLAGS_ ## name ## _SHIFT; \ const uint64_t mask = U_DMA_BUF_IOCTL_FLAGS_ ## name ## _MASK; \ return (int)((p->flags >> shift) & mask); \}typedefstruct {uint64_tflags;charversion[16];}u_dma_buf_ioctl_drv_info;DEFINE_U_DMA_BUF_IOCTL_FLAGS(IOCTL_VERSION ,u_dma_buf_ioctl_drv_info ,0,7)DEFINE_U_DMA_BUF_IOCTL_FLAGS(IN_KERNEL_FUNCTIONS,u_dma_buf_ioctl_drv_info ,8,8)DEFINE_U_DMA_BUF_IOCTL_FLAGS(USE_OF_DMA_CONFIG ,u_dma_buf_ioctl_drv_info ,12,12)DEFINE_U_DMA_BUF_IOCTL_FLAGS(USE_OF_RESERVED_MEM,u_dma_buf_ioctl_drv_info ,13,13)DEFINE_U_DMA_BUF_IOCTL_FLAGS(USE_QUIRK_MMAP ,u_dma_buf_ioctl_drv_info ,16,16)DEFINE_U_DMA_BUF_IOCTL_FLAGS(USE_QUIRK_MMAP_PAGE,u_dma_buf_ioctl_drv_info ,17,17)typedefstruct {uint64_tflags;uint64_tsize;uint64_taddr;}u_dma_buf_ioctl_dev_info;DEFINE_U_DMA_BUF_IOCTL_FLAGS(DMA_MASK ,u_dma_buf_ioctl_dev_info ,0,7)DEFINE_U_DMA_BUF_IOCTL_FLAGS(DMA_COHERENT,u_dma_buf_ioctl_dev_info ,9,9)DEFINE_U_DMA_BUF_IOCTL_FLAGS(MMAP_MODE ,u_dma_buf_ioctl_dev_info ,10,12)typedefstruct {uint64_tflags;uint64_tsize;uint64_toffset;}u_dma_buf_ioctl_sync_args;DEFINE_U_DMA_BUF_IOCTL_FLAGS(SYNC_CMD ,u_dma_buf_ioctl_sync_args,0,1)DEFINE_U_DMA_BUF_IOCTL_FLAGS(SYNC_DIR ,u_dma_buf_ioctl_sync_args,2,3)DEFINE_U_DMA_BUF_IOCTL_FLAGS(SYNC_MODE ,u_dma_buf_ioctl_sync_args,8,15)DEFINE_U_DMA_BUF_IOCTL_FLAGS(SYNC_OWNER ,u_dma_buf_ioctl_sync_args,16,16)enum {U_DMA_BUF_IOCTL_FLAGS_SYNC_CMD_FOR_CPU=1,U_DMA_BUF_IOCTL_FLAGS_SYNC_CMD_FOR_DEVICE=3};typedefstruct {uint64_tflags;uint64_tsize;uint64_toffset;uint64_taddr;intfd;}u_dma_buf_ioctl_export_args;DEFINE_U_DMA_BUF_IOCTL_FLAGS(EXPORT_FD_FLAGS,u_dma_buf_ioctl_export_args,0,31)#defineU_DMA_BUF_IOCTL_MAGIC 'U'#defineU_DMA_BUF_IOCTL_GET_DRV_INFO _IOR (U_DMA_BUF_IOCTL_MAGIC, 1, u_dma_buf_ioctl_drv_info)#defineU_DMA_BUF_IOCTL_GET_SIZE _IOR (U_DMA_BUF_IOCTL_MAGIC, 2, uint64_t)#defineU_DMA_BUF_IOCTL_GET_DMA_ADDR _IOR (U_DMA_BUF_IOCTL_MAGIC, 3, uint64_t)#defineU_DMA_BUF_IOCTL_GET_SYNC_OWNER _IOR (U_DMA_BUF_IOCTL_MAGIC, 4, uint32_t)#defineU_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU _IOW (U_DMA_BUF_IOCTL_MAGIC, 5, uint64_t)#defineU_DMA_BUF_IOCTL_SET_SYNC_FOR_DEVICE _IOW (U_DMA_BUF_IOCTL_MAGIC, 6, uint64_t)#defineU_DMA_BUF_IOCTL_GET_DEV_INFO _IOR (U_DMA_BUF_IOCTL_MAGIC, 7, u_dma_buf_ioctl_dev_info)#defineU_DMA_BUF_IOCTL_GET_SYNC _IOR (U_DMA_BUF_IOCTL_MAGIC, 8, u_dma_buf_ioctl_sync_args)#defineU_DMA_BUF_IOCTL_SET_SYNC _IOW (U_DMA_BUF_IOCTL_MAGIC, 9, u_dma_buf_ioctl_sync_args)#defineU_DMA_BUF_IOCTL_EXPORT _IOWR(U_DMA_BUF_IOCTL_MAGIC,10, u_dma_buf_ioctl_export_args)#endif/* #ifndef U_DMA_BUF_IOCTL_H */
#include<inttypes.h>#include<string.h>#include<fcntl.h>#include<unistd.h>#include<sys/ioctl.h>#include"u-dma-buf-ioctl.h"
This ioctl is for get driver information.The driver information obtained by this ioctl includes the driver support and version number.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_drv_infodrv_info= {0};intstatus=ioctl(fd,U_DMA_BUF_IOCTL_GET_DRV_INFO,&drv_info);intioctl_version=GET_U_DMA_BUF_IOCTL_FLAGS_IOCTL_VERSION(&drv_info);intin_kernel_function=GET_U_DMA_BUF_IOCTL_FLAGS_IN_KERNEL_FUNCTIONS(&drv_info);intuse_of_dma_config=GET_U_DMA_BUF_IOCTL_FLAGS_USE_OF_DMA_CONFIG(&drv_info);intuse_of_reserved_mem=GET_U_DMA_BUF_IOCTL_FLAGS_USE_OF_RESERVED_MEM(&drv_info);intuse_quirk_mmap=GET_U_DMA_BUF_IOCTL_FLAGS_USE_QUIRK_MMAP(&drv_info);intuse_quirk_mmap_page=GET_U_DMA_BUF_IOCTL_FLAGS_USE_QUIRK_MMAP_PAGE(&drv_info);char*drv_version=strdup(&drv_info.version[0]);close(fd); }
This ioctl is for get size of a DMA Buffer.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {uint64_tbuf_size;status=ioctl(fd,U_DMA_BUF_IOCTL_GET_SIZE,&buf_size);close(fd); }
This ioctl is for get physical address of a DMA Buffer.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {uint64_tphys_addr;status=ioctl(fd,U_DMA_BUF_IOCTL_GET_DMA_ADDR,&phys_addr);close(fd); }
This ioctl is for get owner of the memory block in the manual cache management mode.If this value is 1, the buffer is owned by the device.If this value is 0, the buffer is owned by the cpu.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {intsync_owner;status=ioctl(fd,U_DMA_BUF_IOCTL_GET_SYNC_OWNER,&sync_owner);close(fd); }
This ioctl writes a value to sync_for_cpu.If '1' is written to sync_for_cpu, ifsync_direction
is 2(=DMA_FROM_DEVICE) or 0(=DMA_BIDIRECTIONAL),the write to the device file invalidates a cache specified bysync_offset
andsync_size
.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {uint64_tsync_for_cpu=1;status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU,&sync_for_cpu);close(fd); }
The value written to sync_for_cpu can include sync_offset, sync_size, and sync_direction.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {unsigned longsync_offset=0;unsigned longsync_size=0x10000;unsignedintsync_direction=0;uint64_tsync_for_cpu= ((uint64_t)(sync_offset&0xFFFFFFFF) <<32) | ((uint64_t)(sync_size&0xFFFFFFF0) <<0) |((uint64_t)(sync_direction&0x00000003) <<2) |0x00000001;status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC_FOR_CPU,&sync_for_cpu);close(fd); }
The sync_offset/sync_size/sync_direction specified bysync_for_cpu
is temporary and does not affect thesync_offset
orsync_size
orsync_direction
device files.
Details of manual cache management is described in the next section.
This ioctl writes a value to sync_for_device.If '1' is written to sync_for_device, ifsync_direction
is 1(=DMA_TO_DEVICE) or 0(=DMA_BIDIRECTIONAL),the write to the device file flushes a cache specified bysync_offset
andsync_size
(i.e. thecached data, if any, will be updated with data on DDR memory).
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {uint64_tsync_for_device=1;status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC_FOR_DEVICE,&sync_for_device);close(fd); }
The value written to sync_for_cpu can include sync_offset, sync_size, and sync_direction.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {unsigned longsync_offset=0;unsigned longsync_size=0x10000;unsignedintsync_direction=0;uint64_tsync_for_device= ((uint64_t)(sync_offset&0xFFFFFFFF) <<32) | ((uint64_t)(sync_size&0xFFFFFFF0) <<0) |((uint64_t)(sync_direction&0x00000003) <<2) |0x00000001;status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC_FOR_DEVICE,&sync_for_device);close(fd); }
The sync_offset/sync_size/sync_direction specified bysync_for_cpu
is temporary and does not affect thesync_offset
orsync_size
orsync_direction
device files.
Details of manual cache management is described in the next section.
This ioctl is for get device information.The device information obtained by this ioctl includes physical address and size of a DMA Buffer.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_dev_infodev_info= {0};status=ioctl(fd,U_DMA_BUF_IOCTL_GET_DEV_INFO,&dev_info);intdma_mask=GET_U_DMA_BUF_IOCTL_FLAGS_DMA_MASK(&dev_info);intdma_coherent=GET_U_DMA_BUF_IOCTL_FLAGS_DMA_COHERENT(&dev_info);intmmap_mode=GET_U_DMA_BUF_IOCTL_FLAGS_MMAP_MODE(&dev_info);uint64_tphys_addr=dev_info.addr;uint64_tbuf_size=dev_info.size;close(fd); }
This ioctl is for get sync_offset/sync_size/sync_direction/sync_owner/sync_mode.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_sync_argssync_args= {0};status=ioctl(fd,U_DMA_BUF_IOCTL_GET_SYNC,&sync_args);uint64_tsync_offset=sync_args.offset;uint64_tsync_size=sync_args.size;intsync_direction=GET_U_DMA_BUF_IOCTL_FLAGS_SYNC_DIR(&sync_args);intsync_owner=GET_U_DMA_BUF_IOCTL_FLAGS_SYNC_OWNER(&sync_args);intsync_mode=GET_U_DMA_BUF_IOCTL_FLAGS_SYNC_MODE(&sync_args);close(fd); }
This ioctl is for set sync_offset/sync_size/sync_direction/sync_mode.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_sync_argssync_args= {0};uint64_tsync_offset=0;uint64_tsync_size=0x10000;intsync_direction=0;sync_args.offset=sync_offset;sync_args.size=sync_size;SET_U_DMA_BUF_IOCTL_FLAGS_SYNC_DIR(&sync_args,sync_direction);status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC,&sync_args);close(fd); }
Also, by specifying a sync command in flags of the sync_args of this ioctl, sync_for_cpu or sync_for_device can be triggered.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_sync_argssync_args= {0};uint64_tsync_offset=0;uint64_tsync_size=0x10000;intsync_direction=0;sync_args.offset=sync_offset;sync_args.size=sync_size;SET_U_DMA_BUF_IOCTL_FLAGS_SYNC_DIR(&sync_args,sync_direction);SET_U_DMA_BUF_IOCTL_FLAGS_SYNC_CMD(&sync_args,U_DMA_BUF_IOCTL_FLAGS_SYNC_CMD_FOR_CPU);status=ioctl(fd,U_DMA_BUF_IOCTL_SET_SYNC,&sync_args);close(fd); }
Details of manual cache management is described in the next section.
This ioctl is currently under development. Please use with caution.
This ioctl exports the specified range of u-dma-buf as PRIME DMA-BUFs.PRIME DMA-BUFs here is an abbreviation for the Linux kernel's internal DMA buffer sharing API.It provides a general mechanism for sharing DMA buffers between multiple devices managed by different types of device drivers.
The offset field of u_dma_buf_ioctl_args specifies the offset of the area.The size field of u_dma_buf_ioctl_args specifies the size of the area.The fd_flags field of u_dma_buf_ioctl_args specifies O_CLOEXEC, O_SYNC, O_RDWR, O_RDONLY, O_WRONLY.Then execute ioctl U_DMA_BUF_IOCTL_EXPORT.If successful, the fd field of u_dma_buf_ioctl_export_args contains a file descriptor indicating PRIME DMA-BUFs.The resulting file descriptors indicating PRIME DMA-BUFs can be used to access the buffers using mmap().In some cases, it is necessary to synchronize with the CPU cache before and after accessing buffers.In such a case, execute ioctl DMA_BUF_IOCTL_SYNC with file descriptors indicating PRIME DMA-BUFs.
An example is shown below.
if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {u_dma_buf_ioctl_export_argsexport_args;export_args.offset=0x00000000;export_args.size=buf_size;SET_U_DMA_BUF_IOCTL_FLAGS_EXPORT_FD_FLAGS(&export_args,O_CLOEXEC |O_RDWR);status=ioctl(fd,U_DMA_BUF_IOCTL_EXPORT,&export_args);buf=mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,export_args.fd,0);structdma_buf_syncsync_start= {.flags=DMA_BUF_SYNC_START |DMA_BUF_SYNC_RW};status=ioctl(export_args.fd,DMA_BUF_IOCTL_SYNC,&sync_start);/* Do some read/write access to buf */structdma_buf_syncsync_end= {.flags=DMA_BUF_SYNC_END |DMA_BUF_SYNC_RW};status=ioctl(export_args.fd,DMA_BUF_IOCTL_SYNC,&sync_end );close(fd); }
CPU usually accesses to a DMA buffer on the main memory using cache, and a hardwareaccelerator logic accesses to data stored in the DMA buffer on the main memory.In this situation, coherency between data stored on CPU cache and them on the mainmemory should be considered carefully.
When hardware assures the coherency, CPU cache can be turned on without additionaltreatment. For example, ZYNQ provides ACP (Accelerator Coherency Port), and thecoherency is maintained by hardware as long as the accelerator accesses to the mainmemory via this port.
In this case, accesses from CPU to the main memory can be fast by using CPU cacheas usual. To enable CPU cache on the DMA buffer allocated by u-dma-buf, open u-dma-bufwithout specifying theO_SYNC
flag.
/* To enable CPU cache on the DMA buffer, *//* open u-dma-buf without specifying the `O_SYNC` flag. */if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {buf=mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);/* Read/write access to the buffer */close(fd); }
The manual management of cache, described in the following section, will not benecessary when hardware maintains the coherency.
If thedma-coherent
property is specified in the device tree, specify thatcoherency can be guaranteed with hardware. In this case, the cache control describedin "2. Manual cache management with the CPU cache still being enabled" describedlater is not performed.
To maintain coherency of data between CPU and the main memory, another coherencymechanism is necessary. u-dma-buf supports two different ways of coherency maintenance;one is to disable CPU cache, and the other is to involve manual cache flush/invalidationwith CPU cache being enabled.
To disable CPU cache of allocated DMA buffer, specify theO_SYNC
flag when opening u-dma-buf.
/* To disable CPU cache on the DMA buffer, *//* open u-dma-buf with the `O_SYNC` flag. */if ((fd=open("/dev/udmabuf0",O_RDWR |O_SYNC))!=-1) {buf=mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);/* Read/write access to the buffer */close(fd); }
As listed below,sync_mode
can be used to configure the cache behavior when theO_SYNC
flag is present inopen()
:
- sync_mode=0: CPU cache is enabled regardless of the
O_SYNC
flag presence. - sync_mode=1: If
O_SYNC
is specified, CPU cache is disabled.IfO_SYNC
is not specified, CPU cache is enabled. - sync_mode=2: If
O_SYNC
is specified, CPU cache is disabled but CPU useswrite-combine when writing data to DMA buffer improves performance by combiningmultiple write accesses. IfO_SYNC
is not specified, CPU cache is enabled. - sync_mode=3: If
O_SYNC
is specified, DMA coherency mode is used.IfO_SYNC
is not specified, CPU cache is enabled. - sync_mode=4: CPU cache is enabled regardless of the
O_SYNC
flag presence. - sync_mode=5: CPU cache is disabled regardless of the
O_SYNC
flag presence. - sync_mode=6: CPU uses write-combine to write data to DMA buffer regardless of
O_SYNC
presence. - sync_mode=7: DMA coherency mode is used regardless of
O_SYNC
presence.
As a practical example, the execution times of a sample program listed below weremeasured under several test conditions as presented in the table.
intcheck_buf(unsignedchar*buf,unsignedintsize){intm=256;intn=10;inti,k;interror_count=0;while(--n>0) {for(i=0;i<size;i=i+m) {m= (i+256<size) ?256 : (size-i);for(k=0;k<m;k++) {buf[i+k]= (k&0xFF); }for(k=0;k<m;k++) {if (buf[i+k]!= (k&0xFF)) {error_count++; } } } }returnerror_count;}intclear_buf(unsignedchar*buf,unsignedintsize){intn=100;interror_count=0;while(--n>0) {memset((void*)buf,0,size); }returnerror_count;}
Table-1 The execution time of the sample programcheckbuf
sync_mode | O_SYNC | DMA buffer size | ||
1MByte | 5MByte | 10MByte | ||
0 | Not specified | 0.437[sec] | 2.171[sec] | 4.340[sec] |
Specified | 0.437[sec] | 2.171[sec] | 4.340[sec] | |
1 | Not specified | 0.434[sec] | 2.179[sec] | 4.337[sec] |
Specified | 2.283[sec] | 11.414[sec] | 22.830[sec] | |
2 | Not specified | 0.434[sec] | 2.169[sec] | 4.337[sec] |
Specified | 1.616[sec] | 8.262[sec] | 16.562[sec] | |
3 | Not specified | 0.434[sec] | 2.169[sec] | 4.337[sec] |
Specified | 1.600[sec] | 8.391[sec] | 16.587[sec] | |
4 | Not specified | 0.437[sec] | 2.171[sec] | 4.337[sec] |
Specified | 0.437[sec] | 2.171[sec] | 4.337[sec] | |
5 | Not specified | 2.283[sec] | 11.414[sec] | 22.809[sec] |
Specified | 2.283[sec] | 11.414[sec] | 22.840[sec] | |
6 | Not specified | 1.655[sec] | 8.391[sec] | 16.587[sec] |
Specified | 1.655[sec] | 8.391[sec] | 16.587[sec] | |
7 | Not specified | 1.655[sec] | 8.391[sec] | 16.587[sec] |
Specified | 1.655[sec] | 8.391[sec] | 16.587[sec] |
Table-2 The execution time of the sample programclearbuf
sync_mode | O_SYNC | DMA buffer size | ||
1MByte | 5MByte | 10MByte | ||
0 | Not specified | 0.067[sec] | 0.359[sec] | 0.713[sec] |
Specified | 0.067[sec] | 0.362[sec] | 0.716[sec] | |
1 | Not specified | 0.067[sec] | 0.362[sec] | 0.718[sec] |
Specified | 0.912[sec] | 4.563[sec] | 9.126[sec] | |
2 | Not specified | 0.068[sec] | 0.360[sec] | 0.721[sec] |
Specified | 0.063[sec] | 0.310[sec] | 0.620[sec] | |
3 | Not specified | 0.068[sec] | 0.361[sec] | 0.715[sec] |
Specified | 0.062[sec] | 0.310[sec] | 0.620[sec] | |
4 | Not specified | 0.068[sec] | 0.360[sec] | 0.718[sec] |
Specified | 0.067[sec] | 0.360[sec] | 0.710[sec] | |
5 | Not specified | 0.913[sec] | 4.562[sec] | 9.126[sec] |
Specified | 0.913[sec] | 4.562[sec] | 9.126[sec] | |
6 | Not specified | 0.062[sec] | 0.310[sec] | 0.618[sec] |
Specified | 0.062[sec] | 0.310[sec] | 0.619[sec] | |
7 | Not specified | 0.062[sec] | 0.310[sec] | 0.620[sec] |
Specified | 0.062[sec] | 0.310[sec] | 0.621[sec] |
Note: on usingO_SYNC
flag on ARM64
For v2.1.1 or earier, udmabuf usedpgprot_writecombine()
on ARM64 and sync_mode=1(noncached). The reason is that a bus error occurred in memset() in udmabuf_test.c when usingpgprot_noncached()
.
However, as reported in#28, when usingpgprot_writecombine()
on ARM64, it was found that there was a problem with cache coherency.
Therefore, since v2.1.2, when sync_mode = 1, it was changed to usepgprot_noncached()
. This is because cache coherency issues are very difficult to understand and difficult to debug. Rather than worrying about the cache coherency problem, we decided that it was easier to understand when the bus error occurred.
This change requires alignment attention when using O_SYNC cache control on ARM64. You probably won't be able to use memset().
If a problem occurs, either cache coherency is maintained by hardware, or use a method described below that manually cache management with CPU cache still being enabled.
As explained above, by opening u-dma-buf without specifying theO_SYNC
flag, CPU cache can be left turned on.However, for ARM or ARM64, this is only possible if quirk-mmap is enabled.quirk-mmap will be discussed in detail later.
/* To enable CPU cache on the DMA buffer, *//* open u-dma-buf without specifying the `O_SYNC` flag. */if ((fd=open("/dev/udmabuf0",O_RDWR))!=-1) {buf=mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);/* Read/write access to the buffer */close(fd); }
To manually manage cache coherency, users need to follow the
- Specify a memory area shared between CPU and accelerator via
sync_offset
andsync_size
device files.sync_offset
accepts an offset from the startaddress of the allocated buffer in units of bytes.The size of the shared memory area should be set tosync_size
in units of bytes. - Data transfer direction should be set to
sync_direction
. If the acceleratorperforms only read accesses to the memory area,sync_direction
should be setto1(=DMA_TO_DEVICE)
, and to2(=DMA_FROM_DEVICE)
if only write accesses. - If the accelerator reads and writes data from/to the memory area,
sync_direction
should be set to0(=DMA_BIDIRECTIONAL)
.
Following the above configuration,sync_for_cpu
and/orsync_for_device
shouldbe used to set the owner of the buffer specified by the above-mentioned offset andthe size.
When CPU accesses to the buffer, '1' should be written tosync_for_cpu
to setCPU as the owner. Upon the write tosync_for_cpu
, CPU cache is invalidated ifsync_direction
is2(=DMA_FROM_DEVICE)
or0(=DMA_BIDIRECTIONAL)
.Once CPU is becomes the owner of the buffer, the accelerator cannot access the buffer.
On the other hand, when the accelerator needs to access the buffer, '1' should bewritten tosync_for_device
to change ownership of the buffer to the accelerator.Upon the write tosync_for_device
, the CPU cache of the specified memory area isflushed using data on the main memory.
However, if thedma-coherent
property is specified in the device tree, CPU cacheis not invalidated and flushed.
Note: What is quirk-mmap?
The Linux Kernel mainline turns off caching when doing mmap() for architecturessuch as ARM and ARM64 where cache aliasing problems can occur.
However, u-dma-buf provides quirk-mmap to enable caching in cases where the abovearchitecture does not cause cache alias problems.The quirk-mmap is u-dma-buf's own mmap mechanism and does not utilize the dma_mmap_coherent()provided by the dma-mapping API in the linux kernel.This may cause problems in some cases, so please be careful when using it.
The programming language "Python" provides an extension called "NumPy".This section explains how to do the same operation as "ndarry" by mapping the DMAbuffer allocated in the kernel withmemmap
of "NumPy" with u-dma-buf.
importnumpyasnpclassUdmabuf:"""A simple u-dma-buf class"""def__init__(self,name):self.name=nameself.device_name='/dev/%s'%self.nameself.class_path='/sys/class/u-dma-buf/%s'%self.nameself.phys_addr=self.get_value('phys_addr',16)self.buf_size=self.get_value('size')self.sync_offset=Noneself.sync_size=Noneself.sync_direction=Nonedefmemmap(self,dtype,shape):self.item_size=np.dtype(dtype).itemsizeself.array=np.memmap(self.device_name,dtype=dtype,mode='r+',shape=shape)returnself.arraydefget_value(self,name,radix=10):value=Noneforlineinopen(self.class_path+'/'+name):value=int(line,radix)breakreturnvaluedefset_value(self,name,value):f=open(self.class_path+'/'+name,'w')f.write(str(value))f.closedefset_sync_area(self,direction=None,offset=None,size=None):ifoffsetisNone:self.sync_offset=self.get_value('sync_offset')else:self.set_value('sync_offset',offset)self.sync_offset=offsetifsizeisNone:self.sync_size=self.get_value('sync_size')else:self.set_value('sync_size',size)self.sync_size=sizeifdirectionisNone:self.sync_direction=self.get_value('sync_direction')else:self.set_value('sync_direction',direction)self.sync_direction=directiondefset_sync_to_device(self,offset=None,size=None):self.set_sync_area(1,offset,size)defset_sync_to_cpu(self,offset=None,size=None):self.set_sync_area(2,offset,size)defset_sync_to_bidirectional(self,offset=None,size=None):self.set_sync_area(3,offset,size)defsync_for_cpu(self):self.set_value('sync_for_cpu',1)defsync_for_device(self):self.set_value('sync_for_device',1)
fromudmabufimportUdmabufimportnumpyasnpimporttimedeftest_1(a):foriinrange (0,9):a*=0a+=0x31if__name__=='__main__':udmabuf=Udmabuf('udmabuf0')test_dtype=np.uint8test_size=udmabuf.buf_size//(np.dtype(test_dtype).itemsize)udmabuf.memmap(dtype=test_dtype,shape=(test_size))comparison=np.zeros(test_size,dtype=test_dtype)print ("test_size : %d"%test_size)start=time.time()test_1(udmabuf.array)elapsed_time=time.time()-startprint ("udmabuf0 : elapsed_time:{0}".format(elapsed_time)+"[sec]")start=time.time()test_1(comparison)elapsed_time=time.time()-startprint ("comparison : elapsed_time:{0}".format(elapsed_time)+"[sec]")ifnp.array_equal(udmabuf.array,comparison):print ("udmabuf0 == comparison : OK")else:print ("udmabuf0 != comparison : NG")
Install u-dma-buf. In this example, 8MiB DMA buffer is reserved as "udmabuf0".
zynq# insmod u-dma-buf.ko udmabuf0=8388608[ 1183.911189] u-dma-buf udmabuf0: driver version = 5.2.0[ 1183.921238] u-dma-buf udmabuf0: major number = 240[ 1183.931275] u-dma-buf udmabuf0: minor number = 0[ 1183.936063] u-dma-buf udmabuf0: phys address = 0x0000000041600000[ 1183.942328] u-dma-buf udmabuf0: buffer size = 8388608[ 1183.947641] u-dma-buf u-dma-buf.0: driver installed.
Executing the script in the previous section gives the following results.
zynq# python3 udmabuf_test.pytest_size : 8388608udmabuf0 : elapsed_time:0.11204075813293457[sec]comparison : elapsed_time:0.11488151550292969[sec]udmabuf0 == comparison : OK
The execution time for "udmabuf0"(buffer area secured in the kernel) and the sameoperation with ndarray (comparison) were almost the same.That is, it seems that "udmabuf0" is also effective CPU cache.
I confirmed the contents of "udmabuf0" after running this script.
zynq# dd if=/dev/udmabuf0 of=udmabuf0.bin bs=83886081+0 records in1+0 records out8388608 bytes (8.4 MB) copied, 0.151531 s, 55.4 MB/sshell#shell#od -t x1 udmabuf0.bin0000000 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31*40000000
After executing the script, it was confirmed that the result of the execution remainsin the buffer. Just to be sure, let's check that NumPy can read it.
zynq# pythonPython 2.7.9 (default, Aug 13 2016, 17:56:53)[GCC 4.9.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import numpy as np>>> a = np.memmap('/dev/udmabuf0', dtype=np.uint8, mode='r+', shape=(8388608))>>> amemmap([49, 49, 49, ..., 49, 49, 49], dtype=uint8)>>> a.itemsize1>>> a.size8388608>>>
About
User space mappable dma buffer device driver for Linux.