DRM Driver uAPI¶
drm/i915 uAPI¶
uevents generated by i915 on its device node
- I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
event from the GPU L3 cache. Additional information supplied is ROW,BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keeptrack of these events, and if a specific cache-line seems to have apersistent error, remap it with the L3 remapping tool supplied inintel-gpu-tools. The value supplied with the event is always 1.
- I915_ERROR_UEVENT - Generated upon error detection, currently only via
hangcheck. The error detection event is a good indicator of when thingsbegan to go badly. The value supplied with the event is a 1 upon errordetection, and a 0 upon reset completion, signifying no more errorexists. NOTE: Disabling hangcheck or reset via module parameter willcause the related events to not be seen.
- I915_RESET_UEVENT - Event is generated just before an attempt to reset the
GPU. The value supplied with the event is always 1. NOTE: Disablereset via module parameter will cause this event to not be seen.
- structi915_user_extension¶
Base class for defining a chain of extensions
Definition:
struct i915_user_extension { __u64 next_extension; __u32 name; __u32 flags; __u32 rsvd[4];};Members
next_extensionPointer to the next
structi915_user_extension, or zero if the end.nameName of the extension.
Note that the name here is just some integer.
Also note that the name space for this is not global for the wholedriver, but rather its scope/meaning is limited to the specific pieceof uAPI which has embedded the
structi915_user_extension.flagsMBZ
All undefined bits must be zero.
rsvdMBZ
Reserved for future use; must be zero.
Description
Many interfaces need to grow over time. In most cases we can simplyextend thestructand have userspace pass in more data. Another option,as demonstrated by Vulkan’s approach to providing extensions for forwardand backward compatibility, is to use a list of optional structs toprovide those extra details.
The key advantage to using an extension chain is that it allows us toredefine the interface more easily than an ever growingstructofincreasing complexity, and for large parts of that interface to beentirely optional. The downside is more pointer chasing; chasing acrossthe __user boundary with pointers encapsulated inside u64.
Example chaining:
structi915_user_extensionext3{.next_extension=0,// end.name=...,};structi915_user_extensionext2{.next_extension=(uintptr_t)&ext3,.name=...,};structi915_user_extensionext1{.next_extension=(uintptr_t)&ext2,.name=...,};
Typically thestructi915_user_extension would be embedded in some uAPIstruct, and in this case we would feed it the head of the chain(i.e ext1),which would then apply all of the above extensions.
- enumdrm_i915_gem_engine_class¶
uapi engine type enumeration
Constants
I915_ENGINE_CLASS_RENDERRender engines support instructions used for 3D, Compute (GPGPU),and programmable media workloads. These instructions fetch data anddispatch individual work items to threads that operate in parallel.The threads run small programs (called “kernels” or “shaders”) onthe GPU’s execution units (EUs).
I915_ENGINE_CLASS_COPYCopy engines (also referred to as “blitters”) support instructionsthat move blocks of data from one location in memory to another,or that fill a specified location of memory with fixed data.Copy engines can perform pre-defined logical or bitwise operationson the source, destination, or pattern data.
I915_ENGINE_CLASS_VIDEOVideo engines (also referred to as “bit stream decode” (BSD) or“vdbox”) support instructions that perform fixed-function mediadecode and encode.
I915_ENGINE_CLASS_VIDEO_ENHANCEVideo enhancement engines (also referred to as “vebox”) supportinstructions related to image enhancement.
I915_ENGINE_CLASS_COMPUTECompute engines support a subset of the instructions availableon render engines: compute engines support Compute (GPGPU) andprogrammable media workloads, but do not support the 3D pipeline.
I915_ENGINE_CLASS_INVALIDPlaceholder value to represent an invalid engine class assignment.
Description
Different engines serve different roles, and there may be more than oneengine serving each role. Thisenumprovides a classification of the roleof the engine, which may be used when requesting operations to be performedon a certain subset of engines, or for providing information about thatgroup.
- structi915_engine_class_instance¶
Engine class/instance identifier
Definition:
struct i915_engine_class_instance { __u16 engine_class;#define I915_ENGINE_CLASS_INVALID_NONE -1;#define I915_ENGINE_CLASS_INVALID_VIRTUAL -2; __u16 engine_instance;};Members
engine_classEngine class from
enumdrm_i915_gem_engine_classengine_instanceEngine instance.
Description
There may be more than one engine fulfilling any role within the system.Each engine of a class is given a unique instance number and thereforeany engine can be specified by its class:instance tuplet. APIs that allowaccess to any engine in the system will usestructi915_engine_class_instancefor this identification.
perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
- structdrm_i915_getparam¶
Driver parameter query structure.
Definition:
struct drm_i915_getparam { __s32 param; int __user *value;};Members
paramDriver parameter to query.
valueAddress of memory where queried value should be put.
WARNING: Using pointers instead of fixed-size u64 means we need to writecompat32 code. Don’t repeat this mistake.
- typedrm_i915_getparam_t¶
Driver parameter query structure. See
structdrm_i915_getparam.
- structdrm_i915_gem_mmap_offset¶
Retrieve an offset so we can mmap this buffer object.
Definition:
struct drm_i915_gem_mmap_offset { __u32 handle; __u32 pad; __u64 offset; __u64 flags;#define I915_MMAP_OFFSET_GTT 0;#define I915_MMAP_OFFSET_WC 1;#define I915_MMAP_OFFSET_WB 2;#define I915_MMAP_OFFSET_UC 3;#define I915_MMAP_OFFSET_FIXED 4; __u64 extensions;};Members
handleHandle for the object being mapped.
padMust be zero
offsetThe fake offset to use for subsequent mmap call
This is a fixed-size type for 32/64 compatibility.
flagsFlags for extended behaviour.
It is mandatory that one of theMMAP_OFFSET typesshould be included:
I915_MMAP_OFFSET_GTT: Use mmap with the object bound to GTT. (Write-Combined)
I915_MMAP_OFFSET_WC: Use Write-Combined caching.
I915_MMAP_OFFSET_WB: Use Write-Back caching.
I915_MMAP_OFFSET_FIXED: Use object placement to determine caching.
On devices with local memoryI915_MMAP_OFFSET_FIXED is the only validtype. On devices without local memory, this caching mode is invalid.
As caching mode when specifyingI915_MMAP_OFFSET_FIXED, WC or WB willbe used, depending on the object placement on creation. WB will be usedwhen the object can only exist in system memory, WC otherwise.
extensionsZero-terminated chain of extensions.
No current extensions defined; mbz.
Description
Thisstructis passed as argument to theDRM_IOCTL_I915_GEM_MMAP_OFFSET ioctl,and is used to retrieve the fake offset to mmap an object specified byhandle.
The legacy way of usingDRM_IOCTL_I915_GEM_MMAP is removed on gen12+.DRM_IOCTL_I915_GEM_MMAP_GTT is an older supported alias to this struct, but will behaveas setting theextensions to 0, andflags toI915_MMAP_OFFSET_GTT.
- structdrm_i915_gem_set_domain¶
Adjust the objects write or read domain, in preparation for accessing the pages via some CPU domain.
Definition:
struct drm_i915_gem_set_domain { __u32 handle; __u32 read_domains; __u32 write_domain;};Members
handleHandle for the object.
read_domainsNew read domains.
write_domainNew write domain.
Note that having something in the write domain implies it’s in theread domain, and only that read domain.
Description
Specifying a new write or read domain will flush the object out of theprevious domain(if required), before then updating the objects domaintracking with the new domain.
Note this might involve waiting for the object first if it is still active onthe GPU.
Supported values forread_domains andwrite_domain:
I915_GEM_DOMAIN_WC: Uncached write-combined domain
I915_GEM_DOMAIN_CPU: CPU cache domain
I915_GEM_DOMAIN_GTT: Mappable aperture domain
All other domains are rejected.
Note that for discrete, starting from DG1, this is no longer supported, andis instead rejected. On such platforms the CPU domain is effectively static,where we also only support a singledrm_i915_gem_mmap_offset cache mode,which can’t be set explicitly and instead depends on the object placements,as per the below.
Implicit caching rules, starting from DG1:
If any of the object placements (see
drm_i915_gem_create_ext_memory_regions)contain I915_MEMORY_CLASS_DEVICE then the object will be allocated andmapped as write-combined only.Everything else is always allocated and mapped as write-back, with theguarantee that everything is also coherent with the GPU.
Note that this is likely to change in the future again, where we might needmore flexibility on future devices, so making this all explicit as part of anewdrm_i915_gem_create_ext extension is probable.
- structdrm_i915_gem_exec_fence¶
An input or output fence for the execbuf ioctl.
Definition:
struct drm_i915_gem_exec_fence { __u32 handle; __u32 flags;#define I915_EXEC_FENCE_WAIT (1<<0);#define I915_EXEC_FENCE_SIGNAL (1<<1);#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1));};Members
handleUser’s handle for a drm_syncobj to wait on or signal.
flagsSupported flags are:
I915_EXEC_FENCE_WAIT:Wait for the input fence before request submission.
I915_EXEC_FENCE_SIGNAL:Return request completion fence as output
Description
The request will wait for input fence to signal before submission.
The returned output fence will be signaled after the completion of therequest.
- structdrm_i915_gem_execbuffer_ext_timeline_fences¶
Timeline fences for execbuf ioctl.
Definition:
struct drm_i915_gem_execbuffer_ext_timeline_fences {#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0; struct i915_user_extension base; __u64 fence_count; __u64 handles_ptr; __u64 values_ptr;};Members
baseExtension link. See
structi915_user_extension.fence_countNumber of elements in thehandles_ptr &value_ptrarrays.
handles_ptrPointer to an array of
structdrm_i915_gem_exec_fenceof lengthfence_count.values_ptrPointer to an array of u64 values of lengthfence_count.Values must be 0 for a binary drm_syncobj. A Value of 0 for atimeline drm_syncobj is invalid as it turns a drm_syncobj into abinary one.
Description
This structure describes an array of drm_syncobj and associated points fortimeline variants of drm_syncobj. It is invalid to append this structure tothe execbuf if I915_EXEC_FENCE_ARRAY is set.
- structdrm_i915_gem_execbuffer2¶
Structure for DRM_I915_GEM_EXECBUFFER2 ioctl.
Definition:
struct drm_i915_gem_execbuffer2 { __u64 buffers_ptr; __u32 buffer_count; __u32 batch_start_offset; __u32 batch_len; __u32 DR1; __u32 DR4; __u32 num_cliprects; __u64 cliprects_ptr; __u64 flags;#define I915_EXEC_RING_MASK (0x3f);#define I915_EXEC_DEFAULT (0<<0);#define I915_EXEC_RENDER (1<<0);#define I915_EXEC_BSD (2<<0);#define I915_EXEC_BLT (3<<0);#define I915_EXEC_VEBOX (4<<0);#define I915_EXEC_CONSTANTS_MASK (3<<6);#define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6);#define I915_EXEC_CONSTANTS_ABSOLUTE (1<<6);#define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6);#define I915_EXEC_GEN7_SOL_RESET (1<<8);#define I915_EXEC_SECURE (1<<9);#define I915_EXEC_IS_PINNED (1<<10);#define I915_EXEC_NO_RELOC (1<<11);#define I915_EXEC_HANDLE_LUT (1<<12);#define I915_EXEC_BSD_SHIFT (13);#define I915_EXEC_BSD_MASK (3 << I915_EXEC_BSD_SHIFT);#define I915_EXEC_BSD_DEFAULT (0 << I915_EXEC_BSD_SHIFT);#define I915_EXEC_BSD_RING1 (1 << I915_EXEC_BSD_SHIFT);#define I915_EXEC_BSD_RING2 (2 << I915_EXEC_BSD_SHIFT);#define I915_EXEC_RESOURCE_STREAMER (1<<15);#define I915_EXEC_FENCE_IN (1<<16);#define I915_EXEC_FENCE_OUT (1<<17);#define I915_EXEC_BATCH_FIRST (1<<18);#define I915_EXEC_FENCE_ARRAY (1<<19);#define I915_EXEC_FENCE_SUBMIT (1 << 20);#define I915_EXEC_USE_EXTENSIONS (1 << 21);#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1)); __u64 rsvd1; __u64 rsvd2;};Members
buffers_ptrPointer to a list of gem_exec_object2 structs
buffer_countNumber of elements inbuffers_ptr array
batch_start_offsetOffset in the batchbuffer to start executionfrom.
batch_lenLength in bytes of the batch buffer, starting from thebatch_start_offset. If 0, length is assumed to be the batch bufferobject size.
DR1deprecated
DR4deprecated
num_cliprectsSeecliprects_ptr
cliprects_ptrKernel clipping was a DRI1 misfeature.
It is invalid to use this field if I915_EXEC_FENCE_ARRAY orI915_EXEC_USE_EXTENSIONS flags are not set.
If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an arrayof
drm_i915_gem_exec_fenceandnum_cliprects is the length of thearray.If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to asingle
i915_user_extensionand num_cliprects is 0.flagsExecbuf flags
rsvd1Context id
rsvd2in and out sync_file file descriptors.
When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, thelower 32 bits of this field will have the in sync_file fd (input).
When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of thisfield will have the out sync_file fd (output).
- structdrm_i915_gem_caching¶
Set or get the caching for given object handle.
Definition:
struct drm_i915_gem_caching { __u32 handle;#define I915_CACHING_NONE 0;#define I915_CACHING_CACHED 1;#define I915_CACHING_DISPLAY 2; __u32 caching;};Members
handleHandle of the buffer to set/get the caching level.
cachingThe GTT caching level to apply or possible return value.
The supportedcaching values:
I915_CACHING_NONE:
GPU access is not coherent with CPU caches. Default for machineswithout an LLC. This means manual flushing might be needed, if wewant GPU access to be coherent.
I915_CACHING_CACHED:
GPU access is coherent with CPU caches and furthermore the data iscached in last-level caches shared between CPU cores and the GPU GT.
I915_CACHING_DISPLAY:
Special GPU caching mode which is coherent with the scanout engines.Transparently falls back to I915_CACHING_NONE on platforms where nospecial cache mode (like write-through or gfdt flushing) isavailable. The kernel automatically sets this mode when using abuffer as a scanout target. Userspace can manually set this mode toavoid a costly stall and clflush in the hotpath of drawing the firstframe.
Description
Allow userspace to control the GTT caching bits for a given object when theobject is later mapped through the ppGTT(or GGTT on older platforms lackingppGTT support, or if the object is used for scanout). Note that this mightrequire unbinding the object from the GTT first, if its current caching valuedoesn’t match.
Note that this all changes on discrete platforms, starting from DG1, theset/get caching is no longer supported, and is now rejected. Instead the CPUcaching attributes(WB vs WC) will become an immutable creation time propertyfor the object, along with the GTT caching level. For now we don’t expose anynew uAPI for this, instead on DG1 this is all implicit, although this largelyshouldn’t matter since DG1 is coherent by default(without any way ofcontrolling it).
Implicit caching rules, starting from DG1:
If any of the object placements (see
drm_i915_gem_create_ext_memory_regions)contain I915_MEMORY_CLASS_DEVICE then the object will be allocated andmapped as write-combined only.Everything else is always allocated and mapped as write-back, with theguarantee that everything is also coherent with the GPU.
Note that this is likely to change in the future again, where we might needmore flexibility on future devices, so making this all explicit as part of anewdrm_i915_gem_create_ext extension is probable.
Side note: Part of the reason for this is that changing the at-allocation-time CPUcaching attributes for the pages might be required(and is expensive) if weneed to then CPU map the pages later with different caching attributes. Thisinconsistent caching behaviour, while supported on x86, is not universallysupported on other architectures. So for simplicity we opt for settingeverything at creation time, whilst also making it immutable, on discreteplatforms.
- structdrm_i915_gem_context_create_ext¶
Structure for creating contexts.
Definition:
struct drm_i915_gem_context_create_ext { __u32 ctx_id; __u32 flags;#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS (1u << 0);#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE (1u << 1);#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN (-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1)); __u64 extensions;#define I915_CONTEXT_CREATE_EXT_SETPARAM 0;#define I915_CONTEXT_CREATE_EXT_CLONE 1;};Members
ctx_idId of the created context (output)
flagsSupported flags are:
I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:
Extensions may be appended to this structure and driver must checkfor those. Seeextensions.
I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE
Created context will have single timeline.
extensionsZero-terminated chain of extensions.
I915_CONTEXT_CREATE_EXT_SETPARAM:Context parameter to set or query during context creation.See
structdrm_i915_gem_context_create_ext_setparam.I915_CONTEXT_CREATE_EXT_CLONE:This extension has been removed. On the off chance someone somewherehas attempted to use it, never re-use this extension number.
- structdrm_i915_gem_context_param¶
Context parameter to set or query.
Definition:
struct drm_i915_gem_context_param { __u32 ctx_id; __u32 size; __u64 param;#define I915_CONTEXT_PARAM_BAN_PERIOD 0x1;#define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2;#define I915_CONTEXT_PARAM_GTT_SIZE 0x3;#define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE 0x4;#define I915_CONTEXT_PARAM_BANNABLE 0x5;#define I915_CONTEXT_PARAM_PRIORITY 0x6;#define I915_CONTEXT_MAX_USER_PRIORITY 1023;#define I915_CONTEXT_DEFAULT_PRIORITY 0;#define I915_CONTEXT_MIN_USER_PRIORITY -1023;#define I915_CONTEXT_PARAM_SSEU 0x7;#define I915_CONTEXT_PARAM_RECOVERABLE 0x8;#define I915_CONTEXT_PARAM_VM 0x9;#define I915_CONTEXT_PARAM_ENGINES 0xa;#define I915_CONTEXT_PARAM_PERSISTENCE 0xb;#define I915_CONTEXT_PARAM_RINGSIZE 0xc;#define I915_CONTEXT_PARAM_PROTECTED_CONTENT 0xd;#define I915_CONTEXT_PARAM_LOW_LATENCY 0xe;#define I915_CONTEXT_PARAM_CONTEXT_IMAGE 0xf; __u64 value;};Members
ctx_idContext id
sizeSize of the parametervalue
paramParameter to set or query
valueContext parameter value to be set or queried
Virtual Engine uAPI
Virtual engine is a concept where userspace is able to configure a set ofphysical engines, submit a batch buffer, and let the driver execute it on anyengine from the set as it sees fit.
This is primarily useful on parts which have multiple instances of a sameclass engine, like for example GT3+ Skylake parts with their two VCS engines.
For instance userspace can enumerate all engines of a certain class using thepreviously describedEngine Discovery uAPI. After that userspace cancreate a GEM context with a placeholder slot for the virtual engine (usingI915_ENGINE_CLASS_INVALID andI915_ENGINE_CLASS_INVALID_NONE for classand instance respectively) and finally using theI915_CONTEXT_ENGINES_EXT_LOAD_BALANCE extension place a virtual engine inthe same reserved slot.
Example of creating a virtual engine and submitting a batch buffer to it:
I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual,2)={.base.name=I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,.engine_index=0,// Place this virtual engine into engine map slot 0.num_siblings=2,.engines={{I915_ENGINE_CLASS_VIDEO,0},{I915_ENGINE_CLASS_VIDEO,1},},};I915_DEFINE_CONTEXT_PARAM_ENGINES(engines,1)={.engines={{I915_ENGINE_CLASS_INVALID,I915_ENGINE_CLASS_INVALID_NONE}},.extensions=to_user_pointer(&virtual),// Chains after load_balance extension};structdrm_i915_gem_context_create_ext_setparamp_engines={.base={.name=I915_CONTEXT_CREATE_EXT_SETPARAM,},.param={.param=I915_CONTEXT_PARAM_ENGINES,.value=to_user_pointer(&engines),.size=sizeof(engines),},};structdrm_i915_gem_context_create_extcreate={.flags=I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,.extensions=to_user_pointer(&p_engines);};ctx_id=gem_context_create_ext(drm_fd,&create);// Now we have created a GEM context with its engine map containing a// single virtual engine. Submissions to this slot can go either to// vcs0 or vcs1, depending on the load balancing algorithm used inside// the driver. The load balancing is dynamic from one batch buffer to// another and transparent to userspace....execbuf.rsvd1=ctx_id;execbuf.flags=0;// Submits to index 0 which is the virtual enginegem_execbuf(drm_fd,&execbuf);
- structi915_context_engines_parallel_submit¶
Configure engine for parallel submission.
Definition:
struct i915_context_engines_parallel_submit { struct i915_user_extension base; __u16 engine_index; __u16 width; __u16 num_siblings; __u16 mbz16; __u64 flags; __u64 mbz64[3]; struct i915_engine_class_instance engines[];};Members
basebase user extension.
engine_indexslot for parallel engine
widthnumber of contexts per parallel engine or in other words thenumber of batches in each submission
num_siblingsnumber of siblings per context or in other words thenumber of possible placements for each submission
mbz16reserved for future use; must be zero
flagsall undefined flags must be zero, currently not defined flags
mbz64reserved for future use; must be zero
engines2-d array of engine instances to configure parallel engine
length = width (i) * num_siblings (j)index = j + i * num_siblings
Description
Setup a slot in the context engine map to allow multiple BBs to be submittedin a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPUin parallel. Multiple hardware contexts are created internally in the i915 torun these BBs. Once a slot is configured for N BBs only N BBs can besubmitted in each execbuf IOCTL and this is implicit behavior e.g. The userdoesn’t tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows howmany BBs there are based on the slot’s configuration. The N BBs are the lastN buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
The default placement behavior is to create implicit bonds between eachcontext if each context maps to more than 1 physical engine (e.g. context isa virtual engine). Also we only allow contexts of same engine class and thesecontexts must be in logically contiguous order. Examples of the placementbehavior are described below. Lastly, the default is to not allow BBs to bepreempted mid-batch. Rather insert coordinated preemption points on allhardware contexts between each set of BBs. Flags could be added in the futureto change both of these default behaviors.
Returns -EINVAL if hardware context placement configuration is invalid or ifthe placement configuration isn’t supported on the platform / submissioninterface.Returns -ENODEV if extension isn’t supported on the platform / submissioninterface.
Examples syntax:CS[X] = generic engine of same class, logical instance XINVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONEExample 1 pseudo code:set_engines(INVALID)set_parallel(engine_index=0, width=2, num_siblings=1, engines=CS[0],CS[1])Results in the following valid placement:CS[0], CS[1]Example 2 pseudo code:set_engines(INVALID)set_parallel(engine_index=0, width=2, num_siblings=2, engines=CS[0],CS[2],CS[1],CS[3])Results in the following valid placements:CS[0], CS[1]CS[2], CS[3]This can be thought of as two virtual engines, each containing twoengines thereby making a 2D array. However, there are bonds tying theentries together and placing restrictions on how they can be scheduled.Specifically, the scheduler can choose only vertical columns from the 2Darray. That is, CS[0] is bonded to CS[1] and CS[2] to CS[3]. So if thescheduler wants to submit to CS[0], it must also choose CS[1] and viceversa. Same for CS[2] requires also using CS[3].VE[0] = CS[0], CS[2]VE[1] = CS[1], CS[3]Example 3 pseudo code:set_engines(INVALID)set_parallel(engine_index=0, width=2, num_siblings=2, engines=CS[0],CS[1],CS[1],CS[3])Results in the following valid and invalid placements:CS[0], CS[1]CS[1], CS[3] - Not logically contiguous, return -EINVAL
Context Engine Map uAPI
Context engine map is a new way of addressing engines when submitting batch-buffers, replacing the existing way of using identifiers likeI915_EXEC_BLTinside the flags field ofstructdrm_i915_gem_execbuffer2.
To use it created GEM contexts need to be configured with a list of enginesthe user is intending to submit to. This is accomplished using theI915_CONTEXT_PARAM_ENGINES parameter andstructi915_context_param_engines.
For such contexts theI915_EXEC_RING_MASK field becomes an index into theconfigured map.
Example of creating such context and submitting against it:
I915_DEFINE_CONTEXT_PARAM_ENGINES(engines,2)={.engines={{I915_ENGINE_CLASS_RENDER,0},{I915_ENGINE_CLASS_COPY,0}}};structdrm_i915_gem_context_create_ext_setparamp_engines={.base={.name=I915_CONTEXT_CREATE_EXT_SETPARAM,},.param={.param=I915_CONTEXT_PARAM_ENGINES,.value=to_user_pointer(&engines),.size=sizeof(engines),},};structdrm_i915_gem_context_create_extcreate={.flags=I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,.extensions=to_user_pointer(&p_engines);};ctx_id=gem_context_create_ext(drm_fd,&create);// We have now created a GEM context with two engines in the map:// Index 0 points to rcs0 while index 1 points to bcs0. Other engines// will not be accessible from this context....execbuf.rsvd1=ctx_id;execbuf.flags=0;// Submits to index 0, which is rcs0 for this contextgem_execbuf(drm_fd,&execbuf);...execbuf.rsvd1=ctx_id;execbuf.flags=1;// Submits to index 0, which is bcs0 for this contextgem_execbuf(drm_fd,&execbuf);
- structdrm_i915_gem_context_create_ext_setparam¶
Context parameter to set or query during context creation.
Definition:
struct drm_i915_gem_context_create_ext_setparam { struct i915_user_extension base; struct drm_i915_gem_context_param param;};Members
baseExtension link. See
structi915_user_extension.paramContext parameter to set or query.See
structdrm_i915_gem_context_param.
- structdrm_i915_gem_vm_control¶
Structure to create or destroy VM.
Definition:
struct drm_i915_gem_vm_control { __u64 extensions; __u32 flags; __u32 vm_id;};Members
extensionsZero-terminated chain of extensions.
flagsreserved for future usage, currently MBZ
vm_idId of the VM created or to be destroyed
Description
DRM_I915_GEM_VM_CREATE -
Create a new virtual memory address space (ppGTT) for use within a contexton the same file. Extensions can be provided to configure exactly how theaddress space is setup upon creation.
The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM isreturned in the outparamid.
An extension chain maybe provided, starting withextensions, and terminatedby thenext_extension being 0. Currently, no extensions are defined.
DRM_I915_GEM_VM_DESTROY -
Destroys a previously created VM id, specified invm_id.
No extensions or flags are allowed currently, and so must be zero.
- structdrm_i915_gem_userptr¶
Create GEM object from user allocated memory.
Definition:
struct drm_i915_gem_userptr { __u64 user_ptr; __u64 user_size; __u32 flags;#define I915_USERPTR_READ_ONLY 0x1;#define I915_USERPTR_PROBE 0x2;#define I915_USERPTR_UNSYNCHRONIZED 0x80000000; __u32 handle;};Members
user_ptrThe pointer to the allocated memory.
Needs to be aligned to PAGE_SIZE.
user_sizeThe size in bytes for the allocated memory. This will also become theobject size.
Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE,or larger.
flagsSupported flags:
I915_USERPTR_READ_ONLY:
Mark the object as readonly, this also means GPU access can only bereadonly. This is only supported on HW which supports readonly accessthrough the GTT. If the HW can’t support readonly access, an error isreturned.
I915_USERPTR_PROBE:
Probe the provideduser_ptr range and validate that theuser_ptr isindeed pointing to normal memory and that the range is also valid.For example if some garbage address is given to the kernel, then thisshould complain.
Returns -EFAULT if the probe failed.
Note that this doesn’t populate the backing pages, and also doesn’tguarantee that the object will remain valid when the object iseventually used.
The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBEreturns a non-zero value.
I915_USERPTR_UNSYNCHRONIZED:
NOT USED. Setting this flag will result in an error.
handleReturned handle for the object.
Object handles are nonzero.
Description
Userptr objects have several restrictions on what ioctls can be used with theobject handle.
- structdrm_i915_perf_oa_config¶
Definition:
struct drm_i915_perf_oa_config { char uuid[36]; __u32 n_mux_regs; __u32 n_boolean_regs; __u32 n_flex_regs; __u64 mux_regs_ptr; __u64 boolean_regs_ptr; __u64 flex_regs_ptr;};Members
uuidString formatted like “%08x-%04x-%04x-%04x-%012x”
n_mux_regsNumber of mux regs in
mux_regs_ptr.n_boolean_regsNumber of boolean regs in
boolean_regs_ptr.n_flex_regsNumber of flex regs in
flex_regs_ptr.mux_regs_ptrPointer to tuples of u32 values (register address, value) for muxregisters. Expected length of buffer is (2 * sizeof(u32) *
n_mux_regs).boolean_regs_ptrPointer to tuples of u32 values (register address, value) for muxregisters. Expected length of buffer is (2 * sizeof(u32) *
n_boolean_regs).flex_regs_ptrPointer to tuples of u32 values (register address, value) for muxregisters. Expected length of buffer is (2 * sizeof(u32) *
n_flex_regs).
Description
Structure to upload perf dynamic configuration into the kernel.
- structdrm_i915_query_item¶
An individual query for the kernel to process.
Definition:
struct drm_i915_query_item { __u64 query_id;#define DRM_I915_QUERY_TOPOLOGY_INFO 1;#define DRM_I915_QUERY_ENGINE_INFO 2;#define DRM_I915_QUERY_PERF_CONFIG 3;#define DRM_I915_QUERY_MEMORY_REGIONS 4;#define DRM_I915_QUERY_HWCONFIG_BLOB 5;#define DRM_I915_QUERY_GEOMETRY_SUBSLICES 6;#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION 7; __s32 length; __u32 flags;#define DRM_I915_QUERY_PERF_CONFIG_LIST 1;#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2;#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID 3; __u64 data_ptr;};Members
query_id- The id for this query. Currently accepted query IDs are:
DRM_I915_QUERY_TOPOLOGY_INFO(seestructdrm_i915_query_topology_info)DRM_I915_QUERY_ENGINE_INFO(seestructdrm_i915_engine_info)DRM_I915_QUERY_PERF_CONFIG(seestructdrm_i915_query_perf_config)DRM_I915_QUERY_MEMORY_REGIONS(seestructdrm_i915_query_memory_regions)DRM_I915_QUERY_HWCONFIG_BLOB(seeGuC HWCONFIG blob uAPI)DRM_I915_QUERY_GEOMETRY_SUBSLICES(seestructdrm_i915_query_topology_info)DRM_I915_QUERY_GUC_SUBMISSION_VERSION(seestructdrm_i915_query_guc_submission_version)
lengthWhen set to zero by userspace, this is filled with the size of thedata to be written at thedata_ptr pointer. The kernel sets thisvalue to a negative value to signal an error on a particular queryitem.
flagsWhen
query_id==DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.When
query_id==DRM_I915_QUERY_PERF_CONFIG, must be one of thefollowing:DRM_I915_QUERY_PERF_CONFIG_LISTDRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUIDDRM_I915_QUERY_PERF_CONFIG_FOR_UUID
When
query_id==DRM_I915_QUERY_GEOMETRY_SUBSLICESmust containastructi915_engine_class_instancethat references a render engine.data_ptrData will be written at the location pointed bydata_ptr when thevalue oflength matches the length of the data to be written by thekernel.
Description
The behaviour is determined by thequery_id. Note that exactly whatdata_ptr is also depends on the specificquery_id.
- structdrm_i915_query¶
Supply an array of
structdrm_i915_query_itemfor the kernel to fill out.
Definition:
struct drm_i915_query { __u32 num_items; __u32 flags; __u64 items_ptr;};Members
num_itemsThe number of elements in theitems_ptr array
flagsUnused for now. Must be cleared to zero.
items_ptrPointer to an array of
structdrm_i915_query_item. The number ofarray elements isnum_items.
Description
Note that this is generally a two step process for eachstructdrm_i915_query_item in the array:
Call the DRM_IOCTL_I915_QUERY, giving it our array of
structdrm_i915_query_item, withdrm_i915_query_item.lengthset to zero. Thekernel will then fill in the size, in bytes, which tells userspace howmemory it needs to allocate for the blob(say for an array of properties).Next we call DRM_IOCTL_I915_QUERY again, this time with the
drm_i915_query_item.data_ptrequal to our newly allocated blob. Note thatthedrm_i915_query_item.lengthshould still be the same as what thekernel previously set. At this point the kernel can fill in the blob.
Note that for some query items it can make sense for userspace to just passin a buffer/blob equal to or larger than the required size. In this case onlya single ioctl call is needed. For some smaller query items this can workquite well.
- structdrm_i915_query_topology_info¶
Definition:
struct drm_i915_query_topology_info { __u16 flags; __u16 max_slices; __u16 max_subslices; __u16 max_eus_per_subslice; __u16 subslice_offset; __u16 subslice_stride; __u16 eu_offset; __u16 eu_stride; __u8 data[];};Members
flagsUnused for now. Must be cleared to zero.
max_slicesThe number of bits used to express the slice mask.
max_subslicesThe number of bits used to express the subslice mask.
max_eus_per_subsliceThe number of bits in the EU mask that correspond to a singlesubslice’s EUs.
subslice_offsetOffset in data[] at which the subslice masks are stored.
subslice_strideStride at which each of the subslice masks for each slice arestored.
eu_offsetOffset in data[] at which the EU masks are stored.
eu_strideStride at which each of the EU masks for each subslice are stored.
dataContains 3 pieces of information :
The slice mask with one bit per slice telling whether a slice isavailable. The availability of slice X can be queried with thefollowing formula :
(data[X/8]>>(X%8))&1
Starting with Xe_HP platforms, Intel hardware no longer hastraditional slices so i915 will always report a single slice(hardcoded slicemask = 0x1) which contains all of the platform’ssubslices. I.e., the mask here does not reflect any of the newerhardware concepts such as “gslices” or “cslices” since userspaceis capable of inferring those from the subslice mask.
The subslice mask for each slice with one bit per subslice tellingwhether a subslice is available. Starting with Gen12 we use theterm “subslice” to refer to what the hardware documentationdescribes as a “dual-subslices.” The availability of subslice Yin slice X can be queried with the following formula :
(data[subslice_offset+X*subslice_stride+Y/8]>>(Y%8))&1
The EU mask for each subslice in each slice, with one bit per EUtelling whether an EU is available. The availability of EU Z insubslice Y in slice X can be queried with the following formula :
(data[eu_offset+(X*max_subslices+Y)*eu_stride+Z/8]>>(Z%8))&1
Description
Describes slice/subslice/EU information queried byDRM_I915_QUERY_TOPOLOGY_INFO
Engine Discovery uAPI
Engine discovery uAPI is a way of enumerating physical engines present in aGPU associated with an open i915 DRM file descriptor. This supersedes the oldway of usingDRM_IOCTL_I915_GETPARAM and engine identifiers likeI915_PARAM_HAS_BLT.
The need for this interface came starting with Icelake and newer GPUs, whichstarted to establish a pattern of having multiple engines of a same class,where not all instances were always completely functionally equivalent.
Entry point for this uapi isDRM_IOCTL_I915_QUERY with theDRM_I915_QUERY_ENGINE_INFO as the queried item id.
Example for getting the list of engines:
structdrm_i915_query_engine_info*info;structdrm_i915_query_itemitem={.query_id=DRM_I915_QUERY_ENGINE_INFO;};structdrm_i915_queryquery={.num_items=1,.items_ptr=(uintptr_t)&item,};interr,i;// First query the size of the blob we need, this needs to be large// enough to hold our array of engines. The kernel will fill out the// item.length for us, which is the number of bytes we need.//// Alternatively a large buffer can be allocated straightaway enabling// querying in one pass, in which case item.length should contain the// length of the provided buffer.err=ioctl(fd,DRM_IOCTL_I915_QUERY,&query);if(err)...info=calloc(1,item.length);// Now that we allocated the required number of bytes, we call the ioctl// again, this time with the data_ptr pointing to our newly allocated// blob, which the kernel can then populate with info on all engines.item.data_ptr=(uintptr_t)&info;err=ioctl(fd,DRM_IOCTL_I915_QUERY,&query);if(err)...// We can now access each engine in the arrayfor(i=0;i<info->num_engines;i++){structdrm_i915_engine_infoeinfo=info->engines[i];u16class=einfo.engine.class;u16instance=einfo.engine.instance;....}free(info);
Each of the enumerated engines, apart from being defined by its class andinstance (seestructi915_engine_class_instance), also can have flags andcapabilities defined as documented in i915_drm.h.
For instance video engines which support HEVC encoding will have theI915_VIDEO_CLASS_CAPABILITY_HEVC capability bit set.
Engine discovery only fully comes to its own when combined with the new wayof addressing engines when submitting batch buffers using contexts withengine maps configured.
- structdrm_i915_engine_info¶
Definition:
struct drm_i915_engine_info { struct i915_engine_class_instance engine; __u32 rsvd0; __u64 flags;#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE (1 << 0); __u64 capabilities;#define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0);#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC (1 << 1); __u16 logical_instance; __u16 rsvd1[3]; __u64 rsvd2[3];};Members
engineEngine class and instance.
rsvd0Reserved field.
flagsEngine flags.
capabilitiesCapabilities of this engine.
logical_instanceLogical instance of engine
rsvd1Reserved fields.
rsvd2Reserved fields.
Description
Describes one engine and its capabilities as known to the driver.
- structdrm_i915_query_engine_info¶
Definition:
struct drm_i915_query_engine_info { __u32 num_engines; __u32 rsvd[3]; struct drm_i915_engine_info engines[];};Members
num_enginesNumber of
structdrm_i915_engine_infostructs following.rsvdMBZ
enginesMarker for drm_i915_engine_info structures.
Description
Engine info query enumerates all engines known to the driver by filling inan array ofstructdrm_i915_engine_info structures.
- structdrm_i915_query_perf_config¶
Definition:
struct drm_i915_query_perf_config { union { __u64 n_configs; __u64 config; char uuid[36]; }; __u32 flags; __u8 data[];};Members
{unnamed_union}anonymous
n_configsWhen
drm_i915_query_item.flags==DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets this fields tothe number of configurations available.configWhen
drm_i915_query_item.flags==DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID, i915 will use thevalue in this field as configuration identifier to decidewhat data to write into config_ptr.uuidWhen
drm_i915_query_item.flags==DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID, i915 will use thevalue in this field as configuration identifier to decidewhat data to write into config_ptr.String formatted like “
08x-````04x-````04x-````04x-````012x”flagsUnused for now. Must be cleared to zero.
dataWhen
drm_i915_query_item.flags==DRM_I915_QUERY_PERF_CONFIG_LIST,i915 will write an array of __u64 of configuration identifiers.When
drm_i915_query_item.flags==DRM_I915_QUERY_PERF_CONFIG_DATA,i915 will write astructdrm_i915_perf_oa_config. If the followingfields ofstructdrm_i915_perf_oa_configare not set to 0, i915 willwrite into the associated pointers the values of submitted when theconfiguration was created :
Description
Data written by the kernel with queryDRM_I915_QUERY_PERF_CONFIG andDRM_I915_QUERY_GEOMETRY_SUBSLICES.
- enumdrm_i915_gem_memory_class¶
Supported memory classes
Constants
I915_MEMORY_CLASS_SYSTEMSystem memory
I915_MEMORY_CLASS_DEVICEDevice local-memory
- structdrm_i915_gem_memory_class_instance¶
Identify particular memory region
Definition:
struct drm_i915_gem_memory_class_instance { __u16 memory_class; __u16 memory_instance;};Members
memory_classmemory_instanceWhich instance
- structdrm_i915_memory_region_info¶
Describes one region as known to the driver.
Definition:
struct drm_i915_memory_region_info { struct drm_i915_gem_memory_class_instance region; __u32 rsvd0; __u64 probed_size; __u64 unallocated_size; union { __u64 rsvd1[8]; struct { __u64 probed_cpu_visible_size; __u64 unallocated_cpu_visible_size; }; };};Members
regionThe class:instance pair encoding
rsvd0MBZ
probed_sizeMemory probed by the driver
Note that it should not be possible to ever encounter a zero valuehere, also note that no current region type will ever return -1 here.Although for future region types, this might be a possibility. Thesame applies to the other size fields.
unallocated_sizeEstimate of memory remaining
Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.Without this (or if this is an older kernel) the value here willalways equal theprobed_size. Note this is only currently trackedfor I915_MEMORY_CLASS_DEVICE regions (for other types the value herewill always equal theprobed_size).
{unnamed_union}anonymous
rsvd1MBZ
{unnamed_struct}anonymous
probed_cpu_visible_sizeMemory probed by the driverthat is CPU accessible.
This will be always be <=probed_size, and theremainder (if there is any) will not be CPUaccessible.
On systems without small BAR, theprobed_size willalways equal theprobed_cpu_visible_size, since allof it will be CPU accessible.
Note this is only tracked forI915_MEMORY_CLASS_DEVICE regions (for other types thevalue here will always equal theprobed_size).
Note that if the value returned here is zero, thenthis must be an old kernel which lacks the relevantsmall-bar uAPI support (includingI915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but onsuch systems we should never actually end up with asmall BAR configuration, assuming we are able to loadthe kernel module. Hence it should be safe to treatthis the same as whenprobed_cpu_visible_size ==probed_size.
unallocated_cpu_visible_sizeEstimate of CPUvisible memory remaining.
Note this is only tracked forI915_MEMORY_CLASS_DEVICE regions (for other types thevalue here will always equal theprobed_cpu_visible_size).
Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliableaccounting. Without this the value here will alwaysequal theprobed_cpu_visible_size. Note this is onlycurrently tracked for I915_MEMORY_CLASS_DEVICEregions (for other types the value here will alsoalways equal theprobed_cpu_visible_size).
If this is an older kernel the value here will bezero, see alsoprobed_cpu_visible_size.
Description
Note this is using bothstructdrm_i915_query_item andstructdrm_i915_query.For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONSatdrm_i915_query_item.query_id.
- structdrm_i915_query_memory_regions¶
Definition:
struct drm_i915_query_memory_regions { __u32 num_regions; __u32 rsvd[3]; struct drm_i915_memory_region_info regions[];};Members
num_regionsNumber of supported regions
rsvdMBZ
regionsInfo about each supported region
Description
The region info query enumerates all regions known to the driver by fillingin an array ofstructdrm_i915_memory_region_info structures.
Example for getting the list of supported regions:
structdrm_i915_query_memory_regions*info;structdrm_i915_query_itemitem={.query_id=DRM_I915_QUERY_MEMORY_REGIONS;};structdrm_i915_queryquery={.num_items=1,.items_ptr=(uintptr_t)&item,};interr,i;// First query the size of the blob we need, this needs to be large// enough to hold our array of regions. The kernel will fill out the// item.length for us, which is the number of bytes we need.err=ioctl(fd,DRM_IOCTL_I915_QUERY,&query);if(err)...info=calloc(1,item.length);// Now that we allocated the required number of bytes, we call the ioctl// again, this time with the data_ptr pointing to our newly allocated// blob, which the kernel can then populate with the all the region info.item.data_ptr=(uintptr_t)&info,err=ioctl(fd,DRM_IOCTL_I915_QUERY,&query);if(err)...// We can now access each region in the arrayfor(i=0;i<info->num_regions;i++){structdrm_i915_memory_region_infomr=info->regions[i];u16class=mr.region.class;u16instance=mr.region.instance;....}free(info);
- structdrm_i915_query_guc_submission_version¶
query GuC submission interface version
Definition:
struct drm_i915_query_guc_submission_version { __u32 branch; __u32 major; __u32 minor; __u32 patch;};Members
branchFirmware branch version.
majorFirmware major version.
minorFirmware minor version.
patchFirmware patch version.
GuC HWCONFIG blob uAPI
The GuC produces a blob with information about the current device.i915 reads this blob from GuC and makes it available via this uAPI.
The format and meaning of the blob content are documented in theProgrammer’s Reference Manual.
- structdrm_i915_gem_create_ext¶
Existing gem_create behaviour, with added extension support using
structi915_user_extension.
Definition:
struct drm_i915_gem_create_ext { __u64 size; __u32 handle;#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0); __u32 flags;#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0;#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1;#define I915_GEM_CREATE_EXT_SET_PAT 2; __u64 extensions;};Members
sizeRequested size for the object.
The (page-aligned) allocated size for the object will be returned.
On platforms like DG2/ATS the kernel will always use 64K or largerpages for I915_MEMORY_CLASS_DEVICE. The kernel also requires aminimum of 64K GTT alignment for such objects.
NOTE: Previously the ABI here required a minimum GTT alignment of 2Mon DG2/ATS, due to how the hardware implemented 64K GTT page support,where we had the following complications:
1) The entire PDE (which covers a 2MB virtual address range), mustcontain only 64K PTEs, i.e mixing 4K and 64K PTEs in the samePDE is forbidden by the hardware.
2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEMobjects.
However on actual production HW this was completely changed to nowallow setting a TLB hint at the PTE level (see PS64), which is a lotmore flexible than the above. With this the 2M restriction wasdropped where we now only require 64K.
handleReturned handle for the object.
Object handles are nonzero.
flagsOptional flags.
Supported values:
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel thatthe object will need to be accessed via the CPU.
Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and onlystrictly required on configurations where some subset of the devicememory is directly visible/mappable through the CPU (which we alsocall small BAR), like on some DG2+ systems. Note that this is quiteundesirable, but due to various factors like the client CPU, BIOS etcit’s something we can expect to see in the wild. See
drm_i915_memory_region_info.probed_cpu_visible_sizefor how todetermine if this system applies.Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, toensure the kernel can always spill the allocation to system memory,if the object can’t be allocated in the mappable part ofI915_MEMORY_CLASS_DEVICE.
Also note that since the kernel only supports flat-CCS on objectsthat canonly be placed in I915_MEMORY_CLASS_DEVICE, we thereforedon’t support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together withflat-CCS.
Without this hint, the kernel will assume that non-mappableI915_MEMORY_CLASS_DEVICE is preferred for this object. Note that thekernel can still migrate the object to the mappable part, as a lastresort, if userspace ever CPU faults this object, but this might beexpensive, and so ideally should be avoided.
On older kernels which lack the relevant small-bar uAPI support (seealso
drm_i915_memory_region_info.probed_cpu_visible_size),usage of the flag will result in an error, but it should NEVER bepossible to end up with a small BAR configuration, assuming we canalso successfully load the i915 kernel module. In such cases theentire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and assuch there are zero restrictions on where the object can be placed.extensionsThe chain of extensions to apply to this object.
This will be useful in the future when we need to support severaldifferent extensions, and we need to apply more than one whencreating the object. See
structi915_user_extension.If we don’t supply any extensions then we get the same old gem_createbehaviour.
For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
structdrm_i915_gem_create_ext_memory_regions.For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
structdrm_i915_gem_create_ext_protected_content.For I915_GEM_CREATE_EXT_SET_PAT usage see
structdrm_i915_gem_create_ext_set_pat.
Description
Note that new buffer flags should be added here, at least for the stuff thatis immutable. Previously we would have two ioctls, one to create the objectwith gem_create, and another to apply various parameters, however thiscreates some ambiguity for the params which are considered immutable. Also ingeneral we’re phasing out the various SET/GET ioctls.
- structdrm_i915_gem_create_ext_memory_regions¶
The I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
Definition:
struct drm_i915_gem_create_ext_memory_regions { struct i915_user_extension base; __u32 pad; __u32 num_regions; __u64 regions;};Members
baseExtension link. See
structi915_user_extension.padMBZ
num_regionsNumber of elements in theregions array.
regionsThe regions/placements array.
An array of
structdrm_i915_gem_memory_class_instance.
Description
Set the object with the desired set of placements/regions in priorityorder. Each entry must be unique and supported by the device.
This is provided as an array ofstructdrm_i915_gem_memory_class_instance, oran equivalent layout of class:instance pair encodings. Seestructdrm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how toquery the supported regions for a device.
As an example, on discrete devices, if we wish to set the placement asdevice local-memory we can do something like:
structdrm_i915_gem_memory_class_instanceregion_lmem={.memory_class=I915_MEMORY_CLASS_DEVICE,.memory_instance=0,};structdrm_i915_gem_create_ext_memory_regionsregions={.base={.name=I915_GEM_CREATE_EXT_MEMORY_REGIONS},.regions=(uintptr_t)®ion_lmem,.num_regions=1,};structdrm_i915_gem_create_extcreate_ext={.size=16*PAGE_SIZE,.extensions=(uintptr_t)®ions,};interr=ioctl(fd,DRM_IOCTL_I915_GEM_CREATE_EXT,&create_ext);if(err)...
At which point we get the object handle indrm_i915_gem_create_ext.handle,along with the final object size indrm_i915_gem_create_ext.size, whichshould account for any rounding up, if required.
Note that userspace has no means of knowing the current backing regionfor objects wherenum_regions is larger than one. The kernel will onlyensure that the priority order of theregions array is honoured, eitherwhen initially placing the object, or when moving memory around due tomemory pressure
On Flat-CCS capable HW, compression is supported for the objects residingin I915_MEMORY_CLASS_DEVICE. When such objects (compressed) have othermemory class inregions and migrated (by i915, due to memoryconstraints) to the non I915_MEMORY_CLASS_DEVICE region, then i915 needs todecompress the content. But i915 doesn’t have the required information todecompress the userspace compressed objects.
So i915 supports Flat-CCS, on the objects which can reside only onI915_MEMORY_CLASS_DEVICE regions.
- structdrm_i915_gem_create_ext_protected_content¶
The I915_OBJECT_PARAM_PROTECTED_CONTENT extension.
Definition:
struct drm_i915_gem_create_ext_protected_content { struct i915_user_extension base; __u32 flags;};Members
baseExtension link. See
structi915_user_extension.flagsreserved for future usage, currently MBZ
Description
If this extension is provided, buffer contents are expected to be protectedby PXP encryption and require decryption for scan out and processing. Thisis only possible on platforms that have PXP enabled, on all other scenariosusing this extension will cause the ioctl to fail and return -ENODEV. Theflags parameter is reserved for future expansion and must currently be setto zero.
The buffer contents are considered invalid after a PXP session teardown.
The encryption is guaranteed to be processed correctly only if the objectis submitted with a context created using theI915_CONTEXT_PARAM_PROTECTED_CONTENT flag. This will also enable extra checksat submission time on the validity of the objects involved.
Below is an example on how to create a protected object:
structdrm_i915_gem_create_ext_protected_contentprotected_ext={.base={.name=I915_GEM_CREATE_EXT_PROTECTED_CONTENT},.flags=0,};structdrm_i915_gem_create_extcreate_ext={.size=PAGE_SIZE,.extensions=(uintptr_t)&protected_ext,};interr=ioctl(fd,DRM_IOCTL_I915_GEM_CREATE_EXT,&create_ext);if(err)...
- structdrm_i915_gem_create_ext_set_pat¶
The I915_GEM_CREATE_EXT_SET_PAT extension.
Definition:
struct drm_i915_gem_create_ext_set_pat { struct i915_user_extension base; __u32 pat_index; __u32 rsvd;};Members
baseExtension link. See
structi915_user_extension.pat_indexPAT index to be setPAT index is a bit field in Page Table Entry to control cachingbehaviors for GPU accesses. The definition of PAT index isplatform dependent and can be found in hardware specifications,
rsvdreserved for future use
Description
If this extension is provided, the specified caching policy (PAT index) isapplied to the buffer object.
Below is an example on how to create an object with specific caching policy:
structdrm_i915_gem_create_ext_set_patset_pat_ext={.base={.name=I915_GEM_CREATE_EXT_SET_PAT},.pat_index=0,};structdrm_i915_gem_create_extcreate_ext={.size=PAGE_SIZE,.extensions=(uintptr_t)&set_pat_ext,};interr=ioctl(fd,DRM_IOCTL_I915_GEM_CREATE_EXT,&create_ext);if(err)...
drm/nouveau uAPI¶
VM_BIND / EXEC uAPI¶
Nouveau’s VM_BIND / EXEC UAPI consists of three ioctls: DRM_NOUVEAU_VM_INIT,DRM_NOUVEAU_VM_BIND and DRM_NOUVEAU_EXEC.
In order to use the UAPI firstly a user client must initialize the VA spaceusing the DRM_NOUVEAU_VM_INIT ioctl specifying which region of the VA spaceshould be managed by the kernel and which by the UMD.
The DRM_NOUVEAU_VM_BIND ioctl provides clients an interface to manage theuserspace-managable portion of the VA space. It provides operations to mapand unmap memory. Mappings may be flagged as sparse. Sparse mappings are notbacked by a GEM object and the kernel will ignore GEM handles providedalongside a sparse mapping.
Userspace may request memory backed mappings either within or outside of thebounds (but not crossing those bounds) of a previously mapped sparsemapping. Subsequently requested memory backed mappings within a sparsemapping will take precedence over the corresponding range of the sparsemapping. If such memory backed mappings are unmapped the kernel will makesure that the corresponding sparse mapping will take their place again.Requests to unmap a sparse mapping that still contains memory backed mappingswill result in those memory backed mappings being unmapped first.
Unmap requests are not bound to the range of existing mappings and can evenoverlap the bounds of sparse mappings. For such a request the kernel willmake sure to unmap all memory backed mappings within the given range,splitting up memory backed mappings which are only partially containedwithin the given range. Unmap requests with the sparse flag set must matchthe range of a previously mapped sparse mapping exactly though.
While the kernel generally permits arbitrary sequences and ranges of memorybacked mappings being mapped and unmapped, either within a single or multipleVM_BIND ioctl calls, there are some restrictions for sparse mappings.
- The kernel does not permit to:
unmap non-existent sparse mappings
unmap a sparse mapping and map a new sparse mapping overlapping the rangeof the previously unmapped sparse mapping within the same VM_BIND ioctl
unmap a sparse mapping and map new memory backed mappings overlapping therange of the previously unmapped sparse mapping within the same VM_BINDioctl
When using the VM_BIND ioctl to request the kernel to map memory to a givenvirtual address in the GPU’s VA space there is no guarantee that the actualmappings are created in the GPU’s MMU. If the given memory is swapped outat the time the bind operation is executed the kernel will stash the mappingdetails into it’s internal allocator and create the actual MMU mappings oncethe memory is swapped back in. While this is transparent for userspace, it isguaranteed that all the backing memory is swapped back in and all the memorymappings, as requested by userspace previously, are actually mapped once theDRM_NOUVEAU_EXEC ioctl is called to submit an exec job.
A VM_BIND job can be executed either synchronously or asynchronously. Ifexecuted asynchronously, userspace may provide a list of syncobjs this jobwill wait for and/or a list of syncobj the kernel will signal once theVM_BIND job finished execution. If executed synchronously the ioctl willblock until the bind job is finished. For synchronous jobs the kernel willnot permit any syncobjs submitted to the kernel.
To execute a push buffer the UAPI provides the DRM_NOUVEAU_EXEC ioctl. EXECjobs are always executed asynchronously, and, equal to VM_BIND jobs, providethe option to synchronize them with syncobjs.
Besides that, EXEC jobs can be scheduled for a specified channel to execute on.
Since VM_BIND jobs update the GPU’s VA space on job submit, EXEC jobs do havean up to date view of the VA space. However, the actual mappings might stillbe pending. Hence, EXEC jobs require to have the particular fences - ofthe corresponding VM_BIND jobs they depend on - attached to them.
- structdrm_nouveau_sync¶
sync object
Definition:
struct drm_nouveau_sync { __u32 flags;#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0;#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1;#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf; __u32 handle; __u64 timeline_value;};Members
flagsthe flags for a sync object
The first 8 bits are used to determine the type of the sync object.
handlethe handle of the sync object
timeline_valueThe timeline point of the sync object in case the syncobj is oftype DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
Description
This structure serves as synchronization mechanism for (potentially)asynchronous operations such as EXEC or VM_BIND.
- structdrm_nouveau_vm_init¶
GPU VA space init structure
Definition:
struct drm_nouveau_vm_init { __u64 kernel_managed_addr; __u64 kernel_managed_size;};Members
kernel_managed_addrstart address of the kernel managed VA spaceregion
kernel_managed_sizesize of the kernel managed VA space region inbytes
Description
Used to initialize the GPU’s VA space for a user client, telling the kernelwhich portion of the VA space is managed by the UMD and kernel respectively.
For the UMD to use the VM_BIND uAPI, this must be called before any BOs orchannels are created; if called afterwards DRM_IOCTL_NOUVEAU_VM_INIT failswith -ENOSYS.
- structdrm_nouveau_vm_bind_op¶
VM_BIND operation
Definition:
struct drm_nouveau_vm_bind_op { __u32 op;#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x0;#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x1; __u32 flags;#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8); __u32 handle; __u32 pad; __u64 addr; __u64 bo_offset; __u64 range;};Members
opthe operation type
Supported values:
DRM_NOUVEAU_VM_BIND_OP_MAP- Map a GEM object to the GPU’s VAspace. Optionally, theDRM_NOUVEAU_VM_BIND_SPARSEflag can bepassed to instruct the kernel to create sparse mappings for thegiven range.DRM_NOUVEAU_VM_BIND_OP_UNMAP- Unmap an existing mapping in theGPU’s VA space. If the region the mapping is located in is asparse region, new sparse mappings are created where the unmapped(memory backed) mapping was mapped previously. To remove a sparseregion theDRM_NOUVEAU_VM_BIND_SPARSEmust be set.flagsthe flags for a
drm_nouveau_vm_bind_opSupported values:
DRM_NOUVEAU_VM_BIND_SPARSE- Indicates that an allocated VAspace region should be sparse.handlethe handle of the DRM GEM object to map
pad32 bit padding, should be 0
addrthe address the VA space region or (memory backed) mapping should be mapped to
bo_offsetthe offset within the BO backing the mapping
rangethe size of the requested mapping in bytes
Description
This structure represents a single VM_BIND operation. UMDs should passan array of this structure viastructdrm_nouveau_vm_bind’sop_ptr field.
- structdrm_nouveau_vm_bind¶
structure for DRM_IOCTL_NOUVEAU_VM_BIND
Definition:
struct drm_nouveau_vm_bind { __u32 op_count; __u32 flags;#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1; __u32 wait_count; __u32 sig_count; __u64 wait_ptr; __u64 sig_ptr; __u64 op_ptr;};Members
op_countthe number of
drm_nouveau_vm_bind_opflagsthe flags for a
drm_nouveau_vm_bindioctlSupported values:
DRM_NOUVEAU_VM_BIND_RUN_ASYNC- Indicates that the given VM_BINDoperation should be executed asynchronously by the kernel.If this flag is not supplied the kernel executes the associatedoperations synchronously and doesn’t accept any
drm_nouveau_syncobjects.wait_countthe number of wait
drm_nouveau_syncssig_countthe number of
drm_nouveau_syncsto signal when finishedwait_ptrpointer to
drm_nouveau_syncsto wait forsig_ptrpointer to
drm_nouveau_syncsto signal when finishedop_ptrpointer to the
drm_nouveau_vm_bind_opsto execute
- structdrm_nouveau_exec_push¶
EXEC push operation
Definition:
struct drm_nouveau_exec_push { __u64 va; __u32 va_len; __u32 flags;#define DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH 0x1;};Members
vathe virtual address of the push buffer mapping
va_lenthe length of the push buffer mapping
flagsthe flags for this push buffer mapping
Description
This structure represents a single EXEC push operation. UMDs should pass anarray of this structure viastructdrm_nouveau_exec’spush_ptr field.
- structdrm_nouveau_exec¶
structure for DRM_IOCTL_NOUVEAU_EXEC
Definition:
struct drm_nouveau_exec { __u32 channel; __u32 push_count; __u32 wait_count; __u32 sig_count; __u64 wait_ptr; __u64 sig_ptr; __u64 push_ptr;};Members
channelthe channel to execute the push buffer in
push_countthe number of
drm_nouveau_exec_pushopswait_countthe number of wait
drm_nouveau_syncssig_countthe number of
drm_nouveau_syncsto signal when finishedwait_ptrpointer to
drm_nouveau_syncsto wait forsig_ptrpointer to
drm_nouveau_syncsto signal when finishedpush_ptrpointer to
drm_nouveau_exec_pushops
drm/panthor uAPI¶
Introduction
This documentation describes the Panthor IOCTLs.
Just a few generic rules about the data passed to the Panthor IOCTLs:
Structures must be aligned on 64-bit/8-byte. If the object is notnaturally aligned, a padding field must be added.
Fields must be explicitly aligned to their natural type alignment withpad[0..N] fields.
All padding fields will be checked by the driver to make sure they arezeroed.
Flags can be added, but not removed/replaced.
New fields can be added to the main structures (the structuresdirectly passed to the ioctl). Those fields can be added at the end ofthe structure, or replace existing padding fields. Any new field beingadded must preserve the behavior that existed before those fields wereadded when a value of zero is passed.
New fields can be added to indirect objects (objects pointed by themain structure), iff those objects are passed a size to reflect thesize known by the userspace driver (see drm_panthor_obj_array::strideor drm_panthor_dev_query::size).
If the kernel driver is too old to know some fields, those will beignored if zero, and otherwise rejected (and so will be zero on output).
If userspace is too old to know some fields, those will be zeroed(input) before the structure is parsed by the kernel driver.
Each new flag/field addition must come with a driver version update sothe userspace driver doesn’t have to trial and error to know whichflags are supported.
Structures should not contain unions, as this would defeat theextensibility of such structures.
IOCTLs can’t be removed or replaced. New IOCTL IDs should be placedat the end of the drm_panthor_ioctl_id enum.
MMIO regions exposed to userspace.
- DRM_PANTHOR_USER_MMIO_OFFSET¶
File offset for all MMIO regions being exposed to userspace. Don’t usethis value directly, use DRM_PANTHOR_USER_<name>_OFFSET values instead.pgoffset passed tommap2() is an unsigned long, which forces us to use adifferent offset on 32-bit and 64-bit systems.
- DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET¶
File offset for the LATEST_FLUSH_ID register. The Userspace driver controlsGPU cache flushing through CS instructions, but the flush reductionmechanism requires a flush_id. This flush_id could be queried with anioctl, but Arm provides a well-isolated register page containing only thisread-only register, so let’s expose this page through a static mmap offsetand allow direct mapping of this MMIO region so we can avoid theuser <-> kernel round-trip.
IOCTL IDs
enumdrm_panthor_ioctl_id - IOCTL IDs
Place new ioctls at the end, don’t re-order, don’t replace or remove entries.
These IDs are not meant to be used directly. Use the DRM_IOCTL_PANTHOR_xxxdefinitions instead.
IOCTL arguments
- structdrm_panthor_obj_array¶
Object array.
Definition:
struct drm_panthor_obj_array { __u32 stride; __u32 count; __u64 array;};Members
strideStride of object struct. Used for versioning.
countNumber of objects in the array.
arrayUser pointer to an array of objects.
Description
This object is used to pass an array of objects whose size is subject to changes infuture versions of the driver. In order to support this mutability, we pass a stridedescribing the size of the object as known by userspace.
You shouldn’t fill drm_panthor_obj_array fields directly. You should instead usetheDRM_PANTHOR_OBJ_ARRAY() macro that takes care of initializing the stride tothe object size.
- DRM_PANTHOR_OBJ_ARRAY¶
DRM_PANTHOR_OBJ_ARRAY(cnt,ptr)
Initialize a drm_panthor_obj_array field.
Parameters
cntNumber of elements in the array.
ptrPointer to the array to pass to the kernel.
Description
Macro initializing a drm_panthor_obj_array based on the object size as knownby userspace.
- enumdrm_panthor_sync_op_flags¶
Synchronization operation flags.
Constants
DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASKSynchronization handle type mask.
DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJSynchronization object type.
DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJTimeline synchronizationobject type.
DRM_PANTHOR_SYNC_OP_WAITWait operation.
DRM_PANTHOR_SYNC_OP_SIGNALSignal operation.
- structdrm_panthor_sync_op¶
Synchronization operation.
Definition:
struct drm_panthor_sync_op { __u32 flags; __u32 handle; __u64 timeline_value;};Members
flagsSynchronization operation flags. Combination of DRM_PANTHOR_SYNC_OP values.
handleSync handle.
timeline_valueMBZ if(flags & DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK) !=DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ.
- enumdrm_panthor_dev_query_type¶
Query type
Constants
DRM_PANTHOR_DEV_QUERY_GPU_INFOQuery GPU information.
DRM_PANTHOR_DEV_QUERY_CSIF_INFOQuery command-stream interface information.
DRM_PANTHOR_DEV_QUERY_TIMESTAMP_INFOQuery timestamp information.
DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFOQuery allowed group priorities information.
Description
Place new types at the end, don’t re-order, don’t remove or replace.
- enumdrm_panthor_gpu_coherency¶
Type of GPU coherency
Constants
DRM_PANTHOR_GPU_COHERENCY_ACE_LITEACE Lite coherency.
DRM_PANTHOR_GPU_COHERENCY_ACEACE coherency.
DRM_PANTHOR_GPU_COHERENCY_NONENo coherency.
- structdrm_panthor_gpu_info¶
GPU information
Definition:
struct drm_panthor_gpu_info { __u32 gpu_id;#define DRM_PANTHOR_ARCH_MAJOR(x) ((x) >> 28);#define DRM_PANTHOR_ARCH_MINOR(x) (((x) >> 24) & 0xf);#define DRM_PANTHOR_ARCH_REV(x) (((x) >> 20) & 0xf);#define DRM_PANTHOR_PRODUCT_MAJOR(x) (((x) >> 16) & 0xf);#define DRM_PANTHOR_VERSION_MAJOR(x) (((x) >> 12) & 0xf);#define DRM_PANTHOR_VERSION_MINOR(x) (((x) >> 4) & 0xff);#define DRM_PANTHOR_VERSION_STATUS(x) ((x) & 0xf); __u32 gpu_rev; __u32 csf_id;#define DRM_PANTHOR_CSHW_MAJOR(x) (((x) >> 26) & 0x3f);#define DRM_PANTHOR_CSHW_MINOR(x) (((x) >> 20) & 0x3f);#define DRM_PANTHOR_CSHW_REV(x) (((x) >> 16) & 0xf);#define DRM_PANTHOR_MCU_MAJOR(x) (((x) >> 10) & 0x3f);#define DRM_PANTHOR_MCU_MINOR(x) (((x) >> 4) & 0x3f);#define DRM_PANTHOR_MCU_REV(x) ((x) & 0xf); __u32 l2_features; __u32 tiler_features; __u32 mem_features; __u32 mmu_features;#define DRM_PANTHOR_MMU_VA_BITS(x) ((x) & 0xff); __u32 thread_features; __u32 max_threads; __u32 thread_max_workgroup_size; __u32 thread_max_barrier_size; __u32 coherency_features; __u32 texture_features[4]; __u32 as_present; __u32 selected_coherency; __u64 shader_present; __u64 l2_present; __u64 tiler_present; __u32 core_features; __u32 pad; __u64 gpu_features;};Members
gpu_idGPU ID.
gpu_revGPU revision.
csf_idCommand stream frontend ID.
l2_featuresL2-cache features.
tiler_featuresTiler features.
mem_featuresMemory features.
mmu_featuresMMU features.
thread_featuresThread features.
max_threadsMaximum number of threads.
thread_max_workgroup_sizeMaximum workgroup size.
thread_max_barrier_sizeMaximum number of threads that can waitsimultaneously on a barrier.
coherency_featuresCoherency features.
Combination of drm_panthor_gpu_coherency flags.
Note that this is just what the coherency protocols supported by theGPU, but the actual coherency in place depends on the SoCintegration and is reflected bydrm_panthor_gpu_info::selected_coherency.
texture_featuresTexture features.
as_presentBitmask encoding the number of address-space exposed by the MMU.
selected_coherencyCoherency selected for this device.
One of drm_panthor_gpu_coherency.
shader_presentBitmask encoding the shader cores exposed by the GPU.
l2_presentBitmask encoding the L2 caches exposed by the GPU.
tiler_presentBitmask encoding the tiler units exposed by the GPU.
core_featuresUsed to discriminate core variants when they exist.
padMBZ.
gpu_featuresBitmask describing supported GPU-wide features
Description
Structure grouping all queryable information relating to the GPU.
- structdrm_panthor_csif_info¶
Command stream interface information
Definition:
struct drm_panthor_csif_info { __u32 csg_slot_count; __u32 cs_slot_count; __u32 cs_reg_count; __u32 scoreboard_slot_count; __u32 unpreserved_cs_reg_count; __u32 pad;};Members
csg_slot_countNumber of command stream group slots exposed by the firmware.
cs_slot_countNumber of command stream slots per group.
cs_reg_countNumber of command stream registers.
scoreboard_slot_countNumber of scoreboard slots.
unpreserved_cs_reg_countNumber of command stream registers reserved bythe kernel driver to call a userspace command stream.
All registers can be used by a userspace command stream, but the[cs_slot_count - unpreserved_cs_reg_count .. cs_slot_count] registers areused by the kernel when DRM_PANTHOR_IOCTL_GROUP_SUBMIT is called.
padPadding field, set to zero.
Description
Structure grouping all queryable information relating to the command stream interface.
- structdrm_panthor_timestamp_info¶
Timestamp information
Definition:
struct drm_panthor_timestamp_info { __u64 timestamp_frequency; __u64 current_timestamp; __u64 timestamp_offset;};Members
timestamp_frequencyThe frequency of the timestamp timer or 0 ifunknown.
current_timestampThe current timestamp.
timestamp_offsetThe offset of the timestamp timer.
Description
Structure grouping all queryable information relating to the GPU timestamp.
- structdrm_panthor_group_priorities_info¶
Group priorities information
Definition:
struct drm_panthor_group_priorities_info { __u8 allowed_mask; __u8 pad[3];};Members
allowed_maskBitmask of the allowed group priorities.
Each bit represents a variant of the
enumdrm_panthor_group_priority.padPadding fields, MBZ.
Description
Structure grouping all queryable information relating to the allowed group priorities.
- structdrm_panthor_dev_query¶
Arguments passed to DRM_PANTHOR_IOCTL_DEV_QUERY
Definition:
struct drm_panthor_dev_query { __u32 type; __u32 size; __u64 pointer;};Members
typethe query type (see drm_panthor_dev_query_type).
sizesize of the type being queried.
If pointer is NULL, size is updated by the driver to provide theoutput structure size. If pointer is not NULL, the driver willonly copy min(size, actual_structure_size) bytes to the pointer,and update the size accordingly. This allows us to extend querytypes without breaking userspace.
pointeruser pointer to a query type struct.
Pointer can be NULL, in which case, nothing is copied, but theactual structure size is returned. If not NULL, it must point toa location that’s large enough to hold size bytes.
- structdrm_panthor_vm_create¶
Arguments passed to DRM_PANTHOR_IOCTL_VM_CREATE
Definition:
struct drm_panthor_vm_create { __u32 flags; __u32 id; __u64 user_va_range;};Members
flagsVM flags, MBZ.
idReturned VM ID.
user_va_rangeSize of the VA space reserved for user objects.
The kernel will pick the remaining space to map kernel-only objects to theVM (heap chunks, heap context, ring buffers, kernel synchronization objects,...). If the space left for kernel objects is too small, kernel objectallocation will fail further down the road. One can usedrm_panthor_gpu_info::mmu_features to extract the total virtual addressrange, and chose a user_va_range that leaves some space to the kernel.
If user_va_range is zero, the kernel will pick a sensible value based onTASK_SIZE and the virtual range supported by the GPU MMU (the kernel/usersplit should leave enough VA space for userspace processes to support SVM,while still allowing the kernel to map some amount of kernel objects inthe kernel VA range). The value chosen by the driver will be returned inuser_va_range.
User VA space always starts at 0x0, kernel VA space is always placed afterthe user VA range.
- structdrm_panthor_vm_destroy¶
Arguments passed to DRM_PANTHOR_IOCTL_VM_DESTROY
Definition:
struct drm_panthor_vm_destroy { __u32 id; __u32 pad;};Members
idID of the VM to destroy.
padMBZ.
- enumdrm_panthor_vm_bind_op_flags¶
VM bind operation flags
Constants
DRM_PANTHOR_VM_BIND_OP_MAP_READONLYMap the memory read-only.
Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
DRM_PANTHOR_VM_BIND_OP_MAP_NOEXECMap the memory not-executable.
Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHEDMap the memory uncached.
Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
DRM_PANTHOR_VM_BIND_OP_TYPE_MASKMask used to determine the type of operation.
DRM_PANTHOR_VM_BIND_OP_TYPE_MAPMap operation.
DRM_PANTHOR_VM_BIND_OP_TYPE_UNMAPUnmap operation.
DRM_PANTHOR_VM_BIND_OP_TYPE_SYNC_ONLYNo VM operation.
Just serves as a synchronization point on a VM queue.
Only valid if
DRM_PANTHOR_VM_BIND_ASYNCis set in drm_panthor_vm_bind::flags,and drm_panthor_vm_bind_op::syncs contains at least one element.
- structdrm_panthor_vm_bind_op¶
VM bind operation
Definition:
struct drm_panthor_vm_bind_op { __u32 flags; __u32 bo_handle; __u64 bo_offset; __u64 va; __u64 size; struct drm_panthor_obj_array syncs;};Members
flagsCombination of drm_panthor_vm_bind_op_flags flags.
bo_handleHandle of the buffer object to map.MBZ for unmap or sync-only operations.
bo_offsetBuffer object offset.MBZ for unmap or sync-only operations.
vaVirtual address to map/unmap.MBZ for sync-only operations.
sizeSize to map/unmap.MBZ for sync-only operations.
syncsArray of
structdrm_panthor_sync_opsynchronizationoperations.This array must be empty if
DRM_PANTHOR_VM_BIND_ASYNCis not set onthe drm_panthor_vm_bind object containing this VM bind operation.This array shall not be empty for sync-only operations.
- enumdrm_panthor_vm_bind_flags¶
VM bind flags
Constants
DRM_PANTHOR_VM_BIND_ASYNCVM bind operations are queued to the VMqueue instead of being executed synchronously.
- structdrm_panthor_vm_bind¶
Arguments passed to DRM_IOCTL_PANTHOR_VM_BIND
Definition:
struct drm_panthor_vm_bind { __u32 vm_id; __u32 flags; struct drm_panthor_obj_array ops;};Members
vm_idVM targeted by the bind request.
flagsCombination of drm_panthor_vm_bind_flags flags.
opsArray of
structdrm_panthor_vm_bind_opbind operations.
- enumdrm_panthor_vm_state¶
VM states.
Constants
DRM_PANTHOR_VM_STATE_USABLEVM is usable.
New VM operations will be accepted on this VM.
DRM_PANTHOR_VM_STATE_UNUSABLEVM is unusable.
Something put the VM in an unusable state (like an asynchronousVM_BIND request failing for any reason).
Once the VM is in this state, all new MAP operations will berejected, and any GPU job targeting this VM will fail.UNMAP operations are still accepted.
The only way to recover from an unusable VM is to create a newVM, and destroy the old one.
- structdrm_panthor_vm_get_state¶
Get VM state.
Definition:
struct drm_panthor_vm_get_state { __u32 vm_id; __u32 state;};Members
vm_idVM targeted by the get_state request.
statestate returned by the driver.
Must be one of the
enumdrm_panthor_vm_statevalues.
- enumdrm_panthor_bo_flags¶
Buffer object flags, passed at creation time.
Constants
DRM_PANTHOR_BO_NO_MMAPThe buffer object will never be CPU-mapped in userspace.
DRM_PANTHOR_BO_WB_MMAPForce “Write-Back Cacheable” CPU mapping.
CPU map the buffer object in userspace by forcing the “Write-BackCacheable” cacheability attribute. The mapping otherwise uses the“Non-Cacheable” attribute if the GPU is not IO coherent.
- structdrm_panthor_bo_create¶
Arguments passed to DRM_IOCTL_PANTHOR_BO_CREATE.
Definition:
struct drm_panthor_bo_create { __u64 size; __u32 flags; __u32 exclusive_vm_id; __u32 handle; __u32 pad;};Members
sizeRequested size for the object
The (page-aligned) allocated size for the object will be returned.
flagsFlags. Must be a combination of drm_panthor_bo_flags flags.
exclusive_vm_idExclusive VM this buffer object will be mapped to.
- If not zero, the field must refer to a valid VM ID, and implies that:
the buffer object will only ever be bound to that VM
cannot be exported as a PRIME fd
handleReturned handle for the object.
Object handles are nonzero.
padMBZ.
- structdrm_panthor_bo_mmap_offset¶
Arguments passed to DRM_IOCTL_PANTHOR_BO_MMAP_OFFSET.
Definition:
struct drm_panthor_bo_mmap_offset { __u32 handle; __u32 pad; __u64 offset;};Members
handleHandle of the object we want an mmap offset for.
padMBZ.
offsetThe fake offset to use for subsequent mmap calls.
- structdrm_panthor_queue_create¶
Queue creation arguments.
Definition:
struct drm_panthor_queue_create { __u8 priority; __u8 pad[3]; __u32 ringbuf_size;};Members
priorityDefines the priority of queues inside a group. Goes from 0 to 15,15 being the highest priority.
padPadding fields, MBZ.
ringbuf_sizeSize of the ring buffer to allocate to this queue.
- enumdrm_panthor_group_priority¶
Scheduling group priority
Constants
PANTHOR_GROUP_PRIORITY_LOWLow priority group.
PANTHOR_GROUP_PRIORITY_MEDIUMMedium priority group.
PANTHOR_GROUP_PRIORITY_HIGHHigh priority group.
Requires CAP_SYS_NICE or DRM_MASTER.
PANTHOR_GROUP_PRIORITY_REALTIMERealtime priority group.
Requires CAP_SYS_NICE or DRM_MASTER.
- structdrm_panthor_group_create¶
Arguments passed to DRM_IOCTL_PANTHOR_GROUP_CREATE
Definition:
struct drm_panthor_group_create { struct drm_panthor_obj_array queues; __u8 max_compute_cores; __u8 max_fragment_cores; __u8 max_tiler_cores; __u8 priority; __u32 pad; __u64 compute_core_mask; __u64 fragment_core_mask; __u64 tiler_core_mask; __u32 vm_id; __u32 group_handle;};Members
queuesArray of drm_panthor_queue_create elements.
max_compute_coresMaximum number of cores that can be used by computejobs across CS queues bound to this group.
Must be less or equal to the number of bits set incompute_core_mask.
max_fragment_coresMaximum number of cores that can be used by fragmentjobs across CS queues bound to this group.
Must be less or equal to the number of bits set infragment_core_mask.
max_tiler_coresMaximum number of tilers that can be used by tiler jobsacross CS queues bound to this group.
Must be less or equal to the number of bits set intiler_core_mask.
priorityGroup priority (see
enumdrm_panthor_group_priority).padPadding field, MBZ.
compute_core_maskMask encoding cores that can be used for compute jobs.
This field must have at leastmax_compute_cores bits set.
The bits set here should also be set in drm_panthor_gpu_info::shader_present.
fragment_core_maskMask encoding cores that can be used for fragment jobs.
This field must have at leastmax_fragment_cores bits set.
The bits set here should also be set in drm_panthor_gpu_info::shader_present.
tiler_core_maskMask encoding cores that can be used for tiler jobs.
This field must have at leastmax_tiler_cores bits set.
The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
vm_idVM ID to bind this group to.
All submission to queues bound to this group will use this VM.
group_handleReturned group handle. Passed back when submitting jobs ordestroying a group.
- structdrm_panthor_group_destroy¶
Arguments passed to DRM_IOCTL_PANTHOR_GROUP_DESTROY
Definition:
struct drm_panthor_group_destroy { __u32 group_handle; __u32 pad;};Members
group_handleGroup to destroy
padPadding field, MBZ.
- structdrm_panthor_queue_submit¶
Job submission arguments.
Definition:
struct drm_panthor_queue_submit { __u32 queue_index; __u32 stream_size; __u64 stream_addr; __u32 latest_flush; __u32 pad; struct drm_panthor_obj_array syncs;};Members
queue_indexIndex of the queue inside a group.
stream_sizeSize of the command stream to execute.
Must be 64-bit/8-byte aligned (the size of a CS instruction)
Can be zero if stream_addr is zero too.
When the stream size is zero, the queue submit serves as asynchronization point.
stream_addrGPU address of the command stream to execute.
Must be aligned on 64-byte.
Can be zero is stream_size is zero too.
latest_flushFLUSH_ID read at the time the stream was built.
This allows cache flush elimination for the automaticflush+invalidate(all) done at submission time, which is needed toensure the GPU doesn’t get garbage when reading the indirect commandstream buffers. If you want the cache flush to happenunconditionally, pass a zero here.
Ignored when stream_size is zero.
padMBZ.
syncsArray of
structdrm_panthor_sync_opsync operations.
Description
This is describing the userspace command stream to call from the kernelcommand stream ring-buffer. Queue submission is always part of a groupsubmission, taking one or more jobs to submit to the underlying queues.
- structdrm_panthor_group_submit¶
Arguments passed to DRM_IOCTL_PANTHOR_GROUP_SUBMIT
Definition:
struct drm_panthor_group_submit { __u32 group_handle; __u32 pad; struct drm_panthor_obj_array queue_submits;};Members
group_handleHandle of the group to queue jobs to.
padMBZ.
queue_submitsArray of drm_panthor_queue_submit objects.
- enumdrm_panthor_group_state_flags¶
Group state flags
Constants
DRM_PANTHOR_GROUP_STATE_TIMEDOUTGroup had unfinished jobs.
When a group ends up with this flag set, no jobs can be submitted to its queues.
DRM_PANTHOR_GROUP_STATE_FATAL_FAULTGroup had fatal faults.
When a group ends up with this flag set, no jobs can be submitted to its queues.
DRM_PANTHOR_GROUP_STATE_INNOCENTGroup was killed during a reset caused by othergroups.
This flag can only be set if DRM_PANTHOR_GROUP_STATE_TIMEDOUT is set andDRM_PANTHOR_GROUP_STATE_FATAL_FAULT is not.
- structdrm_panthor_group_get_state¶
Arguments passed to DRM_IOCTL_PANTHOR_GROUP_GET_STATE
Definition:
struct drm_panthor_group_get_state { __u32 group_handle; __u32 state; __u32 fatal_queues; __u32 pad;};Members
group_handleHandle of the group to query state on
stateCombination of DRM_PANTHOR_GROUP_STATE_* flags encoding thegroup state.
fatal_queuesBitmask of queues that faced fatal faults.
padMBZ
Description
Used to query the state of a group and decide whether a new group should be created toreplace it.
- structdrm_panthor_tiler_heap_create¶
Arguments passed to DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE
Definition:
struct drm_panthor_tiler_heap_create { __u32 vm_id; __u32 initial_chunk_count; __u32 chunk_size; __u32 max_chunks; __u32 target_in_flight; __u32 handle; __u64 tiler_heap_ctx_gpu_va; __u64 first_heap_chunk_gpu_va;};Members
vm_idVM ID the tiler heap should be mapped to
initial_chunk_countInitial number of chunks to allocate. Must be at least one.
chunk_sizeChunk size.
Must be page-aligned and lie in the [128k:8M] range.
max_chunksMaximum number of chunks that can be allocated.
Must be at leastinitial_chunk_count.
target_in_flightMaximum number of in-flight render passes.
If the heap has more than tiler jobs in-flight, the FW will wait for renderpasses to finish before queuing new tiler jobs.
handleReturned heap handle. Passed back to DESTROY_TILER_HEAP.
tiler_heap_ctx_gpu_vaReturned heap GPU virtual address returned
first_heap_chunk_gpu_vaFirst heap chunk.
The tiler heap is formed of heap chunks forming a single-link list. Thisis the first element in the list.
- structdrm_panthor_tiler_heap_destroy¶
Arguments passed to DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY
Definition:
struct drm_panthor_tiler_heap_destroy { __u32 handle; __u32 pad;};Members
handleHandle of the tiler heap to destroy.
Must be a valid heap handle returned by DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.
padPadding field, MBZ.
- structdrm_panthor_bo_set_label¶
Arguments passed to DRM_IOCTL_PANTHOR_BO_SET_LABEL
Definition:
struct drm_panthor_bo_set_label { __u32 handle; __u32 pad; __u64 label;};Members
handleHandle of the buffer object to label.
padMBZ.
labelUser pointer to a NUL-terminated string
Length cannot be greater than 4096
- structdrm_panthor_set_user_mmio_offset¶
Arguments passed to DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET
Definition:
struct drm_panthor_set_user_mmio_offset { __u64 offset;};Members
offsetUser MMIO offset to use.
Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT orDRM_PANTHOR_USER_MMIO_OFFSET_64BIT.
Use DRM_PANTHOR_USER_MMIO_OFFSET (which selects OFFSET_32BIT orOFFSET_64BIT based on the size of an unsigned long) unless youhave a very good reason to overrule this decision.
Description
This ioctl is only really useful if you want to support userspaceCPU emulation environments where the size of an unsigned long differsbetween the host and the guest architectures.
- enumdrm_panthor_bo_sync_op_type¶
BO sync type
Constants
DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSHFlush CPU caches.
DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATEFlush and invalidate CPU caches.
- structdrm_panthor_bo_sync_op¶
BO map sync op
Definition:
struct drm_panthor_bo_sync_op { __u32 handle; __u32 type; __u64 offset; __u64 size;};Members
handleHandle of the buffer object to sync.
typeType of operation.
offsetOffset into the BO at which the sync range starts.
This will be rounded down to the nearest cache line as needed.
sizeSize of the range to sync
size +offset will be rounded up to the nearest cache line asneeded.
- structdrm_panthor_bo_sync¶
BO map sync request
Definition:
struct drm_panthor_bo_sync { struct drm_panthor_obj_array ops;};Members
opsArray of
structdrm_panthor_bo_sync_opsync operations.
- enumdrm_panthor_bo_extra_flags¶
Set of flags returned on a BO_QUERY_INFO request
Constants
DRM_PANTHOR_BO_IS_IMPORTEDBO has been imported from an external driver.
Note that imported dma-buf handles are not flagged as imported if theywhere exported by panthor. Only buffers that are coming from other drivers(dma heaps, other GPUs, display controllers, V4L, ...).
It’s also important to note that all imported BOs are mapped cached and can’tbe considered IO-coherent even if the GPU is. This means they require explicitsyncs that must go through the DRM_PANTHOR_BO_SYNC ioctl (userland cachemaintenance is not allowed in that case, because extra operations might beneeded to make changes visible to the CPU/device, like buffer migration when theexporter is a GPU with its own VRAM).
Description
Those are flags reflecting BO properties that are not directly coming from the flagspassed are creation time, or information on BOs that were imported from other drivers.
- structdrm_panthor_bo_query_info¶
Query BO info
Definition:
struct drm_panthor_bo_query_info { __u32 handle; __u32 extra_flags; __u32 create_flags; __u32 pad;};Members
handleHandle of the buffer object to query flags on.
extra_flagsCombination of
enumdrm_panthor_bo_extra_flagsflags.create_flagsFlags passed at creation time.
Combination of
enumdrm_panthor_bo_flagsflags.Will be zero if the buffer comes from a different driver.padWill be zero on return.
- DRM_IOCTL_PANTHOR¶
DRM_IOCTL_PANTHOR(__access,__id,__type)
Build a Panthor IOCTL number
Parameters
__accessAccess type. Must be R, W or RW.
__idOne of the DRM_PANTHOR_xxx id.
__typeSuffix of the type being passed to the IOCTL.
Description
Don’t use this macro directly, use the DRM_IOCTL_PANTHOR_xxxvalues instead.
Return
An IOCTL number to be passed to ioctl() from userspace.
drm/xe uAPI¶
Xe Device Block Diagram
The diagram below represents a high-level simplification of a discreteGPU supported by the Xe driver. It shows some device components whichare necessary to understand this API, as well as how their relationsto each other. This diagram does not represent real hardware:
┌──────────────────────────────────────────────────────────────────┐│ ┌──────────────────────────────────────────────────┐ ┌─────────┐ ││ │ ┌───────────────────────┐ ┌─────┐ │ │ ┌─────┐ │ ││ │ │ VRAM0 ├───┤ ... │ │ │ │VRAM1│ │ ││ │ └───────────┬───────────┘ └─GT1─┘ │ │ └──┬──┘ │ ││ │ ┌──────────────────┴───────────────────────────┐ │ │ ┌──┴──┐ │ ││ │ │ ┌─────────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ ││ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ ││ │ │ │ │EU│ │EU│ │EU│ │EU│ │ │ │RCS0 │ │BCS0 │ │ │ │ │ │ │ │ ││ │ │ │ └──┘ └──┘ └──┘ └──┘ │ │ └─────┘ └─────┘ │ │ │ │ │ │ │ ││ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ ││ │ │ │ │EU│ │EU│ │EU│ │EU│ │ │ │VCS0 │ │VCS1 │ │ │ │ │ │ │ │ ││ │ │ │ └──┘ └──┘ └──┘ └──┘ │ │ └─────┘ └─────┘ │ │ │ │ │ │ │ ││ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ ││ │ │ │ │EU│ │EU│ │EU│ │EU│ │ │ │VECS0│ │VECS1│ │ │ │ │ │ ... │ │ ││ │ │ │ └──┘ └──┘ └──┘ └──┘ │ │ └─────┘ └─────┘ │ │ │ │ │ │ │ ││ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ ││ │ │ │ │EU│ │EU│ │EU│ │EU│ │ │ │CCS0 │ │CCS1 │ │ │ │ │ │ │ │ ││ │ │ │ └──┘ └──┘ └──┘ └──┘ │ │ └─────┘ └─────┘ │ │ │ │ │ │ │ ││ │ │ └─────────DSS─────────┘ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ ││ │ │ │ │CCS2 │ │CCS3 │ │ │ │ │ │ │ │ ││ │ │ ┌─────┐ ┌─────┐ ┌─────┐ │ └─────┘ └─────┘ │ │ │ │ │ │ │ ││ │ │ │ ... │ │ ... │ │ ... │ │ │ │ │ │ │ │ │ ││ │ │ └─DSS─┘ └─DSS─┘ └─DSS─┘ └─────Engines─────┘ │ │ │ │ │ │ ││ │ └───────────────────────────GT0────────────────┘ │ │ └─GT2─┘ │ ││ └────────────────────────────Tile0─────────────────┘ └─ Tile1──┘ │└─────────────────────────────Device0───────┬──────────────────────┘ │ ───────────────────────┴────────── PCI bus
Xe uAPI Overview
This section aims to describe the Xe’s IOCTL entries, its structs, and otherXe related uAPI such as uevents and PMU (Platform Monitoring Unit) relatedentries and usage.
- List of supported IOCTLs:
DRM_IOCTL_XE_DEVICE_QUERYDRM_IOCTL_XE_GEM_CREATEDRM_IOCTL_XE_GEM_MMAP_OFFSETDRM_IOCTL_XE_VM_CREATEDRM_IOCTL_XE_VM_DESTROYDRM_IOCTL_XE_VM_BINDDRM_IOCTL_XE_EXEC_QUEUE_CREATEDRM_IOCTL_XE_EXEC_QUEUE_DESTROYDRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTYDRM_IOCTL_XE_EXECDRM_IOCTL_XE_WAIT_USER_FENCEDRM_IOCTL_XE_OBSERVATIONDRM_IOCTL_XE_MADVISEDRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS
Xe IOCTL Extensions
Before detailing the IOCTLs and its structs, it is important to highlightthat every IOCTL in Xe is extensible.
Many interfaces need to grow over time. In most cases we can simplyextend thestructand have userspace pass in more data. Another option,as demonstrated by Vulkan’s approach to providing extensions for forwardand backward compatibility, is to use a list of optional structs toprovide those extra details.
The key advantage to using an extension chain is that it allows us toredefine the interface more easily than an ever growingstructofincreasing complexity, and for large parts of that interface to beentirely optional. The downside is more pointer chasing; chasing acrossthe __user boundary with pointers encapsulated inside u64.
Example chaining:
structdrm_xe_user_extensionext3{.next_extension=0,// end.name=...,};structdrm_xe_user_extensionext2{.next_extension=(uintptr_t)&ext3,.name=...,};structdrm_xe_user_extensionext1{.next_extension=(uintptr_t)&ext2,.name=...,};
Typically thestructdrm_xe_user_extension would be embedded in some uAPIstruct, and in this case we would feed it the head of the chain(i.e ext1),which would then apply all of the above extensions.
- structdrm_xe_user_extension¶
Base class for defining a chain of extensions
Definition:
struct drm_xe_user_extension { __u64 next_extension; __u32 name; __u32 pad;};Members
next_extensionPointer to the next
structdrm_xe_user_extension, or zero if the end.nameName of the extension.
Note that the name here is just some integer.
Also note that the name space for this is not global for the wholedriver, but rather its scope/meaning is limited to the specific pieceof uAPI which has embedded the
structdrm_xe_user_extension.padMBZ
All undefined bits must be zero.
- structdrm_xe_ext_set_property¶
Generic set property extension
Definition:
struct drm_xe_ext_set_property { struct drm_xe_user_extension base; __u32 property; __u32 pad; union { __u64 value; __u64 ptr; }; __u64 reserved[2];};Members
basebase user extension
propertyproperty to set
padMBZ
{unnamed_union}anonymous
valueproperty value
ptrpointer to user value
reservedReserved
Description
A genericstructthat allows any of the Xe’s IOCTL to be extendedwith a set_property operation.
- structdrm_xe_engine_class_instance¶
instance of an engine class
Definition:
struct drm_xe_engine_class_instance {#define DRM_XE_ENGINE_CLASS_RENDER 0;#define DRM_XE_ENGINE_CLASS_COPY 1;#define DRM_XE_ENGINE_CLASS_VIDEO_DECODE 2;#define DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE 3;#define DRM_XE_ENGINE_CLASS_COMPUTE 4;#define DRM_XE_ENGINE_CLASS_VM_BIND 5; __u16 engine_class; __u16 engine_instance; __u16 gt_id; __u16 pad;};Members
engine_classengine class id
engine_instanceengine instance id
gt_idUnique ID of this GT within the PCI Device
padMBZ
Description
It is returned as part of thedrm_xe_engine, but it also is used asthe input of engine selection for bothdrm_xe_exec_queue_create anddrm_xe_query_engine_cycles
- Theengine_class can be:
DRM_XE_ENGINE_CLASS_RENDERDRM_XE_ENGINE_CLASS_COPYDRM_XE_ENGINE_CLASS_VIDEO_DECODEDRM_XE_ENGINE_CLASS_VIDEO_ENHANCEDRM_XE_ENGINE_CLASS_COMPUTEDRM_XE_ENGINE_CLASS_VM_BIND- Kernel only classes (not actualhardware engine class). Used for creating ordered queues of VMbind operations.
- structdrm_xe_engine¶
describe hardware engine
Definition:
struct drm_xe_engine { struct drm_xe_engine_class_instance instance; __u64 reserved[3];};Members
instanceThedrm_xe_engine_class_instance
reservedReserved
- structdrm_xe_query_engines¶
describe engines
Definition:
struct drm_xe_query_engines { __u32 num_engines; __u32 pad; struct drm_xe_engine engines[];};Members
num_enginesnumber of engines returned inengines
padMBZ
enginesThe returned engines for this device
Description
If a query is made with a structdrm_xe_device_query where .queryis equal toDRM_XE_DEVICE_QUERY_ENGINES, then the reply uses an array ofstructdrm_xe_query_engines in .data.
- enumdrm_xe_memory_class¶
Supported memory classes.
Constants
DRM_XE_MEM_REGION_CLASS_SYSMEMRepresents system memory.
DRM_XE_MEM_REGION_CLASS_VRAMOn discrete platforms, thisrepresents the memory that is local to the device, which wecall VRAM. Not valid on integrated platforms.
- structdrm_xe_mem_region¶
Describes some region as known to the driver.
Definition:
struct drm_xe_mem_region { __u16 mem_class; __u16 instance; __u32 min_page_size; __u64 total_size; __u64 used; __u64 cpu_visible_size; __u64 cpu_visible_used; __u64 reserved[6];};Members
mem_classThe memory class describing this region.
See
enumdrm_xe_memory_classfor supported values.instanceThe unique ID for this region, which serves as theindex in the placement bitmask used as argument for
DRM_IOCTL_XE_GEM_CREATEmin_page_sizeMin page-size in bytes for this region.
When the kernel allocates memory for this region, theunderlying pages will be at leastmin_page_size in size.Buffer objects with an allowable placement in this region must becreated with a size aligned to this value.GPU virtual address mappings of (parts of) buffer objects thatmay be placed in this region must also have their GPU virtualaddress and range aligned to this value.Affected IOCTLS will return
-EINVALif alignment restrictions arenot met.total_sizeThe usable size in bytes for this region.
usedEstimate of the memory used in bytes for this region.
cpu_visible_sizeHow much of this region can be CPUaccessed, in bytes.
This will always be <=total_size, and the remainder (ifany) will not be CPU accessible. If the CPU accessible partis smaller thantotal_size then this is referred to as asmall BAR system.
On systems without small BAR (full BAR), the probed_size willalways equal thetotal_size, since all of it will be CPUaccessible.
Note this is only tracked for DRM_XE_MEM_REGION_CLASS_VRAMregions (for other types the value here will always equalzero).
cpu_visible_usedEstimate of CPU visible memory used, inbytes.
Note this is only currently tracked forDRM_XE_MEM_REGION_CLASS_VRAM regions (for other types the valuehere will always be zero).
reservedReserved
- structdrm_xe_query_mem_regions¶
describe memory regions
Definition:
struct drm_xe_query_mem_regions { __u32 num_mem_regions; __u32 pad; struct drm_xe_mem_region mem_regions[];};Members
num_mem_regionsnumber of memory regions returned inmem_regions
padMBZ
mem_regionsThe returned memory regions for this device
Description
If a query is made with astructdrm_xe_device_query where .queryis equal to DRM_XE_DEVICE_QUERY_MEM_REGIONS, then the reply usesstructdrm_xe_query_mem_regions in .data.
- structdrm_xe_query_config¶
describe the device configuration
Definition:
struct drm_xe_query_config { __u32 num_params; __u32 pad;#define DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID 0;#define DRM_XE_QUERY_CONFIG_FLAGS 1;#define DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM (1 << 0);#define DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY (1 << 1);#define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR (1 << 2);#define DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT (1 << 3);#define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT 2;#define DRM_XE_QUERY_CONFIG_VA_BITS 3;#define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY 4; __u64 info[];};Members
num_paramsnumber of parameters returned in info
padMBZ
infoarray of elements containing the config info
Description
If a query is made with astructdrm_xe_device_query where .queryis equal to DRM_XE_DEVICE_QUERY_CONFIG, then the reply usesstructdrm_xe_query_config in .data.
- The index ininfo can be:
DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID- Device ID (lower 16 bits)and the device revision (next 8 bits)DRM_XE_QUERY_CONFIG_FLAGS- Flags describing the deviceconfiguration, see list belowDRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM- Flag is set if the devicehas usable VRAMDRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY- Flag is set if the devicehas low latency hint supportDRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR- Flag is set if thedevice has CPU address mirroring supportDRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT- Flag is set if thedevice supports the userspace hintDRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION.This is exposed only on Xe2+.
DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT- Minimal memory alignmentrequired by this device, typically SZ_4K or SZ_64KDRM_XE_QUERY_CONFIG_VA_BITS- Maximum bits of a virtual addressDRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY- Value of the highestavailable exec queue priority
- structdrm_xe_gt¶
describe an individual GT.
Definition:
struct drm_xe_gt {#define DRM_XE_QUERY_GT_TYPE_MAIN 0;#define DRM_XE_QUERY_GT_TYPE_MEDIA 1; __u16 type; __u16 tile_id; __u16 gt_id; __u16 pad[3]; __u32 reference_clock; __u64 near_mem_regions; __u64 far_mem_regions; __u16 ip_ver_major; __u16 ip_ver_minor; __u16 ip_ver_rev; __u16 pad2; __u64 reserved[7];};Members
typeGT type: Main or Media
tile_idTile ID where this GT lives (Information only)
gt_idUnique ID of this GT within the PCI Device
padMBZ
reference_clockA clock frequency for timestamp
near_mem_regionsBit mask of instances fromdrm_xe_query_mem_regions that are nearest to the current enginesof this GT.Each index in this mask refers directly to the
structdrm_xe_query_mem_regions’ instance, no assumptions shouldbe made about order. The type of each region is describedbystructdrm_xe_query_mem_regions’ mem_class.far_mem_regionsBit mask of instances fromdrm_xe_query_mem_regions that are far from the engines of this GT.In general, they have extra indirections when compared to thenear_mem_regions. For a discrete device this could mean systemmemory and memory living in a different tile.Each index in this mask refers directly to the
structdrm_xe_query_mem_regions’ instance, no assumptions shouldbe made about order. The type of each region is describedbystructdrm_xe_query_mem_regions’ mem_class.ip_ver_majorGraphics/media IP major version on GMD_ID platforms
ip_ver_minorGraphics/media IP minor version on GMD_ID platforms
ip_ver_revGraphics/media IP revision version on GMD_ID platforms
pad2MBZ
reservedReserved
Description
To be used with drm_xe_query_gt_list, which will return a list with all theexisting GT individual descriptions.Graphics Technology (GT) is a subset of a GPU/tile that is responsible forimplementing graphics and/or media operations.
- The index intype can be:
DRM_XE_QUERY_GT_TYPE_MAINDRM_XE_QUERY_GT_TYPE_MEDIA
- structdrm_xe_query_gt_list¶
A list with GT description items.
Definition:
struct drm_xe_query_gt_list { __u32 num_gt; __u32 pad; struct drm_xe_gt gt_list[];};Members
num_gtnumber of GT items returned in gt_list
padMBZ
gt_listThe GT list returned for this device
Description
If a query is made with astructdrm_xe_device_query where .queryis equal to DRM_XE_DEVICE_QUERY_GT_LIST, then the reply usesstructdrm_xe_query_gt_list in .data.
- structdrm_xe_query_topology_mask¶
describe the topology mask of a GT
Definition:
struct drm_xe_query_topology_mask { __u16 gt_id;#define DRM_XE_TOPO_DSS_GEOMETRY 1;#define DRM_XE_TOPO_DSS_COMPUTE 2;#define DRM_XE_TOPO_L3_BANK 3;#define DRM_XE_TOPO_EU_PER_DSS 4;#define DRM_XE_TOPO_SIMD16_EU_PER_DSS 5; __u16 type; __u32 num_bytes; __u8 mask[];};Members
gt_idGT ID the mask is associated with
typetype of mask
num_bytesnumber of bytes in requested mask
masklittle-endian mask ofnum_bytes
Description
This is the hardware topology which reflects the internal physicalstructure of the GPU.
If a query is made with astructdrm_xe_device_query where .queryis equal to DRM_XE_DEVICE_QUERY_GT_TOPOLOGY, then the reply usesstructdrm_xe_query_topology_mask in .data.
- Thetype can be:
DRM_XE_TOPO_DSS_GEOMETRY- To query the mask of Dual Sub Slices(DSS) available for geometry operations. For example a query responsecontaining the following in mask:DSS_GEOMETRY ffffffff00000000means 32 DSS are available for geometry.DRM_XE_TOPO_DSS_COMPUTE- To query the mask of Dual Sub Slices(DSS) available for compute operations. For example a query responsecontaining the following in mask:DSS_COMPUTE ffffffff00000000means 32 DSS are available for compute.DRM_XE_TOPO_L3_BANK- To query the mask of enabled L3 banks. This typemay be omitted if the driver is unable to query the mask from thehardware.DRM_XE_TOPO_EU_PER_DSS- To query the mask of Execution Units (EU)available per Dual Sub Slices (DSS). For example a query responsecontaining the following in mask:EU_PER_DSS ffff000000000000means each DSS has 16 SIMD8 EUs. This type may be omitted if devicedoesn’t have SIMD8 EUs.DRM_XE_TOPO_SIMD16_EU_PER_DSS- To query the mask of SIMD16 ExecutionUnits (EU) available per Dual Sub Slices (DSS). For example a queryresponse containing the following in mask:SIMD16_EU_PER_DSS ffff000000000000means each DSS has 16 SIMD16 EUs. This type may be omitted if devicedoesn’t have SIMD16 EUs.
- structdrm_xe_query_engine_cycles¶
correlate CPU and GPU timestamps
Definition:
struct drm_xe_query_engine_cycles { struct drm_xe_engine_class_instance eci; __s32 clockid; __u32 width; __u64 engine_cycles; __u64 cpu_timestamp; __u64 cpu_delta;};Members
eciThis is input by the user and is the engine for which commandstreamer cycles is queried.
clockidThis is input by the user and is the reference clock id forCPU timestamp. For definition, see clock_gettime(2) andperf_event_open(2). Supported clock ids are CLOCK_MONOTONIC,CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME, CLOCK_TAI.
widthWidth of the engine cycle counter in bits.
engine_cyclesEngine cycles as read from its registerat 0x358 offset.
cpu_timestampCPU timestamp in ns. The timestamp is captured beforereading the engine_cycles register using the reference clockid set by theuser.
cpu_deltaTime delta in ns captured around reading the lower dwordof the engine_cycles register.
Description
If a query is made with astructdrm_xe_device_query where .query is equal toDRM_XE_DEVICE_QUERY_ENGINE_CYCLES, then the reply usesstructdrm_xe_query_engine_cyclesin .data.structdrm_xe_query_engine_cycles is allocated by the user and.data points to this allocated structure.
The query returns the engine cycles, which along with GT’sreference_clock,can be used to calculate the engine timestamp. In addition thequery returns a set of cpu timestamps that indicate when the commandstreamer cycle count was captured.
- structdrm_xe_query_uc_fw_version¶
query a micro-controller firmware version
Definition:
struct drm_xe_query_uc_fw_version {#define XE_QUERY_UC_TYPE_GUC_SUBMISSION 0;#define XE_QUERY_UC_TYPE_HUC 1; __u16 uc_type; __u16 pad; __u32 branch_ver; __u32 major_ver; __u32 minor_ver; __u32 patch_ver; __u32 pad2; __u64 reserved;};Members
uc_typeThe micro-controller type to query firmware version
padMBZ
branch_verbranch uc fw version
major_vermajor uc fw version
minor_verminor uc fw version
patch_verpatch uc fw version
pad2MBZ
reservedReserved
Description
Given a uc_type this will return the branch, major, minor and patch versionof the micro-controller firmware.
- structdrm_xe_query_pxp_status¶
query if PXP is ready
Definition:
struct drm_xe_query_pxp_status { __u32 status; __u32 supported_session_types;};Members
statuscurrent PXP status
supported_session_typesbitmask of supported PXP session types
Description
If PXP is enabled and no fatal error has occurred, the status will be set toone of the following values:0: PXP init still in progress1: PXP init complete
If PXP is not enabled or something has gone wrong, the query will be failedwith one of the following error codes:-ENODEV: PXP not supported or disabled;-EIO: fatal error occurred during init, so PXP will never be enabled;-EINVAL: incorrect value provided as part of the query;-EFAULT: error copying the memory between kernel and userspace.
The status can only be 0 in the first few seconds after driver load. Ifeverything works as expected, the status will transition to init complete inless than 1 second, while in case of errors the driver might take longer tostart returning an error code, but it should still take less than 10 seconds.
The supported session type bitmask is based on the values inenumdrm_xe_pxp_session_type. TYPE_NONE is always supported and thereforeis not reported in the bitmask.
- structdrm_xe_device_query¶
Input of
DRM_IOCTL_XE_DEVICE_QUERY- main structure to query device information
Definition:
struct drm_xe_device_query { __u64 extensions;#define DRM_XE_DEVICE_QUERY_ENGINES 0;#define DRM_XE_DEVICE_QUERY_MEM_REGIONS 1;#define DRM_XE_DEVICE_QUERY_CONFIG 2;#define DRM_XE_DEVICE_QUERY_GT_LIST 3;#define DRM_XE_DEVICE_QUERY_HWCONFIG 4;#define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY 5;#define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES 6;#define DRM_XE_DEVICE_QUERY_UC_FW_VERSION 7;#define DRM_XE_DEVICE_QUERY_OA_UNITS 8;#define DRM_XE_DEVICE_QUERY_PXP_STATUS 9;#define DRM_XE_DEVICE_QUERY_EU_STALL 10; __u32 query; __u32 size; __u64 data; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
queryThe type of data to query
sizeSize of the queried data
dataQueried data is placed here
reservedReserved
Description
The user selects the type of data to query among DRM_XE_DEVICE_QUERY_*and sets the value in the query member. This determines the type ofthe structure provided by the driver in data, amongstructdrm_xe_query_*.
- Thequery can be:
DRM_XE_DEVICE_QUERY_ENGINESDRM_XE_DEVICE_QUERY_MEM_REGIONSDRM_XE_DEVICE_QUERY_CONFIGDRM_XE_DEVICE_QUERY_GT_LISTDRM_XE_DEVICE_QUERY_HWCONFIG- Query type to retrieve the hardwareconfiguration of the device such as information on slices, memory,caches, and so on. It is provided as a table of key / valueattributes.DRM_XE_DEVICE_QUERY_GT_TOPOLOGYDRM_XE_DEVICE_QUERY_ENGINE_CYCLESDRM_XE_DEVICE_QUERY_PXP_STATUS
If size is set to 0, the driver fills it with the required size forthe requested type of data to query. If size is equal to the requiredsize, the queried information is copied into data. If size is set toa value different from 0 and different from the required size, theIOCTL call returns -EINVAL.
For example the following code snippet allows retrieving and printinginformation about the device engines with DRM_XE_DEVICE_QUERY_ENGINES:
structdrm_xe_query_engines*engines;structdrm_xe_device_queryquery={.extensions=0,.query=DRM_XE_DEVICE_QUERY_ENGINES,.size=0,.data=0,};ioctl(fd,DRM_IOCTL_XE_DEVICE_QUERY,&query);engines=malloc(query.size);query.data=(uintptr_t)engines;ioctl(fd,DRM_IOCTL_XE_DEVICE_QUERY,&query);for(inti=0;i<engines->num_engines;i++){printf("Engine %d: %s\n",i,engines->engines[i].instance.engine_class==DRM_XE_ENGINE_CLASS_RENDER?"RENDER":engines->engines[i].instance.engine_class==DRM_XE_ENGINE_CLASS_COPY?"COPY":engines->engines[i].instance.engine_class==DRM_XE_ENGINE_CLASS_VIDEO_DECODE?"VIDEO_DECODE":engines->engines[i].instance.engine_class==DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE?"VIDEO_ENHANCE":engines->engines[i].instance.engine_class==DRM_XE_ENGINE_CLASS_COMPUTE?"COMPUTE":"UNKNOWN");}free(engines);
- structdrm_xe_gem_create¶
Input of
DRM_IOCTL_XE_GEM_CREATE- A structure for gem creation
Definition:
struct drm_xe_gem_create {#define DRM_XE_GEM_CREATE_EXTENSION_SET_PROPERTY 0;#define DRM_XE_GEM_CREATE_SET_PROPERTY_PXP_TYPE 0; __u64 extensions; __u64 size; __u32 placement;#define DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING (1 << 0);#define DRM_XE_GEM_CREATE_FLAG_SCANOUT (1 << 1);#define DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM (1 << 2);#define DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION (1 << 3); __u32 flags; __u32 vm_id; __u32 handle;#define DRM_XE_GEM_CPU_CACHING_WB 1;#define DRM_XE_GEM_CPU_CACHING_WC 2; __u16 cpu_caching; __u16 pad[3]; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
sizeSize of the object to be created, must match region(system or vram) minimum alignment (
min_page_size).placementA mask of memory instances of where BO can be placed.Each index in this mask refers directly to the
structdrm_xe_query_mem_regions’ instance, no assumptions shouldbe made about order. The type of each region is describedbystructdrm_xe_query_mem_regions’ mem_class.flagsFlags, currently a mask of memory instances of where BO canbe placed
vm_idAttached VM, if any
If a VM is specified, this BO must:
Only ever be bound to that VM.
Cannot be exported as a PRIME fd.
handleReturned handle for the object.
Object handles are nonzero.
cpu_cachingThe CPU caching mode to select for this object. Ifmmaping the object the mode selected here will also be used. Theexception is when mapping system memory (including data evictedto system) on discrete GPUs. The caching mode selected willthen be overridden to DRM_XE_GEM_CPU_CACHING_WB, and coherencybetween GPU- and CPU is guaranteed. The caching mode ofexisting CPU-mappings will be updated transparently touser-space clients.
padMBZ
reservedReserved
Description
- Theflags can be:
DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING- Modify the GEM objectallocation strategy by deferring physical memory allocationuntil the object is either bound to a virtual memory region viaVM_BIND or accessed by the CPU. As a result, no backing memory isreserved at the time of GEM object creation.DRM_XE_GEM_CREATE_FLAG_SCANOUT- Indicates that the GEM object isintended for scanout via the display engine. When set, kernel ensuresthat the allocation is placed in a memory region compatible with thedisplay engine requirements. This may impose restrictions on tiling,alignment, and memory placement to guarantee proper display functionality.DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM- When using VRAM as apossible placement, ensure that the corresponding VRAM allocationwill always use the CPU accessible part of VRAM. This is importantfor small-bar systems (on full-bar systems this gets turned into anoop).Note1: System memory can be used as an extra placement if the kernelshould spill the allocation to system memory, if space can’t be madeavailable in the CPU accessible part of VRAM (giving the samebehaviour as the i915 interface, seeI915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS).Note2: For clear-color CCS surfaces the kernel needs to read theclear-color value stored in the buffer, and on discrete platforms weneed to use VRAM for display surfaces, therefore the kernel requiressetting this flag for such objects, otherwise an error is thrown onsmall-bar systems.DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION- Allows userspace tohint that compression (CCS) should be disabled for the buffer beingcreated. This can avoid unnecessary memory operations and CCS statemanagement.On pre-Xe2 platforms, this flag is currently rejected as compressioncontrol is not supported via PAT index. On Xe2+ platforms, compressionis controlled via PAT entries. If this flag is set, the driver will rejectany VM bind that requests a PAT index enabling compression for this BO.
Note
- On dGPU platforms, there is currently no change in behavior with
this flag, but future improvements may leverage it. The current benefit isprimarily applicable to iGPU platforms.
- cpu_caching supports the following values:
DRM_XE_GEM_CPU_CACHING_WB- Allocate the pages with write-backcaching. On iGPU this can’t be used for scanout surfaces. Currentlynot allowed for objects placed in VRAM.DRM_XE_GEM_CPU_CACHING_WC- Allocate the pages as write-combined. Thisis uncached. Scanout surfaces should likely use this. All objectsthat can be placed in VRAM must use this.
This ioctl supports setting the following properties via theDRM_XE_GEM_CREATE_EXTENSION_SET_PROPERTY extension, which uses thegenericdrm_xe_ext_set_property struct:
DRM_XE_GEM_CREATE_SET_PROPERTY_PXP_TYPE- set the type of PXP sessionthis object will be used with. Valid values are listed inenumdrm_xe_pxp_session_type.DRM_XE_PXP_TYPE_NONEis the default behavior, sothere is no need to explicitly set that. Objects used with session of typeDRM_XE_PXP_TYPE_HWDRMwill be marked as invalid if a PXP invalidationevent occurs after their creation. Attempting to flip an invalid objectwill cause a black frame to be displayed instead. Submissions with invalidobjects mapped in the VM will be rejected.
- structdrm_xe_gem_mmap_offset¶
Input of
DRM_IOCTL_XE_GEM_MMAP_OFFSET
Definition:
struct drm_xe_gem_mmap_offset { __u64 extensions; __u32 handle;#define DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER (1 << 0); __u32 flags; __u64 offset; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
handleHandle for the object being mapped.
flagsFlags
offsetThe fake offset to use for subsequent mmap call
reservedReserved
Description
- Theflags can be:
DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER- For user to query special offsetfor use in mmap ioctl. Writing to the returned mmap address will generate aPCI memory barrier with low overhead (avoiding IOCTL call as well as writingto VRAM which would also add overhead), acting like an MI_MEM_FENCEinstruction.
Note
The mmap size can be at most 4K, due to HW limitations. As a resultthis interface is only supported on CPU architectures that support 4K pagesize. The mmap_offset ioctl will detect this and gracefully return anerror, where userspace is expected to have a different fallback method fortriggering a barrier.
Roughly the usage would be as follows:
structdrm_xe_gem_mmap_offsetmmo={.handle=0,// must be set to 0.flags=DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER,};err=ioctl(fd,DRM_IOCTL_XE_GEM_MMAP_OFFSET,&mmo);map=mmap(NULL,size,PROT_WRITE,MAP_SHARED,fd,mmo.offset);map[i]=0xdeadbeaf;// issue barrier
- structdrm_xe_vm_create¶
Input of
DRM_IOCTL_XE_VM_CREATE
Definition:
struct drm_xe_vm_create { __u64 extensions;#define DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE (1 << 0);#define DRM_XE_VM_CREATE_FLAG_LR_MODE (1 << 1);#define DRM_XE_VM_CREATE_FLAG_FAULT_MODE (1 << 2); __u32 flags; __u32 vm_id; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
flagsFlags
vm_idReturned VM ID
reservedReserved
Description
- Theflags can be:
DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE- Map the whole virtual addressspace of the VM to scratch page. A vm_bind would overwrite the scratchpage mapping. This flag is mutually exclusive with theDRM_XE_VM_CREATE_FLAG_FAULT_MODEflag, with an exception of on x2 andxe3 platform.DRM_XE_VM_CREATE_FLAG_LR_MODE- An LR, or Long Running VM acceptsexec submissions to its exec_queues that don’t have an upper timelimit on the job execution time. But exec submissions to thesedon’t allow any of the sync types DRM_XE_SYNC_TYPE_SYNCOBJ,DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ, used as out-syncobjs, that is,together with sync flag DRM_XE_SYNC_FLAG_SIGNAL.LR VMs can be created in recoverable page-fault mode usingDRM_XE_VM_CREATE_FLAG_FAULT_MODE, if the device supports it.If that flag is omitted, the UMD can not rely on the slightlydifferent per-VM overcommit semantics that are enabled byDRM_XE_VM_CREATE_FLAG_FAULT_MODE (see below), but KMD maystill enable recoverable pagefaults if supported by the device.DRM_XE_VM_CREATE_FLAG_FAULT_MODE- Requires alsoDRM_XE_VM_CREATE_FLAG_LR_MODE. It allows memory to be allocated ondemand when accessed, and also allows per-VM overcommit of memory.The xe driver internally uses recoverable pagefaults to implementthis.
- structdrm_xe_vm_destroy¶
Input of
DRM_IOCTL_XE_VM_DESTROY
Definition:
struct drm_xe_vm_destroy { __u32 vm_id; __u32 pad; __u64 reserved[2];};Members
vm_idVM ID
padMBZ
reservedReserved
- structdrm_xe_vm_bind_op¶
run bind operations
Definition:
struct drm_xe_vm_bind_op { __u64 extensions; __u32 obj; __u16 pat_index; __u16 pad; union { __u64 obj_offset; __u64 userptr; __s64 cpu_addr_mirror_offset; }; __u64 range; __u64 addr;#define DRM_XE_VM_BIND_OP_MAP 0x0;#define DRM_XE_VM_BIND_OP_UNMAP 0x1;#define DRM_XE_VM_BIND_OP_MAP_USERPTR 0x2;#define DRM_XE_VM_BIND_OP_UNMAP_ALL 0x3;#define DRM_XE_VM_BIND_OP_PREFETCH 0x4; __u32 op;#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0);#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1);#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2);#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3);#define DRM_XE_VM_BIND_FLAG_CHECK_PXP (1 << 4);#define DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (1 << 5);#define DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET (1 << 6); __u32 flags;#define DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC -1; __u32 prefetch_mem_region_instance; __u32 pad2; __u64 reserved[3];};Members
extensionsPointer to the first extension struct, if any
objGEM object to operate on, MBZ for MAP_USERPTR, MBZ for UNMAP
pat_indexThe platform definedpat_index to use for this mapping.The index basically maps to some predefined memory attributes,including things like caching, coherency, compression etc. The exactmeaning of the pat_index is platform specific and defined in theBspec and PRMs. When the KMD sets up the binding the index here isencoded into the ppGTT PTE.
For coherency thepat_index needs to be at least 1way coherent whendrm_xe_gem_create.cpu_caching is DRM_XE_GEM_CPU_CACHING_WB. The KMDwill extract the coherency mode from thepat_index and reject ifthere is a mismatch (see note below for pre-MTL platforms).
Note: On pre-MTL platforms there is only a caching mode and noexplicit coherency mode, but on such hardware there is always ashared-LLC (or is dgpu) so all GT memory accesses are coherent withCPU caches even with the caching mode set as uncached. It’s only thedisplay engine that is incoherent (on dgpu it must be in VRAM whichis always mapped as WC on the CPU). However to keep the uapi somewhatconsistent with newer platforms the KMD groups the different cachelevels into the following coherency buckets on all pre-MTL platforms:
ppGTT UC -> COH_NONEppGTT WC -> COH_NONEppGTT WT -> COH_NONEppGTT WB -> COH_AT_LEAST_1WAY
In practice UC/WC/WT should only ever used for scanout surfaces onsuch platforms (or perhaps in general for dma-buf if shared withanother device) since it is only the display engine that is actuallyincoherent. Everything else should typically use WB given that wehave a shared-LLC. On MTL+ this completely changes and the HWdefines the coherency mode as part of thepat_index, whereincoherent GT access is possible.
Note: For userptr and externally imported dma-buf the kernel expectseither 1WAY or 2WAY for thepat_index.
For DRM_XE_VM_BIND_FLAG_NULL bindings there are no KMD restrictionson thepat_index. For such mappings there is no actual memory beingmapped (the address in the PTE is invalid), so the various PAT memoryattributes likely do not apply. Simply leaving as zero is oneoption (still a valid pat_index). Same applies toDRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR bindings as for such mappingthere is no actual memory being mapped.
padMBZ
{unnamed_union}anonymous
obj_offsetOffset into the object, MBZ for CLEAR_RANGE,ignored for unbind
userptruser pointer to bind on
cpu_addr_mirror_offsetOffset from GPUaddr to createCPU address mirror mappings. MBZ with current level ofsupport (e.g. 1 to 1 mapping between GPU and CPU mappingsonly supported).
rangeNumber of bytes from the object to bind to addr, MBZ for UNMAP_ALL
addrAddress to operate on, MBZ for UNMAP_ALL
opBind operation to perform
flagsBind flags
prefetch_mem_region_instanceMemory region to prefetch VMA to.It is a region instance, not a mask.To be used only with
DRM_XE_VM_BIND_OP_PREFETCHoperation.pad2MBZ
reservedReserved
Description
- Theop can be:
DRM_XE_VM_BIND_OP_MAPDRM_XE_VM_BIND_OP_UNMAPDRM_XE_VM_BIND_OP_MAP_USERPTRDRM_XE_VM_BIND_OP_UNMAP_ALLDRM_XE_VM_BIND_OP_PREFETCH
- and theflags can be:
DRM_XE_VM_BIND_FLAG_READONLY- Setup the page tables as read-onlyto ensure write protectionDRM_XE_VM_BIND_FLAG_IMMEDIATE- On a faulting VM, do theMAP operation immediately rather than deferring the MAP to the pagefault handler. This is implied on a non-faulting VM as there is nofault handler to defer to.DRM_XE_VM_BIND_FLAG_NULL- When the NULL flag is set, the pagetables are setup with a special bit which indicates writes aredropped and all reads return zero. In the future, the NULL flagswill only be valid for DRM_XE_VM_BIND_OP_MAP operations, the BOhandle MBZ, and the BO offset MBZ. This flag is intended toimplement VK sparse bindings.DRM_XE_VM_BIND_FLAG_CHECK_PXP- If the object is encrypted via PXP,reject the binding if the encryption key is no longer valid. Thisflag has no effect on BOs that are not marked as using PXP.DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR- When the CPU address mirror flag isset, no mappings are created rather the range is reserved for CPU addressmirroring which will be populated on GPU page faults or prefetches. Onlyvalid on VMs with DRM_XE_VM_CREATE_FLAG_FAULT_MODE set. The CPU addressmirror flag are only valid for DRM_XE_VM_BIND_OP_MAP operations, the BOhandle MBZ, and the BO offset MBZ.DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET- Can be used in combination withDRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRRORto reset madvises when the underlyingCPU address space range is unmapped (typically with munmap(2) or brk(2)).The madvise values set withDRM_IOCTL_XE_MADVISEare reset to the valuesthat were present immediately after theDRM_IOCTL_XE_VM_BIND.The reset GPU virtual address range is the intersection of the range boundusingDRM_IOCTL_XE_VM_BINDand the virtual CPU address space rangeunmapped.This functionality is present to mimic the behaviour of CPU address spacemadvises set using madvise(2), which are typically reset on unmap.
Note
- free(3) may or may not call munmap(2) and/or brk(2), and may thus
not invoke autoreset. Neither will stack variables going out of scope.Therefore it’s recommended to always explicitly reset the madvises whenfreeing the memory backing a region used in a
DRM_IOCTL_XE_MADVISEcall.- Theprefetch_mem_region_instance for
DRM_XE_VM_BIND_OP_PREFETCHcan also be: DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, which ensures prefetching occurs inthe memory region advised by madvise.
- structdrm_xe_vm_bind¶
Input of
DRM_IOCTL_XE_VM_BIND
Definition:
struct drm_xe_vm_bind { __u64 extensions; __u32 vm_id; __u32 exec_queue_id; __u32 pad; __u32 num_binds; union { struct drm_xe_vm_bind_op bind; __u64 vector_of_binds; }; __u32 pad2; __u32 num_syncs; __u64 syncs; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
vm_idThe ID of the VM to bind to
exec_queue_idexec_queue_id, must be of class DRM_XE_ENGINE_CLASS_VM_BINDand exec queue must have same vm_id. If zero, the default VM bind engineis used.
padMBZ
num_bindsnumber of binds in this IOCTL
{unnamed_union}anonymous
bindused if num_binds == 1
vector_of_bindsuserptr to array of
structdrm_xe_vm_bind_opif num_binds > 1pad2MBZ
num_syncsamount of syncs to wait on
syncspointer to
structdrm_xe_syncarrayreservedReserved
Description
Below is an example of a minimal use ofdrm_xe_vm_bind toasynchronously bind the bufferdata at addressBIND_ADDRESS toillustrateuserptr. It can be synchronized by using the exampleprovided fordrm_xe_sync.
data=aligned_alloc(ALIGNMENT,BO_SIZE);structdrm_xe_vm_bindbind={.vm_id=vm,.num_binds=1,.bind.obj=0,.bind.obj_offset=to_user_pointer(data),.bind.range=BO_SIZE,.bind.addr=BIND_ADDRESS,.bind.op=DRM_XE_VM_BIND_OP_MAP_USERPTR,.bind.flags=0,.num_syncs=1,.syncs=&sync,.exec_queue_id=0,};ioctl(fd,DRM_IOCTL_XE_VM_BIND,&bind);
- structdrm_xe_exec_queue_create¶
Input of
DRM_IOCTL_XE_EXEC_QUEUE_CREATE
Definition:
struct drm_xe_exec_queue_create {#define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY 0;#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY 0;#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE 1;#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE 2;#define DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE 3;#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP 4;#define DRM_XE_MULTI_GROUP_CREATE (1ull << 63);#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY 5; __u64 extensions; __u16 width; __u16 num_placements; __u32 vm_id;#define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT (1 << 0); __u32 flags; __u32 exec_queue_id; __u64 instances; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
widthsubmission width (number BB per exec) for this exec queue
num_placementsnumber of valid placements for this exec queue
vm_idVM to use for this exec queue
flagsflags to use for this exec queue
exec_queue_idReturned exec queue ID
instancesuser pointer to a 2-d array of
structdrm_xe_engine_class_instancelength = width (i) * num_placements (j)index = j + i * width
reservedReserved
Description
This ioctl supports setting the following properties via theDRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY extension, which uses thegenericdrm_xe_ext_set_property struct:
DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY- set the queue priority.CAP_SYS_NICE is required to set a value above normal.
DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE- set the queue timesliceduration in microseconds.
DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE- set the type of PXP sessionthis queue will be used with. Valid values are listed inenumdrm_xe_pxp_session_type.DRM_XE_PXP_TYPE_NONEis the default behavior, sothere is no need to explicitly set that. When a queue of typeDRM_XE_PXP_TYPE_HWDRMis created, the PXP default HWDRM session(XE_PXP_HWDRM_DEFAULT_SESSION) will be started, if isn’t already running.The user is expected to query the PXP status via the query ioctl (seeDRM_XE_DEVICE_QUERY_PXP_STATUS) and to wait for PXP to be ready beforeattempting to create a queue with this property. When a queue is createdbefore PXP is ready, the ioctl will return -EBUSY if init is still inprogress or -EIO if init failed.Given that going into a power-saving state kills PXP HWDRM sessions,runtime PM will be blocked while queues of this type are alive.All PXP queues will be killed if a PXP invalidation event occurs.
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP- Create a multi-queue groupor add secondary queues to a multi-queue group.If the extension’s ‘value’ field hasDRM_XE_MULTI_GROUP_CREATEflag set,then a new multi-queue group is created with this queue as the primary queue(Q0). Otherwise, the queue gets added to the multi-queue group whose primaryqueue’s exec_queue_id is specified in the lower 32 bits of the ‘value’ field.All the other non-relevant bits of extension’s ‘value’ field while adding theprimary or the secondary queues of the group must be set to 0.
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY- Set the queuepriority within the multi-queue group. Current valid priority values are 0–2(default is 1), with higher values indicating higher priority.
The example below shows how to usedrm_xe_exec_queue_create to createa simple exec_queue (no parallel submission) of classDRM_XE_ENGINE_CLASS_RENDER.
structdrm_xe_engine_class_instanceinstance={.engine_class=DRM_XE_ENGINE_CLASS_RENDER,};structdrm_xe_exec_queue_createexec_queue_create={.extensions=0,.vm_id=vm,.num_bb_per_exec=1,.num_eng_per_bb=1,.instances=to_user_pointer(&instance),};ioctl(fd,DRM_IOCTL_XE_EXEC_QUEUE_CREATE,&exec_queue_create);Allowuserstoprovideahinttokernelforcasesdemandinglowlatencyprofile.Pleasenoteitwillhaveimpactonpowerconsumption.Usercanindicatelowlatencyhintwithflagwhilecreatingexecqueueasmentionedbelow,structdrm_xe_exec_queue_createexec_queue_create={.flags=DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT,.extensions=0,.vm_id=vm,.num_bb_per_exec=1,.num_eng_per_bb=1,.instances=to_user_pointer(&instance),};ioctl(fd,DRM_IOCTL_XE_EXEC_QUEUE_CREATE,&exec_queue_create);
- structdrm_xe_exec_queue_destroy¶
Input of
DRM_IOCTL_XE_EXEC_QUEUE_DESTROY
Definition:
struct drm_xe_exec_queue_destroy { __u32 exec_queue_id; __u32 pad; __u64 reserved[2];};Members
exec_queue_idExec queue ID
padMBZ
reservedReserved
- structdrm_xe_exec_queue_get_property¶
Input of
DRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTY
Definition:
struct drm_xe_exec_queue_get_property { __u64 extensions; __u32 exec_queue_id;#define DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN 0; __u32 property; __u64 value; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
exec_queue_idExec queue ID
propertyproperty to get
valueproperty value
reservedReserved
Description
- Theproperty can be:
DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN
- structdrm_xe_sync¶
sync object
Definition:
struct drm_xe_sync { __u64 extensions;#define DRM_XE_SYNC_TYPE_SYNCOBJ 0x0;#define DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ 0x1;#define DRM_XE_SYNC_TYPE_USER_FENCE 0x2; __u32 type;#define DRM_XE_SYNC_FLAG_SIGNAL (1 << 0); __u32 flags; union { __u32 handle; __u64 addr; }; __u64 timeline_value; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
typeType of the this sync object
flagsSync Flags
{unnamed_union}anonymous
handleHandle for the object
addrAddress of user fence. When sync is passed in via execIOCTL this is a GPU address in the VM. When sync passed in viaVM bind IOCTL this is a user pointer. In either case, it isthe users responsibility that this address is present andmapped when the user fence is signalled. Must be qwordaligned.
timeline_valueInput for the timeline sync object. Needs to bedifferent than 0 when used with
DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ.reservedReserved
Description
- Thetype can be:
DRM_XE_SYNC_TYPE_SYNCOBJDRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJDRM_XE_SYNC_TYPE_USER_FENCE
- and theflags can be:
DRM_XE_SYNC_FLAG_SIGNAL
A minimal use ofdrm_xe_sync looks like this:
structdrm_xe_syncsync={.flags=DRM_XE_SYNC_FLAG_SIGNAL,.type=DRM_XE_SYNC_TYPE_SYNCOBJ,};structdrm_syncobj_createsyncobj_create={0};ioctl(fd,DRM_IOCTL_SYNCOBJ_CREATE,&syncobj_create);sync.handle=syncobj_create.handle;...useof&syncindrm_xe_execordrm_xe_vm_bind...structdrm_syncobj_waitwait={.handles=&sync.handle,.timeout_nsec=INT64_MAX,.count_handles=1,.flags=0,.first_signaled=0,.pad=0,};ioctl(fd,DRM_IOCTL_SYNCOBJ_WAIT,&wait);
- structdrm_xe_exec¶
Input of
DRM_IOCTL_XE_EXEC
Definition:
struct drm_xe_exec { __u64 extensions; __u32 exec_queue_id;#define DRM_XE_MAX_SYNCS 1024; __u32 num_syncs; __u64 syncs; __u64 address; __u16 num_batch_buffer; __u16 pad[3]; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
exec_queue_idExec queue ID for the batch buffer
num_syncsAmount of
structdrm_xe_syncin array.syncsPointer to
structdrm_xe_syncarray.addressaddress of batch buffer if num_batch_buffer == 1 or anarray of batch buffer addresses
num_batch_buffernumber of batch buffer in this exec, must matchthe width of the engine
padMBZ
reservedReserved
Description
This is an example to usedrm_xe_exec for execution of the objectat BIND_ADDRESS (see example indrm_xe_vm_bind) by an exec_queue(see example indrm_xe_exec_queue_create). It can be synchronizedby using the example provided fordrm_xe_sync.
structdrm_xe_execexec={.exec_queue_id=exec_queue,.syncs=&sync,.num_syncs=1,.address=BIND_ADDRESS,.num_batch_buffer=1,};ioctl(fd,DRM_IOCTL_XE_EXEC,&exec);
- structdrm_xe_wait_user_fence¶
Input of
DRM_IOCTL_XE_WAIT_USER_FENCE
Definition:
struct drm_xe_wait_user_fence { __u64 extensions; __u64 addr;#define DRM_XE_UFENCE_WAIT_OP_EQ 0x0;#define DRM_XE_UFENCE_WAIT_OP_NEQ 0x1;#define DRM_XE_UFENCE_WAIT_OP_GT 0x2;#define DRM_XE_UFENCE_WAIT_OP_GTE 0x3;#define DRM_XE_UFENCE_WAIT_OP_LT 0x4;#define DRM_XE_UFENCE_WAIT_OP_LTE 0x5; __u16 op;#define DRM_XE_UFENCE_WAIT_FLAG_ABSTIME (1 << 0); __u16 flags; __u32 pad; __u64 value; __u64 mask; __s64 timeout; __u32 exec_queue_id; __u32 pad2; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
addruser pointer address to wait on, must qword aligned
opwait operation (type of comparison)
flagswait flags
padMBZ
valuecompare value
maskcomparison mask
timeouthow long to wait before bailing, value in nanoseconds.Without DRM_XE_UFENCE_WAIT_FLAG_ABSTIME flag set (relative timeout)it contains timeout expressed in nanoseconds to wait (fence willexpire at
now()+ timeout).When DRM_XE_UFENCE_WAIT_FLAG_ABSTIME flat is set (absolute timeout) waitwill end at timeout (uses system MONOTONIC_CLOCK).Passing negative timeout leads to neverending wait.On relative timeout this value is updated with timeout left(for restarting the call in case of signal delivery).On absolute timeout this value stays intact (restarted call stillexpire at the same point of time).
exec_queue_idexec_queue_id returned from xe_exec_queue_create_ioctl
pad2MBZ
reservedReserved
Description
Wait on user fence, XE will wake-up on every HW engine interrupt in theinstances list and check if user fence is complete:
(*addr & MASK) OP (VALUE & MASK)
Returns to user on user fence completion or timeout.
- Theop can be:
DRM_XE_UFENCE_WAIT_OP_EQDRM_XE_UFENCE_WAIT_OP_NEQDRM_XE_UFENCE_WAIT_OP_GTDRM_XE_UFENCE_WAIT_OP_GTEDRM_XE_UFENCE_WAIT_OP_LTDRM_XE_UFENCE_WAIT_OP_LTE
- and theflags can be:
DRM_XE_UFENCE_WAIT_FLAG_ABSTIMEDRM_XE_UFENCE_WAIT_FLAG_SOFT_OP
- Themask values can be for example:
0xffu for u8
0xffffu for u16
0xffffffffu for u32
0xffffffffffffffffu for u64
- enumdrm_xe_observation_type¶
Observation stream types
Constants
DRM_XE_OBSERVATION_TYPE_OAOA observation stream type
DRM_XE_OBSERVATION_TYPE_EU_STALLEU stall sampling observation stream type
- enumdrm_xe_observation_op¶
Observation stream ops
Constants
DRM_XE_OBSERVATION_OP_STREAM_OPENOpen an observation stream
DRM_XE_OBSERVATION_OP_ADD_CONFIGAdd observation stream config
DRM_XE_OBSERVATION_OP_REMOVE_CONFIGRemove observation stream config
- structdrm_xe_observation_param¶
Input of
DRM_XE_OBSERVATION
Definition:
struct drm_xe_observation_param { __u64 extensions; __u64 observation_type; __u64 observation_op; __u64 param;};Members
extensionsPointer to the first extension struct, if any
observation_typeobservation stream type, of enumdrm_xe_observation_type
observation_opobservation stream op, of enumdrm_xe_observation_op
paramPointer to actual stream params
Description
The observation layer enables multiplexing observation streams ofmultiple types. The actual params for a particular stream operation aresupplied via theparam pointer (use __copy_from_user to get theseparams).
- enumdrm_xe_observation_ioctls¶
Observation stream fd ioctl’s
Constants
DRM_XE_OBSERVATION_IOCTL_ENABLEEnable data capture for an observation stream
DRM_XE_OBSERVATION_IOCTL_DISABLEDisable data capture for a observation stream
DRM_XE_OBSERVATION_IOCTL_CONFIGChange observation stream configuration
DRM_XE_OBSERVATION_IOCTL_STATUSReturn observation stream status
DRM_XE_OBSERVATION_IOCTL_INFOReturn observation stream info
Description
Information exchanged between userspace and kernel for observation fdioctl’s is stream type specific
- enumdrm_xe_oa_unit_type¶
OA unit types
Constants
DRM_XE_OA_UNIT_TYPE_OAGOAG OA unit. OAR/OAC are consideredsub-types of OAG. For OAR/OAC, use OAG.
DRM_XE_OA_UNIT_TYPE_OAMOAM OA unit
DRM_XE_OA_UNIT_TYPE_OAM_SAGOAM_SAG OA unit
DRM_XE_OA_UNIT_TYPE_MERTMERT OA unit
- structdrm_xe_oa_unit¶
describe OA unit
Definition:
struct drm_xe_oa_unit { __u64 extensions; __u32 oa_unit_id; __u32 oa_unit_type; __u64 capabilities;#define DRM_XE_OA_CAPS_BASE (1 << 0);#define DRM_XE_OA_CAPS_SYNCS (1 << 1);#define DRM_XE_OA_CAPS_OA_BUFFER_SIZE (1 << 2);#define DRM_XE_OA_CAPS_WAIT_NUM_REPORTS (1 << 3);#define DRM_XE_OA_CAPS_OAM (1 << 4);#define DRM_XE_OA_CAPS_OA_UNIT_GT_ID (1 << 5); __u64 oa_timestamp_freq; __u16 gt_id; __u16 reserved1[3]; __u64 reserved[3]; __u64 num_engines; struct drm_xe_engine_class_instance eci[];};Members
extensionsPointer to the first extension struct, if any
oa_unit_idOA unit ID
oa_unit_typeOA unit type ofdrm_xe_oa_unit_type
capabilitiesOA capabilities bit-mask
oa_timestamp_freqOA timestamp freq
gt_idgt id for this OA unit
reserved1MBZ
reservedMBZ
num_enginesnumber of engines ineci array
eciengines attached to this OA unit
- structdrm_xe_query_oa_units¶
describe OA units
Definition:
struct drm_xe_query_oa_units { __u64 extensions; __u32 num_oa_units; __u32 pad; __u64 oa_units[];};Members
extensionsPointer to the first extension struct, if any
num_oa_unitsnumber of OA units returned in oau[]
padMBZ
oa_unitsstructdrm_xe_oa_unit array returned for this device.Written below as a u64 array to avoid problems with nested flexiblearrays with some compilers
Description
If a query is made with astructdrm_xe_device_query where .queryis equal to DRM_XE_DEVICE_QUERY_OA_UNITS, then the reply usesstructdrm_xe_query_oa_units in .data.
OA unit properties for all OA units can be accessed using a code blocksuch as the one below:
structdrm_xe_query_oa_units*qoa;structdrm_xe_oa_unit*oau;u8*poau;// malloc qoa and issue DRM_XE_DEVICE_QUERY_OA_UNITS. Then:poau=(u8*)&qoa->oa_units[0];for(inti=0;i<qoa->num_oa_units;i++){oau=(structdrm_xe_oa_unit*)poau;// Access 'struct drm_xe_oa_unit' fields herepoau+=sizeof(*oau)+oau->num_engines*sizeof(oau->eci[0]);}
- enumdrm_xe_oa_format_type¶
OA format types as specified in PRM/Bspec 52198/60942
Constants
DRM_XE_OA_FMT_TYPE_OAGOAG report format
DRM_XE_OA_FMT_TYPE_OAROAR report format
DRM_XE_OA_FMT_TYPE_OAMOAM report format
DRM_XE_OA_FMT_TYPE_OACOAC report format
DRM_XE_OA_FMT_TYPE_OAM_MPECOAM SAMEDIA or OAM MPEC report format
DRM_XE_OA_FMT_TYPE_PECPEC report format
- enumdrm_xe_oa_property_id¶
OA stream property id’s
Constants
DRM_XE_OA_PROPERTY_OA_UNIT_IDID of the OA unit on which to openthe OA stream, seeoa_unit_id in ‘
structdrm_xe_query_oa_units’. Defaults to 0 if not provided.DRM_XE_OA_PROPERTY_SAMPLE_OAA value of 1 requests inclusion of rawOA unit reports or stream samples in a global buffer attached to anOA unit.
DRM_XE_OA_PROPERTY_OA_METRIC_SETOA metrics defining contents of OAreports, previously added viaDRM_XE_OBSERVATION_OP_ADD_CONFIG.
DRM_XE_OA_PROPERTY_OA_FORMATOA counter report format
DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENTRequests periodic OA unitsampling with sampling frequency proportional to 2^(period_exponent + 1)
DRM_XE_OA_PROPERTY_OA_DISABLEDA value of 1 will open the OAstream in a DISABLED state (seeDRM_XE_OBSERVATION_IOCTL_ENABLE).
DRM_XE_OA_PROPERTY_EXEC_QUEUE_IDOpen the stream for a specificexec_queue_id. OA queries can be executed on this exec queue.
DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCEOptional engine instance topass along withDRM_XE_OA_PROPERTY_EXEC_QUEUE_ID or will default to 0.
DRM_XE_OA_PROPERTY_NO_PREEMPTAllow preemption and timeslicingto be disabled for the stream exec queue.
DRM_XE_OA_PROPERTY_NUM_SYNCSNumber of syncs in the sync arrayspecified inDRM_XE_OA_PROPERTY_SYNCS
DRM_XE_OA_PROPERTY_SYNCSPointer to structdrm_xe_sync arraywith array size specified viaDRM_XE_OA_PROPERTY_NUM_SYNCS. OAconfiguration will wait till input fences signal. Output fenceswill signal after the new OA configuration takes effect. ForDRM_XE_SYNC_TYPE_USER_FENCE,addr is a user pointer, similarto the VM bind case.
DRM_XE_OA_PROPERTY_OA_BUFFER_SIZESize of OA buffer to beallocated by the driver in bytes. Supported sizes are powers of2 from 128 KiB to 128 MiB. When not specified, a 16 MiB OAbuffer is allocated by default.
DRM_XE_OA_PROPERTY_WAIT_NUM_REPORTSNumber of reports to waitfor before unblocking poll or read
Description
Stream params are specified as a chain ofdrm_xe_ext_set_propertystruct’s, withproperty values from enumdrm_xe_oa_property_id anddrm_xe_user_extension base.name set toDRM_XE_OA_EXTENSION_SET_PROPERTY.param field in structdrm_xe_observation_param points to the firstdrm_xe_ext_set_property struct.
Exactly the same mechanism is also used for stream reconfiguration using theDRM_XE_OBSERVATION_IOCTL_CONFIG observation stream fd ioctl, though only asubset of properties below can be specified for stream reconfiguration.
- structdrm_xe_oa_config¶
OA metric configuration
Definition:
struct drm_xe_oa_config { __u64 extensions; char uuid[36]; __u32 n_regs; __u64 regs_ptr;};Members
extensionsPointer to the first extension struct, if any
uuidString formatted like “%08x-%04x-%04x-%04x-%012x”
n_regsNumber of regs inregs_ptr
regs_ptrPointer to (register address, value) pairs for OA configregisters. Expected length of buffer is: (2 * sizeof(u32) *n_regs).
Description
Multiple OA configs can be added usingDRM_XE_OBSERVATION_OP_ADD_CONFIG. Aparticular config can be specified when opening an OA stream usingDRM_XE_OA_PROPERTY_OA_METRIC_SET property.
- structdrm_xe_oa_stream_status¶
OA stream status returned fromDRM_XE_OBSERVATION_IOCTL_STATUS observation stream fd ioctl. Userspace can call the ioctl to query stream status in response to EIO errno from observation fd read().
Definition:
struct drm_xe_oa_stream_status { __u64 extensions; __u64 oa_status;#define DRM_XE_OASTATUS_MMIO_TRG_Q_FULL (1 << 3);#define DRM_XE_OASTATUS_COUNTER_OVERFLOW (1 << 2);#define DRM_XE_OASTATUS_BUFFER_OVERFLOW (1 << 1);#define DRM_XE_OASTATUS_REPORT_LOST (1 << 0); __u64 reserved[3];};Members
extensionsPointer to the first extension struct, if any
oa_statusOA stream status (see Bspec 46717/61226)
reservedreserved for future use
- structdrm_xe_oa_stream_info¶
OA stream info returned fromDRM_XE_OBSERVATION_IOCTL_INFO observation stream fd ioctl
Definition:
struct drm_xe_oa_stream_info { __u64 extensions; __u64 oa_buf_size; __u64 reserved[3];};Members
extensionsPointer to the first extension struct, if any
oa_buf_sizeOA buffer size
reservedreserved for future use
- enumdrm_xe_pxp_session_type¶
Supported PXP session types.
Constants
DRM_XE_PXP_TYPE_NONEPXP not used
DRM_XE_PXP_TYPE_HWDRMHWDRM sessions are used for content that endsup on the display.
Description
We currently only support HWDRM sessions, which are used for protectedcontent that ends up being displayed, but the HW supports multiple types, sowe might extend support in the future.
- enumdrm_xe_eu_stall_property_id¶
EU stall sampling input property ids.
Constants
DRM_XE_EU_STALL_PROP_GT_IDgt_id of the GT on whichEU stall data will be captured.
DRM_XE_EU_STALL_PROP_SAMPLE_RATESampling rate inGPU cycles fromsampling_rates in structdrm_xe_query_eu_stall
DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTSMinimum number ofEU stall data reports to be present in the kernel bufferbefore unblocking a blocked poll or read.
Description
These properties are passed to the driver at open as a chain ofdrm_xe_ext_set_property structures withproperty set to theseproperties’ enums andvalue set to the corresponding values of theseproperties.drm_xe_user_extension base.name should be set toDRM_XE_EU_STALL_EXTENSION_SET_PROPERTY.
With the file descriptor obtained from open, user space must enablethe EU stall stream fd withDRM_XE_OBSERVATION_IOCTL_ENABLE beforecalling read(). EIO errno from read() indicates HW dropped datadue to full buffer.
- structdrm_xe_query_eu_stall¶
Information about EU stall sampling.
Definition:
struct drm_xe_query_eu_stall { __u64 extensions; __u64 capabilities;#define DRM_XE_EU_STALL_CAPS_BASE (1 << 0); __u64 record_size; __u64 per_xecore_buf_size; __u64 reserved[5]; __u64 num_sampling_rates; __u64 sampling_rates[];};Members
extensionsPointer to the first extension struct, if any
capabilitiesEU stall capabilities bit-mask
record_sizesize of each EU stall data record
per_xecore_buf_sizeinternal per XeCore buffer size
reservedReserved
num_sampling_ratesNumber of sampling rates insampling_rates array
sampling_ratesFlexible array of sampling ratessorted in the fastest to slowest order.Sampling rates are specified in GPU clock cycles.
Description
If a query is made with a structdrm_xe_device_query where .queryis equal toDRM_XE_DEVICE_QUERY_EU_STALL, then the reply usesstructdrm_xe_query_eu_stall in .data.
- structdrm_xe_madvise¶
Input of
DRM_IOCTL_XE_MADVISE
Definition:
struct drm_xe_madvise { __u64 extensions; __u64 start; __u64 range; __u32 vm_id;#define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC 0;#define DRM_XE_MEM_RANGE_ATTR_ATOMIC 1;#define DRM_XE_MEM_RANGE_ATTR_PAT 2; __u32 type; union { struct {#define DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE 0;#define DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM -1; __u32 devmem_fd;#define DRM_XE_MIGRATE_ALL_PAGES 0;#define DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES 1; __u16 migration_policy; __u16 region_instance; __u64 reserved; } preferred_mem_loc; struct {#define DRM_XE_ATOMIC_UNDEFINED 0;#define DRM_XE_ATOMIC_DEVICE 1;#define DRM_XE_ATOMIC_GLOBAL 2;#define DRM_XE_ATOMIC_CPU 3; __u32 val; __u32 pad; __u64 reserved; } atomic; struct { __u32 val; __u32 pad; __u64 reserved; } pat_index; }; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
startstart of the virtual address range
rangesize of the virtual address range
vm_idvm_id of the virtual range
typetype of attribute
{unnamed_union}anonymous
preferred_mem_locpreferred memory location
Used whentype == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC
- Supported values forpreferred_mem_loc.devmem_fd:
DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE: set vram of fault tile as preferred loc
DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM: set smem as preferred loc
- Supported values forpreferred_mem_loc.migration_policy:
DRM_XE_MIGRATE_ALL_PAGES
DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES
preferred_mem_loc.devmem_fdDevice file-descriptor of the device where thepreferred memory is located, or one of theabove special values. Please also seepreferred_mem_loc.region_instance below.
preferred_mem_loc.region_instanceRegion instance.MBZ ifdevmem_fd <=
DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE.Otherwise should point to the desired deviceVRAM instance of the device indicated bypreferred_mem_loc.devmem_fd.atomicAtomic access policy
Used whentype == DRM_XE_MEM_RANGE_ATTR_ATOMIC.
- Supported values foratomic.val:
DRM_XE_ATOMIC_UNDEFINED: Undefined or default behaviour.Support both GPU and CPU atomic operations for system allocator.Support GPU atomic operations for normal(bo) allocator.
DRM_XE_ATOMIC_DEVICE: Support GPU atomic operations.
DRM_XE_ATOMIC_GLOBAL: Support both GPU and CPU atomic operations.
DRM_XE_ATOMIC_CPU: Support CPU atomic only, no GPU atomics supported.
pat_indexPage attribute table index
Used whentype == DRM_XE_MEM_RANGE_ATTR_PAT.
reservedReserved
Description
This structure is used to set memory attributes for a virtual address rangein a VM. The type of attribute is specified bytype, and the correspondingunionmember is used to provide additional parameters fortype.
- Supported attribute types:
DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.
Example
structdrm_xe_madvisemadvise={.vm_id=vm_id,.start=0x100000,.range=0x2000,.type=DRM_XE_MEM_RANGE_ATTR_ATOMIC,.atomic_val=DRM_XE_ATOMIC_DEVICE,};ioctl(fd,DRM_IOCTL_XE_MADVISE,&madvise);
- structdrm_xe_mem_range_attr¶
Output of
DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS
Definition:
struct drm_xe_mem_range_attr { __u64 extensions; __u64 start; __u64 end; struct { __u32 devmem_fd; __u32 migration_policy; } preferred_mem_loc; struct { __u32 val; __u32 reserved; } atomic; struct { __u32 val; __u32 reserved; } pat_index; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
startstart of the memory range
endend of the memory range
preferred_mem_locpreferred memory location
atomicAtomic access policy
pat_indexPage attribute table index
reservedReserved
Description
This structure is provided by userspace and filled by KMD in response to theDRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS ioctl. It describes memory attributes ofa memory ranges within a user specified address range in a VM.
The structure includes information such as atomic access policy,page attribute table (PAT) index, and preferred memory location.Userspace allocates an array of these structures and passes a pointer to theioctl to retrieve attributes for each memory ranges
- structdrm_xe_vm_query_mem_range_attr¶
Input of
DRM_IOCTL_XE_VM_QUERY_MEM_ATTRIBUTES
Definition:
struct drm_xe_vm_query_mem_range_attr { __u64 extensions; __u32 vm_id; __u32 num_mem_ranges; __u64 start; __u64 range; __u64 sizeof_mem_range_attr; __u64 vector_of_mem_attr; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
vm_idvm_id of the virtual range
num_mem_rangesnumber of mem_ranges in range
startstart of the virtual address range
rangesize of the virtual address range
sizeof_mem_range_attrsize of
structdrm_xe_mem_range_attrvector_of_mem_attruserptr to array of
structdrm_xe_mem_range_attrreservedReserved
Description
This structure is used to query memory attributes of memory regionswithin a user specified address range in a VM. It provides detailedinformation about each memory range, including atomic access policy,page attribute table (PAT) index, and preferred memory location.
Userspace first calls the ioctl withnum_mem_ranges = 0,sizeof_mem_ranges_attr = 0 andvector_of_vma_mem_attr = NULL to retrievethe number of memory regions and size of each memory range attribute.Then, it allocates a buffer of that size and calls the ioctl again to fillthe buffer with memory range attributes.
If second call fails with -ENOSPC, it means memory ranges changed betweenfirst call and now, retry IOCTL again withnum_mem_ranges = 0,sizeof_mem_ranges_attr = 0 andvector_of_vma_mem_attr = NULL followed bySecond ioctl call.
Example
structdrm_xe_vm_query_mem_range_attrquery={.vm_id=vm_id,.start=0x100000,.range=0x2000,};// First ioctl call to get num of mem regions and sizeof each attributeioctl(fd,DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS,&query);// Allocate buffer for the memory region attributesvoid*ptr=malloc(query.num_mem_ranges*query.sizeof_mem_range_attr);void*ptr_start=ptr;query.vector_of_mem_attr=(uintptr_t)ptr;// Second ioctl call to actually fill the memory attributesioctl(fd,DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS,&query);// Iterate over the returned memory region attributesfor(unsignedinti=0;i<query.num_mem_ranges;++i){structdrm_xe_mem_range_attr*attr=(structdrm_xe_mem_range_attr*)ptr;// Do something with attr// Move pointer by one entryptr+=query.sizeof_mem_range_attr;}free(ptr_start);
- structdrm_xe_exec_queue_set_property¶
exec queue set property
Definition:
struct drm_xe_exec_queue_set_property { __u64 extensions; __u32 exec_queue_id; __u32 property; __u64 value; __u64 reserved[2];};Members
extensionsPointer to the first extension struct, if any
exec_queue_idExec queue ID
propertyproperty to set
valueproperty value
reservedReserved
Description
Sets execution queue properties dynamically.Currently onlyDRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITYproperty can be dynamically set.
drm/asahi uAPI¶
Introduction to the Asahi UAPI
This documentation describes the Asahi IOCTLs.
Just a few generic rules about the data passed to the Asahi IOCTLs (cribbedfrom Panthor):
Structures must be aligned on 64-bit/8-byte. If the object is notnaturally aligned, a padding field must be added.
Fields must be explicitly aligned to their natural type alignment withpad[0..N] fields.
All padding fields will be checked by the driver to make sure they arezeroed.
Flags can be added, but not removed/replaced.
New fields can be added to the main structures (the structuresdirectly passed to the ioctl). Those fields can be added at the end ofthe structure, or replace existing padding fields. Any new field beingadded must preserve the behavior that existed before those fields wereadded when a value of zero is passed.
New fields can be added to indirect objects (objects pointed by themain structure), iff those objects are passed a size to reflect thesize known by the userspace driver (seedrm_asahi_cmd_header::size).
If the kernel driver is too old to know some fields, those will beignored if zero, and otherwise rejected (and so will be zero on output).
If userspace is too old to know some fields, those will be zeroed(input) before the structure is parsed by the kernel driver.
Each new flag/field addition must come with a driver version update sothe userspace driver doesn’t have to guess which flags are supported.
Structures should not contain unions, as this would defeat theextensibility of such structures.
IOCTLs can’t be removed or replaced. New IOCTL IDs should be placedat the end of the drm_asahi_ioctl_id enum.
- enumdrm_asahi_ioctl_id¶
IOCTL IDs
Constants
DRM_ASAHI_GET_PARAMSQuery device properties.
DRM_ASAHI_GET_TIMEQuery device time.
DRM_ASAHI_VM_CREATECreate a GPU VM address space.
DRM_ASAHI_VM_DESTROYDestroy a VM.
DRM_ASAHI_VM_BINDBind/unbind memory to a VM.
DRM_ASAHI_GEM_CREATECreate a buffer object.
DRM_ASAHI_GEM_MMAP_OFFSETGet offset to pass to mmap() to map agiven GEM handle.
DRM_ASAHI_GEM_BIND_OBJECTBind memory as a special object
DRM_ASAHI_QUEUE_CREATECreate a scheduling queue.
DRM_ASAHI_QUEUE_DESTROYDestroy a scheduling queue.
DRM_ASAHI_SUBMITSubmit commands to a queue.
Description
Place new ioctls at the end, don’t re-order, don’t replace or remove entries.
These IDs are not meant to be used directly. Use the DRM_IOCTL_ASAHI_xxxdefinitions instead.
- structdrm_asahi_params_global¶
Global parameters.
Definition:
struct drm_asahi_params_global { __u64 features; __u32 gpu_generation; __u32 gpu_variant; __u32 gpu_revision; __u32 chip_id; __u32 num_dies; __u32 num_clusters_total; __u32 num_cores_per_cluster; __u32 max_frequency_khz; __u64 core_masks[DRM_ASAHI_MAX_CLUSTERS]; __u64 vm_start; __u64 vm_end; __u64 vm_kernel_min_size; __u32 max_commands_per_submission; __u32 max_attachments; __u64 command_timestamp_frequency_hz;};Members
featuresFeature bits from drm_asahi_feature
gpu_generationGPU generation, e.g. 13 for G13G
gpu_variantGPU variant as a character, e.g. ‘C’ for G13C
gpu_revisionGPU revision in BCD, e.g. 0x00 for ‘A0’ or0x21 for ‘C1’
chip_idChip ID in BCD, e.g. 0x8103 for T8103
num_diesNumber of dies in the SoC
num_clusters_totalNumber of GPU clusters (across all dies)
num_cores_per_clusterNumber of logical cores per cluster(including inactive/nonexistent)
max_frequency_khzMaximum GPU core clock frequency
core_masksBitmask of present/enabled cores per cluster
vm_startVM range start VMA. Together withvm_end, this definesthe window of valid GPU VAs. Userspace is expected to subdivide VAsout of this window.
This window contains all virtual addresses that userspace needs toknow about. There may be kernel-internal GPU VAs outside this range,but that detail is not relevant here.
vm_endVM range end VMA
vm_kernel_min_sizeMinimum kernel VMA window size.
When creating a VM, userspace is required to carve out a section ofvirtual addresses (within the range given byvm_start andvm_end). The kernel will allocate various internal structureswithin the specified VA range.
Allowing userspace to choose the VA range for the kernel, rather thanthe kernel reserving VAs and requiring userspace to cope, can assistin implementing SVM.
max_commands_per_submissionMaximum number of supported commandsper submission. This mirrors firmware limits. Userspace must split uplarger command buffers, which may require inserting additionalsynchronization.
max_attachmentsMaximum number of drm_asahi_attachment’s percommand
command_timestamp_frequency_hzTimebase frequency for timestampswritten during command execution, specified via drm_asahi_timestampstructures. As this rate is controlled by the firmware, it is aqueryable parameter.
Userspace must divide by this frequency to convert timestamps toseconds, rather than hardcoding a particular firmware’s rate.
Description
Thisstructmay be queried by drm_asahi_get_params.
- enumdrm_asahi_feature¶
Feature bits
Constants
DRM_ASAHI_FEATURE_SOFT_FAULTSGPU has “soft fault” enabled. Shaderloads of unmapped memory will return zero. Shader stores to unmappedmemory will be silently discarded. Note that only shader load/storeis affected. Other hardware units are not affected, notably includingtexture sampling.
Soft fault is set when initializing the GPU and cannot be runtimetoggled. Therefore, it is exposed as a feature bit and not auserspace-settable flag on the VM. When soft fault is enabled,userspace can speculate memory accesses more aggressively.
Description
This covers only features that userspace cannot infer from the architectureversion. Most features don’t need to be here.
- structdrm_asahi_get_params¶
Arguments passed to DRM_IOCTL_ASAHI_GET_PARAMS
Definition:
struct drm_asahi_get_params { __u32 param_group; __u32 pad; __u64 pointer; __u64 size;};Members
param_groupParameter group to fetch (MBZ)
padMBZ
pointerUser pointer to write parameter struct
sizeSize of the user buffer. In case of older userspace, this maybe less than sizeof(
structdrm_asahi_params_global). The kernel willnot write past the length specified here, allowing extensibility.
- structdrm_asahi_vm_create¶
Arguments passed to DRM_IOCTL_ASAHI_VM_CREATE
Definition:
struct drm_asahi_vm_create { __u64 kernel_start; __u64 kernel_end; __u32 vm_id; __u32 pad;};Members
kernel_startStart of the kernel-reserved address range. Seedrm_asahi_params_global::vm_kernel_min_size.
Bothkernel_start andkernel_end must be within the range ofvalid VAs given by drm_asahi_params_global::vm_start anddrm_asahi_params_global::vm_end. The size of the kernel range(kernel_end -kernel_start) must be at leastdrm_asahi_params_global::vm_kernel_min_size.
Userspace must not bind any memory on this VM into this reservedrange, it is for kernel use only.
kernel_endEnd of the kernel-reserved address range. Seekernel_start.
vm_idReturned VM ID
padMBZ
- structdrm_asahi_vm_destroy¶
Arguments passed to DRM_IOCTL_ASAHI_VM_DESTROY
Definition:
struct drm_asahi_vm_destroy { __u32 vm_id; __u32 pad;};Members
vm_idVM ID to be destroyed
padMBZ
- enumdrm_asahi_gem_flags¶
Flags for GEM creation
Constants
DRM_ASAHI_GEM_WRITEBACKBO should be CPU-mapped as writeback.
Map as writeback instead of write-combine. This optimizes for CPUreads.
DRM_ASAHI_GEM_VM_PRIVATEBO is private to this GPU VM (no exports).
- structdrm_asahi_gem_create¶
Arguments passed to DRM_IOCTL_ASAHI_GEM_CREATE
Definition:
struct drm_asahi_gem_create { __u64 size; __u32 flags; __u32 vm_id; __u32 handle; __u32 pad;};Members
sizeSize of the BO
flagsCombination of drm_asahi_gem_flags flags.
vm_idVM ID to assign to the BO, if DRM_ASAHI_GEM_VM_PRIVATE is set
handleReturned GEM handle for the BO
padMBZ
- structdrm_asahi_gem_mmap_offset¶
Arguments passed to DRM_IOCTL_ASAHI_GEM_MMAP_OFFSET
Definition:
struct drm_asahi_gem_mmap_offset { __u32 handle; __u32 flags; __u64 offset;};Members
handleHandle for the object being mapped.
flagsMust be zero
offsetThe fake offset to use for subsequent mmap call
- enumdrm_asahi_bind_flags¶
Flags for GEM binding
Constants
DRM_ASAHI_BIND_UNBINDInstead of binding a GEM object to the range,simply unbind the GPU VMA range.
DRM_ASAHI_BIND_READMap BO with GPU read permission
DRM_ASAHI_BIND_WRITEMap BO with GPU write permission
DRM_ASAHI_BIND_SINGLE_PAGEMap a single page of the BO repeatedlyacross the VA range.
This is useful to fill a VA range with scratch pages or zero pages.It is intended as a mechanism to accelerate sparse.
- structdrm_asahi_gem_bind_op¶
Description of a single GEM bind operation.
Definition:
struct drm_asahi_gem_bind_op { __u32 flags; __u32 handle; __u64 offset; __u64 range; __u64 addr;};Members
flagsCombination of drm_asahi_bind_flags flags.
handleGEM object to bind (except for UNBIND)
offsetOffset into the object (except for UNBIND).
For a regular bind, this is the beginning of the region of the GEMobject to bind.
For a single-page bind, this is the offset to the single page thatwill be repeatedly bound.
Must be page-size aligned.
rangeNumber of bytes to bind/unbind toaddr.
Must be page-size aligned.
addrAddress to bind to.
Must be page-size aligned.
- structdrm_asahi_vm_bind¶
Arguments passed to DRM_IOCTL_ASAHI_VM_BIND
Definition:
struct drm_asahi_vm_bind { __u32 vm_id; __u32 num_binds; __u32 stride; __u32 pad; __u64 userptr;};Members
vm_idThe ID of the VM to bind to
num_bindsnumber of binds in this IOCTL.
strideStride in bytes between consecutive binds. This allowsextensibility of drm_asahi_gem_bind_op.
padMBZ
userptrUser pointer to an array ofnum_binds structures of typedrm_asahi_gem_bind_op and sizestride bytes.
- enumdrm_asahi_bind_object_op¶
Special object bind operation
Constants
DRM_ASAHI_BIND_OBJECT_OP_BINDBind a BO as a special GPU object
DRM_ASAHI_BIND_OBJECT_OP_UNBINDUnbind a special GPU object
- enumdrm_asahi_bind_object_flags¶
Special object bind flags
Constants
DRM_ASAHI_BIND_OBJECT_USAGE_TIMESTAMPSMap a BO as a timestampbuffer.
- structdrm_asahi_gem_bind_object¶
Arguments passed to DRM_IOCTL_ASAHI_GEM_BIND_OBJECT
Definition:
struct drm_asahi_gem_bind_object { __u32 op; __u32 flags; __u32 handle; __u32 vm_id; __u64 offset; __u64 range; __u32 object_handle; __u32 pad;};Members
opBind operation (
enumdrm_asahi_bind_object_op)flagsCombination of drm_asahi_bind_object_flags flags.
handleGEM object to bind/unbind (BIND)
vm_idThe ID of the VM to operate on (MBZ currently)
offsetOffset into the object (BIND only)
rangeNumber of bytes to bind/unbind (BIND only)
object_handleObject handle (out for BIND, in for UNBIND)
padMBZ
- enumdrm_asahi_cmd_type¶
Command type
Constants
DRM_ASAHI_CMD_RENDERRender command, executing on the rendersubqueue. Combined vertex and fragment operation.
Followed by adrm_asahi_cmd_render payload.
DRM_ASAHI_CMD_COMPUTECompute command on the compute subqueue.
Followed by adrm_asahi_cmd_compute payload.
DRM_ASAHI_SET_VERTEX_ATTACHMENTSSoftware command to setattachments for subsequent vertex shaders in the same submit.
Followed by (possibly multiple)drm_asahi_attachment payloads.
DRM_ASAHI_SET_FRAGMENT_ATTACHMENTSSoftware command to setattachments for subsequent fragment shaders in the same submit.
Followed by (possibly multiple)drm_asahi_attachment payloads.
DRM_ASAHI_SET_COMPUTE_ATTACHMENTSSoftware command to setattachments for subsequent compute shaders in the same submit.
Followed by (possibly multiple)drm_asahi_attachment payloads.
- enumdrm_asahi_priority¶
Scheduling queue priority.
Constants
DRM_ASAHI_PRIORITY_LOWLow priority queue.
DRM_ASAHI_PRIORITY_MEDIUMMedium priority queue.
DRM_ASAHI_PRIORITY_HIGHHigh priority queue.
Reserved for future extension.
DRM_ASAHI_PRIORITY_REALTIMEReal-time priority queue.
Reserved for future extension.
Description
These priorities are forwarded to the firmware to influence firmwarescheduling. The exact policy is ultimately decided by firmware, butthese enums allow userspace to communicate the intentions.
- structdrm_asahi_queue_create¶
Arguments passed to DRM_IOCTL_ASAHI_QUEUE_CREATE
Definition:
struct drm_asahi_queue_create { __u32 flags; __u32 vm_id; __u32 priority; __u32 queue_id; __u64 usc_exec_base;};Members
flagsMBZ
vm_idThe ID of the VM this queue is bound to
priorityOne of drm_asahi_priority
queue_idThe returned queue ID
usc_exec_baseGPU base address for all USC binaries (shaders) onthis queue. USC addresses are 32-bit relative to this 64-bit base.
This sets the following registers on all queue commands:
USC_EXEC_BASE_TA (vertex)USC_EXEC_BASE_ISP (fragment)USC_EXEC_BASE_CP (compute)
While the hardware lets us configure these independently per command,we do not have a use case for this. Instead, we expect userspace tofix a 4GiB VA carveout for USC memory and pass its base address here.
- structdrm_asahi_queue_destroy¶
Arguments passed to DRM_IOCTL_ASAHI_QUEUE_DESTROY
Definition:
struct drm_asahi_queue_destroy { __u32 queue_id; __u32 pad;};Members
queue_idThe queue ID to be destroyed
padMBZ
- enumdrm_asahi_sync_type¶
Sync item type
Constants
DRM_ASAHI_SYNC_SYNCOBJBinary sync object
DRM_ASAHI_SYNC_TIMELINE_SYNCOBJTimeline sync object
- structdrm_asahi_sync¶
Sync item
Definition:
struct drm_asahi_sync { __u32 sync_type; __u32 handle; __u64 timeline_value;};Members
sync_typeOne of drm_asahi_sync_type
handleThe sync object handle
timeline_valueTimeline value for timeline sync objects
- DRM_ASAHI_BARRIER_NONE¶
DRM_ASAHI_BARRIER_NONE
Command index for no barrier
Description
This special value may be passed in to drm_asahi_command::vdm_barrier ordrm_asahi_command::cdm_barrier to indicate that the respective subqueueshould not wait on any previous work.
- structdrm_asahi_cmd_header¶
Top level command structure
Definition:
struct drm_asahi_cmd_header { __u16 cmd_type; __u16 size; __u16 vdm_barrier; __u16 cdm_barrier;};Members
cmd_typeOne of drm_asahi_cmd_type
sizeSize of this command, not including this header.
For hardware commands, this enables extensibility of commands withoutrequiring extra command types. Passing a command that is shorterthan expected is explicitly allowed for backwards-compatibility.Truncated fields will be zeroed.
For the synthetic attachment setting commands, this implicitlyencodes the number of attachments. These commands take multiplefixed-sizedrm_asahi_attachment structures as their payload, so sizeequals number of attachments * sizeof(
structdrm_asahi_attachment).vdm_barrierVDM (render) command index to wait on.
Barriers are indices relative to the beginning of a given submit. Abarrier of 0 waits on commands submitted to the respective subqueuein previous submit ioctls. A barrier of N waits on N previouscommands on the subqueue within the current submit ioctl. As aspecial case, passingDRM_ASAHI_BARRIER_NONE avoids waiting on anycommands in the subqueue.
Examples:
0: This waits on all previous work.
NONE: This does not wait for anything on this subqueue.
1: This waits on the first render command in the submit.This is valid only if there are multiple render commands in thesame submit.
Barriers are valid only for hardware commands. Synthetic softwarecommands to set attachments must pass NONE here.
cdm_barrierCDM (compute) command index to wait on.
Seevdm_barrier, and replace VDM/render with CDM/compute.
Description
Thisstructis core to the command buffer definition and therefore is notextensible.
- structdrm_asahi_submit¶
Arguments passed to DRM_IOCTL_ASAHI_SUBMIT
Definition:
struct drm_asahi_submit { __u64 syncs; __u64 cmdbuf; __u32 flags; __u32 queue_id; __u32 in_sync_count; __u32 out_sync_count; __u32 cmdbuf_size; __u32 pad;};Members
syncsAn optional pointer to an array of drm_asahi_sync. The firstin_sync_count elements are in-syncs, then the remainingout_sync_count elements are out-syncs. Using a single array withexplicit partitioning simplifies handling.
cmdbufPointer to the command buffer to submit.
This is a flat command buffer. By design, it contains no CPUpointers, which makes it suitable for a virtgpu wire protocol withoutrequiring any serializing/deserializing step.
It consists of a series of commands. Each command begins with afixed-sizedrm_asahi_cmd_header header and is followed by avariable-length payload according to the type and size in the header.
The combined count of “real” hardware commands must be nonzero and atmost drm_asahi_params_global::max_commands_per_submission.
flagsFlags for command submission (MBZ)
queue_idThe queue ID to be submitted to
in_sync_countNumber of sync objects to wait on before startingthis job.
out_sync_countNumber of sync objects to signal upon completion ofthis job.
cmdbuf_sizeCommand buffer size in bytes
padMBZ
- structdrm_asahi_attachment¶
Describe an “attachment”.
Definition:
struct drm_asahi_attachment { __u64 pointer; __u64 size; __u32 pad; __u32 flags;};Members
pointerBase address of the attachment
sizeSize of the attachment in bytes
padMBZ
flagsMBZ
Description
Attachments are any memory written by shaders, notably including rendertarget attachments written by the end-of-tile program. This is purely a hintabout the accessed memory regions. It is optional to specify, which isfortunate as it cannot be specified precisely with bindless access anyway.But where possible, it’s probably a good idea for userspace to include thesehints, forwarded to the firmware.
Thisstructis implicitly sized and therefore is not extensible.
- structdrm_asahi_zls_buffer¶
Describe a depth or stencil buffer.
Definition:
struct drm_asahi_zls_buffer { __u64 base; __u64 comp_base; __u32 stride; __u32 comp_stride;};Members
baseBase address of the buffer
comp_baseIf the load buffer is compressed, address of thecompression metadata section.
strideIf layered rendering is enabled, the number of bytesbetween each layer of the buffer.
comp_strideIf layered rendering is enabled, the number of bytesbetween each layer of the compression metadata.
Description
These fields correspond to hardware registers in the ZLS (Z Load/Store) unit.There are three hardware registers for each field respectively for loads,stores, and partial renders. In practice, it makes sense to set all to thesame values, except in exceptional cases not yet implemented in userspace, sowe do not duplicate here for simplicity/efficiency.
Thisstructis embedded in other structs and therefore is not extensible.
- structdrm_asahi_timestamp¶
Describe a timestamp write.
Definition:
struct drm_asahi_timestamp { __u32 handle; __u32 offset;};Members
handleHandle of the timestamp buffer, or 0 to skip thistimestamp. If nonzero, this must equal the value returned indrm_asahi_gem_bind_object::object_handle.
offsetOffset to write into the timestamp buffer
Description
The firmware can optionally write the GPU timestamp at render passgranularities, but it needs to be mapped specially viaDRM_IOCTL_ASAHI_GEM_BIND_OBJECT. This structure therefore describes where towrite as a handle-offset pair, rather than a GPU address like normal.
Thisstructis embedded in other structs and therefore is not extensible.
- structdrm_asahi_timestamps¶
Describe timestamp writes.
Definition:
struct drm_asahi_timestamps { struct drm_asahi_timestamp start; struct drm_asahi_timestamp end;};Members
startTimestamp recorded at the start of the operation
endTimestamp recorded at the end of the operation
Description
Each operation that can be timestamped, can be timestamped at the start andend. Therefore, drm_asahi_timestamp structs always come in pairs, bundledtogether into drm_asahi_timestamps.
Thisstructis embedded in other structs and therefore is not extensible.
- structdrm_asahi_helper_program¶
Describe helper program configuration.
Definition:
struct drm_asahi_helper_program { __u32 binary; __u32 cfg; __u64 data;};Members
binaryUSC address to the helper program binary. This is a taggedpointer with configuration in the bottom bits.
cfgAdditional configuration bits for the helper program.
dataData passed to the helper program. This value is notinterpreted by the kernel, firmware, or hardware in any way. It issimply a sideband for userspace, set with the submit ioctl and readvia special registers inside the helper program.
In practice, userspace will pass a 64-bit GPU VA here pointing to theactual arguments, which presumably don’t fit in 64-bits.
Description
The helper program is a compute-like kernel required for various hardwarefunctionality. Its most important role is dynamically allocatingscratch/stack memory for individual subgroups, by partitioning a staticallocation shared for the whole device. It is supplied by userspace viadrm_asahi_helper_program and internally dispatched by the hardware as needed.
Thisstructis embedded in other structs and therefore is not extensible.
- structdrm_asahi_bg_eot¶
Describe a background or end-of-tile program.
Definition:
struct drm_asahi_bg_eot { __u32 usc; __u32 rsrc_spec;};Members
uscUSC address of the hardware USC words binding resources(including images and uniforms) and the program itself. Note this isan additional layer of indirection compared to the helper program,avoiding the need for a sideband for data. This is a tagged pointerwith additional configuration in the bottom bits.
rsrc_specResource specifier for the program. This is a packedhardware data structure describing the required number of registers,uniforms, bound textures, and bound samplers.
Description
The background and end-of-tile programs are dispatched by the hardware at thebeginning and end of rendering. As the hardware “tilebuffer” is simply localmemory, these programs are necessary to implement API-level render targets.The fragment-like background program is responsible for loading either theclear colour or the existing render target contents, while the compute-likeend-of-tile program stores the tilebuffer contents to memory.
Thisstructis embedded in other structs and therefore is not extensible.
- structdrm_asahi_cmd_render¶
Command to submit 3D
Definition:
struct drm_asahi_cmd_render { __u32 flags; __u32 isp_zls_pixels; __u64 vdm_ctrl_stream_base; struct drm_asahi_helper_program vertex_helper; struct drm_asahi_helper_program fragment_helper; __u64 isp_scissor_base; __u64 isp_dbias_base; __u64 isp_oclqry_base; struct drm_asahi_zls_buffer depth; struct drm_asahi_zls_buffer stencil; __u64 zls_ctrl; __u64 ppp_multisamplectl; __u64 sampler_heap; __u32 ppp_ctrl; __u16 width_px; __u16 height_px; __u16 layers; __u16 sampler_count; __u8 utile_width_px; __u8 utile_height_px; __u8 samples; __u8 sample_size_B; __u32 isp_merge_upper_x; __u32 isp_merge_upper_y; struct drm_asahi_bg_eot bg; struct drm_asahi_bg_eot eot; struct drm_asahi_bg_eot partial_bg; struct drm_asahi_bg_eot partial_eot; __u32 isp_bgobjdepth; __u32 isp_bgobjvals; struct drm_asahi_timestamps ts_vtx; struct drm_asahi_timestamps ts_frag;};Members
flagsCombination of drm_asahi_render_flags flags.
isp_zls_pixelsISP_ZLS_PIXELS register value. This contains thedepth/stencil width/height, which may differ from the framebufferwidth/height.
vdm_ctrl_stream_baseVDM_CTRL_STREAM_BASE register value. GPUaddress to the beginning of the VDM control stream.
vertex_helperHelper program used for the vertex shader
fragment_helperHelper program used for the fragment shader
isp_scissor_baseISP_SCISSOR_BASE register value. GPU address of anarray of scissor descriptors indexed in the render pass.
isp_dbias_baseISP_DBIAS_BASE register value. GPU address of anarray of depth bias values indexed in the render pass.
isp_oclqry_baseISP_OCLQRY_BASE register value. GPU address of anarray of occlusion query results written by the render pass.
depthDepth buffer
stencilStencil buffer
zls_ctrlZLS_CTRL register value
ppp_multisamplectlPPP_MULTISAMPLECTL register value
sampler_heapBase address of the sampler heap. This heap is usedfor both vertex shaders and fragment shaders. The registers areper-stage, but there is no known use case for separate heaps.
ppp_ctrlPPP_CTRL register value
width_pxFramebuffer width in pixels
height_pxFramebuffer height in pixels
layersNumber of layers in the framebuffer
sampler_countNumber of samplers in the sampler heap.
utile_width_pxWidth of a logical tilebuffer tile in pixels
utile_height_pxHeight of a logical tilebuffer tile in pixels
samples# of samples in the framebuffer. Must be 1, 2, or 4.
sample_size_B# of bytes in the tilebuffer required per sample.
isp_merge_upper_x32-bit float used in the hardware trianglemerging. Calculate as: tan(60 deg) * width.
Making these values UAPI avoids requiring floating-point calculationsin the kernel in the hot path.
isp_merge_upper_y32-bit float. Calculate as: tan(60 deg) * height.Seeisp_merge_upper_x.
bgBackground program run for each tile at the start
eotEnd-of-tile program ran for each tile at the end
partial_bgBackground program ran at the start of each tile whenresuming the render pass during a partial render.
partial_eotEnd-of-tile program ran at the end of each tile whenpausing the render pass during a partial render.
isp_bgobjdepthISP_BGOBJDEPTH register value. This is the depthbuffer clear value, encoded in the depth buffer’s format: either a32-bit float or a 16-bit unorm (with upper bits zeroed).
isp_bgobjvalsISP_BGOBJVALS register value. The bottom 8-bitscontain the stencil buffer clear value.
ts_vtxTimestamps for the vertex portion of the render
ts_fragTimestamps for the fragment portion of the render
Description
This command submits a single render pass. The hardware control stream mayinclude many draws and subpasses, but within the command, the framebufferdimensions and attachments are fixed.
The hardware requires the firmware to set a large number of Control Registerssetting up state at render pass granularity before each command rendering 3D.The firmware bundles this state into data structures. Unfortunately, wecannot expose either any of that directly to userspace, because thekernel-firmware ABI is not stable. Although we can guarantee the firmwareupdates in tandem with the kernel, we cannot break old userspace whenupgrading the firmware and kernel. Therefore, we need to abstract well thedata structures to avoid tying our hands with future firmwares.
The bulk of drm_asahi_cmd_render therefore consists of values of hardwarecontrol registers, marshalled via the firmware interface.
The framebuffer/tilebuffer dimensions are also specified here. In addition tobeing passed to the firmware/hardware, the kernel requires these dimensionsto calculate various essential tiling-related data structures. It isunfortunate that our submits are heavier than on vendors with sanerhardware-software interfaces. The upshot is all of this information isreadily available to userspace with all current APIs.
It looks odd - but it’s not overly burdensome and it ensures we can remaincompatible with old userspace.
- structdrm_asahi_cmd_compute¶
Command to submit compute
Definition:
struct drm_asahi_cmd_compute { __u32 flags; __u32 sampler_count; __u64 cdm_ctrl_stream_base; __u64 cdm_ctrl_stream_end; __u64 sampler_heap; struct drm_asahi_helper_program helper; struct drm_asahi_timestamps ts;};Members
flagsMBZ
sampler_countNumber of samplers in the sampler heap.
cdm_ctrl_stream_baseCDM_CTRL_STREAM_BASE register value. GPUaddress to the beginning of the CDM control stream.
cdm_ctrl_stream_endGPU base address to the end of the hardwarecontrol stream. Note this only considers the first contiguous segmentof the control stream, as the stream might jump elsewhere.
sampler_heapBase address of the sampler heap.
helperHelper program used for this compute command
tsTimestamps for the compute command
Description
This command submits a control stream consisting of compute dispatches. Thereis essentially no limit on how many compute dispatches may be included in asingle compute command, although timestamps are at command granularity.
- structdrm_asahi_get_time¶
Arguments passed to DRM_IOCTL_ASAHI_GET_TIME
Definition:
struct drm_asahi_get_time { __u64 flags; __u64 gpu_timestamp;};Members
flagsMBZ.
gpu_timestampOn return, the GPU timestamp in nanoseconds.
- DRM_IOCTL_ASAHI¶
DRM_IOCTL_ASAHI(__access,__id,__type)
Build an Asahi IOCTL number
Parameters
__accessAccess type. Must be R, W or RW.
__idOne of the DRM_ASAHI_xxx id.
__typeSuffix of the type being passed to the IOCTL.
Description
Don’t use this macro directly, use the DRM_IOCTL_ASAHI_xxxvalues instead.
Return
An IOCTL number to be passed to ioctl() from userspace.